mariadb/strings/string.doc
Alexander Barkov fd247cc21f MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp()
This patch also fixes:
  MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
  MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
  MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
  MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
  MDEV-33088 Cannot create triggers in the database `MYSQL`
  MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
  MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
  MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
  MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
  MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0

- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER

- Adding a wrapper function CHARSET_INFO::streq(), to compare
  two strings for equality. For now it calls strnncoll() internally.
  In the future it will turn into a virtual function.

- Adding new accent sensitive case insensitive collations:
    - utf8mb4_general1400_as_ci
    - utf8mb3_general1400_as_ci
  They implement accent sensitive case insensitive comparison.
  The weight of a character is equal to the code point of its
  upper case variant. These collations use Unicode-14.0.0 casefolding data.

  The result of
     my_charset_utf8mb3_general1400_as_ci.strcoll()
  is very close to the former
     my_charset_utf8mb3_general_ci.strcasecmp()

  There is only a difference in a couple dozen rare characters, because:
    - the switch from "tolower" to "toupper" comparison, to make
      utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
    - the switch from Unicode-3.0.0 to Unicode-14.0.0
  This difference should be tolarable. See the list of affected
  characters in the MDEV description.

  Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
  Unlike utf8mb4_general_ci, it does not treat all BMP characters
  as equal.

- Adding classes representing names of the file based database objects:

    Lex_ident_db
    Lex_ident_table
    Lex_ident_trigger

  Their comparison collation depends on the underlying
  file system case sensitivity and on --lower-case-table-names
  and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.

- Adding classes representing names of other database objects,
  whose names have case insensitive comparison style,
  using my_charset_utf8mb3_general1400_as_ci:

  Lex_ident_column
  Lex_ident_sys_var
  Lex_ident_user_var
  Lex_ident_sp_var
  Lex_ident_ps
  Lex_ident_i_s_table
  Lex_ident_window
  Lex_ident_func
  Lex_ident_partition
  Lex_ident_with_element
  Lex_ident_rpl_filter
  Lex_ident_master_info
  Lex_ident_host
  Lex_ident_locale
  Lex_ident_plugin
  Lex_ident_engine
  Lex_ident_server
  Lex_ident_savepoint
  Lex_ident_charset
  engine_option_value::Name

- All the mentioned Lex_ident_xxx classes implement a method streq():

  if (ident1.streq(ident2))
     do_equal();

  This method works as a wrapper for CHARSET_INFO::streq().

- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
  in class members and in function/method parameters.

- Replacing all calls like
    system_charset_info->coll->strcasecmp(ident1, ident2)
  to
    ident1.streq(ident2)

- Taking advantage of the c++11 user defined literal operator
  for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
  data types. Use example:

  const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;

  is now a shorter version of:

  const Lex_ident_column primary_key_name=
    Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
2024-04-18 15:22:10 +04:00

143 lines
6.4 KiB
Text

Speciella användbara nya string-rutiner:
bcmp(s1, s2, len) returns 0 if the "len" bytes starting at "s1" are
identical to the "len" bytes starting at "s2", non-zero if they are
different.
bfill(dst, len, fill) moves "len" fill characters to "dst".
Thus to set a buffer to 80 spaces, do bfill(buff, 80, ' ').
bmove(dst, src, len) moves exactly "len" bytes from the source "src"
to the destination "dst". It does not check for NUL characters as
strncpy() and strnmov() do.
bmove_upp(dst, src, len) moves exactly "len" bytes from the source
"src-len" to the destination "dst-len" counting downwards.
bzero(dst, len) moves "len" 0 bytes to "dst".
Thus to clear a disc buffer to 0s do bzero(buffer, BUFSIZ).
int2str(dst, radix, val)
converts the (long) integer "val" to character form and moves it to
the destination string "dst" followed by a terminating NUL. The
result is normally a pointer to this NUL character, but if the radix
is dud the result will be NullS and nothing will be changed.
If radix is -2..-62, val is taken to be SIGNED.
If radix is 2.. 62, val is taken to be UNSIGNED.
That is, val is signed if and only if radix is. You will normally
use radix -10 only through itoa and ltoa, for radix 2, 8, or 16
unsigned is what you generally want.
m_ctype.h
A better inplementation of the UNIX ctype(3) library.
Notes: global.h should be included before ctype.h
- Se efter i filen \c\local\include\m_ctype.h
- Används istället för ctype.h för att klara internationella karakterer.
m_string.h
Använd instället för string.h för att supporta snabbare strängfunktioner.
strintstr(src, from, pat) looks for an instance of pat in src
backwards from pos from. pat is not a regex(3) pattern, it is a literal
string which must be matched exactly.
The result 0 if the pattern was not found else it is the start char of
the pattern counted from the beginning of the string.
strappend(dest, len, fill) appends fill-characters to a string so that
the result length == len. If the string is longer than len it's
trunked. The des+len character is allways set to NULL.
strcat(s, t) concatenates t on the end of s. There had better be
enough room in the space s points to; strcat has no way to tell.
Note that strcat has to search for the end of s, so if you are doing
a lot of concatenating it may be better to use strmov, e.g.
strmov(strmov(strmov(strmov(s,a),b),c),d)
rather than
strcat(strcat(strcat(strcpy(s,a),b),c),d).
strcat returns the old value of s.
- Använd inte strcat, använd strmov (se ovan).
strcend(s, c) returns a pointer to the first place in s where c
occurs, or a pointer to the end-null of s if c does not occur in s.
strcont(str, set) if str contanies any character in the string set.
The result is the position of the first found character in str, or NullS
if there isn't anything found.
strend(s) returns a character pointer to the NUL which ends s. That
is, strend(s)-s == strlen(s). This is useful for adding things at
the end of strings. It is redundant, because strchr(s,'\0') could
strfill(dest, len, fill) makes a string of fill-characters. The result
string is of length == len. The des+len character is allways set to NULL.
strfill() returns pointer to dest+len;
strfind(src, pat) looks for an instance of pat in src. pat is not a
regex(3) pattern, it is a literal string which must be matched exactly.
The result is a pointer to the first character of the located instance,
or NullS if pat does not occur in src.
strmake(dst,src,length) moves length characters, or until end, of src to
dst and appends a closing NUL to dst.
strmake() returns pointer to closing null;
strmov(dst, src) moves all the characters of src (including the
closing NUL) to dst, and returns a pointer to the new closing NUL in
dst. The similar UNIX routine strcpy returns the old value of dst,
which I have never found useful. strmov(strmov(dst,a),b) moves a//b
into dst, which seems useful.
strnmov(dst,src,length) moves length characters, or until end, of src to
dst and appends a closing NUL to dst if src is shorter than length.
The result is a pointer to the first NUL in dst, or is dst+n if dst was
truncated.
strrchr(s, c) returns a pointer to the last place in s where c
occurs, or NullS if c does not occur in s. This function is called
rindex in V7 and 4.?bsd systems.
strrchr looks for single characters, not for sets or strings.
strxmov(dst, src1, ..., srcn, NullS)
moves the concatenation of src1,...,srcn to dst, terminates it
with a NUL character, and returns a pointer to the terminating NUL.
It is just like strmov except that it concatenates multiple sources.
Beware: the last argument should be the null character pointer.
Take VERY great care not to omit it! Also be careful to use NullS
and NOT to use 0, as on some machines 0 is not the same size as a
character pointer, or not the same bit pattern as NullS.
strxnmov(dst, len, src1, ..., srcn, NullS)
moves the first len characters of the concatenation of src1,...,srcn
to dst. If there aren't that many characters, a NUL character will
be added to the end of dst to terminate it properly. This gives the
same effect as calling strxcpy(buff, src1, ..., srcn, NullS) with a
large enough buffer, and then calling strnmov(dst, buff, len).
It is just like strnmov except that it concatenates multiple sources.
Beware: the last argument should be the null character pointer.
Take VERY great care not to omit it! Also be careful to use NullS
and NOT to use 0, as on some machines 0 is not the same size as a
character pointer, or not the same bit pattern as NullS.
Note: strxnmov is like strnmov in that it always moves EXACTLY len
characters; dst will be padded on the right with NUL characters as
needed. strxncpy does the same. strxncat, like strncat, does NOT.
I mysys:
stripp_sp(string str)
Strips end-space from string and returns new length.
strlength(const string str)
Return length of string with end-space:s not counted.
void caseup _A((string str,uint length));
void casedn _A((string str,uint length));
Converts strings or part of string to upper or lower-case.
void case_sort _A((string str,uint length));
Converts string to a string with can be compared with strcmp() to
get strings in rigth order.
string strcfind(str,search)
find string in another with no case_sensivity