into parts when converting to Unicode.
m_ctype.h:
Reorganizing mb_wc return codes to be able
to return "an unassigned N-byte-long character".
sql_string.cc:
Adding code to detect and properly handle
unassigned characters (i.e. the those character
which are correctly formed according to the
character specifications, but don't have Unicode
mapping).
Many files:
Fixing conversion function to return new codes.
ctype_ujis.test, ctype_gbk.test, ctype_big5.test:
Adding a test case.
ctype_ujis.result, ctype_gbk.result, ctype_big5.result:
Fixing results accordingly.
In cp932, '\' character can be the second byte in a
multi-byte character stream. This makes it difficult to use
mysql_escape_string. Added flag to indicate which languages allow
'\' as second byte of multibyte sequence so that when putting a prepared
statement into the binlog we can decide at runtime whether hex encoding
is really needed.
ctype-cp932.c:
ctype-gbk.c:
ctype-mb.c:
ctype-simple.c:
ctype-sjis.c:
ctype-ucs2.c:
ctype-ujis.c:
ctype-utf8.c:
Adding explicit cast to return type
in pointer substructions to avoid
warnings from some compilers.
Fixing tests accordingly.
ctype-ucs2.c:
The same fix for UCS2.
ctype-utf8.c:
Bug #9557
MyISAM utf8 table crash
The problem was that my_strnncollsp_xxx could
return big value in the range 0..0xffff.
for some constant pairs it could return 32738,
which is defined as MI_FOUND_WRONG_KEY in
myisamdef.h. As a result, table considered to
be crashed.
Fix to return -1,0 or 1.
Don't read character set files if we are using only the default charset. In most cases the user will not anymore get a warning about missing character set files
Compare strings with space extend instead of space strip. Now the following comparisons holds: "a" == "a " and "a\t" < "a". (Bug #3152).
Note: Because of the above fix, one has to do a REPAIR on any table that has an ascii character < 32 last in a CHAR/VARCHAR/TEXT columns.
Added more DBUG statements
Ensure that we are comparing end space with BINARY strings
Use 'any_db' instead of '' to mean any database. (For HANDLER command)
Only strip ' ' when comparing CHAR, not other space-like characters (like \t)
sequence boundaries in functions LIKE and LOCATE in
the case of "binary" collation. Comparison was done
like if the strings were just a binary strings without
character set assumption.