TRASH was mapped to TRASH_FREE and was supposed to be used for memory
that should not be accessed anymore, while TRASH_ALLOC() is to be
used for uninitialized but to-be-used memory.
But sometimes TRASH() was used in the latter sense.
Remove TRASH() macro, always use explicit TRASH_ALLOC() or TRASH_FREE().
Partially backporting MDEV-9874 from 10.2 to 10.0
READ_INFO::read_field() raised the ER_INVALID_CHARACTER_STRING error
when reading an escape character followed by a multi-byte character.
Raising wellformedness errors in READ_INFO::read_field() was wrong,
because the main goal of READ_INFO::read_field() is to *unescape* the
data which was presumably escaped using mysql_real_escape_string(),
using the same character set with the one specified in
"LOAD DATA INFILE ... CHARACTER SET ..." (or assumed by default).
During LOAD DATA, multi-byte characters are not always scanned as a single
entity! In case of escaped data, parts of a multi-byte character can be
scanned on different loop iterations. So the old code erroneously tested
welformedness in the middle of a multi-byte character.
Moreover, the data after unescaping can go into a BLOB field, not a text field.
Wellformedness tests are meaningless in this case.
Ater this patch, wellformedness is only checked later, during
Field::store(str,length,cs) time. The loop that scans bytes only
makes sure to revert the changes made by mysql_real_escape_string().
Note, in some cases users can supply data which did not really go through
mysql_real_escape_string() and was escaped by some other means,
or was not escaped at all. The file reported in this MDEV contains
the string "\ä", which is an example of such improperly escaped data, as
- either there should be two backslashes: "\\ä"
- or there should be no backslashes at all: "ä"
mysql_real_escape_string() could not generate "\ä".
Item_string::clone_item() creates a new Item_string that
points exactly to the same buffer that the original one does.
Later, Item_string::print() uses c_ptr() for the original Item_string,
which reallocs the original buffer, and the clone remain with
the old freed buffer.
Refactoring the code not to use c_ptr() in Item_string::print().
- Changed ER(ER_...) to ER_THD(thd, ER_...) when thd was known or if there was many calls to current_thd in the same function.
- Changed ER(ER_..) to ER_THD_OR_DEFAULT(current_thd, ER...) in some places where current_thd is not necessary defined.
- Removing calls to current_thd when we have access to thd
Part of this is optimization (not calling current_thd when not needed),
but part is bug fixing for error condition when current_thd is not defined
(For example on startup and end of mysqld)
Notable renames done as otherwise a lot of functions would have to be changed:
- In JOIN structure renamed:
examined_rows -> join_examined_rows
record_count -> join_record_count
- In Field, renamed new_field() to make_new_field()
Other things:
- Added DBUG_ASSERT(thd == tmp_thd) in Item_singlerow_subselect() just to be safe.
- Removed old 'tab' prefix in JOIN_TAB::save_explain_data() and use members directly
- Added 'thd' as argument to a few functions to avoid calling current_thd.
Adding a new virtual function MY_CHARSET_HANDLER::copy_abort().
Moving character set specific code into the correspoding implementations
(for simple, multi-byte and mbmaxlen>1 character sets).
1. @@boolean_var differs from SHOW VARIABLES
2. @@str_var ignored variable charset (which is wrong
for path variables that use filesystem charset)
3. @@signed_int_var in the string context was printed
as unsigned
The Item_string constructors called set_name() on the source string,
which was wrong because in case of UCS2/UTF16/UTF32 the source value
might be a not well formed string (e.g. have incomplete leftmost character).
Now set_name() is called on str_value after its copied
(with optionally left zero padding) from the source string.
- MDEV-6694 Illegal mix of collation with a PS parameter
Item_param::convert_str_value() did not set repertoire.
Introducing a new structure MY_STRING_METADATA to collect
character length and repertoire of a string in a single loop,
to avoid two separate loops. Adding a new class Item_basic_value::Metadata
as a convenience wrapper around MY_STRING_METADATA, to reuse the
code between Item_string and Item_param.
Item_string::eq() and Item_param::eq() in string context behaved differently.
Introducing a new class Item_basic_value to share the eq() code between
literals (Item_int, Item_double, Item_string, Item_null) and Item_param.
MDEV-6666 Malformed result for CONCAT(utf8_column, binary_string)
Item_static_string_func::safe_charset_converter() and
Item_hex_string::safe_charset_converter() did not
handle character sets with mbminlen>1 properly, as well as
did not handle conversion from binary to multi-byte well.
Introducing Item::const_charset_converter(), to reuse it in a number
of Item_*::safe_charset_converter().
The problem was in the validation of the input data for blob types.
When assigned binary data, the character blob types were only checking if
the length of these data is a multiple of the minimum char length for the
destination charset.
And since e.g. UTF-8's minimum character length is 1 (becuase it's
variable length) even byte sequences that are invalid utf-8 strings (e.g.
wrong leading byte etc) were copied verbatim into utf-8 columns when
coming from binary strings or fields.
Storing invalid data into string columns was having all kinds of ill effects
on code that assumed that the encoding data are valid to begin with.
Fixed by additionally checking the incoming binary string for validity when
assigning it to a non-binary string column.
Made sure the conversions to charsets with no known "invalid" ranges
are not covered by the extra check.
Removed trailing spaces.
Test case added.
MDEV-4489 "Replication of big5, cp932, gbk, sjis strings makes wrong values on slave"
has been fixed.
Problem:
String constants of some Asian charsets (big5,cp932,gbk,sjis)
can have backslash '\' (0x5C) in the second byte of multi-byte characters.
Replicating of such constants using the standard '\'-escaping is dangerous.
Therefore, constants of these charsets are replicated using hex notation:
INSERT INTO t1 (a) VALUES (0x815C);
However, 0xHHHH constants do not work well in some cases,
because they can behave as strings and as numbers, depending on context
(for example, depending on the data type of the column in an INSERT statement).
This SQL script was not replicated correctly with statement-based replication:
SET NAMES gbk;
PREPARE STMT FROM 'INSERT INTO t1 (a) VALUES (?)';
SET @a = '1';
EXECUTE STMT USING @a;
The INSERT statement was replicated as:
INSERT INTO t1 (a) VALUES (0x31);
'1' was correctly converted to the number 1 on master.
But the 0x31 constant was treated as number 49 on slave.
Fix:
1. Binary log now uses X'HHHH' instead of 0xHHHH constants.
2. The X'HHHH' constants now work always as strings, in all contexts.
This is the SQL standard compliant behaviour.
After the fix, the above statement is replicated as:
INSERT INTO t1 (a) VALUES (X'31');
X'31' is treated as string '1' on slave, and is correctly converted to 1.
modified:
@ mysql-test/r/ctype_cp932_binlog_stm.result
@ mysql-test/r/select.result
@ mysql-test/r/select_jcl6.result
@ mysql-test/r/select_pkeycache.result
@ mysql-test/r/user_var-binlog.result
@ mysql-test/r/varbinary.result
@ mysql-test/suite/binlog/r/binlog_stm_ctype_ucs.result
@ mysql-test/suite/binlog/r/binlog_stm_mix_innodb_myisam.result
@ mysql-test/suite/rpl/r/rpl_charset_sjis.result
@ mysql-test/suite/rpl/r/rpl_mdev382.result
@ mysql-test/suite/rpl/t/rpl_charset_sjis.test
@ mysql-test/t/ctype_cp932_binlog_stm.test
@ mysql-test/t/select.test
@ mysql-test/t/varbinary.test
Adding and updating tests
@ sql/item.cc
@ sql/item.h
@ sql/sql_yacc.yy
@ sql/sql_lex.cc
Splitting the implementations of X'HH' and 0xHH constants into two
separate classes. Fixing the parser to distinguish the two syntaxes.
@ sql/log_event.cc
Using X'HH' instead of 0xHH for binary logging for string constants
of the "dangerous" charsets.
@ sql/sql_string.h
Adding a helped method String::append_hex().
COLUMNS ARE USED INSIDE A STORED PROCEDURE
Problem: When 'SET' type columns are used in a DML
inside a stored procedure and a NULL value is passed
to that column, replication is breaking.
Analysis: All stored procedure variables used inside
a DML will be substituted with NAME_CONST functions.
While NAME_CONST are used in this particular scenario,
i.e., when NULL value is passed then charset is copied
from 'empty_set_string' member of Field_set class.
The operator '=' overload method inside 'String' class
is not coping str_charset from R.H.S object to L.H.S object.
Hence charset is wrongly copied in the string assignment
Fix: Handle coping str_charset member in operator '=' overload
method.
sql/sql_string.h:
Handled coping str_charset member in operator '=' overload
method.
COLUMNS ARE USED INSIDE A STORED PROCEDURE
Problem: The operator '=' overload method inside
'String' class is not coping str_charset member from
R.H.S object to L.H.S object. Hence charset is wrongly
set while using string assignments
Analaysis: The above mentioned problem is
identified while doing the analaysis of bug#14593883.
Though the test scenario mentioned in the bug page
is not an issue in mysql-5.1 code, the actual root cause
ie., "str_charset member is not copied" exists in the
mysql-5.1 code base.
Fix: Handle coping str_charset member in operator '=' overload
method.
sql/sql_string.h:
Handled coping str_charset member in operator '=' overload
method.