c3cf7f47f0 reverted the patch
for BUG#24487120. After merging the reverting patch from MySQL
to MariaDB the problems described in MDEV-11079 and MDEV-11631 disappeared.
Adding test cases only.
Partially backporting MDEV-9874 from 10.2 to 10.0
READ_INFO::read_field() raised the ER_INVALID_CHARACTER_STRING error
when reading an escape character followed by a multi-byte character.
Raising wellformedness errors in READ_INFO::read_field() was wrong,
because the main goal of READ_INFO::read_field() is to *unescape* the
data which was presumably escaped using mysql_real_escape_string(),
using the same character set with the one specified in
"LOAD DATA INFILE ... CHARACTER SET ..." (or assumed by default).
During LOAD DATA, multi-byte characters are not always scanned as a single
entity! In case of escaped data, parts of a multi-byte character can be
scanned on different loop iterations. So the old code erroneously tested
welformedness in the middle of a multi-byte character.
Moreover, the data after unescaping can go into a BLOB field, not a text field.
Wellformedness tests are meaningless in this case.
Ater this patch, wellformedness is only checked later, during
Field::store(str,length,cs) time. The loop that scans bytes only
makes sure to revert the changes made by mysql_real_escape_string().
Note, in some cases users can supply data which did not really go through
mysql_real_escape_string() and was escaped by some other means,
or was not escaped at all. The file reported in this MDEV contains
the string "\ä", which is an example of such improperly escaped data, as
- either there should be two backslashes: "\\ä"
- or there should be no backslashes at all: "ä"
mysql_real_escape_string() could not generate "\ä".
- Moving the new my_charlen()-based code handling multi-byte characters
from READ_INFO::field_field() to a new method READ_INFO::read_mbtail()
- Reusing read_mbtail() in READ_INFO::read_value(), instead of the old
my_mbcharlen()-based code which did not catch broken byte sequences