Item_func_group_concat::print() did not take into account
that Item_func_group_concat::separator can be of a different character set
than the "String *str" (when the printing is being done to).
Therefore, printing did not work correctly for:
- non-ASCII separators when GROUP_CONCAT is done on 8bit data
or multi-byte data with mbminlen==1.
- all separators (even including simple ones like comma)
when GROUP_CONCAT is done on ucs2/utf16/utf32 data (mbminlen>1).
Because of this problem, VIEW definitions did not print correctly to
their FRM files. This later led to a wrong SELECT and SHOW CREATE output.
Fix:
- Adding new String methods:
bool append_for_single_quote_using_mb_wc(const char *str, size_t length,
CHARSET_INFO *cs);
bool append_for_single_quote_opt_convert(const char *str,
size_t length,
CHARSET_INFO *cs)
which perform both escaping and character set conversion at the same time.
- Adding a new String method escaped_wc_for_single_quote(),
to reuse the code between the old and the new methods.
- Fixing Item_func_group_concat::print() to use the new
method append_for_single_quote_opt_convert().
Problem:
Item_func_date_format::val_str() and make_date_time() did not take into
account that the format string and the result string
(separately or at the same time) can be of a tricky character set
like UCS2, UTF16, UTF32. As a result, DATE_FORMAT() could generate
an ill-formed result which crashed on DBUG_ASSERTs testing well-formedness
in other parts of the code.
Fix:
1. class String changes
Removing String::append_with_prefill(). It was not compatible with
tricky character sets. Also it was inconvenient to use and required
too much duplicate code on the caller side.
Adding String::append_zerofill() instead. It's compatible with tricky
character sets and is easier to use.
Adding helper methods Static_binary_string::q_append_wc() and
String::append_wc(), to append a single wide character
(a Unicode code point in my_wc_t).
2. storage/spider changes
Removing spider_string::append_with_prefill().
It used String::append_with_prefix() inside, but it was unused itself.
3. Changing tricky charset incompatible code pieces in make_date_time()
to compatible replacements:
- Fixing the loop scanning the format string to iterate in terms
of Unicode code points (using mb_wc()) rather than in terms
of "char" items.
- Using append_wc(my_wc_t) instead of append(char) to append
a single character to the result string.
- Using append_zerofill() instead of append_with_prefill() to
append date/time numeric components to the result string.
The assert inside String::copy() prevents copying from from "str"
if its own String::Ptr also points to the same memory.
The idea of the assert is that copy() performs memory reallocation,
and this reallocation can free (and thus invalidate) the memory pointed by Ptr,
which can lead to further copying from a freed memory.
The assert was incomplete: copy() can free the memory pointed by its Ptr
only if String::alloced is true!
If the String is not alloced, it is still safe to copy even from
the location pointed by Ptr.
This scenario demonstrates a safe copy():
const char *tmp= "123";
String str1(tmp, 3);
String str2(tmp, 3);
// This statement is safe:
str2.copy(str1->ptr(), str1->length(), str1->charset(), cs_to, &errors);
Inside the copy() the parameter "str" is equal to String::Ptr in this example.
But it's still ok to reallocate the memory for str2, because str2
was a constant before the copy() call. Thus reallocation does not
make the memory pointed by str1->ptr() invalid.
Adjusting the assert condition to allow copying for constant strings.
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
Passing a null pointer to a nonnull argument is not only undefined
behaviour, but it also grants the compiler the permission to optimize
away further checks whether the pointer is null. GCC -O2 at least
starting with version 8 may do that, potentially causing SIGSEGV.
Problem:
The crash happened in FORMAT(double, dec>=31, 'de_DE').
The patch for MDEV-23118 (commit 0041dacc1b)
did not take into account that String::set_real() has a limit of 31
(FLOATING_POINT_DECIMALS) fractional digits. So for the range of 31..38
digits, set_real() switches to use:
- my_fcvt() - decimal point notation, e.g. 1.9999999999
- my_gcvt() - scientific notation, e.g. 1e22
my_gcvt() returned a shorter string than Item_func_format::val_str_ascii()
expected to get after the my_fcvt() call, so it crashed on assert.
Solution:
We cannot extend set_real() to use the my_fcvt() mode for the range of
31..38 fractional digits, because set_real() is used in a lot of places
and such a change will break everything.
Introducing String::set_fcvt() which always prints using my_fcvt()
for the whole range of decimals 0..38, supported by the FORMAT() function.
The code did not take into account that:
- U+005C (backslash) can occupy more than mbminlen characters (e.g. in sjis)
- Some character sets do not have a code for U+005C (e.g. swe7)
Adding a new function my_wc_to_printable into MY_CHARSET_HANDLER to
cover all special cases easier.
Introduced a print_key_value function to makes sure that the trace prints data in readable format
for readable characters and the rest of the characters are printed as hexadecimal.
copy_if_not_alloced() did not handle situations when
"from" is a constant string pointing to a substring of "to",
so this code part freed "to" but then tried to copy its old (already freed)
content to a new buffer:
if (to->realloc(from_length))
return from;
if ((to->str_length=MY_MIN(from->str_length,from_length)))
memcpy(to->Ptr,from->Ptr,to->str_length);
Adding a new code piece that catches such constant substrings
and propery reallocs "to" to preserve its important part referenced
by "from".
Bug was introduced in this commit:
commit: a9ca819897
Call alloc() instead of realloc()
Use alloc() if we don't need original string (avoid copy)
Removed not needed test of str_length in sql_string.cc
copy_if_not_alloced() was forgotten when changing realloc()'s to alloc()'s.
Changing it now.
This warning come from a copy() operation of type:
memcpy(ptr, ptr+A, B), which is safe but produces a warning
when run with valgrind.
To avoid the warning, I added copy_or_move() method which uses
memmove() instead of memcpy().
In 10.3 the change in item_strfunc::Item_func_concat() has to be mirroed
in Item_func_concat_oracle() to avoid future valgrind warnings.