Problem:
Item_func_date_format::val_str() and make_date_time() did not take into
account that the format string and the result string
(separately or at the same time) can be of a tricky character set
like UCS2, UTF16, UTF32. As a result, DATE_FORMAT() could generate
an ill-formed result which crashed on DBUG_ASSERTs testing well-formedness
in other parts of the code.
Fix:
1. class String changes
Removing String::append_with_prefill(). It was not compatible with
tricky character sets. Also it was inconvenient to use and required
too much duplicate code on the caller side.
Adding String::append_zerofill() instead. It's compatible with tricky
character sets and is easier to use.
Adding helper methods Static_binary_string::q_append_wc() and
String::append_wc(), to append a single wide character
(a Unicode code point in my_wc_t).
2. storage/spider changes
Removing spider_string::append_with_prefill().
It used String::append_with_prefix() inside, but it was unused itself.
3. Changing tricky charset incompatible code pieces in make_date_time()
to compatible replacements:
- Fixing the loop scanning the format string to iterate in terms
of Unicode code points (using mb_wc()) rather than in terms
of "char" items.
- Using append_wc(my_wc_t) instead of append(char) to append
a single character to the result string.
- Using append_zerofill() instead of append_with_prefill() to
append date/time numeric components to the result string.
1. Adding a separate MY_COLLATION_HANDLER
my_collation_ucs2_general_mysql500_ci_handler
implementing a proper order for ucs2_general_mysql500_ci
The problem happened because ucs2_general_mysql500_ci
erroneously used my_collation_ucs2_general_ci_handler.
2. Cosmetic changes: Renaming:
- plane00_mysql500 to my_unicase_mysql500_page00
- my_unicase_pages_mysql500 to my_unicase_mysql500_pages
to use the same naming style with:
- my_unicase_default_page00
- my_unicase_defaul_pages
3. Moving code fragments from
- handler::check_collation_compatibility() in handler.cc
- upgrade_collation() in table.cc
into new methods in class Charset, to reuse the code easier.
This patch is the result of running
run-clang-tidy -fix -header-filter=.* -checks='-*,modernize-use-equals-default' .
Code style changes have been done on top. The result of this change
leads to the following improvements:
1. Binary size reduction.
* For a -DBUILD_CONFIG=mysql_release build, the binary size is reduced by
~400kb.
* A raw -DCMAKE_BUILD_TYPE=Release reduces the binary size by ~1.4kb.
2. Compiler can better understand the intent of the code, thus it leads
to more optimization possibilities. Additionally it enabled detecting
unused variables that had an empty default constructor but not marked
so explicitly.
Particular change required following this patch in sql/opt_range.cc
result_keys, an unused template class Bitmap now correctly issues
unused variable warnings.
Setting Bitmap template class constructor to default allows the compiler
to identify that there are no side-effects when instantiating the class.
Previously the compiler could not issue the warning as it assumed Bitmap
class (being a template) would not be performing a NO-OP for its default
constructor. This prevented the "unused variable warning".
Allow materialization strategy when collations on the
inner and outer sides of an IN subquery are the same and the
character set of the inner side is a proper subset of the character
set on the outer side.
This allows conversion from utf8mb3 to utf8mb4
as the former is a subset of the later.
This is only allowed when IN predicate is converted to an IN subquery
Backported part of the patch (d6a00d9b18) of MDEV-17905.
Invoking memcpy() on a NULL pointer is undefined behaviour
(even if the length is 0) and gives the compiler permission to
assume that the pointer is nonnull. Recent versions of GCC
(starting with version 8) are more aggressively optimizing away
checks for NULL pointers. This undefined behaviour would cause
a SIGSEGV in the test main.func_encrypt on an optimized debug build
on GCC 10.2.0.
Passing a null pointer to a nonnull argument is not only undefined
behaviour, but it also grants the compiler the permission to optimize
away further checks whether the pointer is null. GCC -O2 at least
starting with version 8 may do that, potentially causing SIGSEGV.
These problems were caught in a WITH_UBSAN=ON build with the
Bug#7024 test in main.view.
Problem:
The crash happened in FORMAT(double, dec>=31, 'de_DE').
The patch for MDEV-23118 (commit 0041dacc1b)
did not take into account that String::set_real() has a limit of 31
(FLOATING_POINT_DECIMALS) fractional digits. So for the range of 31..38
digits, set_real() switches to use:
- my_fcvt() - decimal point notation, e.g. 1.9999999999
- my_gcvt() - scientific notation, e.g. 1e22
my_gcvt() returned a shorter string than Item_func_format::val_str_ascii()
expected to get after the my_fcvt() call, so it crashed on assert.
Solution:
We cannot extend set_real() to use the my_fcvt() mode for the range of
31..38 fractional digits, because set_real() is used in a lot of places
and such a change will break everything.
Introducing String::set_fcvt() which always prints using my_fcvt()
for the whole range of decimals 0..38, supported by the FORMAT() function.
- Adding optional qualifiers to data types:
CREATE TABLE t1 (a schema.DATE);
Qualifiers now work only for three pre-defined schemas:
mariadb_schema
oracle_schema
maxdb_schema
These schemas are virtual (hard-coded) for now, but may turn into real
databases on disk in the future.
- mariadb_schema.TYPE now always resolves to a true MariaDB data
type TYPE without sql_mode specific translations.
- oracle_schema.DATE translates to MariaDB DATETIME.
- maxdb_schema.TIMESTAMP translates to MariaDB DATETIME.
- Fixing SHOW CREATE TABLE to use a qualifier for a data type TYPE
if the current sql_mode translates TYPE to something else.
The above changes fix the reported problem, so this script:
SET sql_mode=ORACLE;
CREATE TABLE t2 AS SELECT mariadb_date_column FROM t1;
is now replicated as:
SET sql_mode=ORACLE;
CREATE TABLE t2 (mariadb_date_column mariadb_schema.DATE);
and the slave can unambiguously treat DATE as the true MariaDB DATE
without ORACLE specific translation to DATETIME.
Similar,
SET sql_mode=MAXDB;
CREATE TABLE t2 AS SELECT mariadb_timestamp_column FROM t1;
is now replicated as:
SET sql_mode=MAXDB;
CREATE TABLE t2 (mariadb_timestamp_column mariadb_schema.TIMESTAMP);
so the slave treats TIMESTAMP as the true MariaDB TIMESTAMP
without MAXDB specific translation to DATETIME.
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
We must relax too strict debug assertions. For latin1_swedish_ci,
mtype=DATA_CHAR or mtype=DATA_VARCHAR will be used instead of
mtype=DATA_MYSQL or mtype=DATA_VARMYSQL. Likewise, some changes of
dtype_get_charset_coll() do not affect the data type encoding,
but only any indexes that are defined on the column.
Charset::same_encoding(): Check whether two charset-collations have
the same character set encoding.
dict_col_t::same_encoding(): Check whether two character columns
have the same character set encoding.
dict_col_t::same_type(): Check whether two columns have a compatible
data type encoding.
dict_col_t::same_format(), dict_table_t::instant_column(): Do not
compare mtype or the charset-collation of prtype directly.
Rely on dict_col_t::same_type() instead.
dtype_get_charset_coll(): Narrow the return type to uint16_t.
This is a refined version of a fix that was developed by
Thirunarayanan Balathandayuthapani.
This change takes into account a column's GENERATED ALWAYS AS
expression dependcy on sql_mode's PAD_CHAR_TO_FULL_LENGTH and
NO_UNSIGNED_SUBTRACTION flags.
Indexed virtual columns as well as persistent generated columns are
now not allowed to have such dependencies to avoid inconsistent data
or index files on sql_mode changes.
So an error is now returned in cases like this:
CREATE OR REPLACE TABLE t1
(
a CHAR(5),
v VARCHAR(5) AS (a) PERSISTENT -- CHAR->VARCHAR or CHAR->TEXT = ERROR
);
Functions RPAD() and RTRIM() can now remove dependency on
PAD_CHAR_TO_FULL_LENGTH. So this can be used instead:
CREATE OR REPLACE TABLE t1
(
a CHAR(5),
v VARCHAR(5) AS (RTRIM(a)) PERSISTENT
);
Note, unlike CHAR->VARCHAR and CHAR->TEXT this still works,
not RPAD(a) is needed:
CREATE OR REPLACE TABLE t1
(
a CHAR(5),
v CHAR(5) AS (a) PERSISTENT -- CHAR->CHAR is OK
);
More sql_mode flags may affect values of generated columns.
They will be addressed separately.
See comments in sql_mode.h for implementation details.
Patch is about two cases:
1) On some collate changes it's possible to rebuild only secondary indexes
2) For non-indexed columns collate can be changed INSTANTly
Implemented mostly in Field_{string,varstring,blob}::is_equal().
Make this method return how exactly collationa differs.
This information is later used by fill_alter_inplace_info() to pass
correct info to engine.
Introduced a print_key_value function to makes sure that the trace prints data in readable format
for readable characters and the rest of the characters are printed as hexadecimal.
This patch fixes:
- MDEV-19284 INSTANT ALTER with ucs2-to-utf16 conversion produces bad data
- MDEV-19285 INSTANT ALTER from ascii_general_ci to latin1_general_ci produces corrupt data
These regressions were introduced in 10.4.3 by:
- MDEV-15564 Avoid table rebuild in ALTER TABLE on collation or charset changes
Changes:
1. Cleanup: Adding a helper method
Field_longstr::csinfo_change_allows_instant_alter(),
to remove some duplicate code in field.cc.
2. Cleanup: removing Type_handler::Charsets_are_compatible() and static
function charsets_are_compatible() and
introducing new methods in the recently added class Charset instead:
- encoding_allows_reinterpret_as()
- encoding_and_order_allow_reinterpret_as()
3. Bug fix: Removing the code that allowed instant conversion for
ascii-to->8bit and ucs2-to->utf16.
This actually fixes MDEV-19284 and MDEV-19285.
4. Bug fix: Adding a helper method Charset::collation_specific_name().
The old corresponding code in Type_handler::Charsets_are_compatible()
was not safe against (badly named) user-defined collations whose
character set name can be longer than collation name.
The MDEV-17262 commit 26432e49d3
was skipped. In Galera 4, the implementation would seem to require
changes to the streaming replication.
In the tests archive.rnd_pos main.profiling, disable_ps_protocol
for SHOW STATUS and SHOW PROFILE commands until MDEV-18974
has been fixed.
This warning come from a copy() operation of type:
memcpy(ptr, ptr+A, B), which is safe but produces a warning
when run with valgrind.
To avoid the warning, I added copy_or_move() method which uses
memmove() instead of memcpy().
In 10.3 the change in item_strfunc::Item_func_concat() has to be mirroed
in Item_func_concat_oracle() to avoid future valgrind warnings.
Consider an IN predicate with ROW-type arguments:
predicant IN (value1, ..., valueM)
where predicant and all values consist of N elements.
When performing IN for these arguments, at every position i (1..N)
only data type of i-th element of predicant was taken into account,
while data types on i-th elements of value1..valueM were not taken.
These led to bad comparison data type detection, e.g. when
mixing unsigned and signed integer values.
After this change all element data types are taken into account.
So, for example, a mixture of unsigned and signed values is
now calculated using decimal and does not overflow any more.
Detailed changes:
1. All comparators for ROW elements are now created recursively
at fix_fields() time, inside cmp_item_row::prepare_comparators().
Previously prepare_comparators() installed comparators only
for temporal data types, while comparators for other types were
installed at execution time, in cmp_item_row::store_value().
2. Removing comparator creating code from cmp_item_row::store_value().
It was responsible for non-temporal data types.
3. Removing find_date_time_item(). It's not needed any more.
All ROW-element data types are now covered by
cmp_item_row::prepare_comparators().
4. Adding a helper method Item_args::alloc_and_extract_row_elements()
to extract elements from an array of ROW-type Items, from the given
position. Using this method to collect elements from the i-th
position and further pass them to
Type_handler_hybrid_field_type::aggregate_for_comparison().
5. Moving the call for alloc_comparators() inside
cmp_item_row::prepare_comparators(). This helps
to call prepare_comparators() for ROW elements recursively
(if elements appear to be ROWs again).
Moving alloc_comparators() from "public" to "private".
Remove unused InnoDB function parameters and functions.
i_s_sys_virtual_fill_table(): Do not allocate heap memory.
mtr_is_block_fix(): Replace with mtr_memo_contains().
mtr_is_page_fix(): Replace with mtr_memo_contains_page().