Problem:
=======
SHOW BINLOG EVENTS FROM <"random"-pos> caused a variety of failures as
reported in MDEV-18046. They are fixed but that approach is not future-proof
as well as is not optimal to create extra check for being constructed event
parameters.
Analysis:
=========
"show binlog events from <pos>" code considers the user given position as a
valid event start position. The code starts reading data from this event start
position onwards and tries to map it to a set of known events. Each event has
a specific event structure and asserts have been added to ensure that, read
event data, satisfies the event specific requirements. When a random position
is supplied to "show binlog events command" the event structure specific
checks will fail and they result in assert.
For example: https://jira.mariadb.org/browse/MDEV-18046
In the bug description user executes CREATE TABLE/INSERT and ALTER SQL
commands.
When a crazy offset like "SHOW BINLOG EVENTS FROM 365" is provided code
assumes offset 365 as valid event begin and proceeds to EVENT_LEN_OFFSET reads
some random length and comes up with a crazy event which didn't exits in the
binary log. In this quoted example scenario, event read at offset 365 is
considered as "Update_rows_log_event", which is not present in binary log.
Since this is a random event its validation fails and code results in
assert/segmentation fault, as shown below.
mysqld: /data/src/10.4/sql/log_event.cc:10863: Rows_log_event::Rows_log_event(
const char*, uint, const Format_description_log_event*):
Assertion `var_header_len >= 2' failed.
181220 15:27:02 [ERROR] mysqld got signal 6 ;
#7 0x00007fa0d96abee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x000055e744ef82de in Rows_log_event::Rows_log_event (this=0x7fa05800d390,
buf=0x7fa05800d080 "", event_len=254, description_event=0x7fa058006d60) at
/data/src/10.4/sql/log_event.cc:10863
#9 0x000055e744f00cf8 in Update_rows_log_event::Update_rows_log_event
Since we are reading random data repeating the same command SHOW BINLOG EVENTS
FROM 365 produces different types of crashes with different events. MDEV-18046
reported 10 such crashes.
In order to avoid such scenarios user provided starting offset needs to be
validated for its correctness. Best way of doing this is to make use of
checksums if they are available. MDEV-18046 fix introduced the checksum based
validation.
The issue still remains in cases where binlog checksums are disabled. Please
find the following bug reports.
MDEV-22473: binlog.binlog_show_binlog_event_random_pos failed in buildbot,
server crashed in read_log_event
MDEV-22455: Server crashes in Table_map_log_event,
binlog.binlog_invalid_read_in_rotate failed in buildbot
Fix:
====
When binlog checksum is disabled, perform scan(via reading event by event), to
validate the requested FROM <pos> offset. Starting from offset 4 read the
event_length of next_event in the binary log. Using the next_event length
advance current offset to point to next event. Repeat this process till the
current offset is less than or equal to crazy offset. If current offset is
higher than crazy offset provide appropriate invalid input offset error.
This piece of the code in Item_func_or_sum::agg_item_set_converter:
if (!conv && ((*arg)->collation.repertoire == MY_REPERTOIRE_ASCII))
conv= new (thd->mem_root) Item_func_conv_charset(thd, *arg, coll.collation, 1);
was wrong because:
1. It could change Item_cache to Item_func_conv_charset
(with the old Item_cache in args[0]).
Such Item type change is not always supported, e.g.
the code in Item_singlerow_subselect::reset() expects only Item_cache,
to be able to call Item_cache::set_null(). So it erroneously
reinterpreted Item_func_conv_charset to Item_cache and called
a non-existing method set_null(), which crashed the server.
2. The 1 in the last parameter to Item_func_conv_charset() was also
a problem. In MariaDB versions where the reported query did not crash,
it erroneously returned "empty set" instead of one row, because
the 1 made subselects execute too earlier and return NULL.
Fix:
1. Removing the above two lines from Item_func_or_sum::agg_item_set_converter()
2. Adding the repertoire test inside the constructor of Item_func_conv_charset,
so it now detects itself as "safe" in more cases than before.
This is needed to avoid new "Illegal mix of collations" after
removing the wrong code in various scenarios when character set
conversion from pure ASCII happens, including the reported scenario.
So now this sequence:
Item_cache -> Item_func_concat
is replaced to this compatible sequence (the top Item is still Item_cache):
new Item_cache -> Item_func_conv_charset -> Item_func_concat
Before the fix it was replaced to this incompatible sequence:
Item_func_conv_charset -> old Item_cache -> Item_func_concat
Passing a null pointer to a nonnull argument is not only undefined
behaviour, but it also grants the compiler the permission to optimize
away further checks whether the pointer is null. GCC -O2 at least
starting with version 8 may do that, potentially causing SIGSEGV.
These problems were caught in a WITH_UBSAN=ON build with the
Bug#7024 test in main.view.
Passing a null pointer to the "%s" argument of a printf-like
function is undefined behaviour. In the GNU libc implementation
of the printf() family of functions, it happens to work.
GCC 10.2.0 would diagnose this with -Wformat-overflow -Og.
In -fsanitize=undefined (WITH_UBSAN=ON) builds, a runtime error
would be generated. In some other builds, GCC 8 or later might infer
that the parameter is nonnull and optimize away further checks whether
the parameter is null, leading to SIGSEGV.
(This commit is exclusively for 10.1 branch, do not merge it to upper ones)
In case of a pattern of non-STMT_END-marked Rows-log-event (A) followed by
a STMT_END marked one (B) mysqlbinlog mixes up the base64 encoded rows events
with their pseudo sql representation produced by the verbose option:
BINLOG '
base64 encoded data for A
### verbose section for A
base64 encoded data for B
### verbose section for B
'/*!*/;
In effect the produced BINLOG '...' query is not valid and is rejected with the error.
Examples of this way malformed BINLOG could have been found in binlog_row_annotate.result
that gets corrected with the patch.
The issue is fixed with introduction an auxiliary IO_CACHE to hold on the verbose
comments until the terminal STMT_END event is found. The new cache is emptied
out after two pre-existing ones are done at that time.
The correctly produced output now for the above case is as the following:
BINLOG '
base64 encoded data for A
base64 encoded data for B
'/*!*/;
### verbose section for A
### verbose section for B
Thanks to Alexey Midenkov for the problem recognition and attempt to tackle,
Venkatesh Duggirala who produced a patch for the upstream whose
idea is exploited here, as well as to MDEV-23077 reporter LukeXwang who
also contributed a piece of a patch aiming at this issue.
Extra: mysqlbinlog_row_minimal refined to not produce mutable numeric values into the result file.
The issue here was that the query was using ORDER BY LIMIT optimzation where
the access method was changed from EQ_REF access to an index scan (index that would
resolve the ORDER BY clause).
But the parameter READ_RECORD::unlock_row was not reset to rr_unlock_row, which is
used when the access method is not EQ_REF access.
Since MDEV-18778, timezone tables get changed to innodb
to allow them to be replicated to other galera nodes.
Even without galera, timezone tables could be declared innodb.
With the standalone innodb tables, the mysql_tzinfo_to_sql takes
approximately 27 seconds.
With the transactions enabled in this patch, 1.2 seconds is
the approximate load time.
While explicit checks for the engine of the time zone tables could be
done, or checks against !opt_skip_write_binlog, non-transactional
storage engines will just ignore the transactional state without
even a warning so its safe to enact globally.
Leap seconds are pretty much ignored as they are a single insert
statement and have gone out of favour as they have caused MariaDB
stalls in the past.
Removing the ORDER BY clause from the UNION when UNION is inside an IN/ALL/ANY/EXISTS subquery.
The rewrites are done for subqueries but this rewrite is not done for the fake_select of
the UNION.
Problem:- rpl_parallel2 was failing non-deterministically
Analysis:-
When FLUSH TABLES WITH READ LOCK is executed, it will allow all worker
threads to complete their ongoing transactions and then it will pause them.
At this state FTWRL will proceed to acquire global read lock. FTWRL first
blocks threads from starting new commits, then upgrades the lock to block
commit of existing transactions.
Step1:
FLUSH TABLES WITH READ LOCK - Blocks new commits
Step2:
* STOP SLAVE command enables 'force_abort=1' which unblocks workers,
they continue to execute events.
* T1: Waits in 'record_gtid' call to update 'gtid_slave_pos' table with
its current GTID, but it is blocked becuase of Step1.
* T2: Holds COMMIT lock and waits for T1 to commit.
Step3:
FLUSH TABLES WITH READ LOCK - Waiting to get BLOCK_COMMIT.
This results in deadlock. When STOP SLAVE command allows paused workers to
proceed, workers should skip the execution of all further events, similar
to 'conservative' parallel mode.
Solution:-
We will assign 1 to skip_event_group when we are aborted in do_ftwrl_wait.
rpl_parallel_entry->pause_sub_id is only reset when force_abort is off in
rpl_pause_after_ftwrl.
Do not collect EITS statistics for this statement:
ALTER TABLE t ANALYZE PARTITION p
EITS stats are currently global, not per-partition.
Collecting global stats when we are asked to process just one partition
causes issues for DBAs.
Fix prefix key comparison in partitioning. Comparions must
take into account no more than prefix_len characters.
It used to compare prefix_len*mbmaxlen bytes.
* Fix the crash: IN-to-EXISTS rewrite causes an error (and so
JOIN::optimize() fails with an error, too), don't call
update_used_tables(). Terminate the query execution instead.
* Fix the cause of the error in the IN-to-EXISTS rewrite: don't do
the rewrite if doing it will cause an error of this kind:
This version of MariaDB doesn't yet support 'SUBQUERY in ROW in left
expression of IN/ALL/ANY'
* Fix another issue exposed by this testcase:
JOIN::setup_subquery_caches() may be invoked before any select has
saved its query plan, and will crash because none of the SELECTs
has called create_explain_query_if_not_exists() to create the Explain
Data Structure for this SELECT.
TODO: When merging this to 10.2, remove the poorly-placed call to
create_explain_query_if_not_exists made by fix for M_D_E_V-16153
The issue occurs when the subquery_cache is enabled.
When there is a cache miss the division was leading to a value with scale 9.
In the case of cache hit the value returned was of scale 9 and due to the different
values for the scales the where condition evaluated to FALSE, hence the output
was incomplete.
To fix this problem we need to round up the decimal to the limit mentioned in
Item::decimals. This would make sure the values are compared with the same
scale.
Server auto-sets lower_case_file_system value based on default
datadir's behavior instead of instead of using the directory specified
by the user through the configuration file or command line options.
This patch fixes this problem.
An oveflow was happening on windows because on Windows sizeof(ulong) is 4 bytes
while it is 8 bytes on Linux.
Switched avg_frequency and avg length for column statistics to ulonglong.
Switched avg_frequency for index statistics to ulonglong.
failed in Diagnostics_area::set_ok_status on FUNCTION replace
When there is REPLACE in the statement, sp_drop_routine_internal() returns
0 (SP_OK) on success which is then assigned to ret. So ret becomes false
and the error state is lost. The expression inside DBUG_ASSERT()
evaluates to false and thus the assertion failure.
While trying to detect datadir, take into account that one can use
Windows service name as section name in options file, for Windows service.
The historical obscurity is being used by WAMP installations.
Make sure that the sort_buffer that is allocated has atleast space for MERGEBUFF2 keys.
The issue here was that the record length is quite high and sort buffer size is very small,
due to which we end up with zero number of keys in the sort buffer. The Sort_param::max_keys_per_buffer
was zero in such a case, due to which we were flushing empty sort_buffer to the disk.
- service not using "--defaults-file" can have any name not just "MySQL"
- service with "--defaults-file", without datadir in them
use default datadir (install_root\data)
The issue here was that the left expr and right expr of the ANY subquery
had different character sets, so we were converting the left expr to utf8 character set.
So when this conversion was happening we were actually converting the item inside the cache,
it looked like <cache>(convert(t1.l1 using utf8)), which is incorrect.
To fix this problem we are going to store the reference of the left expr and convert that
to utf8 character set, it would look like convert(<cache>(`test`.`t1`.`l1`) using utf8)
Problem:
========
Relay_log_info::flush reports following MSAN issue.
==17820==WARNING: MemorySanitizer: use-of-uninitialized-value is reported
#5 0x00005584f0981441 in my_write (Filedes=22,
Buffer=0x72500003e818 "5\n./slave-relay-bin.000003\n21385\n
master-bin.000001\n21643\n0\n", '\245' <repeats 141 times>..., Count=118,
MyFlags=532) at /home/sujatha/bug_repo/test-10.5-msan/mysys/my_write.c:49
Analysis:
=========
In parallel replication at the end of each statement execution the worker execution
status is updated in 'relay-log.info' file. When two workers try to flush
the status at the same time, since the write to cache is not serialized both
workers write to the same address simultaneously and increment the
length twice. Because of this the length of buffer is more than actual data.
When flush code tries to read the buffer beyond valid data length MSAN
reports uninitialized values error.
Fix:
===
Serialize the relay log flush operation using "rli->data_lock".
Analysis:
========
When "Profiling" is enabled, server collects the resource usage of each
statement that gets executed in current session. Profiling doesn't support
nested statements. In order to ensure this behavior when profiling is enabled
for a statement, there should not be any other active query which is being
profiled. This active query information is stored in 'current' variable. When
a nested query arrives it finds 'current' being not NULL and server aborts.
When 'init_connect' and 'init_slave' system variables are set they contain a
set of statements to be executed. "execute_init_command" is the function call
which invokes "dispatch_command" for each statement provided in
'init_connect', 'init_slave' system variables. "execute_init_command" invokes
"start_new_query" and it passes the statement list to "dispatch_command". This
"dispatch_command" intern invokes "start_new_query" which leads to nesting of
queries. Hence '!current' assert is triggered.
Fix:
===
Remove profiling from "execute_init_command" as it will be done within
"dispatch_command" execution.
The code in fill_schema_schemata() did not take into account that
make_db_list() can leave empty db_names if the requested database
name was too long, so the call for db_names.at(0) crashed on assert.
- Moving the code testing if the database directory exists
into a separate function verify_database_directory_exists()
- Modifying the test to check if db_names is not empty
Problem:
=======
The "Start binlog_dump" message hasn't been updated to include the slave's
requested GTID position:
20:05:05 139836760311552 [Note] Start binlog_dump to slave_server(2), pos(, 4)
For diagnostic purposes, it would be helpful if the GTID position were
included.
Fix:
===
Imporve "Start binlog_dump" print message to include "using_gtid" and
"GTID position" requested by slave.
Ex:
[Note] Start binlog_dump to slave_server(2), pos(, 4), using_gtid(1),
gtid('1-1-201,2-2-100')
[Note] Start binlog_dump to slave_server(3), pos('mariadb-bin.004142',
507988273), using_gtid(0), gtid('')
The code erroneously used buff[100] in a fiew places to make
a GRANTEE value in the form:
'user'@'host'
Fix:
- Fixing the code to use (USER_HOST_BUFF_SIZE + 6) instead of 100.
- Adding a DBUG_ASSERT to make sure the buffer is enough
- Wrapping the code into a class Grantee_str, to reuse it easier in 4 places.
The real problem was that attempt to roll back cahnes after end of memory in QC was made incorrectly and lead to using uninitialized memory.
(bug has nothing to do with resize operation, it is just lack of resources erro processed incorrectly)
In case of SELECT without tables which returns either 0 or 1 rows,
JOIN::exec_inner() did not check if the flag representing SQL_CALC_FOUND_ROWS
is set or not and send_records was direclty assigned 0. So SELECT FOUND_ROWS()
was giving 0 in the output. Now it checks if the flag is set, if it is set
send_record=1 else 0. 1 is the number of rows that could have been sent
to the client if the SELECT query had SQL_CALC_FOUND_ROWS.
It is 0 when no rows were sent because the SELECT query did not have
SQL_CALC_FOUND_ROWS.
For low sort_buffer_size, in the cost calculation of using the Unique object the elements in the tree were evaluated to 0, make sure to have atleast 1 element in the Unique tree.
Also for the function Unique::get allocate memory for atleast MERGEBUFF2+1 keys.
For DECIMAL[(M[,D])] datatype max_sort_length was not being honoured which was leading to buffer
overflow while making the sort key. The fix to this problem would be to create sort keys for decimals
with atmost max_sort_key bytes
Important:
The minimum value of max_sort_length has been raised to 8 (previously was 4),
so fixed size datatypes like DOUBLE and BIGINIT are not truncated for
lower values of max_sort_length.
Backported the support for aborting and replaying stored procedure and fix for trigger
key assigments from 10.4 version.
Backported also two mtr tests: wsrep_sp_bf_abort and MDEV-20225