The problem happens when MariaDB master replicates writes for only non InnoDB
tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster,
can apply these writes successfully, but it will, in the end, write gtid position in
mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and
this write makes innodb handlerton part of the replicated "transaction".
Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated
and skips appending wsrep keys for these writes. However, as InnoDB was present
in the transaction, and there are replication events (for MyISAM table) in transaction
cache, but there are no appended keys, wsrep raises an error, and this makes the söave
thread to stop.
The fix is simply to not treat it as an error if async slave tries to replicate a write
set with binlog events, but no keys. We just skip wsrep replication and return successfully.
This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be
of InnoDB engine, and executes MyISAM only write through asyn replication.
There is additional fix for declaring IO and background slave threads as non wsrep.
These threads should not write anything for wsrep replication, and this is just a safeguard
to make sure nothing leaks into cluster from these slave threads.
fix bug for spider where using "not like" (#890)
test case:
t1 is a spider engine table;
CREATE TABLE `t1` (
`id` int(11) NOT NULL DEFAULT '0',
`name` char(64) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=SPIDER
query: "select * from t1 where name not like 'x%' " would dispatch "select xxx name name like 'x%' " to remote mysqld, is wrong
The fix consists of three commits backported from 10.3:
1) Cleanup isnan() portability checks
(cherry picked from commit 7ffd7fe962)
2) Cleanup isinf() portability checks
Original problem reported by Wlad: re-compilation of 10.3 on top of 10.2
build would cache undefined HAVE_ISINF from 10.2, whereas it is expected
to be 1 in 10.3.
std::isinf() seem to be available on all supported platforms.
(cherry picked from commit bc469a0bdf)
3) Use std::isfinite in C++ code
This is addition to parent revision fixing build failures.
(cherry picked from commit 54999f4e75)
This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster.
The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure.
PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster.
This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.
Make sure that the sort buffers can store atleast one sort key.
This is needed to make sure that all merge buffers are read else
with no sort keys some merge buffers are skipped because the code
makes a conclusion there is no data to be read.
The assert indicates that the current transaction got caught uncleaned from
the semisync master's cache when it is signaled to proceed upon its
ack receive.
The reason of missed cleanup turns out to be a flaw in the gtid
connect mode.
A submitted by connecting slave value of its last received event's
binlog file *name* was adopted into
{{Repl_semi_sync_master::m_reply_file_name}} as a part of semisync
initialization.
Notice that the initialization still refines the position part of the
submitted last received event's binlog coordinates.
The master side binlog filename:pos refinement is
specific to the gtid connect mode for purpose of computing the latest
binlog file to resume slave feeding from.
Effectively in the gtid connect mode the computed resumption filename:pos
may appear smaller in which case a new post-connect time committing
transaction may be logged with its filename:pos also less than the
submitted coordinates and that triggers the assert.
Fixed with making the semisync initialization to use the refined filename:pos.
It is guaranteed to be less than any new generated transaction's binlog:pos.
The issue here is the wrong estimate of the cardinality of a partial join,
the cardinality is too high because the function table_cond_selectivity()
returns an absurd number 100 while selectivity cannot be greater than 1.
When accessing table t by outer reference t1.a via index we do not perform any
range analysis for t. Yet we see TABLE::quick_key_parts[key] and
TABLE->quick_rows[key] contain a non-zero value though these should have been
remained untouched and equal to 0.
Thus real cause of the problem is that TABLE::init does not clean the arrays
TABLE::quick_key_parts[] and TABLE::>quick_rows[].
It should have done it because the TABLE structure created for any
instance of a table can be reused for many queries.
In the function prev_record_reads where one finds the different row combinations for a
subset of partial join, it did not take into account the selectivity of tables
involved in the subset of partial join.
Unfortunate DROP TEMPORARY..IF EXISTS on a regular table may allow
subsequent CREATE TABLE statements to steal away the PFS_table_share
instance from the dropped table.
* use compile_time_assert instead of DBUG_ASSERT
* don't use thd->clear_error(), because
* the error was already consumed by the error handler, so there is
nothing to clear
* it's dangerous to clear errors indiscriminately, if the error came
from outside of read_statistics_for_tables() it must not be cleared
mysql_insert() first opens all affected tables (which implicitly
starts a transaction in InnoDB), then stat tables.
A failure to open a stat table caused open_tables() to abort
the current stmt transaction (trans_rollback_stmt()). So, from the
server point of view the following ha_write_row()-s happened outside
of a transactions, and the server didn't bother to commit them.
The server has a mechanism to prevent a transaction being
unexpectedly committed or rolled back in the middle of a statement -
if an operation takes place _in a sub-statement_ it cannot change
the transaction state. Operations on stat tables are exactly that -
they are not allowed to change a transaction state. Put them in
a sub-statement to make sure they don't.
Basicaly it's an uninitialized read. 165 is 0xa5 which comes from TRASH_ALLOC()
Fix by calling a class ctor which initializes problematic
TMP_TABLE_PARAM::force_copy_fields field
Lock wait can happen on secondary index when doing FK checks for wsrep.
We should just return error to upper layer and applier will retry
operation when needed.
Problem was in a combination of LOCK TABLES on several Aria
tables followed by an ALTER TABLE. After the ALTER TABLE there
was a force close + reopen of the alter table. The close failed
because the table was still part of a transaction.
Fixed by calling extra(HA_EXTRA_PREPARE_FOR_FORCED_CLOSE) as
part of closing the table, which ensures that the table is not
anymore part of the current transaction.
Proper C-style type erasure is done via void*, not via char* or something else.
free_key_cache()
free_rpl_filter(): types were fixed to avoid function pointer type cast which
is still undefined behavior.
Note, that casting from void* to any other pointer type is safe and correct.
Analysis
Mysqlbinlog output for encrypted binary log
#Q> insert into tab1 values (3,'row 003')
#190912 17:36:35 server id 10221 end_log_pos 980 CRC32 0x53bcb3d3 Table_map: `test`.`tab1` mapped to number 19
# at 940
#190912 17:36:35 server id 10221 end_log_pos 1026 CRC32 0xf2ae5136 Write_rows: table id 19 flags: STMT_END_F
Here we can see Table_map_log_event ends at 980 but Next event starts at 940.
And the reason for that is we do not send START_ENCRYPTION_EVENT to the slave
Solution:-
Send Start_encryption_log_event as Ignorable_log_event to slave(mysqlbinlog),
So that mysqlbinlog can update its log_pos.
Since Slave can request multiple FORMAT_DESCRIPTION_EVENT while master does not
have so We only update slave master pos when master actually have the
FORMAT_DESCRIPTION_EVENT. Similar logic should be applied for START_ENCRYPTION_EVENT.
Also added the test case when new server reads the data from old server which
does not send START_ENCRYPTION_EVENT to slave.
Master Slave Upgrade Scenario.
When Slave is updated first, Slave will have extra logic of handling
START_ENCRYPTION_EVENT But master willnot be sending START_ENCRYPTION_EVENT.
So there will be no issue.
When Master is updated first, It will send START_ENCRYPTION_EVENT to
slave , But slave will ignore this event in queue_event.
read_statistics_for_tables_if_needed
Regression after 279a907, read_statistics_for_tables_if_needed() was
called after open_normal_and_derived_tables() failure.
Fixed by moving read_statistics_for_tables() call to a branch of
get_schema_stat_record() where result of open_normal_and_derived_tables()
is checked.
Removed THD::force_read_stats, added read_statistics_for_tables() instead.
Simplified away statistics_for_command_is_needed().
For release builds, do not declare unused variables.
unpack_row(): Omit a debug-only variable from WSREP diagnostic message.
create_wsrep_THD(): Fix -Wmaybe-uninitialized for the PSI_thread_key.
Analysis:
========
In general if there are three groups.
1 - Inserts 32 which fails due to local entry '32' on slave.
2 - Inserts 33
3 - Inserts 34
Each group considers itself as a waiter and it waits for prior group 'waitee'.
This is done in 'register_wait_for_prior_event_group_commit'. If there is no
other parallel group being scheduled then no waitee will be there.
Let us assume 3 groups are being scheduled in parallel.
3-> waits for 2-> waits for->1
'1' upon completion it checks is there any registered subsequent waiter. If
so it wakes up the subsequent waiter with its execution status. This execution
status is stored in wakeup_error.
If '1' failed then it sends corresponding wakeup_error to 2. Then '2' aborts
and it propagates error to '3'. So all further commits are aborted. This
mechanism works only when all transactions reach a stage where they are
waiting for their prior commit to complete.
In case of optimistic following scenario occurs.
1,2,3 are scheduled in parallel.
3 - Reaches group_commit_code waits for 2 to complete.
1 - errors out sets stop_on_error_sub_id=1.
When a group execution results in error its corresponding sub_id is set to
'stop_on_error_sub_id'. Any new groups queued for execution will check if
their sub_id is > stop_on_error_sub_id. If it is true their execution will be
skipped as prior group execution failed. 'skip_event_group=1' will be set.
Since the execution of SQL thread is about to stop we just skip execution of
all the following event groups. We still do all the normal waiting and wakeup
processing between the event groups as a simple way to ensure that everything
is stopped and cleaned up correctly.
Upon error '1' transaction checks for registered waiters. Since no one is
there it simply goes away.
2 - Starts the execution. It checks do I have a waitee.
Since wait_commit_sub_id == entry->last_committed_sub_id no waitee is set.
Secondly: 'entry->stop_on_error_sub_id' is set by '1'st execution. Now
'handle_parallel_thread' code checks if the current group 'sub_id' is greater
than the 'sub_id' set within 'stop_on_error_sub_id'.
Since the above is true 'skip_event_group=true' is set. Simply call
'wait_for_prior_commit' to wakeup all waiters. Group '2' didn't had any
waitee and its execution is skipped. Hence its wakeup_error=0.It sends a
positive wakeup signal to '3'. Which commits. This results in a missed
transaction. i.e 33 is missed and 34 is committed.
Fix:
===
When a worker learns that an earlier transaction execution has failed, and it
should not proceed for further execution, it should mark its own execution
status as failed so that it alerts its followers to abort as well.
For CMAKE_BUILD_TYPE=Debug, the default MYSQL_MAINTAINER_MODE=AUTO
implies -Werror along with other flags in cmake/maintainer.cmake,
which would break the debug builds when CMAKE_CXX_FLAGS include -O2.
This fix includes a backport of 6dd3f24090
from MariaDB 10.3.
Also fixes:
MDEV-20560 Assertion `precision > 0' failed in decimal_bin_size upon SELECT with MOD short unsigned decimal
Changing the way how Item_func_mod calculates its max_length.
It now uses decimal_precision(), decimal_scale() and unsigned_flag
of its arguments, like all other Item_num_op descendants do.
In the function test_if_cheaper_ordering we make a decision if using an index is better than
using filesort for ordering. If we chose to do range access then in test_quick_select we
should make sure that cost for table scan is set to DBL_MAX so that it is not picked.