Problem:
========
When attempting to delay a Slave attached with GTID, there appears to be an
extra delay applied initially. For example, this output reflects a Slave that is
already delayed by 43200 seconds. When switching to GTID replication,
replication is paused until SQL_Remaining_Delay counts down to 0:
CHANGE MASTER TO master_use_gtid=current_pos; CHANGE MASTER TO
MASTER_DELAY=43200;
Seconds_Behind_Master: 44847
Using_Gtid: Current_Pos
SQL_Delay: 43200
SQL_Remaining_Delay: 43089
Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master
executed event
Analysis:
=========
When slave initiates a GTID based connection request to master, the master sends
two GTID_LIST events. The first one is actual GTID_LIST event and the second
one is a fake GTID_LIST event. This is sent by master to provide its current
binlary log file position. The fake GTID_LIST events will have their ev->when=0.
'when' (the timestamp) is set to 0 so that slave could distinguish between real
and fake Rotate events.
On slave side when MASTER_DELAY is configured to "X" the applier will ensure
that there is a time delay of "X" seconds before the event is applied.
General behaviour of MASTER_DELAY example:-
Master
timestamp of event e1=10
timestamp of event e2=11
On slave MASTER_DELAY=5
Event e1 will be applied at = 15
e2 will be applied at =16
In bug scenario:-
On Master: With GTIDs
timestamp of event e1=10
timestamp of event e2=0
On Slave:
e1 will be applied at = 10 + 5 =15
For e2, since "e2->when=0" e2->when is set to current timestamp.
i.e since the e2->when and current timestamp on slave is the same applier waits
for additional master_delay=5 seconds. the ev->when contributes to
"rli->last_master_timestamp".
rli->last_master_timestamp= ev->when + (time_t) ev->exec_time;
Fake events should not update the "ev->when" to "current timestamp" on slave.
Fix:
===
Remove the assignment of current timestamp to "ev->when" when "ev->when=0".
This reverts commit 21b2fada7a
and commit 81d71ee6b2.
The MDEV-18464 change introduces a few data race issues. Contrary to
the documentation, the field trx_t::victim is not always being protected
by lock_sys_t::mutex and trx_t::mutex. Most importantly, it seems
that KILL QUERY could wrongly avoid acquiring both mutexes when
invoking lock_trx_handle_wait_low(), in case another thread had
already set trx->victim=true.
We also revert MDEV-12009, because it should depend on the MDEV-18464
fix being present.
To fix the crash there we need to make sure that the
server while storing the statistical values in statistical tables should do it
in a multi-byte safe way.
Also there is no need to throw warnings if there is truncation while storing
values from statistical fields.
As noted on kill_one_thread SUPER should be able to kill even
system threads i.e. threads/query flagged as high priority or
wsrep applier thread. Normal user, should not able to kill
threads/query flagged as high priority (BF) or wsrep applier
thread.
ignore FK-prelocked tables when looking for write-prelocked tables
with auto-increment to complain about "Statement is unsafe because
it invokes a trigger or a stored function that inserts into an
AUTO_INCREMENT column"
now we can afford it. Fix -Werror errors. Note:
* old gcc is bad at detecting uninit variables, disable it.
* time_t is int or long, cast it for printf's
select from I_S
Problem:
========
When applier thread tries to access 'variable_name' of
INFORMATION_SCHEMA.SESSION_VARIABLES table through triggers, it results in an
abnormal exit of slave server.
Analysis:
========
At the time of replication of stored routines and triggers, their associated
security context will be sent by the master. The applier thread on the slave
server will use this information to set the required security context for the
execution of stored routines and triggers. This is achieved as follows.
->The stored routine object has a member named 'm_security_ctx' which holds the
security context received from master.
->The applier thread's security_ctx is stored into a 'backup' object.
->Set the applier thread's security_ctx to 'm_security_ctx'.
->Upon the completion of stored routine execution restore the original security
context of applier thread from the backup.
During the above process the 'm_security_ctx' object is not initialized
properly. Hence the 'external_user' of 'm_security_ctx' has invalid value for
this variable and accessing this variable results in abnormal exit of server.
Fix:
===
Invoke the Security_context::init() call from the constructor of stored routine
so that 'm_security_ctx' gets initialized properly.
Item_cond::eval_not_null_tables(): Use Item::eval_const_cond(),
just like Item_cond::fix_fields().
This inconsistency was found while merging to 10.3, where the
Microsoft compiler is configured to report an error for comparing
longlong to bool.
Simulate slow statements only for COM_QUERY and COM_STMT_EXECUTE commands,
to exclude mysqld_stmt_prepare() and mysqld_stmt_close() entries from the log,
as they are not relevant for log_slow_debug.test. This simplifies the test.
Adding an intermediate volatile variable to avoid using co-processor registers
on some platforms (e.g. 32-bit x86).
This change makes test results stable accross all platforms.
This patch contains a fix for the MDEV-17262/17243 issues and
new mtr test.
These issues (MDEV-17262/17243) have two reasons:
1) After an intermediate commit, a transaction loses its status
of "transaction that registered in the MySQL for 2pc coordinator"
(in the InnoDB) due to the fact that since version 10.2 the
write_row() function (which located in the ha_innodb.cc) does
not call trx_register_for_2pc(m_prebuilt->trx) during the processing
of split transactions. It is necessary to restore this call inside
the write_row() when an intermediate commit was made (for a split
transaction).
Similarly, we need to set the flag of the started transaction
(m_prebuilt->sql_stat_start) after intermediate commit.
The table->file->extra(HA_EXTRA_FAKE_START_STMT) called from the
wsrep_load_data_split() function (which located in sql_load.cc)
will also do this, but it will be too late. As a result, the call
to the wsrep_append_keys() function from the InnoDB engine may be
lost or function may be called with invalid transaction identifier.
2) If a transaction with the LOAD DATA statement is divided into
logical mini-transactions (of the 10K rows) and binlog is rotated,
then in rare cases due to the wsrep handler re-registration at the
boundary of the split, the last portion of data may be lost. Since
splitting of the LOAD DATA into mini-transactions is technical,
I believe that we should not allow these mini-transactions to fall
into separate binlogs. Therefore, it is necessary to prohibit the
rotation of binlog in the middle of processing LOAD DATA statement.
https://jira.mariadb.org/browse/MDEV-17262 and
https://jira.mariadb.org/browse/MDEV-17243
In the function make_cond_for_table_from_pred a call of ix_fields()
missed checking of the return code. As a result an extracted constant
condition could be not well formed and this caused an assertion failure.
Includes:
MDEV-17302 Add support for ALTER USER command in prepared statement
and
MDEV-17673 main.cte_recursive fails in bb-10.4-ps branch in --ps
Set correct SELECT_LEX linkage for recursive CTEs.
Do not delegate this job to TABLE_LIST::set_as_with_table,
because it is only run on prepare, while With_element::move_anchors_ahead
is run both on prepare and execute (fix by Igor)
If an IN-subquery is used in a table-less select the current code
should never consider it as candidate for semi-join optimizations.
Yet the function check_and_do_in_subquery_rewrites() improperly
checked the property "to be a table-less select". As a result
such select in IN subquery was used in INSERT .. SELECT then
the IN subquery by mistake was registered as a semi-join subquery
and convert_subq_to_sj() was called for it. However the code of
this function does not assume that the parent select of the subquery
could be a table-less select.
There were two newly enabled warnings:
1. cast for a function pointers. Affected sql_analyse.h, mi_write.c
and ma_write.cc, mf_iocache-t.cc, mysqlbinlog.cc, encryption.cc, etc
2. memcpy/memset of nontrivial structures. Fixed as:
* the warning disabled for InnoDB
* TABLE, TABLE_SHARE, and TABLE_LIST got a new method reset() which
does the bzero(), which is safe for these classes, but any other
bzero() will still cause a warning
* Table_scope_and_contents_source_st uses `TABLE_LIST *` (trivial)
instead of `SQL_I_List<TABLE_LIST>` (not trivial) so it's safe to
bzero now.
* added casts in debug_sync.cc and sql_select.cc (for JOIN)
* move assignment method for MDL_request instead of memcpy()
* PARTIAL_INDEX_INTERSECT_INFO::init() instead of bzero()
* remove constructor from READ_RECORD() to make it trivial
* replace some memcpy() with c++ copy assignments
Problem was that we skipped background persistent statistics calculation
on applier nodes if thread is marked as high priority (a.k.a BF).
However, on applier nodes all DDL which is replicate will be executed
as high priority i.e BF.
Fixed by allowing background persistent statistics calculation on
applier nodes even when thread is marked as BF. This could lead
BF lock waits but for queries on that node needs that statistics.
Removed redundant plugin_thdvar_cleanup() from end_connection(): called by
THD::free_connection(), which always follows end_connection().
Saves at least one lock(LOCK_plugin) and one
rdlock(LOCK_system_variables_hash).
Benchmarked on a 2socket/20core/40threads Broadwell system using sysbench
connect brencmark @40 threads (with select 1 disabled).
10.2 shows moderate improvement: 136219.93 -> 137766.31 CPS.
10.3 is improvement is somewhat better: 93018.29 -> 101379.77 CPS.
Also backported MyRocks memory leak fix from 10.4, which turned out to
be unrelated.
The issue here was when we had a subquery and a window function in an expression in
the select list then subquery was getting computed after window function computation.
This resulted in incorrect results because the subquery was correlated and the fields
in the subquery was pointing to the base table instead of the temporary table.
The approach to fix this was to have an additional field in the temporary table
for the subquery and to execute the subquery before window function execution.
After execution the values for the subquery were stored in the temporary table
and then when we needed to calcuate the expression, all we do is read the values
from the temporary table for the subquery.