NON-EXISTS RECORDS
Problem:
========
In RBR replication, master deletes a record but the record
don't exist on slave. when slave tries to apply the
Delete_row_log_event from master, it will result in an
assert on slave.
Analysis:
========
This problem exists not only with Delete_rows event but also
with Update_rows event as well. Trying to update a non
existing row on the slave from the master will cause the
same assert. This assert occurs only for the tables that
doesn't have primary keys and which basically require
sequential scan to be done to locate a record. This bug
occurs only with innodb engine not with myisam.
When update or delete rows is executed on a slave on a table
which doesn't have primary key the updated record is stored
in a buffer named table->record[0] and the same is copied to
table->record[1] so that during sequential scan
table->record[0] can reloaded with fetched data from the
table and compared against table->record[1]. In a special
case where there is no record on the slave side scan will
result in EOF in that case we reinit the scan and we try to
compare record[0] with record[1] which are basically the
same. This comparison is incorrect. Since they both are the
same record_compare() will report that record is found and
we try to go ahead and try to update/delete non existing
row. Ideally if the scan results in EOF means no data found
hence no need to do a record_compare() at all.
Fix:
===
Avoid comparision of records on EOF.
sql/log_event.cc:
Avoid record comparison on end of file.
sql/log_event_old.cc:
Avoid record comparison on end of file.
Some variables weren't cleared properly so consequitive embedded server start/stop failed.
Cleanups added. Also mysql_client_test.c extended to test that (taken from Mattias Johnson's patch)
When plugin=mysql_native_password (or mysql_old_password) take the password
from *either* password *or* authentication_string, whichever is set.
This makes no sense, but alas, that's what MySQL-5.6 does.
replication causing replication to fail.
Remove the temporary fix for MDEV-5914, which used READ COMMITTED for parallel
replication worker threads. Replace it with a better, more selective solution.
The issue is with certain edge cases of InnoDB gap locks, for example between
INSERT and ranged DELETE. It is possible for the gap lock set by the DELETE to
block the INSERT, if the DELETE runs first, while the record lock set by
INSERT does not block the DELETE, if the INSERT runs first. This can cause a
conflict between the two in parallel replication on the slave even though they
ran without conflicts on the master.
With this patch, InnoDB will ask the server layer about the two involved
transactions before blocking on a gap lock. If the server layer tells InnoDB
that the transactions are already fixed wrt. commit order, as they are in
parallel replication, InnoDB will ignore the gap lock and allow the two
transactions to proceed in parallel, avoiding the conflict.
Improve the fix for MDEV-6020. When InnoDB itself detects a deadlock, it now
asks the server layer for any preferences about which transaction to roll
back. In case of parallel replication with two transactions T1 and T2 fixed to
commit T1 before T2, the server layer will ask InnoDB to roll back T2 as the
deadlock victim, not T1. This helps in some cases to avoid excessive deadlock
rollback, as T2 will in any case need to wait for T1 to complete before it can
itself commit.
Also some misc. fixes found during development and testing:
- Remove thd_rpl_is_parallel(), it is not used or needed.
- Use KILL_CONNECTION instead of KILL_QUERY when a parallel replication
worker thread is killed to resolve a deadlock with fixed commit
ordering. There are some cases, eg. in sql/sql_parse.cc, where a KILL_QUERY
can be ignored if the query otherwise completed successfully, and this
could cause the deadlock kill to be lost, so that the deadlock was not
correctly resolved.
- Fix random test failure due to missing wait_for_binlog_checkpoint.inc.
- Make sure that deadlock or other temporary errors during parallel
replication are not printed to the the error log; there were some places
around the replication code with extra error logging. These conditions can
occur occasionally and are handled automatically without breaking
replication, so they should not pollute the error log.
- Fix handling of rgi->gtid_sub_id. We need to be able to access this also at
the end of a transaction, to be able to detect and resolve deadlocks due to
commit ordering. But this value was also used as a flag to mark whether
record_gtid() had been called, by being set to zero, losing the value. Now,
introduce a separate flag rgi->gtid_pending, so rgi->gtid_sub_id remains
valid for the entire duration of the transaction.
- Fix one place where the code to handle ignored errors called reset_killed()
unconditionally, even if no error was caught that should be ignored. This
could cause loss of a deadlock kill signal, breaking deadlock detection and
resolution.
- Fix a couple of missing mysql_reset_thd_for_next_command(). This could
cause a prior error condition to remain for the next event executed,
causing assertions about errors already being set and possibly giving
incorrect error handling for following event executions.
- Fix code that cleared thd->rgi_slave in the parallel replication worker
threads after each event execution; this caused the deadlock detection and
handling code to not be able to correctly process the associated
transactions as belonging to replication worker threads.
- Remove useless error code in slave_background_kill_request().
- Fix bug where wfc->wakeup_error was not cleared at
wait_for_commit::unregister_wait_for_prior_commit(). This could cause the
error condition to wrongly propagate to a later wait_for_prior_commit(),
causing spurious ER_PRIOR_COMMIT_FAILED errors.
- Do not put the binlog background thread into the processlist. It causes
too many result differences in mtr, but also it probably is not useful
for users to pollute the process list with a system thread that does not
really perform any user-visible tasks...
SLOW/CRASHES SEMAPHORE
Problem:
There are 2 lakh tables - fk_000001, fk_000002 ... fk_200000. All of them
are related to the same parent_table through a foreign key constraint.
When the parent_table is loaded into the dictionary cache, all the child table
will also be loaded. This is taking lot of time. Since this operation happens
when the dictionary latch is taken, the scenario leads to "long semaphore wait"
situation and the server gets killed.
Analysis:
A simple performance analysis showed that the slowness is because of the
dict_foreign_find() function. It does a linear search on two linked list
table->foreign_list and table->referenced_list, looking for a particular
foreign key object based on foreign->id as the key. This is called two
times for each foreign key object.
Solution:
Introduce a rb tree in table->foreign_rbt and table->referenced_rbt, which
are some sort of index on table->foreign_list and table->referenced_list
respectively, using foreign->id as the key. These rbt structures will be
solely used by dict_foreign_find().
rb#5599 approved by Vasil
The method JOIN_CACHE::init may fail (return 1) if some conditions on the
used join buffer is not satisfied. For example it fails if join_buffer_size
is greater than join_buffer_space_limit. The conditions should be checked
when running the EXPLAIN command for the query. That's why the method
JOIN_CACHE::init has to be called for EXPLAIN commands as well.
Part#1.
table_cond_selectivity() should discount selectivity of table'
conditions only when ity counts that selectivity to begin with.
For non-ref-based access methods (ALL/range/index_merge/etc),
we start with sel=1.0 and hence do not need to discount any
selectivities.
- Key_value_records_iterator::get_next() should pass pointer to the key
to handler->ha_index_next_same(). Because of a typo bug, pointer-to-pointer
was passed instead in certain cases.
- When range optimizer cannot the lookup value into [VAR]CHAR(n) column,
it should produce:
= "Impossible range" for equality
= "no range" for non-equalities.