These tests use search_pattern_in_file.inc to search the error log for
expected output. However, search_pattern_in_file.inc by default searched only
the first 50000 bytes, so if the error log grew too big the tests would fail.
This patch extends search_pattern_in_file.inc with an option to specify how
much of the file to search, and whether to search from the start of the file
or from the end. Then the rpl.rpl_checksum and rpl.rpl_gtid_errorlog test
cases are fixed to search the last 50000 bytes of the error log, which will
work no matter how large prior tests have made it.
The direct cause of the assertion was missing error handling in
record_gtid(). If ha_commit_trans() fails for the statement commit, there was
missing code to catch the error and do ha_rollback_trans() in this case; this
caused close_thread_tables() to assert.
Normally, this error case is not hit, but in this case it was triggered due to
another bug: When a transaction T1 fails during parallel replication, the code
would signal following transactions that they could start to run without
properly marking the error condition. This caused subsequent transactions to
incorrectly start replicating, only to get an error later during their own
commit step. This was particularly serious if the subsequent transactions were
DDL or MyISAM updates, which cannot be rolled back and would leave replication
in an inconsistent state.
Fixed by 1) in case of error, only signal following transactions to continue
once the error has been properly marked and those transactions will know not
to start; and 2) implement proper error handling in record_gtid() in the case
that statement commit fails.
If replication breaks in GTID mode, it is not trivial to determine the GTID of
the failing event group. This is a problem, as such GTID is needed eg. to
explicitly set @@gtid_slave_pos to skip to after that event group, or to
compare errors on different servers, etc.
Fix by ensuring that relevant slave errors logged to the error log include the
GTID of the event group containing the problem event.
This is MySQL Bug#59123. The message string stored in an INCIDENT event was
not zero-terminated. This caused any following checksum bytes (if enabled on
the master) to be output to the error log as trailing garbage when the message
was printed to the error log.
Backport the patch from MySQL 5.6:
revno: 2876.228.200
revision-id: zhenxing.he@sun.com-20110111051323-w2xnzvcjn46x6h6u
committer: He Zhenxing <zhenxing.he@sun.com>
timestamp: Tue 2011-01-11 13:13:23 +0800
message:
BUG#59123 rpl_stm_binlog_max_cache_size fails sporadically with found warnings
Also add a test case.
Bug#16415173 CRLF INSTEAD OF LF IN SQL-BENCH SCRIPTS
Correct perms and converts from Windows style to UNIX style line endings on some files.
Fix perms on installed ini files.
(MySQL 5.5 version)
- "ANALYZE $stmt" should discard select's output, but it should still
evaluate the output columns (otherwise, subqueries in select list
are not executed)
- SHOW EXPLAIN's code practice of calling JOIN::save_explain_data()
after JOIN::exec() is disastrous for ANALYZE, because it resets
all counters after the first execution. It is stopped
= "Late" test_if_skip_sort_order() calls explicitly update their part
of the query plan.
= Also, I had to rewrite I_S optimization to actually have optimization
and execution stages.
MySQL 5.6 implemented WL#344, which is about a MASTER_DELAY option to CHANGE
MASTER. But as part of this worklog, the format of the realy-log.info file was
changed. The new format is not understood by earlier versions, and nor by
MariaDB 10.0, so changing server to those versions would cause the slave to
abort with an error due to reading incorrect data out of relay-log.info.
Fix this by backporting from the WL#344 patch just the code that understands
the new relay-log.info format. We still write out the old format, and none of
the MASTER_DELAY feature is backported with this commit.
CORRUPTS FRM
Analysis:
---------
ALTER TABLE on a partitioned table resulted in the wrong
engine being written into the table's FRM file and displayed
in SHOW CREATE TABLE.
The prep_alter_part_table() modifies the partition_info object
for TABLE instance representing the old version of table.
If the ALTER TABLE ENGINE statement fails, the partition_info
object for the TABLE contains the altered storage engine name.
The SHOW CREATE TABLE uses the TABLE object to display the table
information, hence displays incorrect storage engine for the table.
Also a subsequent successful ALTER TABLE operation will write the
incorrect engine information into the FRM file.
Fix:
---
A copy of the partition_info object is created before modification so
that any changes would not cause the the original partition_info object
to be modified if the ALTER TABLE fails.(Backported part of the code
provided as fix for bug#14156617 in mysql-5.6.6).
* enum values to index different ACL tables, instead of hard-coded numbers
(even different in diffent functions).
* move TABLE_LIST initialization into open_grant_tables()
and use it everywhere
* change few my_bool's to bool's
* Don't write frm for tmp tables
* pass frm image down to open_table_uncached, when possible
* don't use truncate-by-recreate for temp tables - cannot recreate
without frm, and delete_all_rows is faster anyway
Auto-generate the allowed list of values for enum/set/flagset options
in --help output. But don't do that when the help text already has them.
Also, remove lists of values from help strings of various options, where
they were simply listed without any additional information.
it checked that the default is lz4. Which only worked on systems that
had lz4 and did not have lzo. Now it checks for the default to be zlib,
which works on systems that has neither lz4 or lzo. Like our package
builders in buildbot. This is intentional, we don't want introduce
additional dependencies (lz4, lzo) for our packages just yet.
This can (and will) be reconsidered, and this test can (and will)
be updated again.
The INCIDENT_EVENT always caused slave error and abort, without checking
--slave-skip-errors.
Now, if error 1590, ER_SLAVE_INCIDENT is included in the --slave-skip-errors
list, incident events will be ignored.
This is a merge of this MySQL 5.6 patch:
revision-id: frazer@mysql.com-20110314170916-ypgin17otj3ucx95
committer: Frazer Clement <frazer@mysql.com>
timestamp: Mon 2011-03-14 17:09:16 +0000
message:
Bug#11799671 NOT POSSIBLE TO SKIP INCIDENT ERRORS
DISCONNECT CON1 AND CON2
Problem:
The test suite/binlog/t/binlog_killed.test makes the connections
con1 and con2 but forgets to disconnect them + wait till that
operation is finished at test end.
This mistake has the potential to harm subsequent tests in
case these tests depend on the content of the processlist.
Solution:
Added disconnect + wait_until_disconnected.inc
within the test cleanup.
- Filesort has an optmization where it reads only columns that are
needed before the sorting is done.
- When ref(_or_null) is picked by the join optimizer, it may remove parts
of WHERE clause that are guaranteed to be true.
- However, if we use quick select, we must put all of the range columns into the
read set. Not doing so will may cause us to fail to detect the end of the range.
That particular part of slave connect to master was missing code to handle
retry in case of network errors. The same problem is present in MySQL 5.5, but
fixed in MySQL 5.6.
Fixed with this patch, by adding the code (mostly identical to MySQL 5.6), and
also adding a test case.
I checked other queries done towards master during slave connect, and they now
all seem to handle reconnect in case of network failures.
Adapt default_tmp_storage_engine implementation from mysql-5.6
New feature (as compared to 5.6), default_tmp_storage_engine=NULL
means that temporary tables will use default_storage_engine value.
This makes the behavior backward compatible.
When plugin=mysql_native_password (or mysql_old_password) take the password
from *either* password *or* authentication_string, whichever is set.
This makes no sense, but alas, that's what MySQL-5.6 does.
replication causing replication to fail.
Remove the temporary fix for MDEV-5914, which used READ COMMITTED for parallel
replication worker threads. Replace it with a better, more selective solution.
The issue is with certain edge cases of InnoDB gap locks, for example between
INSERT and ranged DELETE. It is possible for the gap lock set by the DELETE to
block the INSERT, if the DELETE runs first, while the record lock set by
INSERT does not block the DELETE, if the INSERT runs first. This can cause a
conflict between the two in parallel replication on the slave even though they
ran without conflicts on the master.
With this patch, InnoDB will ask the server layer about the two involved
transactions before blocking on a gap lock. If the server layer tells InnoDB
that the transactions are already fixed wrt. commit order, as they are in
parallel replication, InnoDB will ignore the gap lock and allow the two
transactions to proceed in parallel, avoiding the conflict.
Improve the fix for MDEV-6020. When InnoDB itself detects a deadlock, it now
asks the server layer for any preferences about which transaction to roll
back. In case of parallel replication with two transactions T1 and T2 fixed to
commit T1 before T2, the server layer will ask InnoDB to roll back T2 as the
deadlock victim, not T1. This helps in some cases to avoid excessive deadlock
rollback, as T2 will in any case need to wait for T1 to complete before it can
itself commit.
Also some misc. fixes found during development and testing:
- Remove thd_rpl_is_parallel(), it is not used or needed.
- Use KILL_CONNECTION instead of KILL_QUERY when a parallel replication
worker thread is killed to resolve a deadlock with fixed commit
ordering. There are some cases, eg. in sql/sql_parse.cc, where a KILL_QUERY
can be ignored if the query otherwise completed successfully, and this
could cause the deadlock kill to be lost, so that the deadlock was not
correctly resolved.
- Fix random test failure due to missing wait_for_binlog_checkpoint.inc.
- Make sure that deadlock or other temporary errors during parallel
replication are not printed to the the error log; there were some places
around the replication code with extra error logging. These conditions can
occur occasionally and are handled automatically without breaking
replication, so they should not pollute the error log.
- Fix handling of rgi->gtid_sub_id. We need to be able to access this also at
the end of a transaction, to be able to detect and resolve deadlocks due to
commit ordering. But this value was also used as a flag to mark whether
record_gtid() had been called, by being set to zero, losing the value. Now,
introduce a separate flag rgi->gtid_pending, so rgi->gtid_sub_id remains
valid for the entire duration of the transaction.
- Fix one place where the code to handle ignored errors called reset_killed()
unconditionally, even if no error was caught that should be ignored. This
could cause loss of a deadlock kill signal, breaking deadlock detection and
resolution.
- Fix a couple of missing mysql_reset_thd_for_next_command(). This could
cause a prior error condition to remain for the next event executed,
causing assertions about errors already being set and possibly giving
incorrect error handling for following event executions.
- Fix code that cleared thd->rgi_slave in the parallel replication worker
threads after each event execution; this caused the deadlock detection and
handling code to not be able to correctly process the associated
transactions as belonging to replication worker threads.
- Remove useless error code in slave_background_kill_request().
- Fix bug where wfc->wakeup_error was not cleared at
wait_for_commit::unregister_wait_for_prior_commit(). This could cause the
error condition to wrongly propagate to a later wait_for_prior_commit(),
causing spurious ER_PRIOR_COMMIT_FAILED errors.
- Do not put the binlog background thread into the processlist. It causes
too many result differences in mtr, but also it probably is not useful
for users to pollute the process list with a system thread that does not
really perform any user-visible tasks...
SLOW/CRASHES SEMAPHORE
Problem:
There are 2 lakh tables - fk_000001, fk_000002 ... fk_200000. All of them
are related to the same parent_table through a foreign key constraint.
When the parent_table is loaded into the dictionary cache, all the child table
will also be loaded. This is taking lot of time. Since this operation happens
when the dictionary latch is taken, the scenario leads to "long semaphore wait"
situation and the server gets killed.
Analysis:
A simple performance analysis showed that the slowness is because of the
dict_foreign_find() function. It does a linear search on two linked list
table->foreign_list and table->referenced_list, looking for a particular
foreign key object based on foreign->id as the key. This is called two
times for each foreign key object.
Solution:
Introduce a rb tree in table->foreign_rbt and table->referenced_rbt, which
are some sort of index on table->foreign_list and table->referenced_list
respectively, using foreign->id as the key. These rbt structures will be
solely used by dict_foreign_find().
rb#5599 approved by Vasil
The method JOIN_CACHE::init may fail (return 1) if some conditions on the
used join buffer is not satisfied. For example it fails if join_buffer_size
is greater than join_buffer_space_limit. The conditions should be checked
when running the EXPLAIN command for the query. That's why the method
JOIN_CACHE::init has to be called for EXPLAIN commands as well.
Part#1.
table_cond_selectivity() should discount selectivity of table'
conditions only when ity counts that selectivity to begin with.
For non-ref-based access methods (ALL/range/index_merge/etc),
we start with sel=1.0 and hence do not need to discount any
selectivities.