Currently, running mtr with an incorrect (for example, new or
obsolete) version of wsrep_provider (for example, with the 26
version of libgalera_smm.so) leads to the failure of tests in
several suites with vague error diagnostics.
As for the galera_3nodes suite, the mtr also does not effectively
check all the prerequisites after merge with MDEV-18426 fixes.
For example, tests that using mariabackup do not check for presence
of ss and socat/nc. This is due to improper handling of relative
paths in mtr scripts.
In addition, some tests in different suites can be run without
setting the environment variables such as MTR_GALERA_TFMT, XBSTREAM,
and so on.
To eliminate all these issues, this patch makes the following changes:
1. Added auxiliary wsrep_mtr_check utility (which located in the
mysql-test/lib/My/SafeProcess subdirectory), which compares the
versions of the wsrep API that used by the server and by the wsrep
provider library, and it does this comparison safely, without
accessing the API if the versions do not match.
2. All checks related to the presence of mariabackup and utilities
that necessary for its operation transferred from the local directories
of different mtr suites (from the suite.pm files) to the main suite.pm
file. This not only reduces the amount of code and eliminates duplication
of identical code fragments, but also avoids problems due to the inability
of mtr to consider relative paths to include files when checking skip
combinations.
3. Setting the values of auxiliary environment variables that
are necessary for Galera, SST scripts and mariabackup (to work
properly) is moved to the main mysql-test-run.pl script, so as
not to duplicate this code in different suites, and to avoid
partial corrections of the same errors for different suites
(while other suites remain uncorrected).
4. Fixed duplication of the have_file_key_management.inc and
have_filekeymanagement.inc files between different suites,
these checks are also transferred to the top level.
5. Added garbd presence check and garbd path variable.
https://jira.mariadb.org/browse/MDEV-18565
Problem was that tests select INFORMATION_SCHEMA.PROCESSLIST processes
from user system user and empty state. Thus, there is not clear
state for slave threads.
Changes:
- Added new status variables that store current amount of applier threads
(wsrep_applier_thread_count) and rollbacker threads
(wsrep_rollbacker_thread_count). This will make clear how many slave threads
of certain type there is.
- Added THD state "wsrep applier idle" when applier slave thread is
waiting for work. This makes finding slave/applier threads easier.
- Added force-restart option for mtr to always restart servers between tests
to avoid race on start of the test
- Added wait_condition_with_debug to wait until the passed statement returns
true, or the operation times out. If operation times out, the additional error
statement will be executed
Changes to be committed:
new file: mysql-test/include/force_restart.inc
new file: mysql-test/include/wait_condition_with_debug.inc
modified: mysql-test/mysql-test-run.pl
modified: mysql-test/suite/galera/disabled.def
modified: mysql-test/suite/galera/r/MW-336.result
modified: mysql-test/suite/galera/r/galera_kill_applier.result
modified: mysql-test/suite/galera/r/galera_var_slave_threads.result
new file: mysql-test/suite/galera/t/MW-336.cnf
modified: mysql-test/suite/galera/t/MW-336.test
modified: mysql-test/suite/galera/t/galera_kill_applier.test
modified: mysql-test/suite/galera/t/galera_parallel_autoinc_largetrx.test
modified: mysql-test/suite/galera/t/galera_parallel_autoinc_manytrx.test
modified: mysql-test/suite/galera/t/galera_var_slave_threads.test
modified: mysql-test/suite/wsrep/disabled.def
modified: mysql-test/suite/wsrep/r/variables.result
modified: mysql-test/suite/wsrep/t/variables.test
modified: sql/mysqld.cc
modified: sql/wsrep_mysqld.cc
modified: sql/wsrep_mysqld.h
modified: sql/wsrep_thd.cc
modified: sql/wsrep_var.cc
Problem:
========
There is a possibility that there can be more concurrent DMLs While the
alter table thread is waiting for upgrading to MDL_EXCLUSIVE before commit phase.
In commit phase, InnoDB acquires dict_operation_lock and it already holds MDL_EXCLUSIVE
on the table. After that, InnoDB applies the concurrent DML logs in commit phase.
This could lead to blocking of the following things:
1) DML on the particular table (due to MDL_EXCLUSIVE on the table)
2) InnoDB DDLs (due to dict_operation_lock)
3) Purge thread, stats thread, the master thread (due to dict_operation_lock)
Fix:
====
Apply the concurrent DML logs in commit phase but before acquiring
dict_operation_lock in commit phase. It makes sure that (2), (3) can't be
blocked for longer time.
Changes to be committed:
modified: suite/galera/r/galera_kill_ddl.result
modified: suite/galera/r/galera_sync_wait_show.result
modified: suite/galera/t/galera_kill_ddl.test
Basic idea of the patch: disallow creating tables which allow to create
rows which are too big to insert. In other words, if user created a table user
should never see an errors like 'can not insert row as it is too big for current
page size'.
SET innodb_strict_mode=OFF; will allow to create very long tables and only a
warning will be issued.
dict_table_t::get_overflow_field_local_len(): this function lets know a maximum
local field len for overflow fields for every file and row format.
innobase_check_column_length(): improve name to too_big_key_part_length()
and reuse in a different part of code.
create_table_info_t::prepare_create_table(): add check for maximum allowed
key part length to keep ALGORITHM=COPY behavior similar to ALGORITHM=INPLACE
behavior. Affected test is innodb.strict_mode
Rename dict_index_too_big_for_tree() to
dict_index_t::rec_potentially_too_big(): copy overflow-related size computation
from dtuple_convert_big_rec(). A lot of tests was changed because of that.
I wonder whether users will complain about it?
Test innodb.max_record_size tests dict_index_t::rec_potentially_too_big()
for different row formats and page sizes.
for passing ones.
Changes to be committed:
new file: mysql-test/std_data/galera-cert.pem
new file: mysql-test/std_data/galera-key.pem
new file: mysql-test/std_data/galera-upgrade-ca-cert.pem
new file: mysql-test/std_data/galera-upgrade-server-cert.pem
new file: mysql-test/std_data/galera-upgrade-server-key.pem
modified: mysql-test/suite/galera/disabled.def
modified: mysql-test/suite/galera/r/MW-416.result
modified: mysql-test/suite/galera/r/MW-44.result
modified: mysql-test/suite/galera/r/galera_sst_mysqldump_with_key,debug.rdiff
modified: mysql-test/suite/galera/r/galera_sst_mysqldump_with_key.result
modified: mysql-test/suite/galera/t/MW-416.test
modified: mysql-test/suite/galera/t/galera_kill_applier.test
- Ported mysql Bug#20597981 test case to mariadb-10.2
- InnoDB never used fts_doc_id_in_read_set. Basically it tells
innodb to read the fts_doc_id from the index record itself.
Problem:
=======
Executing test with following options will result in test failure.
./mtr rpl.kill_race_condition{,,,,,,,,,,} --repeat=10 --par 12 --mem
Fix:
====
Test simulates applier thread kill scenario while applying a row event. But it
doesn't wait for applier to catch the error stop.
Added :wait_for_slave_sql_error.inc to catch the error.
Test uses START SLAVE as a final step and doesn't wait for both threads to
start.
Added: start_slave.inc
The test allowed non-deterministic execution thanks to unresetable status
var of Slave_connections.
Fixed with expecting a correct value for Slaves_connected.
- Introduce a new variable called innodb_encrypt_temporary_tables which is
a boolean variable. It decides whether to encrypt the temporary tablespace.
- Encrypts the temporary tablespace based on full checksum format.
- Introduced a new counter to track encrypted and decrypted temporary
tablespace pages.
- Warnings issued if temporary table creation has conflict value with
innodb_encrypt_temporary_tables
- Added a new test case which reads and writes the pages from/to temporary
tablespace.
Added the condition in innochecksum tool to check page id mismatch.
This could catch the write corruption caused by InnoDB.
Added the debug insert inside fil_io() to check whether it writes
the page to wrong offset.
Since the purpose of event is just to see on second node whether it is
created or not And we are not goint to execute the event also. So instead
of setting GLOBAL event_scheduler=ON and then turning it off, we can just
disable the warning.
The problem was that the code in maria_extra assumed that there could be
only one table open when doing maria_extra(MA_FORCE_REOPEN)
However in the case of triggers, there can be multiple copies of
the table open.
Fixed by removing assert.
Also, move part of the test back to innodb.innodb_mysql
and another part to a new test innodb.purge.
Last but not least, merge the tests innodb_zip.4k and innodb_zip.8k
to innodb_zip.page_size.
The test cases for the MDEV found several independent bugs
in MariaDB server and Aria:
- If a temporary table was marked as crashed, it could never
be deleted.
- Opening of a crashed temporary table gave an error message
but the error was never forwarded to the caller which caused
an assert() in my_ok()
- init_read_record() did mmap of all temporary tables, which is
probably not a good idea as this area can potentially be
very big. Changed code to only mmap internal temporary tables.
- mmap-ed tables where not unmapped in case of repair/optimize
which caused bad data in table and crashes if the original
table files where replaced with new ones (as the old mmap
was still in place). Fixed by removing the mmap in case
of repair.
- Cleaned up usage of code that disabled mmap in Aria
There was two separate problems:
- Aria pagecache didn't properly handle re-reading of blocks
that have given errors before (this triggered an assert)
- temporary tables that where opened several times where
not properly closed in ALTER, REPAIR or OPTIMIZE table
Other things
- Added a couple of asserts that will make it easier to
find problems like this in the future.
Problem:
=========
One of the purge thread access the corrupted page and tries to remove from
LRU list. In the mean time, other purge threads are waiting for same page
in buf_wait_for_read(). Assertion(buf_fix_count == 0) fails for the
purge thread which tries to remove the page from LRU list.
Solution:
========
- Set the page id as FIL_NULL to indicate the page is corrupted before
removing the block from LRU list. Acquire hash lock for the particular
page id and wait for the other threads to release buf_fix_count
for the block.
- Added the error check for btr_cur_open() in row_search_on_row_ref().
Before killing the server, ensure that the incomplete state of
the transaction will be made durable and will be applied and
rolled back on recovery, so that each time, roughly the same
amount of work will be done.
Remove DML statements after the recovery, and execute
CHECK TABLE instead.
Remove the test, because it easily fails with a result difference.
Analysis by Thirunarayanan Balathandayuthapani:
By default, innodb_encrypt_tables=0.
1) Test case creates 100 tables in innodb_encrypt_1.
2) creates another 100 unencrypted tables (encryption=off) in innodb_encrypt_2
3) creates another 100 encrypted tables (encryption=on) in innodb_encrypt_3
4) enabling innodb_encrypt_tables=1 and checking that only
100 encrypted tables exist. (already we have 100 in dictionary)
5) opening all tables again (no idea why)
6) After that, set innodb_encrypt_tables=0 and wait for 100 tables
to be decrypted (already we have 100 unencrypted tables)
7) dropping all databases
Sporadic failure happens because after step 4, it could encrypt the
normal table too, because innodb_encryption_threads=4.
This test was added in MDEV-9931, which was about InnoDB startup being
slow due to all .ibd files being opened. There have been a number of
later fixes to this problem. Currently the latest one is
commit cad56fbaba, in which some tests
(in particular the test innodb.alter_kill) could fail if all InnoDB
.ibd files are read during startup. That could make this test redundant.
Let us remove the test, because it is big, slow, unreliable, and
does not seem to reliably catch the problem that all files are being
read on InnoDB startup.
Problem:
=======
fil_iterate() writes imported tablespace page0 as it is to discarded
tablespace. Space id wasn't even changed. While opening the tablespace,
tablespace fails with space id mismatch error.
Fix:
====
fil_iterate() copies the page0 with discarded space id to imported
tablespace.
fix MDEV-18750: failed to flashback large-size binlog file
fix mysqlbinlog flashback failure caused by reading io_cache without MY_FULL_IO flag
fix MDEV-18750: mysqlbinlog flashback failure on large binlog