Remove the exception that InnoDB does not report auto-increment locks waits
to the parallel replication.
There was an assumption that these waits could not cause conflicts with
in-order parallel replication and thus need not be reported. However, this
assumption is wrong and it is possible to get conflicts that lead to hangs
for the duration of --innodb-lock-wait-timeout. This can be seen with three
transactions:
1. T1 is waiting for T3 on an autoinc lock
2. T2 is waiting for T1 to commit
3. T3 is waiting on a normal row lock held by T2
Here, T3 needs to be deadlock killed on the wait by T1.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.
Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:
MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master
Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Field_varstring::get_copy_func() did not take into account
that functions do_varstring1[_mb], do_varstring2[_mb] do not support
compressed data.
Changing the return value of Field_varstring::get_copy_func()
to `do_field_string` if there is a compresion and truncation
at the same time. This fixes the problem, so now it works as follows:
- val_str() uncompresses the data
- The prefix is then calculated on the uncompressed data
Additionally, introducing two new copying functions
- do_varstring1_no_truncation()
- do_varstring2_no_truncation()
Using new copying functions in cases when:
- a Field_varstring with length_bytes==1 is changing to a longer
Field_varstring with length_bytes==1
- a Field_varstring with length_bytes==2 is changing to a longer
Field_varstring with length_bytes==2
In these cases we don't care neither of compression nor
of multi-byte prefixes: the entire data gets fully copied
from the source column to the target column as is.
This is a kind of new optimization, but this also was needed
to preserve existing MTR test results.
This is also related to
MDEV-31348 Assertion `last_key_entry >= end_pos' failed in virtual bool
JOIN_CACHE_HASHED::put_record()
Valgrind exposed a problem with the join_cache for hash joins:
=25636== Conditional jump or move depends on uninitialised value(s)
==25636== at 0xA8FF4E: JOIN_CACHE_HASHED::init_hash_table()
(sql_join_cache.cc:2901)
The reason for this was that avg_record_length contained a random value
if one had used SET optimizer_switch='optimize_join_buffer_size=off'.
This causes either 'random size' memory to be allocated (up to
join_buffer_size) which can increase memory usage or, if avg_record_length
is less than the row size, memory overwrites in thd->mem_root, which is
bad.
Fixed by setting avg_record_length in JOIN_CACHE_HASHED::init()
before it's used.
There is no test case for MDEV-31893 as valgrind of join_cache_notasan
checks that.
I added a test case for MDEV-31348.
The test case accessed slave-relay-bin.000003 without waiting for the IO
thread to write it first. If the IO thread was slow, this could fail.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Revert the old work-around for buggy fdatasync() on Linux ext3. This bug was
fixed in Linux > 10 years ago back to kernel version at least 3.0.
Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This is also related to
MDEV-31348 Assertion `last_key_entry >= end_pos' failed in virtual bool
JOIN_CACHE_HASHED::put_record()
Valgrind exposed a problem with the join_cache for hash joins:
=25636== Conditional jump or move depends on uninitialised value(s)
==25636== at 0xA8FF4E: JOIN_CACHE_HASHED::init_hash_table()
(sql_join_cache.cc:2901)
The reason for this was that avg_record_length contained a random value
if one had used SET optimizer_switch='optimize_join_buffer_size=off'.
This causes either 'random size' memory to be allocated (up to
join_buffer_size) which can increase memory usage or, if avg_record_length
is less than the row size, memory overwrites in thd->mem_root, which is
bad.
Fixed by setting avg_record_length in JOIN_CACHE_HASHED::init()
before it's used.
There is no test case for MDEV-31893 as valgrind of join_cache_notasan
checks that.
I added a test case for MDEV-31348.
Remove redundant delete_explain_query() calls in
sp_instr_set::exec_core(), sp_instr_set_row_field::exec_core(),
sp_instr_set_row_field_by_name::exec_core().
These calls are made before the SP instruction's tables are
"closed" by close_thread_tables() call.
When we call close_thread_tables() after that, we no longer
can collect engine's counter variables, as they use the data
structures that are located in the Explain Data Structures.
Also, these delete_explain_query() calls are redundant, as
sp_lex_keeper::reset_lex_and_exec_core() has another
delete_explain_query() call, which is located in the right
location after the close_thread_tables() call.
Problem was incorrect condition when node should have
resumed and resync at backup_end. Simplified condition
to fix the problem and added missing test case for
this wsrep_mode = BF_ABORT_MARIABACKUP.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
- InnoDB aborts when table is dropping the column. This is
caused by 5f09b53bdb (MDEV-31086).
While iterating the altered table fields, we fail to consider
the dropped columns.
There was two related problems:
(1) Galera node that is defined as a slave to async MariaDB
master at restart might do SST (state stransfer) and
part of that it will copy mysql.gtid_slave_pos table.
Problem is that updates on that table are not replicated
on a cluster. Therefore, table from donor that is not
slave is copied and joiner looses gtid position it was
and start executing events from wrong position of the binlog.
This incorrect position could break replication and
causes node to be dropped and requiring user action.
(2) Slave sql thread might start executing events before
galera is ready (wsrep_ready=ON) and that could also
cause node to be dropped from the cluster.
In this fix we enable replication of mysql.gtid_slave_pos
table on a cluster. In this way all nodes in a cluster
will know gtid slave position and even after SST joiner
knows correct gtid position to start.
Furthermore, we wait galera to be ready before slave
sql thread executes any events to prevent too early
execution.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
when validating vcol's (default, check, etc) in ALTER TABLE
vcol_info->flags are modified in place. This means that if ALTER TABLE
fails for any reason we need to restore them to their original values.
(mroonga was freeing the memory on ::reset() but not on ::close())
After MDEV-21580 the truncation of SORT_FIELD::length
set_if_smaller(sortorder->length, thd->variables.max_sort_length)
became conditional:
if (is_variable_sized())
set_if_smaller(length, thd->variables.max_sort_length)
To provide correct functioning of is_variable_sized() SORT_FIELD::type
must be set properly. This commit adds the necessary initialization
of SORT_FIELD::type to JOIN_TAB::remove_duplicates() as it is done
in filesort's sortlength() function.
DBUG_ASSERT is added to sortlength() just in case to prevent
a possible uint32 overflow
We used to run `systemctl set-environment` to pass
_WSREP_START_POSITION. This is bad because:
* it clutter systemd's environment (yes, pid 1)
* it requires root privileges
* options (like LimitNOFILE=) are not applied
Let's just create an environment file in ExecStartPre=, that is read
before ExecStart= kicks in. We have _WSREP_START_POSITION around for the
main process without any downsides.
make TRANSACTIONAL table option behave similar to other engine-defined
table options. If the engine doesn't suport it:
* if specified expicitly in CREATE or ALTER - it's ER_UNKNOWN_OPTION
* an error or a warning depending on sql_mode IGNORE_BAD_TABLE_OPTIONS
* in ALTER TABLE from the engine that suppors it to the engine that
doesn't - silently preserved (no warning)
* it is commented out in SHOW CREATE unless IGNORE_BAD_TABLE_OPTIONS
* invoke check_expression() for all vcol_info's in
mysql_prepare_create_table() to check for FK CASCADE
* also check for SET NULL and SET DEFAULT
* to check against existing FKs when a vcol is added in ALTER TABLE,
old FKs must be added to the new_key_list just like other indexes are
* check columns recursively, if vcol1 references vcol2,
flags of vcol2 must be taken into account
* remove check_table_name_processor(), put that logic under
check_vcol_func_processor() to avoid walking the tree twice
mark old keys in the ALTER TABLE with the `old` flag, not with
the `key_create_info.check_for_duplicate_indexes`.
This allows to mark old foreign keys too.
HA_UNIQUE_CHECK was
* only used internally by MyISAM/Aria
* only used for internal temporary tables (for DISTINCT)
* never saved in frm
* saved in MYI/MAD but only for temporary tables
* only set, never checked
it's safe to remove it and free the bit (there are only 16 of them)
Do not attempt to produce "r_engine_stats" on the temporary (=work) tables.
These tables may be
- re-created during the query execution
- freed during the query execution (This is done e.g. in JOIN::cleanup(),
before we produce ANALYZE FORMAT=JSON output).
- (Also, make save_explain_data() functions not set handler_for_stats
to point to handler objects that do not have handler->handler_stats set.
If the storage engine is not collecting handler_stats, it will not have
them when we're producing ANALYZE FORMAT=JSON output, either).