int wsrep_thd_append_key(THD*, const wsrep_key*, int, Wsrep_service_key_type)
CREATE TABLE [SELECT|REPLACE SELECT] is CTAS and idea was that
we force ROW format. However, it was not correctly enforced
and keys were appended before wsrep transaction was started.
At THD::decide_logging_format we should force used stmt binlog
format to ROW in CTAS case and produce a warning if used
binlog format was not ROW.
At ha_innodb::update_row we should not append keys similarly
as in ha_innodb::write_row if sql_command is SQLCOM_CREATE_TABLE.
Improved error logging on ::write_row, ::update_row and ::delete_row
if wsrep key append fails.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This commit checks the validity of value change of wsrep_sst_method variable.
The validity check is same as happens in donor node when incoming SST request
is parsed.
The commit has also a mtr test: wsrep.wsrep_variables_sst_method which verifies
that wsrep_sst_method can be succesfully changed to acceptable values and that
the SET command results in error if invalid value was entered.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
* mariadb-service-convert to use mariadbd-safe
* galera_recovery to use mariadbd
* mtr - wsrep use mariadb executables
* debian/mariadb-server.mariadb.init use mariadbd-safe
* debian/tests/smoke uses mariadb instead of mysql as client.
Co-Author: Daniel Black <daniel@mariadb.org>
The --skip-write-binlog message was confusing that it only had
an effect if the galera was enabled. There are uses beyond galera
so we apply SET SESSION SQL_LOG_BIN=0 as implied by the option
without being conditional on the wsrep status.
Remove wsrep.mysql_tzinfo_to_sql_symlink{,_skip} tests as they offered
no additional coverage beyond main.mysql_tzinfo_to_sql_symlink as no
server testing was done.
Introduced a variant of the galera.mariadb_tzinfo_to_sql as
galera.mysql_tzinfo_to_sql, which does testing using the mysql client
rather than directly importing into the server via mysqltest.
Update man page and mysql_tzinfo_to_sql to having a --skip-write-binlog
option.
merge notes:
10.4:
- conflicts in tztime.cc can revert to this version of --help text.
- tztime.cc - merge execute immediate @prep1, and leave %s%s trunc_tables, lock_tables
after that.
10.6:
- Need to remove the not_embedded.inc in mysql_tzinfo_to_sql.test and
replace it with no_protocol.inc
- leave both mysql_tzinfo_to_sql.test and mariadb_tzinfo_to_sql.sql
tests.
- sql/tztime.cc - keep entirely 10.6 version.
The --skip-write-binlog message was confusing that it only had
an effect if the galera was enabled. There are uses beyond galera
so we apply SET SESSION SQL_LOG_BIN=0 as implied by the option
without being conditional on the wsrep status.
We also with --skip-write-binlog actually check the session @@WSREP_ON
variable rather than the global server variable.
Since 10.6, the wsrep_mode could replicate Aria and MyISAM, in which
case no change to innodb and back is needed.
By removing the conditions, we can use LOCK TABLES in a general case
improving the load speed of Aria (MDEV-23326), regardless of the
skip-write-binlog flag. The only case where we don't use LOCK TABLES is
when we are replicating via Innodb, because wsrep_on=1 and wsrep_mode
doesn't contain REPLICATE_ARIA{,MYISAM}. This uses an Innodb transaction
instead. When replicating via InnoDB we change the table engine type
back to what it was originally.
By removing the \d and other syntax that requires parsing by
the mariadb client, we can use the generated SQL more generally, like
in the embedded server.
We also save and restore the SQL_LOG_BIN and WSREP_ON session server
variables so this can be included in the same session as other data
without taking into changes in state.
Remove wsrep.mysql_tzinfo_to_sql_symlink{,_skip} tests as they offered
no additional coverage beyond main.mysql_tzinfo_to_sql_symlink (no
server testing was done).
Add galera.mariadb_tzinfo_to_sql to actually test the replication
of tzinfo data through galera.
The conditional executable comment around /*M!100602 ...
START TRANSACTION .. LOCK TABLES.. */ is so that we can provide tzinfo
files (MDEV-27113, MDBF-389) and in the case that a user uses it on a
pre-10.6 server version it will still work. Both START TRANSACTION and
LOCK TABLES are not supported in prepared statements in MariaDB versions
earlier than 10.6.2.
Reviewed by Brandon Nesterenko
The --skip-write-binary-log added to mysql_tzinfo_to_sql in
MDEV-18778 was only effective if galera was enabled on the server.
This is because it tied together three concepts under one option:
1. binary logging
2. wsrep replication, and
3. using innodb as a transitional table type.
Change 1: small change in help option to reflect this.
To solve the performance problem with Aria tables, LOCK TABLES WRITE
is used to eliminate the need to fdatasync until the UNLOCK TABLES.
If galera isn't enabled, then we also want to use the LOCK TABLE WRITE
mechanism.
The START TRANSACTION added in MDEV-23440 needed to be moved to
before LOCK TABLES otherwise it would cancel their effect.
TRUNCATE TABLE statements also need to be before the LOCK TABLES.
When changing back from InnoDB to Aria, include the ORDER BY that
was originally there in 6aaccbcbf7 and matching the final ALTER
TABLE in the timezonedir branch.
Running: mariadb-tzinfo-to-sql --skip-write-binlog /usr/share/zoneinfo
now generates 16 Aria_transaction_log_syncs from 7053.
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This patch also fixes mutex locking order and unprotected
THD member accesses on bf aborting case. We try to hold
THD::LOCK_thd_data during bf aborting. Only case where it
is not possible is at wsrep_abort_transaction before
call wsrep_innobase_kill_one_trx where we take InnoDB
mutexes first and then THD::LOCK_thd_data.
This will also fix possible race condition during
close_connection and while wsrep is disconnecting
connections.
Added wsrep_bf_kill_debug test case
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
galera_var_wsrep_on_off : Add wait conditions to make sure DDL is
replicated before continuing.
wsrep.[variables|variables_debug] : Remove unnecessary parts
and add check to correct number of variables or skip
galera_ssl_reload: Add version check and SSL checks.
galera_var_wsrep_on_off : Add wait conditions to make sure DDL is
replicated before continuing.
wsrep.[variables|variables_debug] : Remove unnecessary parts
and add check to correct number of variables or skip
galera_ssl_reload: Add version check and SSL checks.
* Disallow setting wsrep_on = 1 if wsrep_provider is unset. Also, move
wsrep_on_basic from sys_vars to wsrep suite: this test now requires
to run with wsrep_provider set
* Disallow setting @@session.wsrep_on = 1 when @@global.wsrep_on = 0
* Handle the case where a new connection turns @@global.wsrep_on from
off to on. In this case we would miss a call to wsrep_open, causing
unexpected states in wsrep::client_state (causing assertions).
* Disable wsrep.MDEV-22443 because it is no longer possible to enable
wsrep_on, if server is started with wsrep_provider='none'
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
After the merging of MDEV-24915, 10.6 branch has regressions with handling of
concurrent write load against two or more cluster nodes. These regressions may
surface as cluster hanging, node crashes or data inconsistency. With some test
scenarios, the only visible symptom could be that the BF victim aborting happens
only by innodb lock wait timeout expiration. This would result only to poor
performance (by default 50 sec hang for each BF conflict), and could be somewhat
difficult to diagnose.
This pull request has following fixes to handle concurrent write load from
multiple nodes:
In lock_wait_wsrep_kill(), the victim trx was expected to be only in
TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen
that victim has advanced into pre commit state. This was fixed by choosing
victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states.
Victim transaction may be in several different states at the time of detected
lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim
may advance further before the actual BF aborting takes place. The BF aborting
in MDEV-24915 did not wake the victim, if it was in the state of waiting for
some other lock (than the one that was blocking the high priority thread).
This anomaly caused the innodb lock wait timeout expiration delays and poor
performance symptom. To fix this, lock_wait_wsrep_kill() now looks if
victim is in lock waiting state, and uses lock_cancel_waiting_and_release()
to cancel this lock wait.
wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib),
and starts a new transaction if there was no active transaction before.
Due to late BF aborting, the victim may have e.g. failed in certification
and is already aborting or has aborted at this stage. This has caused
problems in testing where BF aborter tries to BF abort himself.
The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting
or has aborted. Victim may not have started transaction yet in wsrep context,
but it may have acquired MDL locks (due to DDL execution), and this has
caused BF conflict. Such case does not require aborting in wsrep or
replication provider state.
BF aborting could cause BF-BF conflict scenario, if victim was already aborted
and changed to replayer having high priority as well. This BF-BF conflict
scenario is now avoided in lock_wait_wsrep() where we now check if blocking
lock holder is also high priority and is ordered before, caller should wait
for the lock in this situation.
The natural innodb deadlock resolving algorithm could pick BF thread as
deadlock victim. This is fixed by giving max weigh to BF threads in
Deadlock::report().
MDEV-24341 has changed excution paths in do_command() and this affects BF
aborted victim execution. This PR fixes one assert in do_command():
DBUG_ASSERT(!thd->async_state.pending_ops())
Which fired if the thd was BF aborted earlier. This assert is now changed
to allow pending_ops() if thd was BF aborted before.
With these fixes, long term highly conflicting write load could be run against
to node cluster. If binlogging is configured, log_slave_updates should be
also set.
Introduced two new wsrep_mode options
* REPLICATE_MYISAM
* REPLICATE_ARIA
Depracated wsrep_replicate_myisam parameter and we use
wsrep_mode = REPLICATE_MYISAM instead.
This required small refactoring of wsrep_check_mode_after_open_table
so that both MyISAM and Aria are handled on required DML cases.
Similarly, added Aria to wsrep_should_replicate_ddl to handle DDL
for Aria tables using TOI. Added test cases and improved MyISAM testing.
Changed use of wsrep_replicate_myisam to wsrep_mode = REPLICATE_MYISAM