Problem:
========
If a primary is shutdown during an active semi-sync connection
during the period when the primary is awaiting an ACK, the primary
hard kills the active communication thread and does not ensure the
transaction was received by a replica. This can lead to an
inconsistent replication state.
Solution:
========
During shutdown, the primary should wait for an ACK or timeout
before hard killing a thread which is awaiting a communication. We
extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
any threads waiting for a semi-sync ACK in phase 1. Then, before
stopping the ack receiver thread, the shutdown is delayed until all
waiting semi-sync connections receive an ACK or time out. The
connections are then killed in phase 2.
Notes:
1) There remains an unresolved corner case that affects this
patch. MDEV-28141: Slave crashes with Packets out of order when
connecting to a shutting down master. Specifically, If a slave is
connecting to a master which is actively shutting down, the slave
can crash with a "Packets out of order" assertion error. To get
around this issue in the MTR tests, the primary will wait a small
amount of time before phase 1 killing threads to let the replicas
safely stop (if applicable).
2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
Thread Can Error on COM_QUIT
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
don't initialize error_log_handler_list in set_handlers()
* error_log_handler_list is initialized to LOG_FILE early, in init_base()
* set_handlers always reinitializes it to LOG_FILE, so it's pointless
* after init_base() concurrent threads start using sql_log_warning,
so following set_handlers() shouldn't modify error_log_handler_list
without some protection
extra2_read_len resolved by keeping the implementation
in sql/table.cc by exposed it for use by ha_partition.cc
Remove identical implementation in unireg.h
(ref: bfed2c7d57)
commit '6de482a6fefac0c21daf33ed465644151cdf879f'
10.3 no longer errors in truncate_notembedded.test
but per comments, a non-crash is all that we are after.
Fixed typo in my_malloc_size_cb_func. There is no max-thread-mem-used
sys variable in MariaDB, only max-session-mem-used. The relevant entry
in sys_vars.cc is also fixed.
Added a fallback case in case we could allocate the 256 bytes for the
error message containing the exact setting.
it's not printed, not cleaned up without perfschema,
so isn't supposed to be written into either
this fixes "Memory not freed" warnings when early command line
options produce warnings in non-perfschema builds
MySQL 5.5 in commit 177d8b0c12
introduced a configuration parameter innodb_force_load_corrupted
whose purpose was to allow a corrupted table to be dropped.
Given that MDEV-11412 in MariaDB 10.5.4 aims to allow any metadata
for a missing or corrupted table to be dropped, and given that
MDEV-17567 and MDEV-25506 and related tasks made DDL operations
crash-safe, the parameter no longer serves any purpose.
Because this obscure parameter was read-only (not settable by a client),
it seems that we can simply declare it with MARIADB_REMOVED_OPTION
(commit 1bc9cce702) without breaking
any upgrades.
DICT_ERR_IGNORE_INDEX: Replaces DICT_ERR_IGNORE_INDEX_ROOT and
DICT_ERR_IGNORE_CORRUPT, which were always set equally.
dict_load_indexes(): Report "No indexes found for table" in
a uniform way, and only when the DICT_ERR_IGNORE_INDEX flag is
not set.
If the clustered index is marked corrupted, and the operation
is DICT_ERR_IGNORE_DROP (we are about to drop the table), we will
load the metadata; else, we will return DB_INDEX_CORRUPT.
If SYS_INDEXES.PAGE is FIL_NULL, report an error or warning
unless we are about to drop the table.
dict_load_table_one(): Simplify the logic.
There is a server startup option --gdb a.k.a. --debug-gdb that requests
signals to be set for more convenient debugging. Most notably, SIGINT
(ctrl-c) will not be ignored, and you will be able to interrupt the
execution of the server while GDB is attached to it.
When we are debugging, the signal handlers that would normally display
a terse stack trace are useless.
When we are debugging with rr, the signal handlers may interfere with
a SIGKILL that could be sent to the process by the environment, and ruin
the rr replay trace, due to a Linux kernel bug
https://lkml.org/lkml/2021/10/31/311
To be able to diagnose bugs in kill+restart tests, we may really need
both a trace before the SIGKILL and a trace of the failure after a
subsequent server startup. So, we had better avoid hitting the problem
by simply not installing those signal handlers.
The crash happened because my_isalnum() does not support character
sets with mbminlen>1.
The value of "ft_boolean_syntax" is converted to utf8 in do_string_check().
So calling my_isalnum() is combination with "default_charset_info" was wrong.
Adding new parameters (size_t length, CHARSET_INFO *cs) to
ft_boolean_check_syntax_string() and passing self->charset(thd)
as the character set.
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This patch also fixes mutex locking order and unprotected
THD member accesses on bf aborting case. We try to hold
THD::LOCK_thd_data during bf aborting. Only case where it
is not possible is at wsrep_abort_transaction before
call wsrep_innobase_kill_one_trx where we take InnoDB
mutexes first and then THD::LOCK_thd_data.
This will also fix possible race condition during
close_connection and while wsrep is disconnecting
connections.
Added wsrep_bf_kill_debug test case
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
because the name was misleading, it counts not threads, but THDs,
and as THD_count is the only way to increment/decrement it, it
could as well be declared inside THD_count.
The issue was that we sent two different signals to different threads
after each other. The DEBUG_SYNC functionality cannot handle this (as
the signal is stored in a global variable) and the first one can get
lost.
Fixed by using the same signal for both threads.
This fixed the MySQL bug# 20338 about misuse of double underscore
prefix __WIN__, which was old MySQL's idea of identifying Windows
Replace it by _WIN32 standard symbol for targeting Windows OS
(both 32 and 64 bit)
Not that connect storage engine is not fixed in this patch (must be
fixed in "upstream" branch)
On shutdown, to kill remaining connections, do the same thing that
server does during KILL CONNECTION, i.e thd->awake().
The stripped-down KILL version, that was used prior to this patch
for shutdown, missed the engine specific part (ha_kill_query)