Commit graph

467 commits

Author SHA1 Message Date
Teemu Ollakka
6966d7fe4b MDEV-29293 MariaDB stuck on starting commit state
This is a backport from 10.5.

The problem seems to be a deadlock between KILL command execution
and BF abort issued by an applier, where:
* KILL has locked victim's LOCK_thd_kill and LOCK_thd_data.
* Applier has innodb side global lock mutex and victim trx mutex.
* KILL is calling innobase_kill_query, and is blocked by innodb
  global lock mutex.
* Applier is in wsrep_innobase_kill_one_trx and is blocked by
  victim's LOCK_thd_kill.

The fix in this commit removes the TOI replication of KILL command
and makes KILL execution less intrusive operation. Aborting the
victim happens now by using awake_no_mutex() and ha_abort_transaction().
If the KILL happens when the transaction is committing, the
KILL operation is postponed to happen after the statement
has completed in order to avoid KILL to interrupt commit
processing.

Notable changes in this commit:
* wsrep client connections's error state may remain sticky after
  client connection is closed. This error message will then pop
  up for the next client session issuing first SQL statement.
  This problem raised with test galera.galera_bf_kill.
  The fix is to reset wsrep client error state, before a THD is
  reused for next connetion.
* Release THD locks in wsrep_abort_transaction when locking
  innodb mutexes. This guarantees same locking order as with applier
  BF aborting.
* BF abort from MDL was changed to do BF abort on server/wsrep-lib
  side first, and only then do the BF abort on InnoDB side. This
  removes the need to call back from InnoDB for BF aborts which originate
  from MDL and simplifies the locking.
* Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h.
  The manipulation of the wsrep_aborter can be done solely on
  server side. Moreover, it is now debug only variable and
  could be excluded from optimized builds.
* Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
  fine grained locking for SR BF abort which may require locking
  of victim LOCK_thd_kill. Added explicit call for
  wsrep_thd_kill_LOCK/UNLOCK where appropriate.
* Wsrep-lib was updated to version which allows external
  locking for BF abort calls.

Changes to MTR tests:
* Disable galera_bf_abort_group_commit. This test is going to
  be removed (MDEV-30855).
* Record galera_gcache_recover_manytrx as result file was incomplete.
  Trivial change.
* Make galera_create_table_as_select more deterministic:
  Wait until CTAS execution has reached MDL wait for multi-master
  conflict case. Expected error from multi-master conflict is
  ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
  wsrep transaction when it is waiting for MDL, query gets interrupted
  instead of BF aborted. This should be addressed in separate task.
* A new test galera_kill_group_commit to verify correct behavior
  when KILL is executed while the transaction is committing.

Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com>
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-05-22 00:33:37 +02:00
Sergei Golubchik
2ac832838f post fix for "move alloca() definition from all *.h files to one new header file" 2023-03-08 17:36:36 +01:00
Julius Goryavsky
46a7e96339 move alloca() definition from all *.h files to one new header file 2023-03-07 03:15:54 +01:00
Marko Mäkelä
fb0808c450 Merge 10.3 into 10.4 2023-01-03 16:10:02 +02:00
musvaage
e9e6c7a3c5 header typos 2022-12-20 08:55:48 +11:00
Marko Mäkelä
fdf43b5c78 Merge 10.3 into 10.4 2022-12-13 11:37:33 +02:00
Alexander Barkov
6216a2dfa2 MDEV-29473 UBSAN: Signed integer overflow: X * Y cannot be represented in type 'int' in strings/dtoa.c
Fixing a few problems relealed by UBSAN in type_float.test

- multiplication overflow in dtoa.c

- uninitialized Field::geom_type (and Field::srid as well)

- Wrong call-back function types used in combination with SHOW_FUNC.
  Changes in the mysql_show_var_func data type definition were not
  properly addressed all around the code by the following commits:
    b4ff64568c
    18feb62fee
    0ee879ff8a

  Adding a helper SHOW_FUNC_ENTRY() function and replacing
  all mysql_show_var_func declarations using SHOW_FUNC
  to SHOW_FUNC_ENTRY, to catch mysql_show_var_func in the future
  at compilation time.
2022-11-17 17:51:01 +04:00
Marko Mäkelä
667d3fbbb5 Merge 10.3 into 10.4 2022-10-25 10:04:37 +03:00
Brad Smith
5f25a91140
Cleanup the alloca.h header handling to further reduce hardcoded OS lists (#2289) 2022-10-16 18:44:51 +01:00
Vlad Lesin
f6f055a191 Merge 10.3 into 10.4 2022-02-21 14:10:27 +03:00
Nayuta Yanagisawa
66f55a018b MDEV-27730 Add PLUGIN_VAR_DEPRECATED flag to plugin variables
The sys_var class has the deprecation_substitute member to mark the
deprecated variables. As it's set, the server produces warnings when
these variables are used. However, the plugin has no means to utilize
that functionality.

So, the PLUGIN_VAR_DEPRECATED flag is introduced to set the
deprecation_substitute with the empty string. A non-empty string can
make the warning more informative, but there's no nice way seen to
specify it, and not that needed at the moment.
2022-02-18 13:10:20 +09:00
Sergei Golubchik
7b555ff2c5 MDEV-27341 Use SET PASSWORD to change PAM service
SET PASSWORD = PASSWORD('foo') would fail for pam plugin with

ERROR HY000: SET PASSWORD is ignored for users authenticating via pam plugin

but SET PASSWORD = 'foo' would not.

Now it will.
2022-01-17 18:19:29 +01:00
sjaakola
c1846c4fcf MDEV-26803 PA unsafety with FK cascade delete operation
This commit has a mtr test where two two transactions delete a row from
two separate tables, which will cascade a FK delete for the same row in
a third table. Second replica node is configured with 2 applier threads,
and the test will fail if these two transactions are applied in parallel.

The actual fix, in this commit, is to mark a transaction as unsafe for
parallel applying when it traverses into cascade delete operation.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-12-17 09:38:23 +02:00
Marko Mäkelä
0464761126 Merge 10.3 into 10.4 2021-08-31 09:22:21 +03:00
Marko Mäkelä
e835cc851e Merge 10.2 into 10.3 2021-08-31 08:36:59 +03:00
Marko Mäkelä
fda704c82c Fix GCC 11 -Wmaybe-uninitialized for PLUGIN_PERFSCHEMA
init_mutex_v1_t: Stop lying that the mutex parameter is const.
GCC 11.2.0 assumes that it is and could complain about any mysql_mutex_t
being uninitialized even after mysql_mutex_init() as long as
PLUGIN_PERFSCHEMA is enabled.

init_rwlock_v1_t, init_cond_v1_t: Remove untruthful const qualifiers.

Note: init_socket_v1_t is expecting that the socket fd has already
been created before PSI_SOCKET_CALL(init_socket), and therefore that
parameter really is being treated as a pointer to const.
2021-08-30 11:52:59 +03:00
Michael Okoko
6cd3588f0e Improve documentation of json parser functions
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2021-07-22 21:51:49 +03:00
Vladislav Vaintroub
b81803f065 MDEV-22221: MariaDB with WolfSSL doesn't support AES-GCM cipher for SSL
Enable AES-GCM for SSL (only).

AES-GCM for encryption plugins remains disabled (aes-t fails, on some bug
in GCM or CTR padding)
2021-06-09 15:44:55 +02:00
Marko Mäkelä
44d70c01f0 Merge 10.3 into 10.4 2021-03-19 11:42:44 +02:00
Marko Mäkelä
19052b6deb Merge 10.2 into 10.3 2021-03-18 12:34:48 +02:00
Julius Goryavsky
7345d37141 MDEV-24853: Duplicate key generated during cluster configuration change
Incorrect processing of an auto-incrementing field in the
WSREP-related code during applying transactions results in
a duplicate key being created. This is due to the fact that
at the beginning of the write_row() and update_row() functions,
the values of the auto-increment parameters are used, which
are read from the parameters of the current thread, but further
along the code other values are used, which are read from global
variables (when applying a transaction). This can happen when
the cluster configuration has changed while applying a transaction
(for example in the high_priority_service mode for Galera 4).
Further during IST processing duplicating key is detected, and
processing of the DB_DUPLICATE_KEY return code (inside innodb,
in the write_row() handler) results in a call to the
wsrep_thd_self_abort() function.
2021-03-08 11:15:08 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Sergei Golubchik
2676c9aad7 galera fixes related to THD::LOCK_thd_kill
Since 2017 (c2118a08b1) THD::awake() no longer requires LOCK_thd_data.
It uses LOCK_thd_kill, and this latter mutex is used to prevent
a thread of dying, not LOCK_thd_data as before.
2021-02-02 10:02:17 +01:00
Oleksandr Byelkin
478b83032b Merge branch '10.3' into 10.4 2020-12-25 09:13:28 +01:00
Oleksandr Byelkin
25561435e0 Merge branch '10.2' into 10.3 2020-12-23 19:28:02 +01:00
Sergei Golubchik
e189faf0b3 document that a fulltext parser plugin can replace mysql_add_word callback 2020-12-10 08:45:20 +01:00
Marko Mäkelä
3a423088ac Merge 10.3 into 10.4 2020-09-21 12:29:00 +03:00
Marko Mäkelä
cbcb4ecabb Merge 10.2 into 10.3 2020-09-21 11:04:04 +03:00
Jan Lindström
224c950462 MDEV-23101 : SIGSEGV in lock_rec_unlock() when Galera is enabled
Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue
and move condition to correct callers. Add a function to report
BF lock waits and assert if incorrect BF-BF lock wait happens.

wsrep_report_bf_lock_wait
	Add a new function to report BF lock wait.

wsrep_assert_no_bf_bf_wait
	Add a new function to check do we have a
	BF-BF wait and if we have report this case
	and assert as it is a bug.

lock_rec_has_to_wait
	Use new wsrep_assert_bf_wait to check BF-BF wait.

lock_rec_create_low
lock_table_create
	Use new function to report BF lock waits.

lock_rec_insert_by_trx_age
lock_grant_and_move_on_page
lock_grant_and_move_on_rec
	Assert that trx is not Galera as VATS is not compatible
	with Galera.

lock_rec_add_to_queue
	If there is conflicting lock in a queue make sure that
	transaction is BF.

lock_rec_has_to_wait_in_queue
	Remove incorrect BF handling. If there is conflicting
	locks in a queue all transactions must wait.

lock_rec_dequeue_from_page
lock_rec_unlock
	If there is conflicting lock make sure it is not
	BF-BF case.

lock_rec_queue_validate
	Add Galera record locking rules comment and use
	new function to report BF lock waits.

All attempts to reproduce the original assertion have been
failed. Therefore, there is no test case on this commit.
2020-09-10 13:18:12 +03:00
Julius Goryavsky
956f21c3b0 Merge remote-tracking branch 'origin/bb-10.4-MDEV-21910' into 10.4 2020-07-16 13:03:29 +02:00
Marko Mäkelä
9936cfd531 Merge 10.3 into 10.4 2020-07-15 10:17:15 +03:00
Marko Mäkelä
8a0944080c Merge 10.2 into 10.3 2020-07-14 22:59:19 +03:00
Marko Mäkelä
646a6005e7 Merge 10.1 into 10.2 2020-07-14 15:10:59 +03:00
Daniel Black
3efdac2064 MDEV-22173: socket accept - test for failure
accept might return an error, including SOCKET_EAGAIN/
SOCKET_EINTR. The caller, usually handle_connections_sockets
can these however and invalid file descriptor isn't something
to call fcntl on.

Thanks to Etienne Guesnet (ATOS) for diagnosis,
sample patch description and testing.
2020-07-06 12:33:35 +02:00
sjaakola
5a7794d3a8 MDEV-21910 Deadlock between BF abort and manual KILL command
When high priority replication slave applier encounters lock conflict in innodb,
it will force the conflicting lock holder transaction (victim) to rollback.
This is a must in multi-master sychronous replication model to avoid cluster lock-up.
This high priority victim abort (aka "brute force" (BF) abort), is started
from innodb lock manager while holding the victim's transaction's (trx) mutex.
Depending on the execution state of the victim transaction, it may happen that the
BF abort will call for THD::awake() to wake up the victim transaction for the rollback.
Now, if BF abort requires THD::awake() to be called, then the applier thread executed
locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data

If, at the same time another DBMS super user issues KILL command to abort the same victim,
it will execute locking protocol of: victim THD::LOCK_thd_data  -> victim trx mutex.
These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking
deadlock may occur.

The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim
This flag is set both when BF is called for from innodb and by KILL command.
Either path of victim killing will bail out if victim's wsrep_killed is already
set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter
records the aborter THD's ID. This is needed to preserve the right to kill
the victim from different locations for the same aborter thread.
It is also good error logging, to see who is reponsible for the abort.

A new test case was added in galera.galera_bf_kill_debug.test for scenario where
wsrep applier thread and manual KILL command try to kill same idle victim
2020-06-26 09:56:23 +03:00
Marko Mäkelä
ca38b6e427 Merge 10.3 into 10.4 2020-05-26 11:54:55 +03:00
Marko Mäkelä
ecc7f305dd Merge 10.2 into 10.3 2020-05-25 19:41:58 +03:00
Oleksandr Byelkin
cf52dd174e MDEV-22545: my_vsnprintf behaves not as in C standard
Added parameter %T for string which should be visibly truncated.
2020-05-24 21:27:08 +02:00
Vladislav Vaintroub
403dc759d0 Update WolfSSL
Fix WolfSSL build:

- Do not build with TLSv1.0,it stopped working,at least with SChannel client
- Disable a test that depends on TLSv1.0
- define FP_MAX_BITS always, to fix 32bit builds.
- Increase MAX_AES_CTX_SIZE, to fix build on Linux
2020-05-08 11:51:03 +02:00
Marko Mäkelä
edd38b50f6 MDEV-7962 wsrep_on() takes 0.14% in OLTP RO
The reason why we have wsrep_on() at all is that the macro WSREP(thd)
depends on the definition of THD, and that is intentionally an opaque
data type for InnoDB. So, we cannot avoid invoking wsrep_on(), but
we can evaluate the less expensive conditions thd && WSREP_ON before
calling the function.

Global_read_lock: Use WSREP_NNULL(thd) instead of wsrep_on(thd)
because we not only know the definition of THD but also that
the pointer is not null.

wsrep_open(): Use WSREP(thd) instead of wsrep_on(thd).

InnoDB: Replace thd && wsrep_on(thd) with wsrep_on(thd), now that
the condition has been merged to the definition of the macro
wsrep_on().
2020-04-24 16:01:10 +03:00
Marko Mäkelä
2c39f69d34 MDEV-22203: WSREP_ON is unnecessarily expensive WITH_WSREP=OFF
If the server is compiled WITH_WSREP=OFF, we should avoid evaluating
conditions on a global variable that is constant.

WSREP_ON_: Renamed from WSREP_ON. Defined only WITH_WSREP=ON.

WSREP_ON: Defined as unlikely(WSREP_ON_).

wsrep_on(): Defined as WSREP_ON && wsrep_service->wsrep_on_func().

The reason why we have wsrep_on() at all is that the macro WSREP(thd)
depends on the definition of THD, and that is intentionally an opaque
data type for InnoDB. So, we cannot avoid invoking wsrep_on(), but
we can evaluate the less expensive condition WSREP_ON before calling
the function.
2020-04-24 15:25:39 +03:00
Oleksandr Byelkin
6918157e98 Merge branch '10.3' into 10.4 2020-01-21 23:15:02 +01:00
Oleksandr Byelkin
ade89fc898 Merge branch '10.2' into 10.3 2020-01-21 09:11:14 +01:00
Oleksandr Byelkin
3a1716a7e7 Merge branch '10.1' into 10.2 2020-01-20 16:15:05 +01:00
Oleksandr Byelkin
10eacd5ff7 Merge branch 'merge-perfschema-5.6' into 10.1 2020-01-19 13:11:45 +01:00
Oleksandr Byelkin
3aff3f3679 5.6.47 2020-01-19 12:52:07 +01:00
Daniele Sciascia
aab6cefe8d MDEV-20848 Fixes for MTR test galera_sr.GCF-1060 (#1421)
This patch contains two fixes:

* wsrep_handle_mdl_conflict(): handle the case where SR transaction
  is in aborting state. Previously, a BF-BF conflict was reported, and
  the process would abort.
* wsrep_thd_bf_abort(): do not restore thread vars after calling
  wsrep_bf_abort(). Thread vars are already restored in wsrep-lib if
  necessary. This also removes the assumption that the caller of
  wsrep_thd_bf_abort() is the given bf_thd, which is not the case.

Also in this patch:

* Remove unnecessary check for active victim transaction in
  wsrep_thd_bf_abort(): the exact same check is performed later in
  wsrep_bf_abort().
* Make wsrep_thd_bf_abort() and wsrep_log_thd() const-correct.
* Change signature of wsrep_abort_thd() to take THD pointers instead
  of void pointers.
2019-12-04 09:21:14 +02:00
Oleksandr Byelkin
55b2281a5d Merge branch '10.2' into 10.3 2019-10-31 10:58:06 +01:00
Jan Lindström
36a9694378 MDEV-18562 [ERROR] InnoDB: WSREP: referenced FK check fail: Lock wait index
Lock wait can happen on secondary index when doing FK checks for wsrep.
We should just return error to upper layer and applier will retry
operation when needed.
2019-10-30 10:14:56 +02:00
Sergei Golubchik
4af932e899 remove incorrect #ifdef 2019-08-26 23:33:42 +02:00