MDEV-7026: Race in InnoDB/XtraDB mutex implementation can stall or hang the server.

The bug was that full memory barrier was missing in the code that ensures that
a waiter on an InnoDB mutex will not go to sleep unless it is guaranteed to be
woken up again by another thread currently holding the mutex. This made
possible a race where a thread could get stuck waiting for a mutex that is in
fact no longer locked. If that thread was also holding other critical locks,
this could stall the entire server. There is an error monitor thread than can
break the stall, it runs about once per second. But if the error monitor
thread itself got stuck or was not running, then the entire server could hang
infinitely.

This was introduced on i386/amd64 platforms in 5.5.40 and 10.0.13 by an
incorrect patch that tried to fix the similar problem for PowerPC.

This commit reverts the incorrect PowerPC patch, and instead implements a fix
for PowerPC that does not change i386/amd64 behaviour, making PowerPC work
similarly to i386/amd64.
This commit is contained in:
Kristian Nielsen 2014-11-19 13:56:46 +01:00
commit 6ea41f1e84
10 changed files with 214 additions and 60 deletions

View file

@ -80,11 +80,11 @@ mutex_test_and_set(
mutex_t* mutex) /*!< in: mutex */
{
#if defined(HAVE_ATOMIC_BUILTINS)
return(os_atomic_test_and_set_byte(&mutex->lock_word, 1));
return(os_atomic_test_and_set_byte_acquire(&mutex->lock_word, 1));
#else
ibool ret;
ret = os_fast_mutex_trylock(&(mutex->os_fast_mutex));
ret = os_fast_mutex_trylock_full_barrier(&(mutex->os_fast_mutex));
if (ret == 0) {
/* We check that os_fast_mutex_trylock does not leak
@ -92,7 +92,6 @@ mutex_test_and_set(
ut_a(mutex->lock_word == 0);
mutex->lock_word = 1;
os_wmb;
}
return((byte)ret);
@ -109,11 +108,14 @@ mutex_reset_lock_word(
mutex_t* mutex) /*!< in: mutex */
{
#if defined(HAVE_ATOMIC_BUILTINS)
os_atomic_lock_release_byte(&mutex->lock_word);
/* In theory __sync_lock_release should be used to release the lock.
Unfortunately, it does not work properly alone. The workaround is
that more conservative __sync_lock_test_and_set is used instead. */
os_atomic_test_and_set_byte_release(&mutex->lock_word, 0);
#else
mutex->lock_word = 0;
os_fast_mutex_unlock(&(mutex->os_fast_mutex));
os_fast_mutex_unlock_full_barrier(&(mutex->os_fast_mutex));
#endif
}
@ -145,7 +147,6 @@ mutex_get_waiters(
ptr = &(mutex->waiters);
os_rmb;
return(*ptr); /* Here we assume that the read of a single
word from memory is atomic */
}
@ -180,7 +181,6 @@ mutex_exit_func(
to wake up possible hanging threads if
they are missed in mutex_signal_object. */
os_isync;
if (mutex_get_waiters(mutex) != 0) {
mutex_signal_object(mutex);