MDEV-24449 Corruption of system tablespace or last recovered page

This corresponds to 10.5 commit 39378e1366.

With a patched version of the test innodb.ibuf_not_empty (so that
it would trigger crash recovery after using the change buffer),
and patched code that would modify the os_thread_sleep() in
recv_apply_hashed_log_recs() to be 1ms as well as add a sleep of
the same duration to the end of recv_recover_page() when
recv_sys->n_addrs=0, we can demonstrate a race condition.

After disabling some debug checks in buf_all_freed_instance(),
buf_pool_invalidate_instance() and buf_validate(), we managed to
trigger an assertion failure in fseg_free_step(), on the XDES_FREE_BIT.
In other words, an trx_undo_seg_free() call during
trx_rollback_resurrected() was attempting a double-free of a page.
This was repeated about once in 400 to 500 test runs. With the fix
applied, the test passed 2,000 runs.

recv_apply_hashed_log_recs(): Do not only wait for recv_sys->n_addrs
to reach 0, but also wait for buf_get_n_pending_read_ios() to reach 0,
to guarantee that buf_page_io_complete() will not be executing
ibuf_merge_or_delete_for_page().
This commit is contained in:
Marko Mäkelä 2020-12-28 12:06:22 +02:00
parent 8e3e87d2fc
commit 5b9ee8d819

View file

@ -2501,7 +2501,7 @@ apply:
/* Wait until all the pages have been processed */
while (recv_sys->n_addrs != 0) {
while (recv_sys->n_addrs || buf_get_n_pending_read_ios()) {
const bool abort = recv_sys->found_corrupt_log
|| recv_sys->found_corrupt_fs;