Commit graph

1185 commits

Author SHA1 Message Date
Marko Mäkelä
e0e096faaa MDEV-29982 Improve the InnoDB log overwrite error message
The InnoDB write-ahead log ib_logfile0 is of fixed size,
specified by innodb_log_file_size. If the tail of the log
manages to overwrite the head (latest checkpoint) of the log,
crash recovery will be broken.

Let us clarify the messages about this, including adding
a message on the completion of a log checkpoint that notes
that the dangerous situation is over.

To reproduce the dangerous scenario, we will introduce the
debug injection label ib_log_checkpoint_avoid_hard, which will
avoid log checkpoints even harder than the previous
ib_log_checkpoint_avoid.

log_t::overwrite_warned: The first known dangerous log sequence number.
Set in log_close() and cleared in log_write_checkpoint_info(),
which will output a "Crash recovery was broken" message.
2022-11-14 12:18:03 +02:00
Marko Mäkelä
a732d5e2ba Merge 10.4 into 10.5 2022-11-08 17:01:28 +02:00
Marko Mäkelä
8fb176c3c1 MDEV-27121 fixup: mariabackup.mdev-14447,full_crc32 2022-11-08 16:59:36 +02:00
Marko Mäkelä
93b4f84ab2 Merge 10.3 into 10.4 2022-11-08 16:04:01 +02:00
Marko Mäkelä
eabb3b35d5 MDEV-27121 fixup: mariabackup.mdev-14447 fault injection 2022-11-08 08:53:49 +02:00
Marko Mäkelä
65d0c57c1a Merge 10.3 into 10.4 2022-10-05 20:30:57 +03:00
Vlad Lesin
c0eda62aec MDEV-27927 row_sel_try_search_shortcut_for_mysql() does not latch a page, violating read view isolation
btr_search_guess_on_hash() would only acquire an index page latch if it
is invoked with ahi_latch=NULL. If it's invoked from
row_sel_try_search_shortcut_for_mysql() with ahi_latch!=NULL, a page
will not be latched, and row_search_mvcc() will get a pointer to the
record, which can be changed by some other transaction before the record
was stored in result buffer with row_sel_store_mysql_rec() call.

ahi_latch argument of btr_cur_search_to_nth_level_func() and
btr_pcur_open_with_no_init_func() is used only for
row_sel_try_search_shortcut_for_mysql().
btr_cur_search_to_nth_level_func(..., ahi_latch !=0, ...) is invoked
only from btr_pcur_open_with_no_init_func(..., ahi_latch !=0, ...),
which, in turns, is invoked only from
row_sel_try_search_shortcut_for_mysql().

I suppose that separate case with ahi_latch!=0 was intentionally
implemented to protect row_sel_store_mysql_rec() call in
row_search_mvcc() just after row_sel_try_search_shortcut_for_mysql()
call. After the ahi_latch was moved from row_seach_mvcc() to
row_sel_try_search_shortcut_for_mysql(), there is no need in it at all
if btr_search_guess_on_hash() latches a page unconditionally. And if
btr_search_guess_on_hash() latched the page, any access to the record in
row_sel_try_search_shortcut_for_mysql() after btr_pcur_open_with_no_init()
call will be protected with the page latch.

The fix is to remove ahi_latch argument from
btr_pcur_open_with_no_init_func(), btr_cur_search_to_nth_level_func()
and btr_search_guess_on_hash().

There will not be test, as to test it we need to freeze some SELECT
execution in the point between row_sel_try_search_shortcut_for_mysql()
and row_sel_store_mysql_rec() calls in row_search_mvcc(), and to change
the record in some other transaction to let row_sel_store_mysql_rec() to
store changed record in result buffer. Buf we can't do this with the
fix, as the page will be latched in btr_search_guess_on_hash() call.
2022-10-05 17:35:21 +03:00
Marko Mäkelä
0c0a569028 Merge 10.3 into 10.4 2022-09-20 12:38:25 +03:00
Marko Mäkelä
c22dff21a5 InnoDB cleanup: Replace UNIV_LINUX, UNIV_SOLARIS, UNIV_AIX
Let us use the normal platform-specific preprocessor symbols
__linux__, __sun__, _AIX instead of some homebrew ones.

The preprocessor symbol UNIV_HPUX must have lost its meaning
by f6deb00a56 (note: the symbol
UNIV_HPUX10 is being checked for, but only UNIV_HPUX is defined).
2022-09-19 12:20:53 +03:00
Marko Mäkelä
9929301ecd Merge 10.4 into 10.5 2022-08-25 15:31:19 +03:00
Marko Mäkelä
851058a3e6 Merge 10.3 into 10.4 2022-08-25 15:17:20 +03:00
Marko Mäkelä
d1a80c42ee MDEV-29384 Hangs caused by innodb_adaptive_hash_index=ON
buf_defer_drop_ahi(): Remove. Ever since
commit c7f8cfc9e7 (MDEV-27700)
it is safe to invoke btr_search_drop_page_hash_index(block, true)
to remove an orphan adaptive hash index.

Any attempt to upgrade page latches is prone to deadlocks. Recently,
we observed a few hangs that involved nothing more than a small table
consisting of one clustered index page, one secondary index page and
some undo pages.
2022-08-25 15:14:38 +03:00
Marko Mäkelä
3b656ac8c1 Merge 10.4 into 10.5 2022-08-22 19:49:56 +03:00
Marko Mäkelä
b68ae6dc1d Merge 10.3 into 10.4 2022-08-22 16:22:09 +03:00
Thirunarayanan Balathandayuthapani
c7f8cfc9e7 MDEV-27700 ASAN: Heap_use_after_free in btr_search_drop_page_hash_index()
Reason:
=======
Race condition between btr_search_drop_hash_index() and
btr_search_lazy_free(). One thread does resizing of buffer pool
and clears the ahi on all pages in the buffer pool, frees the
index and table while removing the last reference. At the same time,
other thread access index->heap in btr_search_drop_hash_index().

Solution:
=========
Acquire the respective ahi latch before checking index->freed()

btr_search_drop_page_hash_index(): Added new parameter to indicate
that drop ahi entries only if the index is marked as freed

btr_search_check_marked_free_index(): Acquire all ahi latches and
return true if the index was freed
2022-08-22 16:29:46 +05:30
Oleksandr Byelkin
af143474d8 Merge branch '10.4' into 10.5 2022-08-03 07:12:27 +02:00
Oleksandr Byelkin
48e35b8cf6 Merge branch '10.3' into 10.4 2022-08-02 14:15:39 +02:00
Daniel Black
182a6383cd MDEV-16605 Always include buf_madvise_do_dump in binaries
The "used" attribute seems to do this

ref: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
2022-08-02 12:29:11 +10:00
Marko Mäkelä
f09687094c Merge 10.4 into 10.5 2022-07-01 14:42:02 +03:00
Marko Mäkelä
392ee571c1 Merge 10.3 into 10.4 2022-07-01 13:10:36 +03:00
Marko Mäkelä
7c35ad16e3 MDEV-28389 fixup: Fix pre-GCC 10 -Wconversion
Before version 10, GCC would think that a right shift of an
unsigned char returns int. Let us explicitly cast that back,
to silence a bogus -Wconversion warning.
2022-07-01 13:02:43 +03:00
Marko Mäkelä
773f1dad94 Merge 10.4 into 10.5 2022-06-27 16:17:02 +03:00
Marko Mäkelä
b922ae5fc9 Merge 10.3 into 10.4 2022-06-27 16:16:20 +03:00
Marko Mäkelä
a75ad73545 MDEV-28389 fixup: Fix compiler warnings
hex_to_ascii(): Add #if around the definition to avoid
clang -Wunused-function. Avoid GCC 5 -Wconversion with a cast.
2022-06-27 14:50:00 +03:00
Marko Mäkelä
ea847cbeaf Merge 10.4 into 10.5 2022-06-27 10:51:20 +03:00
Marko Mäkelä
01d757036f Merge 10.3 into 10.4 2022-06-27 10:14:37 +03:00
Marko Mäkelä
c86b1389de MDEV-28389: Simplify the InnoDB corrupted page output
buf_page_print(): Dump the buffer page 32 bytes (64 hexadecimal digits)
per line. In this way, the limitation in mtr
("Data too long for column 'line'") will not be triggered.

Also, do not bother decoding the page contents, because everything
is present in the hexadecimal output.

dict_index_find_on_id_low(): Merge to dict_index_get_if_in_cache_low().
The direct call in buf_page_print() was prone to crashing, in case the
table definition was concurrently evicted or dropped from the
data dictionary cache.
2022-06-27 09:49:49 +03:00
Marko Mäkelä
4849d94fe6 MDEV-28828 SIGSEGV in buf_flush_LRU_list_batch
In commit 73fee39ea6 (MDEV-27985)
a regression was introduced that would cause bpage=nullptr to
be referenced.

buf_flush_LRU_list_batch(): Always terminate the loop upon
encountering a null pointer.
2022-06-14 09:14:24 +03:00
Vlad Lesin
3fabdc3ca8 MDEV-28473 field_ref_zero is not initialized in xtrabackup_prepare_func()
The solution is to initialize field_ref_zero in main_low() before
xtrabackup_backup_func() and xtrabackup_prepare_func() calls.
2022-05-11 17:20:31 +03:00
Sergei Golubchik
a70a1cf3f4 Merge branch '10.3' into 10.4 2022-05-08 23:03:08 +02:00
Oleksandr Byelkin
9614fde1aa Merge branch '10.2' into 10.3 2022-05-03 10:59:54 +02:00
Marko Mäkelä
f21a875600 MDEV-28415 ALTER TABLE on a large table hangs InnoDB
buf_flush_page(): Never wait for a page latch, even in checkpoint
flushing (flush_type == BUF_FLUSH_LIST), to prevent a hang of the
page cleaner threads when a large number of pages is latched.

In mysql/mysql-server@9542f3015b
it was claimed that such a hang only affects CREATE FULLTEXT INDEX.
Their fix was to retain buffer-fix but release exclusive latch
on non-leaf pages, and subsequently write to those pages while
they are not associated with the mini-transaction, which would
trip a debug assertion in the MariaDB version of
mtr_t::memo_modify_page() and cause potential corruption
when using the default MariaDB setting innodb_log_optimize_ddl=OFF.

This change essentially backports a small part of
commit 7cffb5f6e8 (MDEV-23399)
from MariaDB Server 10.5.7.
2022-04-27 07:57:04 +03:00
Marko Mäkelä
c009ce7dd0 MDEV-27094 Debug builds include useless InnoDB "disabled" options
This is a backport of commit 4489a89c71
in order to remove the test innodb.redo_log_during_checkpoint
that would cause trouble in the DBUG subsystem invoked by
safe_mutex_lock() via log_checkpoint(). Before
commit 7cffb5f6e8
these mutexes were of different type.

The following options were introduced in
commit 2e814d4702 (mariadb-10.2.2)
and have little use:

innodb_disable_resize_buffer_pool_debug had no effect even in
MariaDB 10.2.2 or MySQL 5.7.9. It was introduced in
mysql/mysql-server@5c4094cf49
to work around a problem that was fixed in
mysql/mysql-server@2957ae4f99
(but the parameter was not removed).

innodb_page_cleaner_disabled_debug and innodb_master_thread_disabled_debug
are only used by the test innodb.redo_log_during_checkpoint
that will be removed as part of this commit.

innodb_dict_stats_disabled_debug is only used by that test,
and it is redundant because one could simply use
innodb_stats_persistent=OFF or the STATS_PERSISTENT=0 attribute
of the table in the test to achieve the same effect.
2022-04-22 12:48:40 +03:00
Marko Mäkelä
f84b5d782a Fix clang -Wunused-but-set-variable 2022-04-21 11:35:07 +03:00
Marko Mäkelä
5d8dcfd86c MDEV-25975: Merge 10.4 into 10.5 2022-04-06 10:30:49 +03:00
Marko Mäkelä
7d7bdd4aaa MDEV-28185 InnoDB generates redundant log checkpoints
The comparison on the checkpoint age (number of log bytes
written since the previous checkpoint) is inaccurate, because
the previous FILE_CHECKPOINT record could span two 512-byte
log blocks, which will cause the LSN to increase by the size of the
log block header and footer.

We will still generate a redudant checkpoint if the previous
checkpoint wrote some FILE_MODIFY records before the FILE_CHECKPOINT
record.
2022-03-29 19:42:10 +03:00
Marko Mäkelä
42609c240d Cleanup: Replace log_sys.n_pending_checkpoint_writes with a Boolean
Only one checkpoint may be in progress at a time.
The counter log_sys.n_pending_checkpoint_writes
was being protected by log_sys.mutex.
Let us replace it with the Boolean log_sys.checkpoint_pending.
2022-03-29 14:56:44 +03:00
Marko Mäkelä
c14f60a72f Fix g++-12 -O2 -Wstringop-overflow
buf_pool_t::watch_unset(): Reorder some code so that
no warning will be emitted in CMAKE_BUILD_TYPE=RelWithDebInfo.
It is unclear why invoking watch_is_sentinel() before
buf_fix_count() would make the warning disappear.
2022-03-29 12:59:38 +03:00
Marko Mäkelä
d62b0368ca Merge 10.4 into 10.5 2022-03-29 12:59:18 +03:00
Marko Mäkelä
ae6e214fd8 Merge 10.3 into 10.4 2022-03-29 11:13:18 +03:00
Marko Mäkelä
020e7d89eb Merge 10.2 into 10.3 2022-03-29 09:53:15 +03:00
Marko Mäkelä
303448bc91 MDEV-27931: buf_page_is_corrupted() wrongly claims corruption
In commit 437da7bc54 (MDEV-19534),
the default value of the global variable srv_checksum_algorithm
in innochecksum was changed from SRV_CHECKSUM_ALGORITHM_INNODB
to implied 0 (innodb_checksum_algorithm=crc32). As a result,
the function buf_page_is_corrupted() would by default invoke
buf_calc_page_crc32() in innochecksum, and crc32_inited would hold.

This would cause "innochecksum" to fail on a particular page.

The actual problem is older, introduced in 2011 in
mysql/mysql-server@17e497bdb7
(MySQL 5.6.3). It should affect the validation of pages of old
data files that were written with innodb_checksum_algorithm=innodb.
When using innodb_checksum_algorithm=crc32 (the default setting
since MariaDB Server 10.2), some valid pages would be rejected
only because exactly one of the two checksum fields accidentally
matches the innodb_checksum_algorithm=crc32 value.

buf_page_is_corrupted(): Simplify the logic of non-strict
checksum validation, by always invoking buf_calc_page_crc32().
Remove a bogus condition that if only one of the checksum fields
contains the value returned by buf_calc_page_crc32(), the page
is corrupted.
2022-03-28 13:36:36 +03:00
Marko Mäkelä
73fee39ea6 MDEV-27985 buf_flush_freed_pages() causes InnoDB to hang
buf_flush_freed_pages(): Assert that neither buf_pool.mutex
nor buf_pool.flush_list_mutex are held. Simplify the loops.
Return the tablespace and the number of pages written or punched.

buf_flush_LRU_list_batch(), buf_do_flush_list_batch():
Release buf_pool.mutex before invoking buf_flush_space().

buf_flush_list_space(): Acquire the mutexes only after invoking
buf_flush_freed_pages().

Reviewed by: Thirunarayanan Balathandayuthapani
2022-03-15 14:44:22 +02:00
Daniel Black
d78173828e MDEV-27900: aio handle partial reads/writes
As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can
really confuse MariaDB.

Filipe Manana (SuSE)[1] showed how database programmers can assume
O_DIRECT is all or nothing.

While a fix was done in the kernel side, we can do better in our code by
requesting that the rest of the block be read/written synchronously if
we do only get a partial read/write.

Per the APIs, a partial read/write can occur before an error, so
reattempting the request will leave the caller with a concrete error to
handle.

[1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a

Also spell synchronously correctly in other files.
2022-03-12 09:47:53 +11:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Oleksandr Byelkin
a576a1cea5 Merge branch '10.3' into 10.4 2022-01-30 09:46:52 +01:00
Oleksandr Byelkin
41a163ac5c Merge branch '10.2' into 10.3 2022-01-29 15:41:05 +01:00
Thirunarayanan Balathandayuthapani
28e166d643 MDEV-26784 [Warning] InnoDB: Difficult to find free blocks in the buffer pool
Problem:
=======
  InnoDB ran out of memory during recovery and it fails to
flush the dirty LRU blocks. The reason is that buffer pool
can ran out before the LRU list length reaches
BUF_LRU_OLD_MIN_LEN(256) threshold.

Fix:
====
During recovery, InnoDB should write out and evict all
dirty blocks.
2022-01-21 14:15:18 +05:30
Daniel Black
410c4edef3 MDEV-27467: innodb to enforce the minimum innodb_buffer_pool_size in SET GLOBAL
.. to be the same as startup.

In resolving MDEV-27461, BUF_LRU_MIN_LEN (256) is the minimum number of
pages for the innodb buffer pool size. Obviously we need more than just
flushing pages. Taking the 16k page size and its default minimum, an
extra 25% is needed on top of the flushing pages to make a workable buffer
pool.

The minimum innodb_buffer_pool_chunk_size (1M) restricts the minimum
otherwise we'd have a pool made up of different chunk sizes.

The resulting minimum innodb buffer pool sizes are:

Page Size, Previously minimum (startup), with change.
        4k                            5M           2M
        8k                            5M           3M
       16k                            5M           5M
       32k                           24M          10M
       64k                           24M          20M

With this patch, SET GLOBAL innodb_buffer_pool_size minimums are
enforced.

The evident minimum system variable size for innodb_buffer_pool_size
is 2M, however this is only setable if using 4k page size. As
the order of the page_size and buffer_pool_size aren't fixed, we can't
hide this change.

Subsequent changes:
* innodb_buffer_pool_resize_with_chunks.test - raised of pool resize due to new
  minimums. Chunk size also needed increase as the test was for
  pool_size < chunk_size to generate a warning.
* Removed srv_buf_pool_min_size and replaced use with MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* Removed srv_buf_pool_def_size and replaced constant defination in
  MYSQL_SYSVAR_LONGLONG(buffer_pool_size)
* Reordered ha_innodb to allow for direct use of MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* Moved buf_pool_size_align into ha_innodb to access to MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* loose-innodb_disable_resize_buffer_pool_debug is needed in the
  innodb.restart.opt test so that under debug mode, resizing of the
  innodb buffer pool can occur.
2022-01-19 11:10:45 +11:00
Marko Mäkelä
e44439ab73 MDEV-27499 Performance regression in log_checkpoint_margin()
In commit 4c3ad24413 (MDEV-27416)
an unnecessarily strict wait condition was introduced in the
function buf_flush_wait(). Most callers actually only care that
the pages have been flushed, not that a checkpoint has completed.

Only in the buf_flush_sync() call for log resizing, we might care
about the log checkpoint. But, in fact,
srv_prepare_to_delete_redo_log_file() is explicitly disabling
checkpoints. So, we can simply remove the unnecessary wait loop.

Thanks to Krunal Bauskar for reporting this performance regression
that we failed to repeat in our testing.
2022-01-18 12:57:15 +02:00