Commit graph

3664 commits

Author SHA1 Message Date
Marko Mäkelä
4ae105a37d Merge 10.4 into 10.5 2023-12-18 08:59:07 +02:00
Thirunarayanan Balathandayuthapani
59a984b4d8 MDEV-32725 innodb.import_update_stats accesses uninitialized ib_table->stat_n_rows
- InnoDB should write all zeros into a table and its indexes
statistics members when table is unreadable.
2023-12-15 15:43:19 +05:30
Sergei Golubchik
98a39b0c91 Merge branch '10.4' into 10.5 2023-12-02 01:02:50 +01:00
Marko Mäkelä
47fc64c19f MDEV-32833 InnoDB wrong error message
trx_t::commit_in_memory(): Empty the detailed_error string, so that
FOREIGN KEY error messages from an earlier transaction will not be
wrongly reused in ha_innobase::get_error_message().

Reviewed by: Thirunarayanan Balathandayuthapani
2023-11-29 10:52:25 +02:00
Marko Mäkelä
78c9a12c8f MDEV-32861 InnoDB hangs when running out of I/O slots
When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1
and the server is run with the minimum parameters
innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs
were observed.

tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire()
will be woken up when the cache previously was empty.

buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite
batch so that os_aio_wait_until_no_pending_writes() has a chance of
returning. Add a Boolean parameter and pass wait_for_reads=false inside
buf_page_decrypt_after_read(), because those calls will be executed
inside a read completion callback, and therefore
os_aio_wait_until_no_pending_reads() would block indefinitely.
2023-11-22 16:54:41 +02:00
Marko Mäkelä
a3d0d5fc33 MDEV-26055: Improve adaptive flushing
This is a 10.5 backport from 10.6
commit 9593cccf28.

Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0
(not default) and innodb_adaptive_flushing=ON (default).
There is also the parameter innodb_adaptive_flushing_lwm
(default: 10 per cent of the log capacity). It should enable some
adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0.
That is not being changed here.

This idea was first presented by Inaam Rana several years ago,
and I discussed it with Jean-François Gagné at FOSDEM 2023.

buf_flush_page_cleaner(): When we are not near the log capacity limit
(neither buf_flush_async_lsn nor buf_flush_sync_lsn are set),
also try to move clean blocks from the buf_pool.LRU list to buf_pool.free
or initiate writes (but not the eviction) of dirty blocks, until
the remaining I/O capacity has been consumed.

buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify
whether dirty least recently used pages (from buf_pool.LRU) should
be evicted immediately after they have been written out. Callers outside
buf_flush_page_cleaner() will pass evict=true, to retain the existing
behaviour.

buf_do_LRU_batch(): Add the parameter bool evict.
Return counts of evicted and flushed pages.

buf_flush_LRU(): Add the parameter bool evict.
Assume that the caller holds buf_pool.mutex and
will invoke buf_dblwr.flush_buffered_writes() afterwards.

buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list()
whose caller must hold buf_pool.mutex and invoke
buf_dblwr.flush_buffered_writes() afterwards.

buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have
buf_flush_wait_batch_end().

page_cleaner_flush_pages_recommendation(): Avoid some floating-point
arithmetics.

buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(),
buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict".

buf_free_from_unzip_LRU_list_batch(): Remove the parameter.
Only actual page writes will contribute towards the limit.

buf_LRU_free_page(): Evict freed pages of temporary tables.

buf_pool.done_free: Broadcast whenever a block is freed
(and buf_pool.try_LRU_scan is set).

buf_pool_t::io_buf_t::reserve(): Retry indefinitely.
During the test encryption.innochecksum we easily run out of
these buffers for PAGE_COMPRESSED or ENCRYPTED pages.

Tested by Matthias Leich and Axel Schwenke
2023-11-16 17:45:18 +02:00
Oleksandr Byelkin
6cfd2ba397 Merge branch '10.4' into 10.5 2023-11-08 12:59:00 +01:00
Kristian Nielsen
9fa718b1a1 Fix mariabackup InnoDB recovered binlog position on server upgrade
Before MariaDB 10.3.5, the binlog position was stored in the TRX_SYS page,
while after it is stored in rollback segments. There is code to read the
legacy position from TRX_SYS to handle upgrades. The problem was if the
legacy position happens to compare larger than the position found in
rollback segments; in this case, the old TRX_SYS position would incorrectly
be preferred over the newer position from rollback segments.

Fixed by always preferring a position from rollback segments over a legacy
position.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-11-03 09:13:51 +01:00
Kristian Nielsen
f8f5ed2280 Revert: MDEV-22351 InnoDB may recover wrong information after RESET MASTER
This commit can cause the wrong (old) binlog position to be recovered by
mariabackup --prepare. It implements that the value of the FIL_PAGE_LSN is
compared to determine which binlog position is the last one and should be
recoved. However, it is not guaranteed that the FIL_PAGE_LSN order matches the
commit order, as is assumed by the code. This is because the page LSN could be
modified by an unrelated update of the page after the commit.

In one example, the recovery first encountered this in trx_rseg_mem_restore():

  lsn=27282754  binlog position (./master-bin.000001, 472908)

and then later:

  lsn=27282699  binlog position (./master-bin.000001, 477164)

The last one 477164 is the correct position. However, because the LSN
encountered for the first one is higher, that position is recovered instead.
This results in too old binlog position, and a newly provisioned slave will
start replicating too early and get duplicate key error or similar.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-11-03 09:13:51 +01:00
Marko Mäkelä
bfab4ab000 MDEV-18867 fixup: Remove DBUG injection
In commit 75e82f71f1 the code to
rename internal tables for FULLTEXT INDEX that had been
created on Microsoft Windows using incompatible names
was removed. Let us also remove the related fault injection.
2023-11-02 15:27:52 +02:00
Marko Mäkelä
15ae97b1c2 MDEV-32578 row_merge_fts_doc_tokenize() handles parser plugin inconsistently
When mysql/mysql-server@0c954c2289
added a plugin interface for FULLTEXT INDEX tokenization to MySQL 5.7,
fts_tokenize_ctx::processed_len got a second meaning, which is only
partly implemented in row_merge_fts_doc_tokenize().

This inconsistency could cause a crash when using FULLTEXT...WITH PARSER.
A test case that would crash MySQL 8.0 when using an n-gram parser and
single-character words would fail to crash in MySQL 5.7, because the
buf_full condition in row_merge_fts_doc_tokenize() was not met.

This change is inspired by
mysql/mysql-server@38e9a0779a
that appeared in MySQL 5.7.44.
2023-10-27 13:13:49 +03:00
Thirunarayanan Balathandayuthapani
3da5d047b8 MDEV-31851 After crash recovery, undo tablespace fails to open
Problem:
========
- InnoDB fails to open undo tablespace when page0 is corrupted
and fails to throw error.

Solution:
=========
- InnoDB throws DB_CORRUPTION error when InnoDB encounters
page0 corruption of undo tablespace.

- InnoDB restores the page0 of undo tablespace from
doublewrite buffer if it encounters page corruption

- Moved Datafile::restore_from_doublewrite() to
recv_dblwr_t::restore_first_page(). So that undo
tablespace and system tablespace can use this function
instead of duplicating the code

srv_undo_tablespace_open(): Returns 0 if file doesn't exist
or ULINT_UNDEFINED if page0 is corrupted.
2023-10-17 18:41:21 +05:30
Daniel Black
fbd11d5f29 MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success
Review cleanups.
2023-10-13 09:48:57 +11:00
Daniel Black
c79ca7c7ad MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success
There are many filesystem related errors that can occur with
MariaBackup. These already outputed to stderr with a good description of
the error. Many of these are permission or resource (file descriptor)
limits where the assertion and resulting core crash doesn't offer
developers anything more than the log message. To the user, assertions
and core crashes come across as poor error handling.

As such we return an error and handle this all the way up the stack.
2023-10-12 21:37:27 +11:00
Marko Mäkelä
f9d471e2d5 Cleanup: Remove innobase_init_vc_templ()
This fixes up a merge of commit 4fb8f7d07a
with respect to commit ea37b14409.
2023-10-12 09:48:54 +03:00
Marko Mäkelä
6e9b421f77 MDEV-32364 Server crashes when starting server with high innodb_log_buffer_size
log_t::create(): Return whether the initialisation succeeded.
It may fail if too large an innodb_log_buffer_size is specified.
2023-10-06 14:16:01 +03:00
Daniel Black
ca66a2cbfa MDEV-18200 MariaBackup full backup failed with InnoDB: Failing assertion: success
There are many filesystem related errors that can occur with
MariaBackup. These already outputed to stderr with a good description of
the error. Many of these are permission or resource (file descriptor)
limits where the assertion and resulting core crash doesn't offer
developers anything more than the log message. To the user, assertions
and core crashes come across as poor error handling.

As such we return an error and handle this all the way up the stack.
2023-09-26 08:55:52 +10:00
Vlad Lesin
95730372bd MDEV-30165 X-lock on supremum for prepared transaction for RR
trx_t::set_skip_lock_inheritance() must be invoked at the very beginning
of lock_release_on_prepare(). Currently  trx_t::set_skip_lock_inheritance()
is invoked at the end of lock_release_on_prepare() when lock_sys and trx
are released, and there can be a case when locks on prepare are released,
but "not inherit gap locks" bit has not yet been set, and page split
inherits lock to supremum.

Also reset supremum bit and rebuild waiting queue when XA is prepared.

Reviewed by: Marko Mäkelä
2023-09-21 20:07:53 +03:00
Marko Mäkelä
d58f43f8b4 MDEV-21174 fixup: Remove unused ut_bit_set_nth()
This fixes up commit 56f6dab1d0
2023-09-19 18:02:56 +03:00
Oleksandr Byelkin
7564be1352 Merge branch '10.4' into 10.5 2023-07-26 16:02:57 +02:00
Oleksandr Byelkin
f52954ef42 Merge commit '10.4' into 10.5 2023-07-20 11:54:52 +02:00
Vlad Lesin
1bfd3cc457 MDEV-10962 Deadlock with 3 concurrent DELETEs by unique key
PROBLEM:
A deadlock was possible when a transaction tried to "upgrade" an already
held Record Lock to Next Key Lock.

SOLUTION:
This patch is based on observations that:
(1) a Next Key Lock is equivalent to Record Lock combined with Gap Lock
(2) a GAP Lock never has to wait for any other lock
In case we request a Next Key Lock, we check if we already own a Record
Lock of equal or stronger mode, and if so, then we change the requested
lock type to GAP Lock, which we either already have, or can be granted
immediately, as GAP locks don't conflict with any other lock types.
(We don't consider Insert Intention Locks a Gap Lock in above statements).

The reason of why we don't upgrage Record Lock to Next Key Lock is the
following.

Imagine a transaction which does something like this:

for each row {
    request lock in LOCK_X|LOCK_REC_NOT_GAP mode
    request lock in LOCK_S mode
}

If we upgraded lock from Record Lock to Next Key lock, there would be
created only two lock_t structs for each page, one for
LOCK_X|LOCK_REC_NOT_GAP mode and one for LOCK_S mode, and then used
their bitmaps to mark all records from the same page.

The situation would look like this:

request lock in LOCK_X|LOCK_REC_NOT_GAP mode on row 1:
// -> creates new lock_t for LOCK_X|LOCK_REC_NOT_GAP mode and sets bit for
// 1
request lock in LOCK_S mode on row 1:
// -> notices that we already have LOCK_X|LOCK_REC_NOT_GAP on the row 1,
// so it upgrades it to X
request lock in LOCK_X|LOCK_REC_NOT_GAP mode on row 2:
// -> creates a new lock_t for LOCK_X|LOCK_REC_NOT_GAP mode (because we
// don't have any after we've upgraded!) and sets bit for 2
request lock in LOCK_S mode on row 2:
// -> notices that we already have LOCK_X|LOCK_REC_NOT_GAP on the row 2,
// so it upgrades it to X
    ...etc...etc..

Each iteration of the loop creates a new lock_t struct, and in the end we
have a lot (one for each record!) of LOCK_X locks, each with single bit
set in the bitmap. Soon we run out of space for lock_t structs.

If we create LOCK_GAP instead of lock upgrading, the above scenario works
like the following:

// -> creates new lock_t for LOCK_X|LOCK_REC_NOT_GAP mode and sets bit for
// 1
request lock in LOCK_S mode on row 1:
// -> notices that we already have LOCK_X|LOCK_REC_NOT_GAP on the row 1,
// so it creates LOCK_S|LOCK_GAP only and sets bit for 1
request lock in LOCK_X|LOCK_REC_NOT_GAP mode on row 2:
// -> reuses the lock_t for LOCK_X|LOCK_REC_NOT_GAP by setting bit for 2
request lock in LOCK_S mode on row 2:
// -> notices that we already have LOCK_X|LOCK_REC_NOT_GAP on the row 2,
// so it reuses LOCK_S|LOCK_GAP setting bit for 2

In the end we have just two locks per page, one for each mode:
LOCK_X|LOCK_REC_NOT_GAP and LOCK_S|LOCK_GAP.
Another benefit of this solution is that it avoids not-entirely
const-correct, (and otherwise looking risky) "upgrading".

The fix was ported from
mysql/mysql-server@bfba840dfa
mysql/mysql-server@75cefdb1f7

Reviewed by: Marko Mäkelä
2023-07-06 15:06:10 +03:00
Thirunarayanan Balathandayuthapani
bd076d4dff MDEV-31442 page_cleaner thread aborts while releasing the tablespace
- InnoDB shouldn't acquire the tablespace when it is being stopped
or closed
2023-06-16 14:58:48 +05:30
Thirunarayanan Balathandayuthapani
841e905f20 MDEV-31442 page_cleaner thread aborts while releasing the tablespace
After further I/O on a tablespace has been stopped
(for example due to DROP TABLE or an operation that
rebuilds a table), page cleaner thread tries to
flush the pending writes for the tablespace and
releases the tablespace reference even though it was not
acquired.

fil_space_t::flush(): Don't release the tablespace when it is
being stopped and closed

Thanks to Marko Mäkelä for suggesting this patch.
2023-06-09 18:15:33 +05:30
Marko Mäkelä
c25b496724 MDEV-31382 SET GLOBAL innodb_undo_log_truncate=ON has no effect on logically empty undo logs
innodb_undo_log_truncate_update(): A callback function. If
SET GLOBAL innodb_undo_log_truncate=ON, invoke
srv_wake_purge_thread_if_not_active().

srv_wake_purge_thread_if_not_active(): If innodb_undo_log_truncate=ON,
always wake up the purge subsystem.

srv_do_purge(): If the history is empty, invoke
trx_purge_truncate_history() in order to free undo log pages.

trx_purge_truncate_history(): If head.trx_no==0, consider the
cached undo logs to be free.

trx_purge(): Remove the parameter "bool truncate" and let the
caller invoke trx_purge_truncate_history() directly.

Reviewed by: Vladislav Lesin
2023-06-08 09:18:21 +03:00
Marko Mäkelä
3e40f9a7f3 MDEV-31355 innodb_undo_log_truncate=ON fails to wait for purge of enough transaction history
purge_sys_t::sees(): Wrapper for view.sees().

trx_purge_truncate_history(): Invoke purge_sys.sees() instead of
comparing to head.trx_no, to determine if undo pages can be safely freed.

The test innodb.cursor-restore-locking was adjusted by Vladislav Lesin,
as was the the debug instrumentation in row_purge_del_mark().

Reviewed by: Vladislav Lesin
2023-06-08 09:17:52 +03:00
Vlad Lesin
b54e7b0cea MDEV-31185 rw_trx_hash_t::find() unpins pins too early
rw_trx_hash_t::find() acquires element->mutex, then unpins pins, used for
lf_hash element search. After that the "element" can be deallocated and
reused by some other thread.

If we take a look rw_trx_hash_t::insert()->lf_hash_insert()->lf_alloc_new()
calls, we will not find any element->mutex acquisition, as it was not
initialized yet before it's allocation. rw_trx_hash_t::insert() can reuse
the chunk, unpinned in rw_trx_hash_t::find().

The scenario is the following:

1. Thread 1 have just executed lf_hash_search() in
rw_trx_hash_t::find(), but have not acquired element->mutex yet.
2. Thread 2 have removed the element from hash table with
rw_trx_hash_t::erase() call.
3. Thread 1 acquired element->mutex and unpinned pin 2 pin with
lf_hash_search_unpin(pins) call.
4. Some thread purged memory of the element.
5. Thread 3 reused the memory for the element, filled element->id,
element->trx.
6. Thread 1 crashes with failed "DBUG_ASSERT(trx_id == trx->id)"
assertion.

Note that trx_t objects are also reused, see the code around trx_pools
for details.

The fix is to invoke "lf_hash_search_unpin(pins);" after element->trx is
stored in local variable in rw_trx_hash_t::find().

Reviewed by: Nikita Malyavin, Marko Mäkelä.
2023-05-19 15:50:20 +03:00
Marko Mäkelä
06d555a41a Merge bb-10.5-release into 10.5 2023-05-19 14:23:04 +03:00
Marko Mäkelä
e0084b9d31 MDEV-31234 InnoDB does not free UNDO after the fix of MDEV-30671
trx_purge_truncate_history(): Only call trx_purge_truncate_rseg_history()
if the rollback segment is safe to process. This will avoid leaking undo
log pages that are not yet ready to be processed. This fixes a regression
that was introduced in
commit 0de3be8cfd (MDEV-30671).

trx_sys_t::any_active_transactions(): Separately count XA PREPARE
transactions.

srv_purge_should_exit(): Terminate slow shutdown if the history size
does not change and XA PREPARE transactions exist in the system.
This will avoid a hang of the test innodb.recovery_shutdown.

Tested by: Matthias Leich
2023-05-19 12:19:26 +03:00
Marko Mäkelä
477285c8ea MDEV-31253 Freed data pages are not always being scrubbed
fil_space_t::flush_freed(): Renamed from buf_flush_freed_pages();
this is a backport of aa45850687 from 10.6.
Invoke log_write_up_to() on last_freed_lsn, instead of avoiding
the operation when the log has not yet been written.
A more costly alternative would be that log_checkpoint() would invoke
this function on every affected tablespace.
2023-05-12 14:57:14 +03:00
Marko Mäkelä
50f3b7d164 MDEV-31124 Innodb_data_written miscounts doublewrites
When commit a5a2ef079c
implemented asynchronous doublewrite, the writes via
the doublewrite buffer started to be counted incorrectly,
without multiplying them by innodb_page_size.

srv_export_innodb_status(): Correctly count the
Innodb_data_written.

buf_dblwr_t: Remove submitted(), because it is close to written()
and only Innodb_data_written was interested in it. According to
its name, it should count completed and not submitted writes.

Tested by: Axel Schwenke
2023-04-25 12:17:06 +03:00
Oleksandr Byelkin
1d74927c58 Merge branch '10.4' into 10.5 2023-04-24 12:43:47 +02:00
Alexander Barkov
9f98a2acd7 MDEV-30968 mariadb-backup does not copy Aria logs if aria_log_dir_path is used
- `mariadb-backup --backup` was fixed to fetch the value of the
   @@aria_log_dir_path server variable and copy aria_log* files
   from @@aria_log_dir_path directory to the backup directory.
   Absolute and relative (to --datadir) paths are supported.

   Before this change aria_log* files were copied to the backup
   only if they were in the default location in @@datadir.

- `mariadb-backup --copy-back` now understands a new my.cnf and command line
   parameter --aria-log-dir-path.

  `mariadb-backup --copy-back` in the main loop in copy_back()
   (when copying back from the backup directory to --datadir)
   was fixed to ignore all aria_log* files.

   A new function copy_back_aria_logs() was added.
   It consists of a separate loop copying back aria_log* files from
   the backup directory to the directory specified in --aria-log-dir-path.
   Absolute and relative (to --datadir) paths are supported.
   If --aria-log-dir-path is not specified,
   aria_log* files are copied to --datadir by default.

- The function is_absolute_path() was fixed to understand MTR style
  paths on Windows with forward slashes, e.g.
   --aria-log-dir-path=D:/Buildbot/amd64-windows/build/mysql-test/var/...
2023-04-21 19:08:35 +04:00
Oleksandr Byelkin
ac5a534a4c Merge remote-tracking branch '10.4' into 10.5 2023-03-31 21:32:41 +02:00
Marko Mäkelä
402f36dd65 MDEV-30936 fixup
fil_space_t::~fil_space_t(): Invoke ut_free(name) because
doing so in the callers would trip MSAN_OPTIONS=poison_in_dtor=1
2023-03-28 15:10:32 +03:00
Marko Mäkelä
dfa90257f6 MDEV-30936 clang 15.0.7 -fsanitize=memory fails massively
handle_slave_io(), handle_slave_sql(), os_thread_exit():
Remove a redundant pthread_exit(nullptr) call, because it
would cause SIGSEGV.

mysql_print_status(): Add MEM_MAKE_DEFINED() to work around
some missing instrumentation around mallinfo2().

que_graph_free_stat_list(): Invoke que_node_get_next(node) before
que_graph_free_recursive(node). That is the logical and
MSAN_OPTIONS=poison_in_dtor=1 compatible way of freeing memory.

ins_node_t::~ins_node_t(): Invoke mem_heap_free(entry_sys_heap).

que_graph_free_recursive(): Rely on ins_node_t::~ins_node_t().

fts_t::~fts_t(): Invoke mem_heap_free(fts_heap).

fts_free(): Replace with direct calls to fts_t::~fts_t().

The failures in free_root() due to MSAN_OPTIONS=poison_in_dtor=1
will be covered in MDEV-30942.
2023-03-28 11:44:24 +03:00
Vlad Lesin
4c226c1850 MDEV-29050 mariabackup issues error messages during InnoDB tablespaces export on partial backup preparing
The solution is to suppress error messages for missing tablespaces if
mariabackup is launched with "--prepare --export" options.

"mariabackup --prepare --export" invokes itself with --mysqld parameter.
If the parameter is set, then it starts server to feed "FLUSH TABLES ...
FOR EXPORT;" queries for exported tablespaces. This is "normal" server
start, that's why new srv_operation value is introduced.

Reviewed by Marko Makela.
2023-03-27 20:15:10 +03:00
Vlad Lesin
7d6b3d4008 MDEV-30775 Performance regression in fil_space_t::try_to_close() introduced in MDEV-23855
fil_node_open_file_low() tries to close files from the top of
fil_system.space_list if the number of opened files is exceeded.

It invokes fil_space_t::try_to_close(), which iterates the list searching
for the first opened space. Then it just closes the space, leaving it in
the same position in fil_system.space_list.

On heavy files opening, like during 'SHOW TABLE STATUS ...' execution,
if the number of opened files limit is reached,
fil_space_t::try_to_close() iterates more and more closed spaces before
reaching any opened space for each fil_node_open_file_low() call. What
causes performance regression if the number of spaces is big enough.

The fix is to keep opened spaces at the top of fil_system.space_list,
and move closed files at the end of the list.

For this purpose fil_space_t::space_list_last_opened pointer is
introduced. It points to the last inserted opened space in
fil_space_t::space_list. When space is opened, it's inserted to the
position just after the pointer points to in fil_space_t::space_list to
preserve the logic, inroduced in MDEV-23855. Any closed space is added
to the end of fil_space_t::space_list.

As opened spaces are located at the top of fil_space_t::space_list,
fil_space_t::try_to_close() finds opened space faster.

There can be the case when opened and closed spaces are mixed in
fil_space_t::space_list if fil_system.freeze_space_list was set during
fil_node_open_file_low() execution. But this should not cause any error,
as fil_space_t::try_to_close() still iterates spaces in the list.

There is no need in any test case for the fix, as it does not change any
functionality, but just fixes performance regression.
2023-03-10 18:31:10 +03:00
Marko Mäkelä
c14a39431b MDEV-30753 Possible corruption due to trx_purge_free_segment()
Starting with commit 0de3be8cfd (MDEV-30671),
the field TRX_UNDO_NEEDS_PURGE lost its previous meaning.
The following scenario is possible:

(1) InnoDB is killed at a point of time corresponding to the durable
execution of some fseg_free_step_not_header() but not
trx_purge_remove_log_hdr().
(2) After restart, the affected pages are allocated for something else.
(3) Purge will attempt to access the newly reallocated pages when looking
for some old undo log records.

trx_purge_free_segment(): Invoke trx_purge_remove_log_hdr() as the first
thing, to be safe. If the server is killed, some pages will never be
freed. That is the lesser evil. Also, before each mtr.start(), invoke
log_free_check() to prevent ib_logfile0 overrun.
2023-02-28 15:39:23 +02:00
Marko Mäkelä
0de3be8cfd MDEV-30671 InnoDB undo log truncation fails to wait for purge of history
It is not safe to invoke trx_purge_free_segment() or execute
innodb_undo_log_truncate=ON before all undo log records in
the rollback segment has been processed.

A prominent failure that would occur due to premature freeing of
undo log pages is that trx_undo_get_undo_rec() would crash when
trying to copy an undo log record to fetch the previous version
of a record.

If trx_undo_get_undo_rec() was not invoked in the unlucky time frame,
then the symptom would be that some committed transaction history is
never removed. This would be detected by CHECK TABLE...EXTENDED that
was impleented in commit ab0190101b.
Such a garbage collection leak should be possible even when using
innodb_undo_log_truncate=OFF, just involving trx_purge_free_segment().

trx_rseg_t::needs_purge: Change the type from Boolean to a transaction
identifier, noting the most recent non-purged transaction, or 0 if
everything has been purged. On transaction start, we initialize this
to 1 more than the transaction start ID. On recovery, the field may be
adjusted to the transaction end ID (TRX_UNDO_TRX_NO) if it is larger.

The field TRX_UNDO_NEEDS_PURGE becomes write-only; only some debug
assertions that would validate the value. The field reflects the old
inaccurate Boolean field trx_rseg_t::needs_purge.

trx_undo_mem_create_at_db_start(), trx_undo_lists_init(),
trx_rseg_mem_restore(): Remove the parameter max_trx_id.
Instead, store the maximum in trx_rseg_t::needs_purge,
where trx_rseg_array_init() will find it.

trx_purge_free_segment(): Contiguously hold a lock on
trx_rseg_t to prevent any concurrent allocation of undo log.

trx_purge_truncate_rseg_history(): Only invoke trx_purge_free_segment()
if the rollback segment is empty and there are no pending transactions
associated with it.

trx_purge_truncate_history(): Only proceed with innodb_undo_log_truncate=ON
if trx_rseg_t::needs_purge indicates that all history has been purged.

Tested by: Matthias Leich
2023-02-24 14:24:44 +02:00
Marko Mäkelä
5300c0fb76 MDEV-30657 InnoDB: Not applying UNDO_APPEND due to corruption
This almost completely reverts
commit acd23da4c2 and
retains a safe optimization:

recv_sys_t::parse(): Remove any old redo log records for the
truncated tablespace, to free up memory earlier.
If recovery consists of multiple batches, then recv_sys_t::apply()
will must invoke recv_sys_t::trim() again to avoid wrongly
applying old log records to an already truncated undo tablespace.
2023-02-15 18:16:41 +02:00
Marko Mäkelä
c41c79650a Merge 10.4 into 10.5 2023-02-10 12:02:11 +02:00
Vicențiu Ciorbaru
08c852026d Apply clang-tidy to remove empty constructors / destructors
This patch is the result of running
run-clang-tidy -fix -header-filter=.* -checks='-*,modernize-use-equals-default' .

Code style changes have been done on top. The result of this change
leads to the following improvements:

1. Binary size reduction.
* For a -DBUILD_CONFIG=mysql_release build, the binary size is reduced by
  ~400kb.
* A raw -DCMAKE_BUILD_TYPE=Release reduces the binary size by ~1.4kb.

2. Compiler can better understand the intent of the code, thus it leads
   to more optimization possibilities. Additionally it enabled detecting
   unused variables that had an empty default constructor but not marked
   so explicitly.

   Particular change required following this patch in sql/opt_range.cc

   result_keys, an unused template class Bitmap now correctly issues
   unused variable warnings.

   Setting Bitmap template class constructor to default allows the compiler
   to identify that there are no side-effects when instantiating the class.
   Previously the compiler could not issue the warning as it assumed Bitmap
   class (being a template) would not be performing a NO-OP for its default
   constructor. This prevented the "unused variable warning".
2023-02-09 16:09:08 +02:00
Marko Mäkelä
acd23da4c2 MDEV-30479 optimization: Invoke recv_sys_t::trim() earlier
recv_sys_t::parse(): Discard old page-level redo log when parsing
a TRIM_PAGES record.

recv_sys_t::apply(): trim() was invoked in parse() already.

recv_sys_t::truncated_undo_spaces[]: Only store the size, no LSN.
2023-02-06 20:29:42 +02:00
Thirunarayanan Balathandayuthapani
17858e03a7 MDEV-30179 mariabackup --backup fails with FATAL ERROR: ... failed
to copy datafile

- Mariabackup fails to copy the undo log tablespace when it undergoes
truncation. So Mariabackup should detect the redo log which does
undo tablespace truncation and also backup should read the minimum
file size of the tablespace and ignore the error while reading.

- Throw error when innodb undo tablespace read failed, but backup
doesn't find the redo log for undo tablespace truncation
2023-01-10 15:47:13 +05:30
Marko Mäkelä
8b9b4ab3f5 Merge 10.4 into 10.5 2023-01-03 17:08:42 +02:00
Marko Mäkelä
fb0808c450 Merge 10.3 into 10.4 2023-01-03 16:10:02 +02:00
Aleksey Midenkov
e056efdd6c MDEV-25004 Missing row in FTS_DOC_ID_INDEX during DELETE HISTORY
1. In case of system-versioned table add row_end into FTS_DOC_ID index
   in fts_create_common_tables() and innobase_create_key_defs().
   fts_n_uniq() returns 1 or 2 depending on whether the table is
   system-versioned.

   After this patch recreate of FTS_DOC_ID index is required for
   existing system-versioned tables. If you see this message in error
   log or server warnings: "InnoDB: Table db/t1 contains 2 indexes
   inside InnoDB, which is different from the number of indexes 1
   defined in the MariaDB" use this command to fix the table:

      ALTER TABLE db.t1 FORCE;

2. Fix duplicate history for secondary unique index like it was done
   in MDEV-23644 for clustered index (932ec586aa). In case of
   existing history row which conflicts with currently inseted row we
   check in row_ins_scan_sec_index_for_duplicate() whether that row
   was inserted as part of current transaction. In that case we
   indicate with DB_FOREIGN_DUPLICATE_KEY that new history row is not
   needed and should be silently skipped.

3. Some parts of MDEV-21138 (7410ff436e) reverted. Skipping of
   FTS_DOC_ID index for history rows made problems with purge
   system. Now this is fixed differently by p.2.

4. wait_all_purged.inc checks that we didn't affect non-history rows
   so they are deleted and purged correctly.

Additional FTS fixes

  fts_init_get_doc_id(): exclude history rows from max_doc_id
  calculation. fts_init_get_doc_id() callback is used only for crash
  recovery.

  fts_add_doc_by_id(): set max value for row_end field.

  fts_read_stopword(): stopwords table can be system-versioned too. We
  now read stopwords only for current data.

  row_insert_for_mysql(): exclude history rows from doc_id validation.

  row_merge_read_clustered_index(): exclude history_rows from doc_id
  processing.

  fts_load_user_stopword(): for versioned table retrieve row_end field
  and skip history rows. For non-versioned table we retrieve 'value'
  field twice (just for uniformity).

FTS tests for System Versioning now include maybe_versioning.inc which
adds 3 combinations:

'vers'     for debug build sets sysvers_force and
	   sysvers_hide. sysvers_force makes every created table
	   system-versioned, sysvers_hide hides WITH SYSTEM VERSIONING
	   for SHOW CREATE.

	   Note: basic.test, stopword.test and versioning.test do not
	   require debug for 'vers' combination. This is controlled by
	   $modify_create_table in maybe_versioning.inc and these
	   tests run WITH SYSTEM VERSIONING explicitly which allows to
	   test 'vers' combination on non-debug builds.

'vers_trx' like 'vers' sets sysvers_force_trx and sysvers_hide. That
	   tests FTS with trx_id-based System Versioning.

'orig' 	   works like before: no System Versioning is added, no debug is
	   required.

Upgrade/downgrade test for System Versioning is done by
innodb_fts.versioning. It has 2 combinations:

'prepare' makes binaries in std_data (requires old server and OLD_BINDIR).
	  It tests upgrade/downgrade against old server as well.

'upgrade' tests upgrade against binaries in std_data.

Cleanups:

Removed innodb-fts-stopword.test as it duplicates stopword.test
2022-12-27 00:02:02 +03:00
Marko Mäkelä
b8f4b984f9 MDEV-24685 fixup: Remove srv_n_file_io_threads
The variable was not really being used for anything. The parameters
innodb_read_io_threads, innodb_write_io_threads have replaced
innodb_file_io_threads.
2022-12-16 17:08:56 +02:00
Marko Mäkelä
92ff7bb63f MDEV-30227 [ERROR] [FATAL] InnoDB: fdatasync() returned 9
fil_space_t::flush<false>(): If the CLOSING flag is set,
the file may already have been closed, resulting in EBADF
being returned by fdatasync(). In any case, the
thread that had set the flag should take care of invoking
os_file_flush_func().

The crash occurred during the execution of FLUSH TABLES...FOR EXPORT.

Tested by: Matthias Leich
2022-12-15 12:45:26 +02:00