Commit graph

198268 commits

Author SHA1 Message Date
Thirunarayanan Balathandayuthapani
5919f7b675 MDEV-31264 Purge trying to access freed secondary index page
- InnoDB purge tries to access aborted secondary index and access
the freed secondary index root page.
2023-05-31 19:07:41 +05:30
Marko Mäkelä
e3b06156c6 MDEV-31347 fil_ibd_create() may hijack the file handle of an old file
fil_ibd_create(): Hold fil_system.mutex until fil_node_t::find_metadata()
has completed, so that node->handle cannot be closed by a concurrent
thread. This race condition was introduced
in commit 10dd290b4b (MDEV-17380).

Tested by: Matthias Leich
2023-05-31 15:25:07 +03:00
Marko Mäkelä
eb20e7c900 MDEV-31353 InnoDB recovery hangs after reporting corruption
recv_recover_page(): Remove some code which was added in
commit 0b47c126e3 with
no good reason and which would cause a hang after a corrupted
page was reported during crash recovery.

Tested by: Matthias Leich
2023-05-31 15:20:54 +03:00
Tuukka Pasanen
30fb72ca6e MDEV-31331: Fix cut'n'paste variable name in Debian pre-inst script
There is unwanted cut'n'paste variable name in Debian pre-inst
script which causes:

df: '': No such file or directory
/var/lib/dpkg/tmp.ci/preinst: line 215: [: : integer expression expected

Rename variable to correct one and make check that that directory
or symlink really exists. If it does not then fail with error
and message.
2023-05-31 10:26:19 +10:00
Marko Mäkelä
a6c0a27696 MDEV-31362 recv_sys_t::apply(bool): Assertion `!last_batch || recovered_lsn == scanned_lsn' failed
recv_sys_t::apply(): Remove a bogus debug assertion that had been
added in commit f2c17cc9d9 (MDEV-29911).

It is perfectly normal that when the server was killed in the middle of
writing multiple redo log blocks, the recovery would end such that
recv_sys.scanned_lsn will point to the end of the last complete 512-byte
log block, but recv_sys.recovered_lsn will be less than that.

Also, correct the function comment of recv_sys_t::parse().
2023-05-30 17:21:49 +03:00
Otto Kekalainen
ea66df2f45 Deb: Fix blocksize check to use $mysql_datadir/$datadir correctly
In commit f99a8918 this line was changed to not use awk, and new version
copied both to init file and preinst file but overlooking that they use
different variable names.

Also fix minor syntax issues to make Shellcheck happy.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
2023-05-30 18:55:13 +10:00
Marko Mäkelä
ce547cfc05 MDEV-31350: Hang in innodb.recovery_memory
buf_flush_page_cleaner(): Whenever buf_pool.ran_out(), invoke
buf_pool.get_oldest_modification(0) so that all clean blocks
will be removed from buf_pool.flush_list and buf_flush_LRU_list_batch()
will be able to evict some pages.

This fixes a regression that was likely caused by
commit a55b951e60 (MDEV-26827).
2023-05-26 16:40:07 +03:00
Marko Mäkelä
7b72fc0a57 MDEV-22739 !cursor->index->is_committed() in row0ins.cc
row_ins_sec_index_entry_by_modify(): When noticing a corrupted secondary
index on which CREATE INDEX is not in progress, return DB_CORRUPTION
instead of intentionally crashing the server.

Tested by: Matthias Leich
2023-05-26 16:40:02 +03:00
Marko Mäkelä
e38c075aa0 MDEV-31346 trx_purge_add_undo_to_history() is not optimal
trx_undo_set_state_at_finish(): Merge to its only caller,
trx_purge_add_undo_to_history().

trx_purge_add_undo_to_history(): Evaluate the condition related to
TRX_UNDO_STATE only once.

Tested by: Matthias Leich
2023-05-26 16:39:46 +03:00
Marko Mäkelä
db8765500e MDEV-31343 Another server hang with innodb_undo_log_truncate=ON
trx_purge_truncate_history(): While waiting for a write-fixed block
to become available, simply wait for an exclusive latch on it.
Also, simplify the iteration: first check for oldest_modification>2
(to ignore clean pages or pages belonging to the temporary tablespace)
and then compare the tablespace identifier.

Before releasing buf_pool.flush_list_mutex we will buffer-fix the block
of interest. In that way, buf_page_t::can_relocate() will not hold on
the block and it must remain in the buffer pool until we have acquired
an exclusive latch on it. If the block is still dirty, we will register
it with the tablespace truncation mini-transaction; else, we will simply
release the latch and buffer-fix and move to the next block.

This also reverts commit c4d7939989
because that fix should no longer be necessary; the wait for an
exclusive block latch should allow buf_pool_t::release_freed_page()
on the same block to proceed.

Tested by: Axel Schwenke, Matthias Leich
2023-05-26 16:16:10 +03:00
Alexander Barkov
03a9366c73 Extra tests for MDEV-30483 After upgrade to 10.6 from Mysql 5.7 seeing "InnoDB: Column last_update in table mysql.innodb_table_stats is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL"
Adding tests demonstrating that columns:
- mysql.innodb_table_stats.last_update
- mysql.innodb_index_stats.last_update

contain sane values close to NOW() rathar than a garbage.

Tests cover these three underlying TIMESTAMP data formats:

- MariaDB Field_timestamp0 - UINT4 based
  Like in a MariaDB native installation running with mysql56_temporal_format=0

- MariaDB Field_timestampf - BINARY(4) based, with UNSIGNED_FLAG
  Like in a MariaDB native installation running with mysql56_temporal_format=1

- MySQL-alike Field_timestampf - BINARY(4) based, without UNSIGNED_FLAG
  Like with a MariaDB server running over a MySQL-5.6 directory
  (e.g. during a migragion).
2023-05-26 16:47:16 +04:00
Oleksandr Byelkin
08fedeb3d3 Merge branch '10.10' into bb-10.10-release 2023-05-25 12:02:34 +02:00
Oleksandr Byelkin
44c9008ba6 Merge branch '10.9' into bb-10.9-release 2023-05-25 11:35:05 +02:00
Marko Mäkelä
31be25349f Merge 10.6 into 10.9 2023-05-25 09:24:32 +03:00
Alexander Barkov
9edb1a5ce3 MDEV-30483 After upgrade to 10.6 from Mysql 5.7 seeing "InnoDB: Column last_update in table mysql.innodb_table_stats is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL"
Problem:

Field_timestampf implementations differ in MySQL and MariaDB:
- MariaDB sets the UNSIGNED_FLAG in Field::flags
- MySQL does not

The reference table structures
(defined in table_stats_schema and index_stats_schema)
expected the last_update column to have the DATA_UNSIGNED flag,
because MariaDB's Field_timestampf has the UNSIGNED_FLAG.
It worked fine on pure MariaDB installations.

However, if a MariaDB server starts over a MySQL-5.7 data directory during
a migration, the last_update column does not have DATA_UNSIGNED flag,
because MySQL's Field_timestampf does not have the UNSIGNED_FLAG.

This made InnoDB (after the migration from MySQL) complain into the server
error log about the unexpected data type.

The actual fix is done in storage/innobase/dict/dict0stats.cc:
It removes DATA_UNSIGNED from the prtype_mask member of the reference columns,
so now it does not require the underlying columns to have this flag.

The rest of the fix is needed for MTR tests.
The new data type plugin TYPE_MYSQL_TIMESTAMP implements a slightly modified
version of Field_timestampf, which removes the unsigned flag, so
it works like MySQL's Field_timestampf.

The MTR test ALTERs the data type of the columns
table_stats_schema.last_update and index_stats_schema.last_update
from TIMESTAMP to TYPE_MYSQL_TIMESTAMP, then makes InnoDB
verify the structure of the two statistics tables by creating
and populating an InnoDB table t1.

Without the fix made storage/innobase/dict/dict0stats.cc,
MTR complains about unexpected warnings in the server error log:

[ERROR] InnoDB: Column last_update in table mysql.innodb_table_stats is ...
[ERROR] InnoDB: Column last_update in table mysql.innodb_index_stats is ...

With the fix made storage/innobase/dict/dict0stats.cc these warnings
go away.
2023-05-25 05:25:39 +04:00
Monty
9b3084b7be Fixed typo in xtrabackup.c 2023-05-24 18:39:55 +03:00
Monty
d77d9e1f6f MENT-1703 Repeatable crash during backup after processing very large ibdata1
The crash happened in filename_to_spacename() when using it on a
filename that is not in the format of "./database/table.ibd".
According to Marko, it is possible the function is called with
the path to an undo file, which would cause a crash.

This patch fixes this by,  instead of crashing with unexpected filenames,
returning them 'as such', except for changing all '\' to '/'.
2023-05-24 15:00:26 +03:00
Monty
e9fe39d566 MDEV-7389 Request: log warnings into SQL_ERROR_LOG
Changes:
- Audit_null records and displays warning count
- sql_error_log prints warnings

Reviewer: Alexey Botchkov <holyfoot@askmonty.org>
2023-05-24 13:21:55 +03:00
Thirunarayanan Balathandayuthapani
7737f15f87 MDEV-31333 fsp_free_page() fails to move the extent from
FSP_FREE_FRAG to FSP_FREE list

- This issue was caused by commit 0b47c126e3.
In fsp_free_page(), InnoDB should set XDES_FREE_BIT of the page before
moving the extent from FSP_FREE_FRAG to FSP_FREE list.
2023-05-24 15:15:29 +05:30
Oleksandr Byelkin
b0723f6c24 Merge branch '10.9' into 10.10 2023-05-24 09:49:19 +02:00
Oleksandr Byelkin
be19f81ad5 Merge branch '10.6' into 10.9 2023-05-24 09:46:27 +02:00
Marko Mäkelä
2cd6a48e72 Merge 10.5 into 10.6 2023-05-24 08:38:35 +03:00
Marko Mäkelä
b220bb756b Merge bb-10.6-release into 10.6 2023-05-24 08:37:19 +03:00
Marko Mäkelä
98de15aba1 Merge bb-10.5-release into bb-10.6-release 2023-05-24 08:36:30 +03:00
Marko Mäkelä
383105dae1 Merge bb-10.5-release into 10.5 2023-05-24 08:28:20 +03:00
Marko Mäkelä
c5cf94b2dc MDEV-31234 fixup: Free some UNDO pages earlier
trx_purge_truncate_rseg_history(): Add a parameter to specify if
the entire rollback segment is safe to be freed. If not, we may
still be able to invoke trx_undo_truncate_start() and free some pages.
2023-05-24 08:25:26 +03:00
Marko Mäkelä
270eeeb523 Merge 10.5 into 10.6 2023-05-23 12:25:39 +03:00
Marko Mäkelä
9c35f9c9c1 MDEV-31234 fixup: Allow innodb_undo_log_truncate=ON after upgrade
trx_purge_truncate_history(): Relax a condition that would prevent
undo log truncation if the undo log tablespaces were "contaminated"
by the bug that commit e0084b9d31 fixed.
That is, trx_purge_truncate_rseg_history() would have invoked
flst_remove() on TRX_RSEG_HISTORY but not reduced TRX_RSEG_HISTORY_SIZE.

To avoid any regression with normal operation, we implement this
fixup during slow shutdown only. The condition on the history list
being empty is necessary: without it, in the test
innodb.undo_truncate_recover there may be much fewer than the
expected 90,000 calls to row_purge() before the truncation.
That is, we would truncate the undo tablespace before actually having
processed all undo log records in it.

To truncate such "contaminated" or "bloated" undo log tablespaces
(when using innodb_undo_tablespaces=2 or more)
you can execute the following SQL:

BEGIN;INSERT mysql.innodb_table_stats VALUES('','',DEFAULT,0,0,0);ROLLBACK;
SET GLOBAL innodb_undo_log_truncate=ON, innodb_fast_shutdown=0;
SHUTDOWN;

The first line creates a dummy InnoDB transaction, to ensure that there
will be some history to be purged during shutdown and that the undo
tablespaces will be truncated.
2023-05-23 12:20:27 +03:00
Monty
a7adfd4c52 Optimized version of safe_strcpy()
Note: We should replace most case of safe_strcpy() with strmake() to avoid
the not needed zerofill.
2023-05-23 10:02:33 +03:00
Monty
92d2ceac73 MDEV-28285 Unexpected result when combining DISTINCT, subselect and LIMIT
The problem was that when  JOIN_TAB::remove_duplicates() noticed there
can only be one possible row in the output, it adjusted limits but
didn't take into account any possible offset.

Fixed by not adjusting limit offset when setting one-row-limit.
2023-05-23 09:16:36 +03:00
Monty
cd37e49422 MDEV-31083 ASAN use-after-poison in myrg_attach_children
The reason for ASAN report was that the MERGE and MYISAM file
had different key definitions, which is not allowed.

Fixed by ensuring that the MERGE code is not copying more key stats
than what is in the MyISAM file.

Other things:
- Give an error if different MyISAM files has different number of
  key parts.
2023-05-23 09:16:36 +03:00
Monty
c7e04af8bc Update main.selectivity test and results 2023-05-23 09:16:36 +03:00
Monty
6a0314063d Make install.db read only in mtr
This ensures that no mtr test can change install.db after it's initial
creation as changing it while as another thread is coping it will lead to
failures in at least InnoDB and Aria recovery.

Fixed spider/bugfix.mdev_30370 that was wrongly used install.db
2023-05-23 09:16:36 +03:00
Monty
b0c285bb06 Remove warning of not freed memory if mysqld aborts
Fixes warning when doing:
./sql/mariadbd --socket=/tmp/xxxx/ddd
2023-05-22 18:05:31 +03:00
Monty
16258677b3 MDEV-6768 Wrong result with aggregate with join with no result set
When a query does implicit grouping and join operation produces an empty
result set, a NULL-complemented row combination is generated.
However, constant table fields still show non-NULL values.

What happens in the is that end_send_group() is called with a
const row but without any rows matching the WHERE clause.
This last part is shown by 'join->first_record' not being set.

This causes item->no_rows_in_result() to be called for all items to reset
all sum functions to their initial state. However fields are not set
to NULL.

The used fix is to produce NULL-complemented records for constant tables
as well. Also, reset the constant table's records back in case we're
in a subquery which may get re-executed.
An alternative fix would have item->no_rows_in_result() also work
with Item_field objects.

There is some other issues with the code:
- join->no_rows_in_result_called is used but never set.
- Tables that are used with group functions are not properly marked as
  maybe_null, which is required if the table rows should be regarded as
  null-complemented (not existing).
- The code that tries to detect if mixed_implicit_grouping should be set
  didn't take into account all usage of fields and sum functions.
- Item_func::restore_to_before_no_rows_in_result() called the wrong
  function.
- join->clear() does not use a table_map argument to clear_tables(),
  which caused it to ignore constant tables.
- unclear_tables() does not correctly restore status to what is
  was before clear_tables().

Main bug fix was to always use a table_map argument to clear_tables() and
always use join->clear() and clear_tables() together with unclear_tables().

Other fixes:
- Fixed Item_func::restore_to_before_no_rows_in_result()
- Set 'join->no_rows_in_result_called' when no_rows_in_result_set()
  is called.
- Removed not used argument from setup_end_select_func().
- More code comments
- Ensure that end_send_group() modifies the same fields as are in the
  result set.
- Changed return_zero_rows() to use pointers instead of references,
  similar to the rest of the code.

Reviewer: Sergei Petrunia <sergey@mariadb.com>
2023-05-22 17:15:46 +03:00
Marko Mäkelä
3bd10b57b2 Merge 10.5 into 10.6 2023-05-22 17:11:41 +03:00
Marko Mäkelä
a5ce335ac9 MDEV-29593 fixup: Avoid a leak if rseg.undo_cached is corrupted
trx_purge_truncate_rseg_history(): Avoid a leak similar to the one
that was fixed in MDEV-31324, in case a supposedly cached undo log
page is not found in the rseg.undo_cached list.
2023-05-22 17:10:25 +03:00
Daniel Black
c0adb05b30 MDEV-31268: Fedora mariadb-connector-c-config conflicts with MariaDB's MariaDB-common
The previous fix in MDEV-24629 had a version end of life date.

Thanks @pgnd on Zulip for noticing.
2023-05-22 16:54:42 +10:00
Marko Mäkelä
0796b7ad5e Merge 10.6 into 10.9 2023-05-22 09:13:51 +03:00
Marko Mäkelä
eb2e074494 Merge 10.5 into 10.6 2023-05-22 08:38:21 +03:00
Teemu Ollakka
f307160218 MDEV-29293 MariaDB stuck on starting commit state
This commit contains a merge from 10.5-MDEV-29293-squash
into 10.6.

Although the bug MDEV-29293 was not reproducible with 10.6,
the fix contains several improvements for wsrep KILL query and
BF abort handling, and addresses the following issues:

* MDEV-30307 KILL command issued inside a transaction is
  problematic for galera replication:
  This commit will remove KILL TOI replication, so Galera side
  transaction context is not lost during KILL.
* MDEV-21075 KILL QUERY maintains nodes data consistency but
  breaks GTID sequence: This is fixed as well as KILL does not
  use TOI, and thus does not change GTID state.
* MDEV-30372 Assertion in wsrep-lib state: This was caused by
  BF abort or KILL when local transaction was in the middle
  of group commit. This commit disables THD::killed handling
  during commit, so the problem is avoided.
* MDEV-30963 Assertion failure !lock.was_chosen_as_deadlock_victim
  in trx0trx.h:1065: The assertion happened when the victim was
  BF aborted via MDL while it was committing. This commit changes
  MDL BF aborts so that transactions which are committing cannot
  be BF aborted via MDL. The RQG grammar attached in the issue
  could not reproduce the crash anymore.

Original commit message from 10.5 fix:

    MDEV-29293 MariaDB stuck on starting commit state

    The problem seems to be a deadlock between KILL command execution
    and BF abort issued by an applier, where:
    * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data.
    * Applier has innodb side global lock mutex and victim trx mutex.
    * KILL is calling innobase_kill_query, and is blocked by innodb
      global lock mutex.
    * Applier is in wsrep_innobase_kill_one_trx and is blocked by
      victim's LOCK_thd_kill.

    The fix in this commit removes the TOI replication of KILL command
    and makes KILL execution less intrusive operation. Aborting the
    victim happens now by using awake_no_mutex() and ha_abort_transaction().
    If the KILL happens when the transaction is committing, the
    KILL operation is postponed to happen after the statement
    has completed in order to avoid KILL to interrupt commit
    processing.

    Notable changes in this commit:
    * wsrep client connections's error state may remain sticky after
      client connection is closed. This error message will then pop
      up for the next client session issuing first SQL statement.
      This problem raised with test galera.galera_bf_kill.
      The fix is to reset wsrep client error state, before a THD is
      reused for next connetion.
    * Release THD locks in wsrep_abort_transaction when locking
      innodb mutexes. This guarantees same locking order as with applier
      BF aborting.
    * BF abort from MDL was changed to do BF abort on server/wsrep-lib
      side first, and only then do the BF abort on InnoDB side. This
      removes the need to call back from InnoDB for BF aborts which originate
      from MDL and simplifies the locking.
    * Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h.
      The manipulation of the wsrep_aborter can be done solely on
      server side. Moreover, it is now debug only variable and
      could be excluded from optimized builds.
    * Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
      fine grained locking for SR BF abort which may require locking
      of victim LOCK_thd_kill. Added explicit call for
      wsrep_thd_kill_LOCK/UNLOCK where appropriate.
    * Wsrep-lib was updated to version which allows external
      locking for BF abort calls.

    Changes to MTR tests:
    * Disable galera_bf_abort_group_commit. This test is going to
      be removed (MDEV-30855).
    * Make galera_var_retry_autocommit result more readable by echoing
      cases and expectations into result. Only one expected result for
      reap to verify that server returns expected status for query.
    * Record galera_gcache_recover_manytrx as result file was incomplete.
      Trivial change.
    * Make galera_create_table_as_select more deterministic:
      Wait until CTAS execution has reached MDL wait for multi-master
      conflict case. Expected error from multi-master conflict is
      ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
      wsrep transaction when it is waiting for MDL, query gets interrupted
      instead of BF aborted. This should be addressed in separate task.
    * A new test galera_bf_abort_registering to check that registering trx gets
      BF aborted through MDL.
    * A new test galera_kill_group_commit to verify correct behavior
      when KILL is executed while the transaction is committing.

    Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
    Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com>

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-05-22 00:42:05 +02:00
Teemu Ollakka
3f59bbeeae MDEV-29293 MariaDB stuck on starting commit state
The problem seems to be a deadlock between KILL command execution
and BF abort issued by an applier, where:
* KILL has locked victim's LOCK_thd_kill and LOCK_thd_data.
* Applier has innodb side global lock mutex and victim trx mutex.
* KILL is calling innobase_kill_query, and is blocked by innodb
  global lock mutex.
* Applier is in wsrep_innobase_kill_one_trx and is blocked by
  victim's LOCK_thd_kill.

The fix in this commit removes the TOI replication of KILL command
and makes KILL execution less intrusive operation. Aborting the
victim happens now by using awake_no_mutex() and ha_abort_transaction().
If the KILL happens when the transaction is committing, the
KILL operation is postponed to happen after the statement
has completed in order to avoid KILL to interrupt commit
processing.

Notable changes in this commit:
* wsrep client connections's error state may remain sticky after
  client connection is closed. This error message will then pop
  up for the next client session issuing first SQL statement.
  This problem raised with test galera.galera_bf_kill.
  The fix is to reset wsrep client error state, before a THD is
  reused for next connetion.
* Release THD locks in wsrep_abort_transaction when locking
  innodb mutexes. This guarantees same locking order as with applier
  BF aborting.
* BF abort from MDL was changed to do BF abort on server/wsrep-lib
  side first, and only then do the BF abort on InnoDB side. This
  removes the need to call back from InnoDB for BF aborts which originate
  from MDL and simplifies the locking.
* Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h.
  The manipulation of the wsrep_aborter can be done solely on
  server side. Moreover, it is now debug only variable and
  could be excluded from optimized builds.
* Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
  fine grained locking for SR BF abort which may require locking
  of victim LOCK_thd_kill. Added explicit call for
  wsrep_thd_kill_LOCK/UNLOCK where appropriate.
* Wsrep-lib was updated to version which allows external
  locking for BF abort calls.

Changes to MTR tests:
* Disable galera_bf_abort_group_commit. This test is going to
  be removed (MDEV-30855).
* Record galera_gcache_recover_manytrx as result file was incomplete.
  Trivial change.
* Make galera_create_table_as_select more deterministic:
  Wait until CTAS execution has reached MDL wait for multi-master
  conflict case. Expected error from multi-master conflict is
  ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
  wsrep transaction when it is waiting for MDL, query gets interrupted
  instead of BF aborted. This should be addressed in separate task.
* A new test galera_kill_group_commit to verify correct behavior
  when KILL is executed while the transaction is committing.

Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi>
Co-authored-by: Jan Lindström <jan.lindstrom@galeracluster.com>
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-05-22 00:39:43 +02:00
Robin Newhouse
03d4fd3214 Backport GitLab CI to 10.5
Add .gitlab-ci.yml file to earliest supported branch to enable
automated building and testing for all MariaDB major branches.

Note to mergers:
GitLab CI is available for branches >= 10.6. This commit includes a
GitLab CI file identical to that in branches >= 10.6, except for the
MARIADB_MAJOR_VERSION variable which should reflect the branch version.
A modified CI will be included in branches 10.4 with PR !2418.
Also changed is the `allow_failure: true` for the MSAN build,
which should be merged up to later branches.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
2023-05-19 16:25:47 +01:00
Marko Mäkelä
d2420669bd MDEV-31309 Innodb_buffer_pool_read_requests is not updated correctly
srv_export_innodb_status(): Update
export_vars.innodb_buffer_pool_read_requests as it was done
before commit a55b951e60 (MDEV-26827).
If innodb_status_variables[] pointed to a sharded variable, it would
only access the first shard.
2023-05-19 15:38:48 +03:00
Marko Mäkelä
df524dc06f MDEV-31308 InnoDB monitor trx_rseg_history_len was accidentally disabled by default
innodb_counter_info[]: Revert a change that was accidentally made in
commit 204e7225dc
2023-05-19 15:29:26 +03:00
Marko Mäkelä
f2c17cc9d9 MDEV-29911 InnoDB recovery and mariadb-backup --prepare fail to report detailed progress
This is a 10.6 port of commit 2f9e264781
from MariaDB Server 10.9 that is missing some optimization due to a
more complex redo log format and recovery logic
(which was simplified in commit 685d958e38).

The progress reporting of InnoDB crash recovery was rather intermittent.
Nothing was reported during the single-threaded log record parsing, which
could consume minutes when parsing a large log. During log application,
there only was progress reporting in background threads that would be
invoked on data page read completion.

The progress reporting here will be detailed like this:

InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799
InnoDB: Read redo log up to LSN=1963895808
InnoDB: Multi-batch recovery needed at LSN 2534560930
InnoDB: Read redo log up to LSN=3312233472
InnoDB: Read redo log up to LSN=1599646720
InnoDB: Read redo log up to LSN=2160831488
InnoDB: To recover: LSN 2806789376/2806819840; 195082 pages
InnoDB: To recover: LSN 2806789376/2806819840; 63507 pages
InnoDB: Read redo log up to LSN=3195776000
InnoDB: Read redo log up to LSN=3687099392
InnoDB: Read redo log up to LSN=4165315584
InnoDB: To recover: LSN 4374395699/4374440960; 241454 pages
InnoDB: To recover: LSN 4374395699/4374440960; 123701 pages
InnoDB: Read redo log up to LSN=4508724224
InnoDB: Read redo log up to LSN=5094550528
InnoDB: To recover: 205230 pages

The previous messages "Starting a batch to recover" or
"Starting a final batch to recover" will be replaced by
"To recover: ... pages" messages.

If a batch lasts longer than 15 seconds, then there will be
progress reports every 15 seconds, showing the number of remaining pages.
For the non-final batch, the "To recover:" message includes two end LSN:
that of the batch, and of the recovered log. This is the primary measure
of progress. The batch will end once the number of pages to recover
reaches 0.

If recovery is possible in a single batch, the output will look like this,
with a shorter "To recover:" message that counts only the remaining pages:

InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799
InnoDB: Read redo log up to LSN=1984539648
InnoDB: Read redo log up to LSN=2710875136
InnoDB: Read redo log up to LSN=3358895104
InnoDB: Read redo log up to LSN=3965299712
InnoDB: Read redo log up to LSN=4557417472
InnoDB: Read redo log up to LSN=5219527680
InnoDB: To recover: 450915 pages

We will also speed up recovery by improving the memory management and
implementing multi-threaded recovery of data pages that will not need
to be read into the buffer pool ("fake read"). Log application in the
"fake read" threads will be protected by an atomic being_recovered field
and exclusive buf_page_t::lock.

Recovery will reserve for data pages two thirds of the buffer pool,
or 256 pages, whichever is smaller. Previously, we could only use at most
one third of the buffer pool for buffered log records. This would typically
mean that with large buffer pools, recovery unnecessary consisted of
multiple batches.

If recovery runs out of memory, it will "roll back" or "rewind" the current
mini-transaction. The recv_sys.recovered_lsn and recv_sys.pages
will correspond to the "out of memory LSN", at the end of the previous
complete mini-transaction.

If recovery runs out of memory while executing the final recovery batch,
we can simply invoke recv_sys.apply(false) to make room, and resume
parsing.

If recovery runs out of memory before the final batch, we will
scan the redo log to the end and check for any missing or inconsistent
files. In this version of the patch, we will throw away any previously
buffered recv_sys.pages and rescan the log from the checkpoint onwards.

recv_sys_t::pages_it: A cached iterator to recv_sys.pages.

recv_sys_t::is_memory_exhausted(): Remove. We will have out-of-memory
handling deep inside recv_sys_t::parse().

recv_sys_t::rewind(), page_recv_t::recs_t::rewind():
Remove all log starting with a specific LSN.

IORequest::write_complete(), IORequest::read_complete():
Replaces fil_aio_callback().

read_io_callback(), write_io_callback(): Replaces io_callback().

IORequest::fake_read_complete(), fake_io_callback(), os_fake_read():
Process a "fake read" request for concurrent recovery.

recv_sys_t::apply_batch(): Choose a number of successive pages
for a recovery batch.

recv_sys_t::erase(recv_sys_t::map::iterator): Remove log records for a
page whose recovery is not in progress. Log application threads
will not invoke this; they will only set being_recovered=-1 to indicate
that the entry is no longer needed.

recv_sys_t::garbage_collect(): Remove all being_recovered=-1 entries.

recv_sys_t::wait_for_pool(): Wait for some space to become available
in the buffer pool.

mlog_init_t::mark_ibuf_exist(): Avoid calls to
recv_sys::recover_low() via ibuf_page_exists() and buf_page_get_low().
Such calls would lead to double locking of recv_sys.mutex, which
depending on implementation could cause a deadlock. We will use
lower-level calls to look up index pages.

buf_LRU_block_remove_hashed(): Disable consistency checks for freed
ROW_FORMAT=COMPRESSED pages. Their contents could be uninitialized garbage.
This fixes an occasional failure of the test
innodb.innodb_bulk_create_index_debug.

Tested by: Matthias Leich
2023-05-19 15:20:07 +03:00
Marko Mäkelä
2f9e264781 MDEV-29911 InnoDB recovery and mariadb-backup --prepare fail to report detailed progress
The progress reporting of InnoDB crash recovery was rather intermittent.
Nothing was reported during the single-threaded log record parsing, which
could consume minutes when parsing a large log. During log application,
there only was progress reporting in background threads that would be
invoked on data page read completion.

The progress reporting here will be detailed like this:

InnoDB: Starting crash recovery from checkpoint LSN=503549688
InnoDB: Parsed redo log up to LSN=1990840177; to recover: 124806 pages
InnoDB: Parsed redo log up to LSN=2729777071; to recover: 186123 pages
InnoDB: Parsed redo log up to LSN=3488599173; to recover: 248397 pages
InnoDB: Parsed redo log up to LSN=4177856618; to recover: 306469 pages
InnoDB: Multi-batch recovery needed at LSN 4189599815
InnoDB: End of log at LSN=4483551634
InnoDB: To recover: LSN 4189599815/4483551634; 307490 pages
InnoDB: To recover: LSN 4189599815/4483551634; 197159 pages
InnoDB: To recover: LSN 4189599815/4483551634; 67623 pages
InnoDB: Parsed redo log up to LSN=4353924218; to recover: 102083 pages
...
InnoDB: log sequence number 4483551634 ...

The previous messages "Starting a batch to recover" or
"Starting a final batch to recover" will be replaced by
"To recover: ... pages" messages.

If a batch lasts longer than 15 seconds, then there will be
progress reports every 15 seconds, showing the number of remaining pages.
For the non-final batch, the "To recover:" message includes two end LSN:
that of the batch, and of the recovered log. This is the primary measure
of progress. The batch will end once the number of pages to recover
reaches 0.

If recovery is possible in a single batch, the output will look like this,
with a shorter "To recover:" message that counts only the remaining pages:

InnoDB: Starting crash recovery from checkpoint LSN=503549688
InnoDB: Parsed redo log up to LSN=1998701027; to recover: 125560 pages
InnoDB: Parsed redo log up to LSN=2734136874; to recover: 186446 pages
InnoDB: Parsed redo log up to LSN=3499505504; to recover: 249378 pages
InnoDB: Parsed redo log up to LSN=4183247844; to recover: 306964 pages
InnoDB: End of log at LSN=4483551634
...
InnoDB: To recover: 331797 pages
...
InnoDB: log sequence number 4483551634 ...

We will also speed up recovery by improving the memory management and
implementing multi-threaded recovery of data pages that will not need
to be read into the buffer pool ("fake read"). Log application in the
"fake read" threads will be protected by an atomic being_recovered field
and exclusive buf_page_t::latch.

Recovery will reserve for data pages two thirds of the buffer pool,
or 256 pages, whichever is smaller. Previously, we could only use at most
one third of the buffer pool for buffered log records. This would typically
mean that with large buffer pools, recovery unnecessary consisted of
multiple batches.

If recovery runs out of memory, it will "roll back" or "rewind" the current
mini-transaction. The recv_sys.lsn and recv_sys.pages will correspond
to the "out of memory LSN", at the end of the previous complete
mini-transaction.

If recovery runs out of memory while executing the final recovery batch,
we can simply invoke recv_sys.apply(false) to make room, and resume
parsing.

If recovery runs out of memory before the final batch, we will scan
the redo log to the end (recv_sys.scanned_lsn) and check for any missing
or inconsistent files. If recv_init_crash_recovery_spaces() does not
report any potentially missing tablespaces, we can make use of the
already stored recv_sys.pages and only rewind to the "out of memory LSN".
Else, we must keep parsing and invoking recv_validate_tablespace()
until an error has been found or everything has been resolved, and
ultimatily rewind to to the checkpoint LSN.

recv_sys_t::pages_it: A cached iterator to recv_sys.pages

recv_sys_t::parse_mtr(): Remove an ATTRIBUTE_NOINLINE that would
prevent tail call optimization in recv_sys_t::parse_pmem().

recv_sys_t::parse(), recv_sys_t::parse_mtr(), recv_sys_t::parse_pmem():
Add template<bool store> parameter. Redo log record parsing
(store=false) is better specialized from store=true
(with bool if_exists) so that we can avoid some conditional branches
in frequently invoked low-level code.

recv_sys_t::is_memory_exhausted(): Remove. The special parse() status
GOT_OOM will report out-of-memory situation at the low level.

recv_sys_t::rewind(), page_recv_t::recs_t::rewind():
Remove all log starting with a specific LSN.

recv_scan_log(): Separate some code for only parsing, not storing log.
In rewound_lsn, remember the LSN at which last_phase=false recovery
ran out of memory. This is where the next call to recv_scan_log()
will resume storing the log. This replaces recv_sys.last_stored_lsn.

recv_sys_t::parse(): Evaluate the template parameter store in a few more
cases, to allow dead code to be eliminated at compile time.

recv_sys_t::scanned_lsn: The end of the log found by recv_scan_log().
The special value 1 means that recv_sys has been initialized but
no log has been parsed.

IORequest::write_complete(), IORequest::read_complete():
Replaces fil_aio_callback().

read_io_callback(), write_io_callback(): Replaces io_callback().

IORequest::fake_read_complete(), fake_io_callback(), os_fake_read():
Process a "fake read" request for concurrent recovery.

recv_sys_t::apply_batch(): Choose a number of successive pages
for a recovery batch.

recv_sys_t::erase(recv_sys_t::map::iterator): Remove log records for a
page whose recovery is not in progress. Log application threads
will not invoke this; they will only set being_recovered=-1 to indicate
that the entry is no longer needed.

recv_sys_t::garbage_collect(): Remove all being_recovered=-1 entries.

recv_sys_t::wait_for_pool(): Wait for some space to become available
in the buffer pool.

mlog_init_t::mark_ibuf_exist(): Avoid calls to
recv_sys::recover_low() via ibuf_page_exists() and buf_page_get_low().
Such calls would lead to double locking of recv_sys.mutex, which
depending on implementation could cause a deadlock. We will use
lower-level calls to look up index pages.

buf_LRU_block_remove_hashed(): Disable consistency checks for freed
ROW_FORMAT=COMPRESSED pages. Their contents could be uninitialized garbage.
This fixes an occasional failure of the test
innodb.innodb_bulk_create_index_debug.

Tested by: Matthias Leich
2023-05-19 15:15:38 +03:00
Marko Mäkelä
2ec42f793d Merge 10.6 into 10.9 2023-05-19 15:11:06 +03:00
Vlad Lesin
5422784792 MDEV-31256 fil_node_open_file() releases fil_system.mutex allowing other thread to open its file node
There is room between mutex_exit(&fil_system.mutex) and
mutex_enter(&fil_system.mutex) calls in fil_node_open_file(). During this
room another thread can open the node, and ut_ad(!node->is_open())
assertion in fil_node_open_file_low() can fail.

The fix is not to open node if it was already opened by another thread.
2023-05-19 14:52:47 +03:00
Marko Mäkelä
1fe830b56a Merge 10.5 into 10.6 2023-05-19 14:24:09 +03:00