Commit graph

966 commits

Author SHA1 Message Date
Marko Mäkelä
e5c4f4e590 Merge 10.3 into 10.4 2022-07-27 14:25:36 +03:00
Marko Mäkelä
0ee1082bd2 MDEV-28495 InnoDB corruption due to lack of file locking
Starting with commit da094188f6 (MDEV-24393),
MariaDB will no longer acquire advisory file locks on InnoDB data
files by default, because it would create a large number of
entries in Linux /proc/locks.

The motivation for acquiring the file locks is to prevent accidental
concurrent startup of multiple server processes on the same data files.
Such mistake still turns out to be relatively common, based on
corruption bug reports from the community.

To prevent corruption due to concurrent startup attempts, the
Aria storage engine would unconditionally acquire an advisory lock
on one of its log files.

Solution: InnoDB will always lock its system tablespace files.
(Ever since commit 685d958e38
the InnoDB log file will not necessarily be open while the
server is running, because it can be accessed via memory-mapped I/O.)

If more protection is desired, then the option --external-locking
can be used.

The mandatory advisory lock also fixes intermittent failures of
some crash recovery tests. It turns out that when the mtr test harness
kills and restarts the server, it will not actually ensure that the
old process has terminated before starting the new one.
2022-07-27 14:15:14 +03:00
Marko Mäkelä
99c8aed00d MDEV-28601 InnoDB history list length was reverted to 32 bits
srv_do_purge(): In commit edde1f6e0d
when the de-facto 32-bit trx_sys_t::history_size() was replaced with
32-bit trx_sys.rseg_history_len, some more variables were changed
from ulint (size_t) to uint32_t.

The history list length is the number of committed transactions whose
undo logs are waiting to be purged. Each TRX_RSEG_HISTORY list is
storing the number of entries in a 32-bit field and each transaction
will occupy at least one undo log page. It is thinkable that the
length of each TRX_RSEG_HISTORY list may approach the maximum
representable number. The number cannot be exceeded, because the
rollback segment header is allocated from the same tablespace as
the undo log header pages it is pointing to, and because the page
numbers of a tablespace are stored in 32 bits. In any case, it is
possible that the total number of unpurged committed transactions
cannot be represented in 32 but 39 bits (corresponding to
128 rollback segments and undo tablespaces).
2022-05-25 14:06:04 +03:00
Sergei Golubchik
23ddc3518f Merge branch '10.3' into 10.4 2022-05-18 01:25:30 +02:00
Marko Mäkelä
3e564d468d MDEV-28541 Unused counter Innodb_encryption_key_rotation_list_length
The counter srv_stats.key_rotation_list_length is never updated, and
therefore Innodb_encryption_key_rotation_list_length will always be 0.

The view INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION comes close
to reporting this information.
2022-05-16 13:45:17 +03:00
Marko Mäkelä
4e1bf2bb23 MDEV-28537 Unused or useless InnoDB counters num_index_pages_written, num_non_index_pages_written
The counters were added in commit 5e55d1ced5
and any code to update them was
inadvertently removed in commit 2e814d4702
when applying InnoDB changes from MySQL 5.7.

Let us remove these counters that never reported anything useful. If such
statistics are really needed in a special case, they can be obtained by
instrumenting the code by some means, such as eBPF or a source code patch.
2022-05-16 13:41:53 +03:00
Marko Mäkelä
d172df9913 MDEV-25975: Merge 10.3 into 10.4 2022-04-06 09:18:38 +03:00
Marko Mäkelä
e9735a8185 MDEV-25975 innodb_disallow_writes causes shutdown to hang
We will remove the parameter innodb_disallow_writes because it is badly
designed and implemented. The parameter was never allowed at startup.
It was only internally used by Galera snapshot transfer.
If a user executed
SET GLOBAL innodb_disallow_writes=ON;
the server could hang even on subsequent read operations.

During Galera snapshot transfer, we will block writes
to implement an rsync friendly snapshot, as follows:

sst_flush_tables() will acquire a global lock by executing
FLUSH TABLES WITH READ LOCK, which will block any writes
at the high level.

sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true),
will suspend or disable InnoDB background tasks or threads that could
initiate writes. As part of this, log_make_checkpoint() will be invoked
to ensure that anything in the InnoDB buf_pool.flush_list will be written
to the data files. This has the nice side effect that the Galera joiner
will avoid crash recovery.

The changes to sql/wsrep.cc and to the tests are based on a prototype
that was developed by Jan Lindström.

Reviewed by: Jan Lindström
2022-04-06 08:06:49 +03:00
Oleksandr Byelkin
a576a1cea5 Merge branch '10.3' into 10.4 2022-01-30 09:46:52 +01:00
Oleksandr Byelkin
41a163ac5c Merge branch '10.2' into 10.3 2022-01-29 15:41:05 +01:00
Daniel Black
410c4edef3 MDEV-27467: innodb to enforce the minimum innodb_buffer_pool_size in SET GLOBAL
.. to be the same as startup.

In resolving MDEV-27461, BUF_LRU_MIN_LEN (256) is the minimum number of
pages for the innodb buffer pool size. Obviously we need more than just
flushing pages. Taking the 16k page size and its default minimum, an
extra 25% is needed on top of the flushing pages to make a workable buffer
pool.

The minimum innodb_buffer_pool_chunk_size (1M) restricts the minimum
otherwise we'd have a pool made up of different chunk sizes.

The resulting minimum innodb buffer pool sizes are:

Page Size, Previously minimum (startup), with change.
        4k                            5M           2M
        8k                            5M           3M
       16k                            5M           5M
       32k                           24M          10M
       64k                           24M          20M

With this patch, SET GLOBAL innodb_buffer_pool_size minimums are
enforced.

The evident minimum system variable size for innodb_buffer_pool_size
is 2M, however this is only setable if using 4k page size. As
the order of the page_size and buffer_pool_size aren't fixed, we can't
hide this change.

Subsequent changes:
* innodb_buffer_pool_resize_with_chunks.test - raised of pool resize due to new
  minimums. Chunk size also needed increase as the test was for
  pool_size < chunk_size to generate a warning.
* Removed srv_buf_pool_min_size and replaced use with MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* Removed srv_buf_pool_def_size and replaced constant defination in
  MYSQL_SYSVAR_LONGLONG(buffer_pool_size)
* Reordered ha_innodb to allow for direct use of MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* Moved buf_pool_size_align into ha_innodb to access to MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* loose-innodb_disable_resize_buffer_pool_debug is needed in the
  innodb.restart.opt test so that under debug mode, resizing of the
  innodb buffer pool can occur.
2022-01-19 11:10:45 +11:00
Julius Goryavsky
681b7784b6 Merge branch 10.3 into 10.4 2021-12-25 12:13:03 +01:00
Julius Goryavsky
3376668ca8 Merge branch 10.2 into 10.3 2021-12-23 14:14:04 +01:00
Marko Mäkelä
ef9517eb81 MDEV-27268 Failed InnoDB initialization leaves garbage files behind
create_log_files(): Check log_set_capacity() before modifying
or creating any log files.

innobase_start_or_create_for_mysql(): If create_log_files()
fails and we were initializing a new database, delete the
system tablespace files before exiting.
2021-12-15 14:17:55 +02:00
sjaakola
5c230b21bf MDEV-23328 Server hang due to Galera lock conflict resolution
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is

          wsrep_thd_LOCK()
          wsrep_kill_victim()
          lock_rec_other_has_conflicting()
          lock_clust_rec_read_check_and_lock()
          row_search_mvcc()
          ha_innobase::index_read()
          ha_innobase::rnd_pos()
          handler::ha_rnd_pos()
          handler::rnd_pos_by_record()
          handler::ha_rnd_pos_by_record()
          Rows_log_event::find_row()
          Update_rows_log_event::do_exec_row()
          Rows_log_event::do_apply_event()
          Log_event::apply_event()
          wsrep_apply_events()

and mutexes are taken in the order

          lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data

When a normal KILL statement is executed, the stack is

          innobase_kill_query()
          kill_handlerton()
          plugin_foreach_with_mask()
          ha_kill_query()
          THD::awake()
          kill_one_thread()

        and mutexes are

          victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex

This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.

In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.

TOI replication is used, in this approach,  purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.

This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-10-29 09:52:52 +03:00
mkaruza
2f5ae0da71 MDEV-25883 Galera Cluster hangs while "DELETE FROM mysql.wsrep_cluster"
Using `innodb_thread_concurrency` will call `wsrep_thd_is_aborting` to
check WSREP thread state. This call should be protected by taking
`LOCK_thd_data` before entering function.
Applier and TOI threads should no be affected with usage of
`innodb_thread_concurrency` variable so returning before any checks.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-09-30 12:25:26 +03:00
Marko Mäkelä
d5bd704f4b Merge 10.3 into 10.4 2021-09-24 12:11:52 +03:00
Marko Mäkelä
4bfdba2e89 MDEV-26672 innodb_undo_log_truncate may reset transaction ID sequence
trx_rseg_header_create(): Add a parameter for the value that is
to be written to TRX_RSEG_MAX_TRX_ID. If we omit this write, then
the updated test innodb.undo_truncate will fail for the 4k, 8k, 16k
page sizes. This was broken ever since
commit 947efe17ed (MDEV-15158)
removed the writes of transaction identifiers to the TRX_SYS page.

srv_do_purge(): Truncate undo tablespaces also during slow shutdown
(innodb_fast_shutdown=0).

Thanks to Krunal Bauskar for noticing this problem.
2021-09-24 11:23:37 +03:00
Marko Mäkelä
baf0ef9a18 After-merge fix: Remove duplicated code
In the merge commit d3e4fae797
a message about innodb_force_recovery was accidentally duplicated.
2021-06-21 14:05:43 +03:00
Marko Mäkelä
d3e4fae797 Merge 10.3 into 10.4 2021-06-21 12:38:25 +03:00
Marko Mäkelä
e46f76c974 MDEV-15912: Remove traces of insert_undo
Let us simply refuse an upgrade from earlier versions if the
upgrade procedure was not followed. This simplifies the purge,
commit, and rollback of transactions.

Before upgrading to MariaDB 10.3 or later, a clean shutdown
of the server (with innodb_fast_shutdown=1 or 0) is necessary,
to ensure that any incomplete transactions are rolled back.
The undo log format was changed in MDEV-12288. There is only
one persistent undo log for each transaction.
2021-06-21 12:34:07 +03:00
Nikita Malyavin
509e4990af Merge branch bb-10.3-release into bb-10.4-release 2021-05-05 23:03:01 +03:00
Nikita Malyavin
a8a925dd22 Merge branch bb-10.2-release into bb-10.3-release 2021-05-04 14:49:31 +03:00
Nikita Malyavin
300253acf1 revive innodb_debug_sync
innodb_debug_sync was introduced in commit
b393e2cb0c and reverted in
commit fc58c17216 due to memory leak reported
by valgrind, see MDEV-21336.

The leak is now fixed by adding `rw_lock_free(&slot->debug_sync_lock)`
after background thread working loop is finished, and the patch is
reapplied, with respect to c++98 fixes by Marko.

The missing DEBUG_SYNC for MDEV-18546 in row0vers.cc is also reapplied.
2021-04-27 11:51:17 +03:00
Marko Mäkelä
5008171b05 Merge 10.3 into 10.4 2021-04-14 10:33:59 +03:00
Marko Mäkelä
450c017c2d Merge 10.2 into 10.3 2021-04-09 14:32:06 +03:00
Srinidhi Kaushik
5bc5ecce08 MDEV-24197: Add "innodb_force_recovery" for "mariabackup --prepare"
During the prepare phase of restoring backups, "mariabackup" does
not seem to allow (or recognize) the option "innodb_force_recovery"
for the embedded InnoDB server instance that it starts.

If page corruption observed during page recovery, the prepare step
fails. While this is indeed the correct behavior ideally, allowing
this option to be set in case of emergencies might be useful when
the current backup is the only copy available. Some error messages
during "--prepare" suggest to set "innodb_force_recovery" to 1:

  [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.

For backwards compatibility, "mariabackup --innobackupex --apply-log"
should also have this option.

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
2021-04-01 13:34:40 +03:00
Marko Mäkelä
7ae37ff74f Merge 10.3 into 10.4 2021-03-27 17:12:28 +02:00
Marko Mäkelä
3157fa182a Merge 10.2 into 10.3 2021-03-27 16:11:26 +02:00
Marko Mäkelä
0f8caadc96 MDEV-22653: Remove the useless parameter innodb_simulate_comp_failures
The debug parameter innodb_simulate_comp_failures injected compression
failures for ROW_FORMAT=COMPRESSED tables, breaking the pre-existing
logic that I had implemented in the InnoDB Plugin for MySQL 5.1 to prevent
compressed page overflows. A much better check is already achieved by
defining UNIV_ZIP_COPY at the compilation time.
(Only UNIV_ZIP_DEBUG is part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON.)
2021-03-22 18:12:44 +02:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
Marko Mäkelä
59eda73eff MDEV-24751: member call on fil_system.temp_space in innodb_shutdown()
innodb_shutdown(): Check that fil_system.temp_space is not null
before invoking a member function.

This regression was caused by the merge
commit fa1aef39eb
of MDEV-24340 (commit 1eb59c307d).
2021-02-01 13:17:17 +02:00
Marko Mäkelä
ea9cd97f85 MDEV-24536 innodb_idle_flush_pct has no effect
The parameter innodb_idle_flush_pct that was introduced in
MariaDB Server 10.1.2 by MDEV-6932 has no effect ever since
the InnoDB changes from MySQL 5.7.9 were applied in
commit 2e814d4702.

Let us declare the parameter as deprecated and having no effect.
2021-01-13 18:55:56 +02:00
Marko Mäkelä
0aa02567dd Merge 10.3 into 10.4 2020-12-23 14:52:59 +02:00
Marko Mäkelä
fa1aef39eb Merge 10.2 into 10.3 2020-12-23 14:25:45 +02:00
Marko Mäkelä
1eb59c307d MDEV-24340 Unique final message of InnoDB during shutdown
innobase_space_shutdown(): Remove. We want this step to be executed
before the message "InnoDB: Shutdown completed; log sequence number "
is output by innodb_shutdown(). It used to be executed after that step.

innodb_shutdown(): Duplicate the code that used to live in
innobase_space_shutdown().

innobase_init_abort(): Merge with innobase_space_shutdown().
2020-12-04 11:46:47 +02:00
Eugene Kosov
a50cb4867a MDEV-24334 make monitor_set_tbl global variable thread-safe
Atomic_relaxed<T>: add fetch_or() and fetch_and()

innodb_init(): rely on a zero-initialization of a global variable

monitor_set_tbl: make Atomic_relaxed<ulint> array and use proper operations
for setting bit, unsetting bit and reading bit

Reviewed by: Marko Mäkelä
2020-12-03 11:55:36 +03:00
Marko Mäkelä
2fa9f8c53a Merge 10.3 into 10.4 2020-08-20 11:01:47 +03:00
Marko Mäkelä
de0e7cd72a Merge 10.2 into 10.3 2020-08-20 09:12:16 +03:00
Marko Mäkelä
309302a3da MDEV-23475 InnoDB performance regression for write-heavy workloads
In commit fe39d02f51 (MDEV-20638)
we removed some wake-up signaling of the master thread that should
have been there, to ensure a steady log checkpointing workload.

Common sense suggests that the commit omitted some necessary calls
to srv_inc_activity_count(). But, an attempt to add the call to
trx_flush_log_if_needed_low() as well as to reinstate the function
innobase_active_small() did not restore the performance for the
case where sync_binlog=1 is set.

Therefore, we will revert the entire commit in MariaDB Server 10.2.
In MariaDB Server 10.5, adding a srv_inc_activity_count() call to
trx_flush_log_if_needed_low() did restore the performance, so we
will not revert MDEV-20638 across all versions.
2020-08-19 11:18:56 +03:00
Marko Mäkelä
9216114ce7 Merge 10.3 into 10.4 2020-07-31 18:09:08 +03:00
Marko Mäkelä
66ec3a770f Merge 10.2 into 10.3 2020-07-31 13:51:28 +03:00
sjaakola
95132ade6d MDEV-20928 mtr test galera.galera_var_innodb_disallow_writes test failure
The sporadic test hangs happen because of mutex dealock between innodb
background threads and two test connection executions.
The test sets variable innodb_disallow_writes, which blocks all writes
to filesyste. The test logic is to execute an INSERT, which should hang
because of filesytstem writes are blocked, and through another session
verify by SELECT that this hanging happens. The SELECT session will then
release innodb_disallow_writes blocking.

However, filesystem write  blocking affects also innodb background threads
and they may hang while keeping some other resources locked.
As an example, in one test hang situation, buffer pool access was blocked.
And, if buffer pool is blocked, the test connections will be blocked as well,
and the SELECT session will not be able to continue to release the
innodb_disallow_writes.

The fix in this commit is refactoring of the test logic.
The test will now set first innodb_disallow_writes blocking, and then record
a hash of data directory's filesystem contents. This works as checksum of the
state of data on the datadirectory.

Then some SQL load is tried on both nodes, these sessions will be blocking
due to frozen file system state. The test will have a short sleep to allow
innodb background threads to loop and possibly encounter innodb_disallow_writes
blocking as well.

After the sleep, the test will record file system checksun for the second time,
and then release the innodb_disallow-writes blocking.

Finally, the two checksums are compared, they should be identical to verify that
nothing was written on datadirectory during the test execution.

The checksum is implemented by md5sum hash over all files found in datadirectory
by find command. all these file hashes are hashed together by one more md5sum.

The test therefore depends on md5sum and find. find may work differently with some
OS distributions, e.g. freebsd may be problematic.
2020-07-24 12:05:39 +03:00
Thirunarayanan Balathandayuthapani
fe39d02f51 MDEV-20638 Remove the deadcode from srv_master_thread() and srv_active_wake_master_thread_low()
- Due to commit fe95cb2e40 (MDEV-16125),
InnoDB master thread does not need to call srv_resume_thread()
and therefore there is no need to wake up the thread.
Due to the above patch, InnoDB should remove the following dead code.

srv_check_activity(): Makes the parameter as in,out and returns the
recent activity value

innobase_active_small(): Removed

srv_active_wake_master_thread(): Removed

srv_wake_master_thread(): Removed

srv_active_wake_master_thread_low(): Removed

Simplify srv_master_thread() and remove switch cases, added the assert.

Replace srv_wake_master_thread() with srv_inc_activity_count()

INNOBASE_WAKE_INTERVAL: Removed
2020-07-23 16:23:20 +05:30
Thirunarayanan Balathandayuthapani
3a8943ae73 MDEV-17481 mariadb service won't shutdown when it's running and the OS datetime updated backwards
__pthread_cond_timedwait() in page cleaner hangs if os time moved
backwards.Workaround could be waking up the page cleaner thread in
logs_empty_and_mark_files_at_shutdown(). But there is possibility that
server could hang when server is running. So InnoDB should wake up page
cleaner thread periodically in srv_master_do_idle_tasks().
2020-07-22 18:02:52 +05:30
Marko Mäkelä
4b959bd8df Merge 10.3 into 10.4 2020-07-20 15:34:59 +03:00
Marko Mäkelä
acc58fd835 Merge 10.2 into 10.3 2020-07-20 15:11:59 +03:00
Marko Mäkelä
ca9276e37e Merge 10.1 into 10.2 2020-07-20 14:53:24 +03:00
Marko Mäkelä
57ec42bc32 MDEV-23190 InnoDB data file extension is not crash-safe
When InnoDB is extending a data file, it is updating the FSP_SIZE
field in the first page of the data file.

In commit 8451e09073 (MDEV-11556)
we removed a work-around for this bug and made recovery stricter,
by making it track changes to FSP_SIZE via redo log records, and
extend the data files before any changes are being applied to them.

It turns out that the function fsp_fill_free_list() is not crash-safe
with respect to this when it is initializing the change buffer bitmap
page (page 1, or generally, N*innodb_page_size+1). It uses a separate
mini-transaction that is committed (and will be written to the redo
log file) before the mini-transaction that actually extended the data
file. Hence, recovery can observe a reference to a page that is
beyond the current end of the data file.

fsp_fill_free_list(): Initialize the change buffer bitmap page in
the same mini-transaction.

The rest of the changes are fixing a bug that the use of the separate
mini-transaction was attempting to work around. Namely, we must ensure
that no other thread will access the change buffer bitmap page before
our mini-transaction has been committed and all page latches have been
released.

That is, for read-ahead as well as neighbour flushing, we must avoid
accessing pages that might not yet be durably part of the tablespace.

fil_space_t::committed_size: The size of the tablespace
as persisted by mtr_commit().

fil_space_t::max_page_number_for_io(): Limit the highest page
number for I/O batches to committed_size.

MTR_MEMO_SPACE_X_LOCK: Replaces MTR_MEMO_X_LOCK for fil_space_t::latch.

mtr_x_space_lock(): Replaces mtr_x_lock() for fil_space_t::latch.

mtr_memo_slot_release_func(): When releasing MTR_MEMO_SPACE_X_LOCK,
copy space->size to space->committed_size. In this way, read-ahead
or flushing will never be invoked on pages that do not yet exist
according to FSP_SIZE.
2020-07-20 14:48:56 +03:00