Commit graph

3,652 commits

Author SHA1 Message Date
Marko Mäkelä
5ae5453291 MDEV-25919 fixup: MSAN and Valgrind errors related to statistics
dict_table_close(): Fix a race condition around dict_stats_deinit().
This was not observed; it should have been caught by an assertion.

dict_stats_deinit(): Slightly simplify the code.

ha_innobase::info_low(): If the table is unreadable,
initialize some dummy statistics.
2021-09-04 19:08:14 +03:00
Marko Mäkelä
45a05fda27 MDEV-25919: Replace dict_table_t::stats_bg_flag with MDL
The purpose of dict_table_t::stats_bg_flag was to prevent
race conditions between DDL operations and a background thread
that updates persistent statistics for InnoDB tables.

Now that with the parent commit, we started to acquire a
shared meta-data lock (MDL) on the InnoDB persistent statistics tables
in background tasks that access them, we may easily acquire MDL
on the table for which the statistics are being updated. This will by
design prevent race conditions with any DDL operations on that table,
and the stats_bg_flag may be removed.

dict_stats_process_entry_from_recalc_pool(): Complete rewrite.
During the processing, retain the entry in recalc_pool, so
that dict_stats_recalc_pool_del() will be able to request
deletion of the entry, or delete the entry if its caller is
holding MDL_EXCLUSIVE while we are waiting for MDL.

recalc_pool: In addition to the table ID, store a state for
inter-thread communication, so that dict_stats_recalc_pool_del()
can wait until all processing is finished.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:54:55 +03:00
Marko Mäkelä
c5fd9aa562 MDEV-25919: Lock tables before acquiring dict_sys.latch
In commit 1bd681c8b3 (MDEV-25506 part 3)
we introduced a "fake instant timeout" when a transaction would wait
for a table or record lock while holding dict_sys.latch. This prevented
a deadlock of the server but could cause bogus errors for operations
on the InnoDB persistent statistics tables.

A better fix is to ensure that whenever a transaction is being
executed in the InnoDB internal SQL parser (which will for now
require dict_sys.latch to be held), it will already have acquired
all locks that could be required for the execution. So, we will
acquire the following locks upfront, before acquiring dict_sys.latch:

(1) MDL on the affected user table (acquired by the SQL layer)
(2) If applicable (not for RENAME TABLE): InnoDB table lock
(3) If persistent statistics are going to be modified:
(3.a) MDL_SHARED on mysql.innodb_table_stats, mysql.innodb_index_stats
(3.b) exclusive table locks on the statistics tables
(4) Exclusive table locks on the InnoDB data dictionary tables
(not needed in ANALYZE TABLE and the like)

Note: Acquiring exclusive locks on the statistics tables may cause
more locking conflicts between concurrent DDL operations.
Notably, RENAME TABLE will lock the statistics tables
even if no persistent statistics are enabled for the table.

DROP DATABASE will only acquire locks on statistics tables if
persistent statistics are enabled for the tables on which the
SQL layer is invoking ha_innobase::delete_table().
For any "garbage collection" in innodb_drop_database(), a timeout
while acquiring locks on the statistics tables will result in any
statistics not being deleted for any tables that the SQL layer
did not know about.

If innodb_defragment=ON, information may be written to the statistics
tables even for tables for which InnoDB persistent statistics are
disabled. But, DROP TABLE will no longer attempt to delete that
information if persistent statistics are not enabled for the table.

This change should also fix the hangs related to InnoDB persistent
statistics and STATS_AUTO_RECALC (MDEV-15020) as well as
a bug that running ALTER TABLE on the statistics tables
concurrently with running ALTER TABLE on InnoDB tables could
cause trouble.

lock_rec_enqueue_waiting(), lock_table_enqueue_waiting():
Do not issue a fake instant timeout error when the transaction
is holding dict_sys.latch. Instead, assert that the dict_sys.latch
is never being held here.

lock_sys_tables(): A new function to acquire exclusive locks on all
dictionary tables, in case DROP TABLE or similar operation is
being executed. Locking non-hard-coded tables is optional to avoid
a crash in row_merge_drop_temp_indexes(). The SYS_VIRTUAL table was
introduced in MySQL 5.7 and MariaDB Server 10.2. Normally, we require
all these dictionary tables to exist before executing any DDL, but
the function row_merge_drop_temp_indexes() is an exception.
When upgrading from MariaDB Server 10.1 or MySQL 5.6 or earlier,
the table SYS_VIRTUAL would not exist at this point.

ha_innobase::commit_inplace_alter_table(): Invoke
log_write_up_to() while not holding dict_sys.latch.

dict_sys_t::remove(), dict_table_close(): No longer try to
drop index stubs that were left behind by aborted online ADD INDEX.
Such indexes should be dropped from the InnoDB data dictionary by
row_merge_drop_indexes() as part of the failed DDL operation.
Stubs for aborted indexes may only be left behind in the
data dictionary cache.

dict_stats_fetch_from_ps(): Use a normal read-only transaction.

ha_innobase::delete_table(), ha_innobase::truncate(), fts_lock_table():
While waiting for purge to stop using the table,
do not hold dict_sys.latch.

ha_innobase::delete_table(): Implement a work-around for the rollback
of ALTER TABLE...ADD PARTITION. MDL_EXCLUSIVE would not be held if
ALTER TABLE hits lock_wait_timeout while trying to upgrade the MDL
due to a conflicting LOCK TABLES, such as in the first ALTER TABLE
in the test case of Bug#53676 in parts.partition_special_innodb.
Therefore, we must explicitly stop purge, because it would not be
stopped by MDL.

dict_stats_func(), btr_defragment_chunk(): Allocate a THD so that
we can acquire MDL on the InnoDB persistent statistics tables.

mysqltest_embedded: Invoke ha_pre_shutdown() before free_used_memory()
in order to avoid ASAN heap-use-after-free related to acquire_thd().

trx_t::dict_operation_lock_mode: Changed the type to bool.

row_mysql_lock_data_dictionary(), row_mysql_unlock_data_dictionary():
Implemented as macros.

rollback_inplace_alter_table(): Apply an infinite timeout to lock waits.

innodb_thd_increment_pending_ops(): Wrapper for
thd_increment_pending_ops(). Never attempt async operation for
InnoDB background threads, such as the trx_t::commit() in
dict_stats_process_entry_from_recalc_pool().

lock_sys_t::cancel(trx_t*): Make dictionary transactions immune to KILL.

lock_wait(): Make dictionary transactions immune to KILL, and to
lock wait timeout when waiting for locks on dictionary tables.

parts.partition_special_innodb: Use lock_wait_timeout=0 to instantly
get ER_LOCK_WAIT_TIMEOUT.

main.mdl: Filter out MDL on InnoDB persistent statistics tables

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:54:44 +03:00
Marko Mäkelä
094de71742 MDEV-25919 preparation: Various cleanup
que_eval_sql(): Remove the parameter lock_dict. The only caller
with lock_dict=true was dict_stats_exec_sql(), which will now
explicitly invoke dict_sys.lock() and dict_sys.unlock() by itself.

row_import_cleanup(): Do not unnecessarily lock the dictionary.
Concurrent access to the table during ALTER TABLE...IMPORT TABLESPACE
is prevented by MDL and the fact that there cannot exist any
undo log or change buffer records that would refer to the table
or tablespace.

row_import_for_mysql(): Do not unnecessarily lock the dictionary
while accessing fil_system. Thanks to MDL_EXCLUSIVE that was acquired
by the SQL layer, only one IMPORT may be in effect for the table name.

row_quiesce_set_state(): Do not unnecessarily lock the dictionary.
The dict_table_t::quiesce state is documented to be protected by
all index latches, which we are acquiring.

dict_table_close(): Introduce a simpler variant with fewer parameters.

dict_table_close(): Reduce the amount of calls.
We can simply invoke dict_table_t::release() on startup or
in DDL operations, or when the table is inaccessible.
In none of these cases, there is no need to invalidate the
InnoDB persistent statistics.

pars_info_t::graph_owns_us: Remove (unused).

pars_info_free(): Define inline.

fts_delete(), trx_t::evict_table(), row_prebuilt_free(),
row_rename_table_for_mysql(): Simplify.

row_mysql_lock_data_dictionary(): Remove some references;
use dict_sys.lock() and dict_sys.unlock() instead.

row_mysql_lock_table(): Remove. Use lock_table_for_trx() instead.

ha_innobase::check_if_supported_inplace_alter(),
row_create_table_for_mysql(): Simply assert dict_sys.sys_tables_exist().
In commit 49e2c8f0a6 and
commit 1bd681c8b3 srv_start()
actually guarantees that the system tables will exist,
or the server is in read-only mode, or startup will fail.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:54:20 +03:00
Marko Mäkelä
6a2cd6f4b4 MDEV-19505 Do not hold mutex while calling que_graph_free()
sym_tab_free_private(): Do not call dict_table_close(), but
simply invoke dict_table_t::release(), which we can do without
locking the whole dictionary cache. (Note: On user tables it
may still be necessary to invoke dict_table_close(), so that
InnoDB persistent statistics will be deinitialized as expected.)

fts_check_corrupt(), row_fts_merge_insert(): Invoke
aux_table->release() to simplify the code. This is never a user table.

fts_que_graph_free(), fts_que_graph_free_check_lock(): Replaced with
que_graph_free().

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:54:06 +03:00
Marko Mäkelä
82b7c561b7 MDEV-24258 Merge dict_sys.mutex into dict_sys.latch
In the parent commit, dict_sys.latch could theoretically have been
replaced with a mutex. But, we can do better and merge dict_sys.mutex
into dict_sys.latch. Generally, every occurrence of dict_sys.mutex_lock()
will be replaced with dict_sys.lock().

The PERFORMANCE_SCHEMA instrumentation for dict_sys_mutex
will be removed along with dict_sys.mutex. The dict_sys.latch
will remain instrumented as dict_operation_lock.

Some use of dict_sys.lock() will be replaced with dict_sys.freeze(),
which we will reintroduce for the new shared mode. Most notably,
concurrent table lookups are possible as long as the tables are present
in the dict_sys cache. In particular, this will allow more concurrency
among InnoDB purge workers.

Because dict_sys.mutex will no longer 'throttle' the threads that purge
InnoDB transaction history, a performance degradation may be observed
unless innodb_purge_threads=1.

The table cache eviction policy will become FIFO-like,
similar to what happened to fil_system.LRU
in commit 45ed9dd957.
The name of the list dict_sys.table_LRU will become somewhat misleading;
that list contains tables that may be evicted, even though the
eviction policy no longer is least-recently-used but first-in-first-out.
(Note: Tables can never be evicted as long as locks exist on them or
the tables are in use by some thread.)

As demonstrated by the test perfschema.sxlock_func, there
will be less contention on dict_sys.latch, because some previous
use of exclusive latches will be replaced with shared latches.

fts_parse_sql_no_dict_lock(): Replaced with pars_sql().

fts_get_table_name_prefix(): Merged to fts_optimize_create().

dict_stats_update_transient_for_index(): Deduplicated some code.

ha_innobase::info_low(), dict_stats_stop_bg(): Use a combination
of dict_sys.latch and table->stats_mutex_lock() to cover the
changes of BG_STAT_SHOULD_QUIT, because the flag is being read
in dict_stats_update_persistent() while not holding dict_sys.latch.

row_discard_tablespace_for_mysql(): Protect stats_bg_flag by
exclusive dict_sys.latch, like most other code does.

row_quiesce_table_has_fts_index(): Remove unnecessary mutex
acquisition. FLUSH TABLES...FOR EXPORT is protected by MDL.

row_import::set_root_by_heuristic(): Remove unnecessary mutex
acquisition. ALTER TABLE...IMPORT TABLESPACE is protected by MDL.

row_ins_sec_index_entry_low(): Replace a call
to dict_set_corrupted_index_cache_only(). Reads of index->type
were not really protected by dict_sys.mutex, and writes
(flagging an index corrupted) should be extremely rare.

dict_stats_process_entry_from_defrag_pool(): Only freeze the dictionary,
do not lock it exclusively.

dict_stats_wait_bg_to_stop_using_table(), DICT_BG_YIELD: Remove trx.
We can simply invoke dict_sys.unlock() and dict_sys.lock() directly.

dict_acquire_mdl_shared()<trylock=false>: Assert that dict_sys.latch is
only held in shared more, not exclusive mode. Only acquire it in
exclusive mode if the table needs to be loaded to the cache.

dict_sys_t::acquire(): Remove. Relocating elements in dict_sys.table_LRU
would require holding an exclusive latch, which we want to avoid
for performance reasons.

dict_sys_t::allow_eviction(): Add the table first to dict_sys.table_LRU,
to compensate for the removal of dict_sys_t::acquire(). This function
is only invoked by INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS.

dict_table_open_on_id(), dict_table_open_on_name(): If dict_locked=false,
try to acquire dict_sys.latch in shared mode. Only acquire the latch in
exclusive mode if the table is not found in the cache.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:51:35 +03:00
Marko Mäkelä
2e08b6d78c MDEV-24258 preparation: Remove dict_sys.freeze() and unfreeze()
This will essentially make dict_sys.latch a mutex
(it is only acquired in exclusive mode).

The subsequent commit will merge dict_sys.mutex into dict_sys.latch
and reintroduce dict_sys.freeze() for those cases where we currently
acquire only dict_sys.latch but not dict_sys.mutex. The case where
both are acquired will be mapped to dict_sys.lock().

i_s_sys_tables_fill_table_stats(): Invoke dict_sys.prevent_eviction()
and the new function dict_sys.allow_eviction() to avoid table eviction
while a row in INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS is being
produced.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-08-31 13:48:10 +03:00
Marko Mäkelä
49f95c4065 Merge 10.5 into 10.6 2021-08-23 11:21:33 +03:00
Marko Mäkelä
2c9f2a4c8c Merge 10.4 into 10.5 2021-08-23 11:10:59 +03:00
Marko Mäkelä
2b66cd2493 Merge 10.3 into 10.4 2021-08-23 10:44:06 +03:00
Marko Mäkelä
cfbdb5d210 Merge 10.2 into 10.3 2021-08-23 10:14:01 +03:00
Marko Mäkelä
ca89489716 MDEV-26383 fixup: Consistently protect freed_indexes with autoinc_mutex
To avoid potential race conditions between concurrent access to
dict_table_t::freed_indexes, let us consistently use
dict_table_t::autoinc_mutex.

dict_table_remove_from_cache_low(): To avoid extensive hold time
of table->autoinc_mutex, unconditionally free the FTS data structures.
2021-08-23 10:06:21 +03:00
Thirunarayanan Balathandayuthapani
08e5a3d2e3 MDEV-26383 ASAN heap-use-after-free failure in btr_search_lazy_free
Problem:
=======
The last AHI page for two indexes of an dropped table is being
freed at the same time by two threads. One thread frees the
table heap and other thread tries to access table heap again.
It leads to asan failure in btr_search_lazy_free().

Solution:
========
InnoDB uses autoinc_mutex to avoid the race condition
in btr_search_lazy_free()
2021-08-21 12:38:10 +05:30
Marko Mäkelä
f3fcf5f45c Merge 10.5 to 10.6 2021-08-19 12:25:00 +03:00
Marko Mäkelä
4a25957274 Merge 10.4 into 10.5 2021-08-18 18:22:35 +03:00
Marko Mäkelä
f84e28c119 Merge 10.3 into 10.4 2021-08-18 16:51:52 +03:00
Marko Mäkelä
cd65845a0e Merge 10.2 into 10.3
MDEV-18734 FIXME: vcol.partition triggers ASAN heap-use-after-free
2021-08-18 12:26:58 +03:00
Eugene Kosov
890f2ad769 MDEV-20931 ALTER...IMPORT can crash the server
Main idea: don't log-and-crash but propogate error to the upper layers of stack
to handle it and show to a user.
2021-08-17 20:28:42 +06:00
Marko Mäkelä
4cd063b9e4 MDEV-26376 pars_info_bind_id() unnecessarily copies strings
pars_info_bind_id(): Remove the parameter copy_name. It was always
being passed as constant TRUE or true. It turns out that copying
the string is completely unnecessary. In all calls except the one
in fts_get_select_columns_str() and fts_doc_fetch_by_doc_id(),
the parameter is being passed as a compile-time constant, and therefore
the pointer cannot become stale. In that special call, the string
that is being passed is allocated from the same memory heap that
pars_info_bind_id() would have been using.

pars_info_add_id(): Remove (unused declaration).
2021-08-16 12:10:20 +03:00
Oleksandr Byelkin
7ae6ef5236 Merge branch '10.5' into 10.6 2021-08-03 11:21:22 +02:00
Oleksandr Byelkin
850b2ba15d Merge branch '10.4' into 10.5 2021-08-02 16:53:37 +02:00
Marko Mäkelä
89cc633853 MDEV-13564 fixup: Remove unused function fts_check_corrupt()
The call to the function fts_check_corrupt() was removed
in commit 09af00cbde already.
2021-08-02 16:39:08 +03:00
Oleksandr Byelkin
ae6bdc6769 Merge branch '10.4' into 10.5 2021-07-31 23:19:51 +02:00
Oleksandr Byelkin
7841a7eb09 Merge branch '10.3' into 10.4 2021-07-31 22:59:58 +02:00
Marko Mäkelä
e305493b1c MDEV-21175 follow-up: Remove redundant locking; rely on MDL
Before entering DML or DDL execution in the storage engine, the SQL layer
will have acquired metadata lock (MDL) on the current table name as well
as the names of FOREIGN KEY (grand)child tables (that is,
tables whose REFERENCES clauses point to the current table).
The MDL prevents any metadata changes to these tables, such as
RENAME, TRUNCATE, DROP, ALTER.

While the MDL on the current table prevents dict_table_t::foreign_set
from being modified, it does not prevent the table metadata that the
stored pointers are pointing to from being modified.

The MDL on the child tables will prevent both dict_table_t::referenced_set
as well as the pointed child table metadata from being modified.

wsrep_row_upd_index_is_foreign(): Do not unnecessarily acquire the
data dictionary latch if Galera replication is not enabled.

ha_innobase::can_switch_engines(): Rely on MDL. We are not dereferencing
any pointers stored in the sets.

row_mysql_freeze_data_dictionary(), row_mysql_unfreeze_data_dictionary():
Remove.

row_update_for_mysql(): Call init_fts_doc_id_for_ref() only once.

In ALTER TABLE...IMPORT TABLESPACE and FLUSH TABLES...FOR EXPORT
the SQL layer is protecting the current table with MDL. We do not
need InnoDB latches.
2021-07-29 16:38:24 +03:00
Marko Mäkelä
15363a4f1b Cleanup: Remove pars_stored_procedure_call()
The InnoDB internal SQL parser never supported this syntax.
2021-07-29 15:37:35 +03:00
Marko Mäkelä
f50eb0d398 Merge 10.2 into 10.3 2021-07-27 10:47:17 +03:00
Marko Mäkelä
afe00bb7cc MDEV-25998 fixup: Avoid a hang
btr_scrub_start_space(): Avoid an unnecessary tablespace lookup
and related acquisition of fil_system->mutex. In MariaDB Server 10.3
we would get deadlocks between that mutex and a crypt_data mutex.

The fix was developed by Thirunarayanan Balathandayuthapani.
2021-07-27 10:44:01 +03:00
Marko Mäkelä
cf1fc59856 MDEV-25594: Improve debug checks
trx_t::will_lock: Changed the type to bool.

trx_t::is_autocommit_non_locking(): Replaces
trx_is_autocommit_non_locking().

trx_is_ac_nl_ro(): Remove (replaced with equivalent assertion expressions).

assert_trx_nonlocking_or_in_list(): Remove.
Replaced with at least as strict checks in each place.

check_trx_state(): Moved to a static function; partially replaced with
individual debug assertions implementing equivalent or stricter checks.

This is a backport of commit 7b51d11cca
from 10.5.
2021-07-27 08:52:01 +03:00
Marko Mäkelä
b50ea90063 Merge 10.2 into 10.3 2021-07-22 18:57:54 +03:00
Marko Mäkelä
742b3a0d39 MDEV-26205 Merge new release of InnoDB 5.7.35 to 10.2 2021-07-22 18:07:37 +03:00
Jakub Łopuszański
c4295b9be9 Bug #32460315 ONLINE RESIZING BUFFER POOL CAN CRASH CONCURRENT BP LOOKUP
This patch changes it so that we do not free old BP `page_hash`, but rather modify it's parameters, during resize.

RB: 26084
Reviewed-by: Marcin Babij <marcin.babij@oracle.com>
Reviewed-by: Yasufumi Kinoshita <yasufumi.kinoshita@oracle.com>

mysql/mysql-server@ea3adc6a11
2021-07-22 18:05:23 +03:00
Marko Mäkelä
124dc0d85b MDEV-25361 fixup: Fix integer type mismatch
InnoDB tablespace identifiers and page numbers are 32-bit numbers.
Let us use a 32-bit type for them in innochecksum.

The changes in commit 1918bdf32c
broke the build on 32-bit Windows.

Thanks to Vicențiu Ciorbaru for an initial version of this fixup.
2021-07-22 17:53:43 +03:00
Marko Mäkelä
641f09398f Merge 10.5 into 10.6 2021-07-22 10:11:08 +03:00
Marko Mäkelä
82d5994520 MDEV-26110: Do not rely on alignment on static allocation
It is implementation-defined whether alignment requirements
that are larger than std::max_align_t (typically 8 or 16 bytes)
will be honored by the compiler and linker.

It turns out that on IBM AIX, both alignas() and MY_ALIGNED()
only guarantees alignment up to 16 bytes.

For some data structures, specifying alignment to the CPU
cache line size (typically 64 or 128 bytes) is a mere performance
optimization, and we do not really care whether the requested
alignment is guaranteed.

But, for the correct operation of direct I/O, we do require that
the buffers be aligned at a block size boundary.

field_ref_zero: Define as a pointer, not an array.
For innochecksum, we can make this point to unaligned memory;
for anything else, we will allocate an aligned buffer from the heap.
This buffer will be used for overwriting freed data pages when
innodb_immediate_scrub_data_uncompressed=ON. And exactly that code
hit an assertion failure on AIX, in the test innodb.innodb_scrub.

log_sys.checkpoint_buf: Define as a pointer to aligned memory
that is allocated from heap.

log_t::file::write_header_durable(): Reuse log_sys.checkpoint_buf
instead of trying to allocate an aligned buffer from the stack.
2021-07-22 10:05:13 +03:00
Marko Mäkelä
ed0a7b1b3f MDEV-24626 fixup: Remove useless code
fil_ibd_create(): Remove code that should have been removed in
commit 86dc7b4d4c already.
We no longer wrote an initialized page to the file, but we would
still allocate a page image in memory and write it.

xb_space_create_file(): Remove an unnecessary page write.
(This is a functional change for Mariabackup.)
2021-07-20 17:35:03 +03:00
Vladislav Vaintroub
e7f4daf88c merge 10.5 to 10.6 2021-07-16 22:12:09 +02:00
Vladislav Vaintroub
fc2ec25733 MDEV-26166 replace log_write_up_to(LSN_MAX,...) with log_buffer_flush_to_disk()
Also, remove comparison lsn > flush/write lsn, prior to calling
log_write_up_to. The checks and early returns are part of this function.
2021-07-16 18:44:58 +02:00
Marko Mäkelä
b797f217a3 Merge 10.5 into 10.6 2021-07-03 14:54:46 +03:00
Marko Mäkelä
bd5a6403ca MDEV-26033: Race condition between buf_pool.page_hash and resize()
The replacement of buf_pool.page_hash with a different type of
hash table in commit 5155a300fa (MDEV-22871)
introduced a race condition with buffer pool resizing.

We have an execution trace where buf_pool.page_hash.array is changed
to point to something else while page_hash_latch::read_lock() is
executing. The same should also affect page_hash_latch::write_lock().

We fix the race condition by never resizing (and reallocating) the
buf_pool.page_hash. We assume that resizing the buffer pool is
a rare operation. Yes, there might be a performance regression if a
server is first started up with a tiny buffer pool, which is later
enlarged. In that case, the tiny buf_pool.page_hash.array could cause
increased use of the hash bucket lists. That problem can be worked
around by initially starting up the server with a larger buffer pool
and then shrinking that, until changing to a larger size again.

buf_pool_t::resize_hash(): Remove.

buf_pool_t::page_hash_table::lock(): Do not attempt to deal with
hash table resizing. If we really wanted that in a safe manner,
we would probably have to introduce a global rw-lock around the
operation, or at the very least, poll buf_pool.resizing, both of
which would be detrimental to performance.
2021-07-03 13:58:38 +03:00
Marko Mäkelä
ed6b230744 MDEV-25919 preparation: Remove trx_t::internal
With commit 1bd681c8b3 (MDEV-25506)
it no longer is necessary to run DDL and DML operations in
separate transactions. Let us remove the flag trx_t::internal.
Dictionary transactions will be distinguished by trx_t::dict_operation.
2021-07-01 17:51:55 +03:00
Marko Mäkelä
0a67b15a9d Cleanup: Remove pointer indirection for trx_t::xid
The trx_t::xid is always allocated, so we might as well allocate it
directly in the trx_t object to improve the locality of reference.
2021-07-01 16:38:24 +03:00
Marko Mäkelä
8c5c3a4594 MDEV-26067 innodb_lock_wait_timeout values above 100,000,000 are useless
The practical maximum value of the parameter innodb_lock_wait_timeout
is 100,000,000. Any value larger than that specifies an infinite timeout.

Therefore, we should make 100,000,000 the maximum value of the parameter.
2021-07-01 10:31:08 +03:00
Marko Mäkelä
30edd5549d MDEV-26029: Sparse files are inefficient on thinly provisioned storage
The MariaDB implementation of page_compressed tables for InnoDB used
sparse files. In the worst case, in the data file, every data page
will consist of some data followed by a hole. This may be extremely
inefficient in some file systems.

If the underlying storage device is thinly provisioned (can compress
data on the fly), it would be good to write regular files (with sequences
of NUL bytes at the end of each page_compressed block) and let the
storage device take care of compressing the data.

For reads, sparse file regions and regions containing NUL bytes will be
indistinguishable.

my_test_if_disable_punch_hole(): A new predicate for detecting thinly
provisioned storage. (Not implemented yet.)

innodb_atomic_writes: Correct the comment.

buf_flush_page(): Support all values of fil_node_t::punch_hole.
On a thinly provisioned storage device, we will always write
NUL-padded innodb_page_size bytes also for page_compressed tables.

buf_flush_freed_pages(): Remove a redundant condition.

fil_space_t::atomic_write_supported: Remove. (This was duplicating
fil_node_t::atomic_write.)

fil_space_t::punch_hole: Remove. (Duplicated fil_node_t::punch_hole.)

fil_node_t: Remove magic_n, and consolidate flags into bitfields.
For punch_hole we introduce a third value that indicates a
thinly provisioned storage device.

fil_node_t::find_metadata(): Detect all attributes of the file.
2021-06-29 15:18:22 +03:00
Marko Mäkelä
891a927e80 Merge 10.5 into 10.6 2021-06-26 11:53:28 +03:00
Marko Mäkelä
aa95c42360 Cleanup: Remove unused mtr_block_dirtied 2021-06-26 11:17:05 +03:00
Marko Mäkelä
759deaa0a2 MDEV-26010 fixup: Use acquire/release memory order
In commit 5f22511e35 we depend on
Total Store Ordering. For correct operation on ISAs that implement
weaker memory ordering, we must explicitly use release/acquire stores
and loads on buf_page_t::oldest_modification_ to prevent a race condition
when buf_page_t::list does not happen to be on the same cache line.

buf_page_t::clear_oldest_modification(): Assert that the block is
not in buf_pool.flush_list, and use std::memory_order_release.

buf_page_t::oldest_modification_acquire(): Read oldest_modification_
with std::memory_order_acquire. In this way, if the return value is 0,
the caller may safely assume that it will not observe the buf_page_t
as being in buf_pool.flush_list, even if it is not holding
buf_pool.flush_list_mutex.

buf_flush_relocate_on_flush_list(), buf_LRU_free_page():
Invoke buf_page_t::oldest_modification_acquire().
2021-06-26 11:16:40 +03:00
Marko Mäkelä
a8350cfb5e Merge 10.5 into 10.6 2021-06-24 21:56:44 +03:00
Marko Mäkelä
5f22511e35 MDEV-26010: Assertion lsn > 2 failed in buf_pool_t::get_oldest_modification
In commit 22b62edaed (MDEV-25113)
we introduced a race condition. buf_LRU_free_page() would read
buf_page_t::oldest_modification() as 0 and assume that
buf_page_t::list can be used (for attaching the block to the
buf_pool.free list). In the observed race condition,
buf_pool_t::delete_from_flush_list() had cleared the field,
and buf_pool_t::delete_from_flush_list_low() was executing
concurrently with buf_LRU_block_free_non_file_page(),
which resulted in buf_pool.flush_list.end becoming corrupted.

buf_pool_t::delete_from_flush_list(), buf_flush_relocate_on_flush_list():
First remove the block from buf_pool.flush_list, and only then
invoke buf_page_t::clear_oldest_modification(), to ensure that
reading oldest_modification()==0 really implies that the block
no longer is in buf_pool.flush_list.
2021-06-24 21:55:10 +03:00
Marko Mäkelä
b4c9cd201b Merge 10.5 into 10.6 2021-06-24 12:39:34 +03:00