after 6b685ea7b0 one can no longer violate the locking protocol
by invoking thd_get_ha_data() on some other thread without
protecting that with a mutex
row_purge_step(): Process all available purge_node_t::undo_recs.
row_purge_end(): Replaced with purge_node_t::end().
TODO: Do we need a "query graph node" at all for purge?
purge_sys_t::low_limit_no(): Adjust a comment. Actually, this
is protected after all.
TrxUndoRsegsIterator::set_next(): Reduce the critical section
of purge_sys.rseg->latch. Some purge_sys fields are accessed
only by the purge coordinator task.
ReadViewBase::snapshot(): In case m_low_limit_no==m_low_limit_id
and m_ids would include everything between that and m_up_limit_id,
set all fields to m_up_limit_id and clear m_ids, to speed up
changes_visible() and append().
rw_trx_hash_t::debug_iterator(): Add an assertion.
btr_page_reorganize_low(): Do not invoke lock_move_reorganize_page()
on a dummy index during change buffer merge. The ibuf.index page
latch that we are holding may block a DDL operation that is waiting
in ibuf_delete_for_discarded_space() while holding exclusive
lock_sys.latch. ibuf_insert_low() would refuse to buffer a change
if any locks exist for the index page.
btr_search_guess_on_hash() would only acquire an index page latch if it
is invoked with ahi_latch=NULL. If it's invoked from
row_sel_try_search_shortcut_for_mysql() with ahi_latch!=NULL, a page
will not be latched, and row_search_mvcc() will get a pointer to the
record, which can be changed by some other transaction before the record
was stored in result buffer with row_sel_store_mysql_rec() call.
ahi_latch argument of btr_cur_search_to_nth_level_func() and
btr_pcur_open_with_no_init_func() is used only for
row_sel_try_search_shortcut_for_mysql().
btr_cur_search_to_nth_level_func(..., ahi_latch !=0, ...) is invoked
only from btr_pcur_open_with_no_init_func(..., ahi_latch !=0, ...),
which, in turns, is invoked only from
row_sel_try_search_shortcut_for_mysql().
I suppose that separate case with ahi_latch!=0 was intentionally
implemented to protect row_sel_store_mysql_rec() call in
row_search_mvcc() just after row_sel_try_search_shortcut_for_mysql()
call. After the ahi_latch was moved from row_seach_mvcc() to
row_sel_try_search_shortcut_for_mysql(), there is no need in it at all
if btr_search_guess_on_hash() latches a page unconditionally. And if
btr_search_guess_on_hash() latched the page, any access to the record in
row_sel_try_search_shortcut_for_mysql() after btr_pcur_open_with_no_init()
call will be protected with the page latch.
The fix is to remove ahi_latch argument from
btr_pcur_open_with_no_init_func(), btr_cur_search_to_nth_level_func()
and btr_search_guess_on_hash().
There will not be test, as to test it we need to freeze some SELECT
execution in the point between row_sel_try_search_shortcut_for_mysql()
and row_sel_store_mysql_rec() calls in row_search_mvcc(), and to change
the record in some other transaction to let row_sel_store_mysql_rec() to
store changed record in result buffer. Buf we can't do this with the
fix, as the page will be latched in btr_search_guess_on_hash() call.
dict_load_foreigns(): Remove the constant parameter uncommitted=false.
The parameter only had to be added to dict_load_foreign().
Spotted by Alexey Midenkov
row_purge_get_partial(): Replaces trx_undo_rec_get_partial_row().
Also copy the purge_node_t::ref to the purge_node_t::row.
In this way, the clustered index key fields will always be
available, even if thanks to
commit d384ead0f0 (MDEV-14799)
they would no longer be repeated in the remaining part of the
undo log record.
trx->mysql_thd can be zeroed-out between thd_get_thread_id() and
thd_query_safe() calls in fill_trx_row(). trx_disconnect_prepared() zeroes out
trx->mysql_thd. And this can cause null pointer dereferencing in
fill_trx_row().
fill_trx_row() is invoked from fetch_data_into_cache() under trx_sys.mutex.
Bug fix is in reseting trx_t::mysql_thd in trx_disconnect_prepared() under
trx_sys.mutex lock too.
MTR test case can't be created for the fix, as we need to wait for
trx_t::mysql_thd reseting in fill_trx_row() after trx_t::mysql_thd was
checked for null while trx_sys.mutex is held. But trx_t::mysql_thd must be
reset in trx_disconnect_prepared() under trx_sys.mutex. There will be deadlock.
thd_get_ha_data() can be used without a lock, but only from the
current thd thread, when calling from anoher thread it *must*
be protected by thd->LOCK_thd_data
* fix group commit code to take thd->LOCK_thd_data
* remove innobase_close_connection() from the innodb background thread,
it's not needed after 87775402cd and was failing the assert with
current_thd==0
This is a 10.5 version of 9b750dcbd8, fix for
MDEV-23536 Race condition between KILL and transaction commit
InnoDB needs to remove trx from thd before destroying it (trx), otherwise
a concurrent KILL might get a pointer from thd to a destroyed trx.
ha_close_connection() should allow engines to clear ha_data in
hton->on close_connection(). To prevent the engine from being unloaded
while hton->close_connection() is running, we remove the lock from
ha_data and unlock the plugin manually.
This reverts commit bdc5548cad
that introduced a work-around to ha_innobase::delete_table()
for avoiding failures when trying to remove table partitions.
This work-around (of not removing statistics in case of a locking
conflict) would occasionally cause a failure of the test
parts.part_supported_sql_func_innodb:
mysqltest: In included file "./suite/parts/inc/partition_supported_sql_funcs.inc":
included from ./suite/parts/inc/part_supported_sql_funcs_main.inc at line 91:
included from /buildbot/amd64-ubuntu-2004-msan/build/mysql-test/suite/parts/t/part_supported_sql_func_innodb.test at line 44:
At line 234: query 'alter table t66
reorganize partition s1 into
(partition p0 values less than ($valsqlfunc),
partition p1 values less than maxvalue)' failed: ER_DUP_KEY (1022): Can't write; duplicate key in table 'mysql.innodb_table_stats'
ha_innobase::delete_table(): If locking the InnoDB persistent statistics
tables mysql.innodb_table_stats or mysql.innodb_index_stats fails for
a table partition, proceed to drop the partition. On DROP TABLE of a
partitioned table, each partition is being dropped in a separate
InnoDB DDL transaction. The only practical way to create an illusion of
atomicity is to avoid failures.
There are separate flags DBUG_OFF for disabling the DBUG facility
and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility.
Let us allow debug builds without DEBUG_SYNC.
Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to
define ENABLED_DEBUG_SYNC.
In commit 28325b0863
a compile-time option was introduced to disable the macros
DBUG_ENTER and DBUG_RETURN or DBUG_VOID_RETURN.
The parameter name WITH_DBUG_TRACE would hint that it also
covers DBUG_PRINT statements. Let us do that: WITH_DBUG_TRACE=OFF
shall disable DBUG_PRINT() as well.
A few InnoDB recovery tests used to check that some output from
DBUG_PRINT("ib_log", ...) is present. We can live without those checks.
Reviewed by: Vladislav Vaintroub
fts_sync_commit() fails to release the auxiliary table handle
when it encounters error. This issue is caused by
commit 1fd7d3a9adac50de37e40e92188077e3515de505(MDEV-25581).
fts_cache_clear() releases the auxiliary table handles.
MDEV-25581's patch clear the cache only if fts_sync_commit was
successful.
row_log_table_apply_update(): Free the pcur.old_rec_buf before returning.
It may be allocated by btr_pcur_store_position() inside
btr_blob_log_check_t::check() and btr_store_big_rec_extern_fields().
This memory leak was introduced in
commit 2e814d4702 (MariaDB Server 10.2.2)
via mysql/mysql-server@ce0a1e85e2
(MySQL 5.7.5).
The test is unstable because 'UPDATE t SET b = 100' latches a page and
waits for 'upd_cont' signal in lock_trx_handle_wait_enter sync point, then
purge requests RW_X_LATCH on the same page, and then 'SELECT * FROM t
WHERE a = 10 FOR UPDATE' requests RW_S_LATCH, waiting for RW_X_LATCH
requested by purge. 'UPDATE t SET b = 100' can't release page latch as
it waits for upd_cont signal, which must be emitted after 'SELECT * FROM
t WHERE a = 10 FOR UPDATE' acquired RW_S_LATCH. So we have a deadlock,
which is resolved by finishing the debug sync point wait by timeout, and
the 'UPDATE t SET b = 100' releases it's record locks rolling back the
transaction, and 'SELECT * FROM t WHERE a = 10 FOR UPDATE' is finished
successfully instead of finishing by lock wait timeout.
The fix is to forbid purging during the test by opening read view in a
separate connection before the first insert into the table.
Besides, 'lock_wait_end' syncpoint is not needed, as it enough to wait
the end of the SELECT execution to let the UPDATE to continue.
The reason why mysql/mysql-server@8020cfac20
split the files was some unit tests that never existed in the
MariaDB Server code base. The storage/innobase/unittest/ works just fine
with this file.
This is reverting part of 2e814d4702
which applied InnoDB changes from MySQL 5.7.9.
The futex system calls were introduced in Linux 2.6.0,
which was released in December 2003. It should be safe to assume
that the system calls are always available on the Linux kernels
that MariaDB Server 10.3 would run on.
Let us use the normal platform-specific preprocessor symbols
__linux__, __sun__, _AIX instead of some homebrew ones.
The preprocessor symbol UNIV_HPUX must have lost its meaning
by f6deb00a56 (note: the symbol
UNIV_HPUX10 is being checked for, but only UNIV_HPUX is defined).
log_phys_t::apply(): When parsing an INSERT_HEAP_DYNAMIC record,
allow ll==rlen to hold for the last part. A secondary index record
may inherit all preceding bytes from the infimum pseudo-record.
For INSERT_HEAP_REDUNDANT, some header bytes will always be present
because the header will never be copied from the page infimum.
We will tolerate ll==rlen also in that case to be consistent with
the parsing of INSERT_HEAP_DYNAMIC.
btr_lift_page_up(): If the leaf page only contains a hidden metadata
record for MDEV-11369 instant ADD COLUMN, convert the table to the
canonical format like we are supposed to do whenever the table
becomes empty.
lock_place_prdt_page_lock(): Do not place locks on temporary tables.
Temporary tables can only be accessed from one connection, so
it does not make any sense to acquire any transactional locks on them.
In commit 8f8ba75855 (MDEV-27234)
the data dictionary recovery was changed to use READ COMMITTED
so that table-rebuild operations (OPTIMIZE TABLE, TRUNCATE TABLE,
some forms of ALTER TABLE) would be recovered correctly.
However, for operations that avoid a table rebuild thanks to
being able to instantly ADD, DROP or reorder columns, recovery
must use the READ UNCOMMITTED isolation level so that changes to
the hidden metadata record can be rolled back.
We will detect instant operations by detecting uncommitted changes
to SYS_COLUMNS in case there is no uncommitted change of SYS_TABLES.ID
for the table. In any table-rebuilding DDL operation, the SYS_TABLES.ID
(and likely also the table name) will be updated.
As part of rolling back the instant ALTER TABLE operation, after the
operation on the hidden metadata record has been rolled back, a rollback
of an INSERT into SYS_COLUMNS in row_undo_ins_remove_clust_rec() will
invoke trx_t::evict_table() to discard the READ UNCOMMITTED definition
of the table. After that, subsequent recovery steps will load and use
the correct table definition.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
Additional fixes for 10.6:
fts_sync_commit(): Release cache->lock also on rollback.
fts_sync_write_words(): Avoid a crash if an error occurs,
by stopping at the first error.
fts_add_doc_by_id(): Sync the doc id only after adding the doc id
to the cache.