1. In case of system-versioned table add row_end into FTS_DOC_ID index
in fts_create_common_tables() and innobase_create_key_defs().
fts_n_uniq() returns 1 or 2 depending on whether the table is
system-versioned.
After this patch recreate of FTS_DOC_ID index is required for
existing system-versioned tables. If you see this message in error
log or server warnings: "InnoDB: Table db/t1 contains 2 indexes
inside InnoDB, which is different from the number of indexes 1
defined in the MariaDB" use this command to fix the table:
ALTER TABLE db.t1 FORCE;
2. Fix duplicate history for secondary unique index like it was done
in MDEV-23644 for clustered index (932ec586aa). In case of
existing history row which conflicts with currently inseted row we
check in row_ins_scan_sec_index_for_duplicate() whether that row
was inserted as part of current transaction. In that case we
indicate with DB_FOREIGN_DUPLICATE_KEY that new history row is not
needed and should be silently skipped.
3. Some parts of MDEV-21138 (7410ff436e) reverted. Skipping of
FTS_DOC_ID index for history rows made problems with purge
system. Now this is fixed differently by p.2.
4. wait_all_purged.inc checks that we didn't affect non-history rows
so they are deleted and purged correctly.
Additional FTS fixes
fts_init_get_doc_id(): exclude history rows from max_doc_id
calculation. fts_init_get_doc_id() callback is used only for crash
recovery.
fts_add_doc_by_id(): set max value for row_end field.
fts_read_stopword(): stopwords table can be system-versioned too. We
now read stopwords only for current data.
row_insert_for_mysql(): exclude history rows from doc_id validation.
row_merge_read_clustered_index(): exclude history_rows from doc_id
processing.
fts_load_user_stopword(): for versioned table retrieve row_end field
and skip history rows. For non-versioned table we retrieve 'value'
field twice (just for uniformity).
FTS tests for System Versioning now include maybe_versioning.inc which
adds 3 combinations:
'vers' for debug build sets sysvers_force and
sysvers_hide. sysvers_force makes every created table
system-versioned, sysvers_hide hides WITH SYSTEM VERSIONING
for SHOW CREATE.
Note: basic.test, stopword.test and versioning.test do not
require debug for 'vers' combination. This is controlled by
$modify_create_table in maybe_versioning.inc and these
tests run WITH SYSTEM VERSIONING explicitly which allows to
test 'vers' combination on non-debug builds.
'vers_trx' like 'vers' sets sysvers_force_trx and sysvers_hide. That
tests FTS with trx_id-based System Versioning.
'orig' works like before: no System Versioning is added, no debug is
required.
Upgrade/downgrade test for System Versioning is done by
innodb_fts.versioning. It has 2 combinations:
'prepare' makes binaries in std_data (requires old server and OLD_BINDIR).
It tests upgrade/downgrade against old server as well.
'upgrade' tests upgrade against binaries in std_data.
Cleanups:
Removed innodb-fts-stopword.test as it duplicates stopword.test
Works like vers_force but forces trx_id-based system-versioned tables
if the storage supports it (currently InnoDB-only). Otherwise creates
timestamp-based system-versioned table.
Before the fix next-key lock was requested only if a record was
delete-marked for locking unique search in RR isolation level.
There can be several delete-marked records for the same unique key,
that's why InnoDB scans the records until eighter non-delete-marked record
is reached or all delete-marked records with the same unique key are
scanned.
For range scan next-key locks are used for RR to protect scanned range from
inserting new records by other transactions. And this is the reason of why
next-key locks are used for delete-marked records for unique searches.
If a record is not delete-marked, the requested lock type was "not-gap".
When a record is not delete-marked during lock request by trx 1, and
some other transaction holds conflicting lock, trx 1 creates waiting
not-gap lock on the record and suspends. During trx 1 suspending the
record can be delete-marked. And when the lock is granted on conflicting
transaction commit or rollback, its type is still "not-gap". So we have
"not-gap" lock on delete-marked record for RR. And this let some other
transaction to insert some record with the same unique key when trx 1 is
not committed, what can cause isolation level violation.
The fix is to set next-key locks for both delete-marked and
non-delete-marked records for unique search in RR.
mysql_discard_or_import_tablespace(): On successful
ALTER TABLE...DISCARD TABLESPACE, evict the table handle from the
table definition cache, so that ha_innobase::close() will be invoked,
like InnoDB expects to be the case. This will avoid an assertion failure
ut_a(table->get_ref_count() == 0) during IMPORT TABLESPACE.
ha_innobase::open(): Do not issue any ER_TABLESPACE_DISCARDED warning.
Member functions for DML will do that.
ha_innobase::truncate(), ha_innobase::check_if_supported_inplace_alter():
Issue ER_TABLESPACE_DISCARDED warnings, to compensate for the removal of
the warning in ha_innobase::open().
row_quiesce_write_indexes(): Only write information about committed
indexes. The ALTER TABLE t NOWAIT ADD INDEX(c) in the nondeterministic
test case will most of the time fail due to a metadata lock (MDL) timeout
and leave behind an uncommitted index.
Reviewed by: Sergei Golubchik
- InnoDB AHI tries to access the concurrent instant alter column,
leads to asan failure. Instant alter column should acquire the
clustered index search latch in exclusive mode before changing
the table cache definition.
- Removed the default parameter for the function
btr_search_drop_page_hash_index()
- Addressed the DWITH_INNODB_AHI=0 compilation failure
by passing two parameters from all callers of
btr_search_drop_page_hash_index()
Fixing a few problems relealed by UBSAN in type_float.test
- multiplication overflow in dtoa.c
- uninitialized Field::geom_type (and Field::srid as well)
- Wrong call-back function types used in combination with SHOW_FUNC.
Changes in the mysql_show_var_func data type definition were not
properly addressed all around the code by the following commits:
b4ff64568c18feb62fee0ee879ff8a
Adding a helper SHOW_FUNC_ENTRY() function and replacing
all mysql_show_var_func declarations using SHOW_FUNC
to SHOW_FUNC_ENTRY, to catch mysql_show_var_func in the future
at compilation time.
Per fsp0types.h, SDI is on tablespace flags position 14 where MariaDB
stores its pagesize. Flag at position 13, also in MariaDB pagesize
flags, is a MySQL encryption flag.
These are checked only if fsp_flags_is_valid fails, so valid MariaDB
pages sizes don't become errors.
The error message "Cannot reset LSNs in table" was rather specific and
not always true to replaced with more generic error.
ALTER TABLE tbl IMPORT TABLESPACE now reports Unsupported on MySQL
tablespace (rather than index corrupted) along with a server error
message.
MySQL innodb Errors are with with UNSUPPORTED rather than CORRUPTED
to avoid user anxiety.
Reviewer: Marko Mäkelä
This is a backport of commit 8b6a308e46
from MariaDB Server 10.6.11. No attempt to reproduce the hang
in earlier an earlier version of MariaDB Server than 10.6 was made.
In each caller of fseg_n_reserved_pages() except ibuf_init_at_db_start()
which is a special case for ibuf.index at database startup, we must hold
an index latch that prevents concurrent allocation or freeing of index
pages.
Any operation that allocates or free pages that belong to an index tree
must first acquire an index latch in non-shared mode, and while
holding that, acquire an index root page latch in non-shared mode.
btr_get_size(), btr_get_size_and_reserved(): Assert that a strong enough
index latch is being held.
dict_stats_update_transient_for_index(),
dict_stats_analyze_index(): Acquire a strong enough index latch.
These operations had followed the same order of acquiring latches in
every InnoDB version since the very beginning
(commit c533308a15).
The hang was introduced in
commit 2e814d4702 which imported
mysql/mysql-server@ac74632293
which failed to strengthen the locking requirements of the function
btr_get_size().
spatial_index_info: Replaces index_tuple_info_t. Always take
a memory heap as a parameter to the member functions.
Remove pointer indirection for m_dtuple_vec.
spatial_index_info::add(): Duplicate any PRIMARY KEY fields that would
point to within ext->buf because that buffer will be allocated in
a shorter-lifetime memory heap.
Every operation that is going to write redo log is supposed to
invoke log_free_check() before acquiring any latches. If there
is a risk of log buffer overrun, a log checkpoint would be
triggered by that call.
ibuf_merge_space(), ibuf_merge_in_background(),
ibuf_delete_for_discarded_space(): Invoke log_free_check()
when the current thread is not holding any page latches.
Unfortunately, in lower-level code called from ibuf_insert()
or ibuf_merge_or_delete_for_page(), some page latches may be
held and a call to log_free_check() could hang.
ibuf_set_bitmap_for_bulk_load(): Use the caller's mini-transaction.
The caller should have invoked log_free_check() while not holding
any page latches.
Something appears to be broken in the DBUG subsystem.
Let us remove frequent calls to it from the InnoDB internal SQL interpreter
that is used in the purge of transaction history.
The DBUG_PRINT in que_eval_sql() can remain for now, because those
operations are much less frequent.
See also commits aa8a31da and 64678c for a Bug #22990029 fix.
In this scenario INSERT chose to check if delete unmarking is available for
a just deleted record. To build an update vector, it needed to calculate
the vcols as well. Since this INSERT was not IGNORE-flagged, recalculation
failed.
Solutiuon: temporarily set abort_on_warning=true, while calculating the
column for delete-unmarked insert.
As of now innodb does not store trx_id for each record in secondary index.
The idea behind is following: let us store only per-page max_trx_id, and
delete-mark the records when they are deleted/updated.
If the read starts, it rememders the lowest id of currently active
transaction. Innodb refers to it as trx->read_view->m_up_limit_id.
See also ReadView::open.
When the page is fetched, its max_trx_id is compared to m_up_limit_id.
If the value is lower, and the secondary index record is not delete-marked,
then this page is just safe to read as is. Else, a clustered index could be
needed ato access. See page_get_max_trx_id call in row_search_mvcc, and the
corresponding switch (row_search_idx_cond_check(...)) below.
Virtual columns are required to be updated in case if the record was
delete-marked. The motivation behind it is documented in
Row_sel_get_clust_rec_for_mysql::operator() near
row_sel_sec_rec_is_for_clust_rec call.
This was basically a description why virtual column computation can
normally happen during SELECT, and, generally, a vcol index access.
Sometimes stats tables are updated by innodb. This starts a new
transaction, and it can happen that it didn't finish to the moment of
SELECT execution, forcing virtual columns recomputation. If the result was
a something that normally outputs a warning, like division by zero, then
it could be outputted in a racy manner.
The solution is to suppress the warnings when a column is computed
for the described purpose.
ignore_wrnings argument is added innobase_get_computed_value.
Currently, it is only true for a call from
row_sel_sec_rec_is_for_clust_rec.
btr_search_guess_on_hash() would only acquire an index page latch if it
is invoked with ahi_latch=NULL. If it's invoked from
row_sel_try_search_shortcut_for_mysql() with ahi_latch!=NULL, a page
will not be latched, and row_search_mvcc() will get a pointer to the
record, which can be changed by some other transaction before the record
was stored in result buffer with row_sel_store_mysql_rec() call.
ahi_latch argument of btr_cur_search_to_nth_level_func() and
btr_pcur_open_with_no_init_func() is used only for
row_sel_try_search_shortcut_for_mysql().
btr_cur_search_to_nth_level_func(..., ahi_latch !=0, ...) is invoked
only from btr_pcur_open_with_no_init_func(..., ahi_latch !=0, ...),
which, in turns, is invoked only from
row_sel_try_search_shortcut_for_mysql().
I suppose that separate case with ahi_latch!=0 was intentionally
implemented to protect row_sel_store_mysql_rec() call in
row_search_mvcc() just after row_sel_try_search_shortcut_for_mysql()
call. After the ahi_latch was moved from row_seach_mvcc() to
row_sel_try_search_shortcut_for_mysql(), there is no need in it at all
if btr_search_guess_on_hash() latches a page unconditionally. And if
btr_search_guess_on_hash() latched the page, any access to the record in
row_sel_try_search_shortcut_for_mysql() after btr_pcur_open_with_no_init()
call will be protected with the page latch.
The fix is to remove ahi_latch argument from
btr_pcur_open_with_no_init_func(), btr_cur_search_to_nth_level_func()
and btr_search_guess_on_hash().
There will not be test, as to test it we need to freeze some SELECT
execution in the point between row_sel_try_search_shortcut_for_mysql()
and row_sel_store_mysql_rec() calls in row_search_mvcc(), and to change
the record in some other transaction to let row_sel_store_mysql_rec() to
store changed record in result buffer. Buf we can't do this with the
fix, as the page will be latched in btr_search_guess_on_hash() call.
row_purge_get_partial(): Replaces trx_undo_rec_get_partial_row().
Also copy the purge_node_t::ref to the purge_node_t::row.
In this way, the clustered index key fields will always be
available, even if thanks to
commit d384ead0f0 (MDEV-14799)
they would no longer be repeated in the remaining part of the
undo log record.
trx->mysql_thd can be zeroed-out between thd_get_thread_id() and
thd_query_safe() calls in fill_trx_row(). trx_disconnect_prepared() zeroes out
trx->mysql_thd. And this can cause null pointer dereferencing in
fill_trx_row().
fill_trx_row() is invoked from fetch_data_into_cache() under trx_sys.mutex.
Bug fix is in reseting trx_t::mysql_thd in trx_disconnect_prepared() under
trx_sys.mutex lock too.
MTR test case can't be created for the fix, as we need to wait for
trx_t::mysql_thd reseting in fill_trx_row() after trx_t::mysql_thd was
checked for null while trx_sys.mutex is held. But trx_t::mysql_thd must be
reset in trx_disconnect_prepared() under trx_sys.mutex. There will be deadlock.
row_log_table_apply_update(): Free the pcur.old_rec_buf before returning.
It may be allocated by btr_pcur_store_position() inside
btr_blob_log_check_t::check() and btr_store_big_rec_extern_fields().
This memory leak was introduced in
commit 2e814d4702 (MariaDB Server 10.2.2)
via mysql/mysql-server@ce0a1e85e2
(MySQL 5.7.5).
The reason why mysql/mysql-server@8020cfac20
split the files was some unit tests that never existed in the
MariaDB Server code base. The storage/innobase/unittest/ works just fine
with this file.
This is reverting part of 2e814d4702
which applied InnoDB changes from MySQL 5.7.9.
The futex system calls were introduced in Linux 2.6.0,
which was released in December 2003. It should be safe to assume
that the system calls are always available on the Linux kernels
that MariaDB Server 10.3 would run on.
Let us use the normal platform-specific preprocessor symbols
__linux__, __sun__, _AIX instead of some homebrew ones.
The preprocessor symbol UNIV_HPUX must have lost its meaning
by f6deb00a56 (note: the symbol
UNIV_HPUX10 is being checked for, but only UNIV_HPUX is defined).
btr_lift_page_up(): If the leaf page only contains a hidden metadata
record for MDEV-11369 instant ADD COLUMN, convert the table to the
canonical format like we are supposed to do whenever the table
becomes empty.
lock_place_prdt_page_lock(): Do not place locks on temporary tables.
Temporary tables can only be accessed from one connection, so
it does not make any sense to acquire any transactional locks on them.
- During shutdown, InnoDB fts fails to update synced doc id
when there is only one doc id about to sync. While starting
the server, InnoDB fetches the already synced doc id from
config table. In the subsequent sync operation, InnoDB fails
with DB_DUPLICATE_KEY error.
dict_table_rename_in_cache(), dict_table_get_highest_foreign_id():
Reserve sufficient space for the fkid[] buffer, and ensure that the
fkid[] will be NUL-terminated.
The fkid[] must accommodate both the database name (which is already
encoded in my_charset_filename) and the constraint name
(which must be converted to my_charset_filename) so that we can check
if it is in the format databasename/tablename_ibfk_1 (all encoded in
my_charset_filename).
trx_undo_page_report_rename(): Use the correct maximum length of
a table name. Both the database name and the table name can be up to
NAME_CHAR_LEN (64 characters) times 5 bytes per character in the
my_charset_filename encoding. They are not encoded in UTF-8!
fil_op_write_log(): Reserve the correct amount of log buffer for
a rename operation. The file name will be appended by
mlog_catenate_string().
rename_file_ext(): Reserve a large enough buffer for the file names.
buf_defer_drop_ahi(): Remove. Ever since
commit c7f8cfc9e7 (MDEV-27700)
it is safe to invoke btr_search_drop_page_hash_index(block, true)
to remove an orphan adaptive hash index.
Any attempt to upgrade page latches is prone to deadlocks. Recently,
we observed a few hangs that involved nothing more than a small table
consisting of one clustered index page, one secondary index page and
some undo pages.
- InnoDB fts table initially added to LRU table cache
while creating the table. Later, table was marked
as non-evicted when we add the table to fts optimizer
list. Before marking the table as non-evicted, master
thread can try to evict the fts table.
- Race condition between fsp_get_available_space_in_free_extents()
and fsp_try_extend_data_file() while accessing space.free_limit.
Before calling fsp_get_available_space_in_free_extents(), take
shared lock on space->latch.
Reason:
=======
Race condition between btr_search_drop_hash_index() and
btr_search_lazy_free(). One thread does resizing of buffer pool
and clears the ahi on all pages in the buffer pool, frees the
index and table while removing the last reference. At the same time,
other thread access index->heap in btr_search_drop_hash_index().
Solution:
=========
Acquire the respective ahi latch before checking index->freed()
btr_search_drop_page_hash_index(): Added new parameter to indicate
that drop ahi entries only if the index is marked as freed
btr_search_check_marked_free_index(): Acquire all ahi latches and
return true if the index was freed
- While creating a new InnoDB segment, allocates the extent
before allocating the inode or page allocation even though
the pages are present in fragment segment. This patch does
reserve the extent when InnoDB ran out of fragment pages
in the tablespace.
- query->intersection fails to get freed if the query exceeds
innodb_ft_result_cache_limit
- errors from init_ftfuncs were not propogated by delete command
This is taken from percona/percona-server@ef2c0bcb9a