- query->intersection fails to get freed if the query exceeds
innodb_ft_result_cache_limit
- errors from init_ftfuncs were not propogated by delete command
This is taken from percona/percona-server@ef2c0bcb9a
Starting with commit da094188f6 (MDEV-24393),
MariaDB will no longer acquire advisory file locks on InnoDB data
files by default, because it would create a large number of
entries in Linux /proc/locks.
The motivation for acquiring the file locks is to prevent accidental
concurrent startup of multiple server processes on the same data files.
Such mistake still turns out to be relatively common, based on
corruption bug reports from the community.
To prevent corruption due to concurrent startup attempts, the
Aria storage engine would unconditionally acquire an advisory lock
on one of its log files.
Solution: InnoDB will always lock its system tablespace files.
(Ever since commit 685d958e38
the InnoDB log file will not necessarily be open while the
server is running, because it can be accessed via memory-mapped I/O.)
If more protection is desired, then the option --external-locking
can be used.
The mandatory advisory lock also fixes intermittent failures of
some crash recovery tests. It turns out that when the mtr test harness
kills and restarts the server, it will not actually ensure that the
old process has terminated before starting the new one.
- Import tablespace re-evicts and reload the table definition. During that
time, innodb has to load the table even though the secondary fts index
marked as corrupted
- InnoDB should ignore the single word followed by apostrophe while
tokenising the document. Example is that if the input string is O'brien
then right now, InnoDB seperates into two tokens as O, brien. But
after this patch, InnoDB can ignore the token 'O' and consider
only 'brien'.
dict_load_foreigns(): Use a correctly sized buffer for the maximum-length
SYS_FOREIGN.ID. In case of overflow, do not crash the server but instead
return DB_CORRUPTION.
- InnoDB mistakenly identifies the non-unique FTS_DOC_ID index as
FTS_DOC_ID_INDEX while loading the table. dict_load_indexes()
should check whether the index is unique before assigning
fts_doc_id_index
Before version 10, GCC would think that a right shift of an
unsigned char returns int. Let us explicitly cast that back,
to silence a bogus -Wconversion warning.
- Redundant InnoDB table fails to set the flags2 while loading
the table. It leads to "Upgrade index name failure" during alter
operation. InnoDB should set the flags2 to FTS_AUX_HEX_NAME when
fts is being loaded
prepare_inplace_alter_table_dict(): If the table will not be rebuilt,
preserve all of the original ROW_FORMAT, including the compressed
page size flags related to ROW_FORMAT=COMPRESSED.
buf_page_print(): Dump the buffer page 32 bytes (64 hexadecimal digits)
per line. In this way, the limitation in mtr
("Data too long for column 'line'") will not be triggered.
Also, do not bother decoding the page contents, because everything
is present in the hexadecimal output.
dict_index_find_on_id_low(): Merge to dict_index_get_if_in_cache_low().
The direct call in buf_page_print() was prone to crashing, in case the
table definition was concurrently evicted or dropped from the
data dictionary cache.
PageConverter::update_header(): Remove an unnecessary write.
The field that was originally called FIL_PAGE_FILE_FLUSH_LSN only
made sense for the first page of the system tablespace
(initially, for the first page of each file of the system tablespace).
It never had any meaning for .ibd files, and it lost its original
meaning in MariaDB Server 10.8.1 when
commit b07920b634 (MDEV-27199)
removed the ability to start without ib_logfile0.
If the most significant 32 bits of the LSN are nonzero, this
unnecessary write would write the wrong encryption key identifier
to the page. The first page of any file is never encrypted,
so normally those bytes should be 0 for any .ibd file.
MariaDB never supported this form of preemption via high-priority
transactions. This error code shold not have been added in the
first place, in commit 2e814d4702.
- InnoDB fails to create a fts cache while loading the innodb fts
table which is stored in system tablespace. InnoDB should create
the fts cache while loading FTS_DOC_ID column from system column.
The counter srv_stats.key_rotation_list_length is never updated, and
therefore Innodb_encryption_key_rotation_list_length will always be 0.
The view INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION comes close
to reporting this information.
The counters were added in commit 5e55d1ced5
and any code to update them was
inadvertently removed in commit 2e814d4702
when applying InnoDB changes from MySQL 5.7.
Let us remove these counters that never reported anything useful. If such
statistics are really needed in a special case, they can be obtained by
instrumenting the code by some means, such as eBPF or a source code patch.
row_ins_sec_index_entry_low(): If a separate mini-transaction is
needed to adjust the minimum bounding rectangle (MBR) in the parent
page, we must disable redo logging if the table is a temporary table.
For temporary tables, no log is supposed to be written, because
the temporary tablespace will be reinitialized on server restart.
rtr_update_mbr_field(): Plug a memory leak.
lock_validate() accumulates page ids under locked lock_sys->mutex, then
releases the latch, and invokes lock_rec_block_validate() for each page.
Some other thread has ability to add/remove locks and change pages
between releasing the latch in lock_validate() and acquiring it in
lock_rec_validate_page().
lock_rec_validate_page() can invoke lock_rec_queue_validate() for
non-locked supremum, what can cause ut_ad(page_rec_is_leaf(rec)) failure
in lock_rec_queue_validate().
The fix is to invoke lock_rec_queue_validate() only for locked records
in lock_rec_validate_page().
The error message in lock_rec_block_validate() is not necessary as
BUF_GET_POSSIBLY_FREED mode is used to get block from buffer pool, and
this is not error if a block was evicted.
The test case would require new debug sync point. I think it's not
necessary as the fixed code is debug-only.
btr_insert_into_right_sibling(): Inherit any gap lock from the
left sibling to the right sibling before inserting the record
to the right sibling and updating the node pointer(s).
lock_update_node_pointer(): Update locks in case a node pointer
will move.
Based on mysql/mysql-server@c7d93c274f
buf_flush_page(): Never wait for a page latch, even in checkpoint
flushing (flush_type == BUF_FLUSH_LIST), to prevent a hang of the
page cleaner threads when a large number of pages is latched.
In mysql/mysql-server@9542f3015b
it was claimed that such a hang only affects CREATE FULLTEXT INDEX.
Their fix was to retain buffer-fix but release exclusive latch
on non-leaf pages, and subsequently write to those pages while
they are not associated with the mini-transaction, which would
trip a debug assertion in the MariaDB version of
mtr_t::memo_modify_page() and cause potential corruption
when using the default MariaDB setting innodb_log_optimize_ddl=OFF.
This change essentially backports a small part of
commit 7cffb5f6e8 (MDEV-23399)
from MariaDB Server 10.5.7.
The only purpose of ibuf_bitmap_mutex is to prevent a deadlock between
two concurrent invocations of ibuf_update_free_bits_for_two_pages_low()
on the same pair of bitmap pages, but in opposite order.
The mutex is unnecessarily serializing the execution of the function
even when it is being invoked on totally different tablespaces.
To avoid deadlocks, it suffices to ensure that the two page latches
are being acquired in a deterministic (sorted) order.
The following condition has to added:
1) InnoDB fails to include the offset of the node pointer field
in non-leaf record for redundant row format.
2) If the Fixed length field does have only prefix length then
calculate the field maximum size as prefix length.
- Added the test case to test (2) and to check maximum number of
fields can exist in the index.
When we enable writes after Galera SST srv_n_fil_crypt_threads needs
to be set temporally to 0 (as was done when writes were disabled)
to make sure that encryption threads will be really started based
on old value of encryption threads.
Fix provided by Marko Mäkelä.
Starting with 10.3, an assertion would fail on the rollback of
a recovered incomplete transaction if a table definition violates
a FOREIGN KEY constraint.
DICT_ERR_IGNORE_RECOVER_LOCK: Include also DICT_ERR_IGNORE_FK_NOKEY
so that trx_resurrect_table_locks() will be able to load
table definitions and resurrect IX locks. Previously, if the
FOREIGN KEY constraints of a table were incomplete, the table
would fail to load until rollback, and in 10.3 or later an assertion
would fail that the rollback was not protected by a table IX lock.
Thanks to commit 9de2e60d74 there
will be no problems to enforce subsequent FOREIGN KEY operations
even though a table with invalid REFERENCES clause was loaded.
This failure was caused by MDEV-25975, which removed the parameter
innodb_disallow_writes.
Added a check for wsrep_sst_disable_writes to the function
ibuf_merge_in_background().
We will remove the parameter innodb_disallow_writes because it is badly
designed and implemented. The parameter was never allowed at startup.
It was only internally used by Galera snapshot transfer.
If a user executed
SET GLOBAL innodb_disallow_writes=ON;
the server could hang even on subsequent read operations.
During Galera snapshot transfer, we will block writes
to implement an rsync friendly snapshot, as follows:
sst_flush_tables() will acquire a global lock by executing
FLUSH TABLES WITH READ LOCK, which will block any writes
at the high level.
sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true),
will suspend or disable InnoDB background tasks or threads that could
initiate writes. As part of this, log_make_checkpoint() will be invoked
to ensure that anything in the InnoDB buf_pool.flush_list will be written
to the data files. This has the nice side effect that the Galera joiner
will avoid crash recovery.
The changes to sql/wsrep.cc and to the tests are based on a prototype
that was developed by Jan Lindström.
Reviewed by: Jan Lindström
mariabackup does not load dictionary during backup, but it loads
tablespaces, that is why fil_system.max_assigned_id is not set with
dict_check_tablespaces_and_store_max_id(). There is no sense to issue the
warning during backup.
In commit 437da7bc54 (MDEV-19534),
the default value of the global variable srv_checksum_algorithm
in innochecksum was changed from SRV_CHECKSUM_ALGORITHM_INNODB
to implied 0 (innodb_checksum_algorithm=crc32). As a result,
the function buf_page_is_corrupted() would by default invoke
buf_calc_page_crc32() in innochecksum, and crc32_inited would hold.
This would cause "innochecksum" to fail on a particular page.
The actual problem is older, introduced in 2011 in
mysql/mysql-server@17e497bdb7
(MySQL 5.6.3). It should affect the validation of pages of old
data files that were written with innodb_checksum_algorithm=innodb.
When using innodb_checksum_algorithm=crc32 (the default setting
since MariaDB Server 10.2), some valid pages would be rejected
only because exactly one of the two checksum fields accidentally
matches the innodb_checksum_algorithm=crc32 value.
buf_page_is_corrupted(): Simplify the logic of non-strict
checksum validation, by always invoking buf_calc_page_crc32().
Remove a bogus condition that if only one of the checksum fields
contains the value returned by buf_calc_page_crc32(), the page
is corrupted.
The first step for deprecating innodb_autoinc_lock_mode(see MDEV-27844) is:
- to switch statement binlog format to ROW if binlog format is MIXED and
the statement changes autoincremented fields
- issue warnings if innodb_autoinc_lock_mode == 2 and binlog format is
STATEMENT
MDEV-27025 allows to insert records before the record on which DELETE is
locked, as a result the DELETE misses those records, what causes serious ACID
violation.
Revert MDEV-27025, MDEV-27550. The test which shows the scenario of ACID
violation is added.
The virtual member function handler::reset_auto_increment(ulonglong)
is only ever invoked by the default implementation of the virtual
member function handler::truncate().
Because ha_innobase::truncate() overrides handler::truncate() without
ever invoking handler::truncate(), some InnoDB member functions are
never called.
ha_innobase::innobase_reset_autoinc(), ha_innobase::reset_auto_increment():
Removed (unreachable code).
ha_innobase::delete_all_rows(): Removed. The default implementation
handler::delete_all_rows() works just as fine.
- InnoDB FTS DDL decrements the FTS_DOC_ID when there is a
deleted marked record involved. FTS_DOC_ID must never be
reused. The purpose of FTS_DOC_ID is to be a unique row
identifier that will be changed whenever a fulltext indexed
column is updated.
btr_cur_optimistic_insert(): Disregard DEBUG_DBUG injection to
invoke btr_page_reorganize() if the page (and the table) is empty.
Otherwise, an assertion would fail in btr_page_reorganize_low()
because PAGE_MAX_TRX_ID is 0 in an empty secondary index leaf page.
In commit c7d0448797 (MDEV-15132)
MariaDB Server 10.3 stopped writing the latest transaction identifier
to the TRX_SYS page. Instead, the transaction identifier will be
recovered from undo log pages.
Unfortunately, before commit 3926673ce7
and mysql/mysql-server@dc29792ff2
(MySQL 5.1.48 or MariaDB 5.1.48) InnoDB did not always initialize all
data fields, but some garbage could be left behind in unused parts
of data pages.
In undo log pages that are essentially free, but added to a list for
reuse (TRX_UNDO_CACHED) the TRX_UNDO_TRX_NO fields could contain garbage,
instead of 0. As long as such undo pages are being reused and never
marked completely free, the garbage contents may remain forever.
In fact, the function trx_undo_header_create() and the record
MLOG_UNDO_HDR_CREATE will only initialize TRX_UNDO_TRX_ID, but leave
TRX_UNDO_TRX_NO uninitialized.
trx_undo_mem_create_at_db_start(): Only read the TRX_UNDO_TRX_NO
fields of TRX_UNDO_CACHED pages if the TRX_UNDO_PAGE_TYPE is 0,
that is, the page was updated by MariaDB Server 10.3. Earlier versions
would always write the TRX_UNDO_PAGE_TYPE as 1 or 2.
trx_undo_header_create(): Zero out the TRX_UNDO_TRX_NO field.
Strictly speaking, this will change the semantics of the
MLOG_UNDO_HDR_CREATE record, but it should not do any harm to
overwrite a potentially garbage field with zeroes.
Note: This fix will only help future upgrades straight from
MariaDB Server 10.2 or MySQL 5.6 or earlier. If such an upgrade has
already been made, then an earlier server startup could have
fast-forwarded the transaction ID sequence to a large value.
If this large value cannot be represented in 48 bits (the size of
the DB_TRX_ID column in clustered index records), then various
strange things can happen.