As noted in commit 0b66d3f70d,
MariaDB does not support CREATE TABLESPACE for InnoDB.
Hence, some code that was added in
commit fec844aca8
and originally in
mysql/mysql-server@c71dd213bd
is unused in MariaDB and should be removed.
trx_undo_left(): Return 0 in case of an overflow, instead of
returning a negative number interpreted as a large positive number.
Also, add debug assertions to check that the pointer is within
the page area. This should allow us to catch bugs like
MDEV-24096 easier in the future.
The function ibuf_merge_or_delete_for_page() was always being
invoked with update_ibuf_bitmap=true ever since
commit cd623508df
fixed up something after MDEV-9566.
Furthermore, the parameter page_size is never being passed as a
null pointer, and therefore it should better be a reference to
a constant object.
The instrumentation that was added in
commit 90635c6fb5 (MDEV-7620)
was effectively reverted in MariaDB Server 10.2.2, in
commit 2e814d4702
(which stopped reporting the statistics) and
commit fec844aca8
(which stopped updating the statistics).
Let us remove the orphan data members to reduce the memory footprint.
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
Implement a different fix for
"MDEV-19232: Floating point precision / value comparison problem"
Instead of truncating decimal values after every division,
truncate them for comparison purposes.
This reverts commit 62d73df6b2 but keeps the test.
Patch removes dict_index_t::stats_latch. Table/index statistics now
protected with dict_sys->mutex. That way statistics computation can
happen in parallel in several threads and dict_sys->mutex will be locked
only for a short period of time.
This patch is a joint work with Marko Mäkelä
dict_index_t:🔒 make mutable which allows to pass const pointer
when only lock is touched in an object
btr_height_get()
btr_get_size(): make index argument const for better type safety
btr_estimate_number_of_different_key_vals(): now returns computed values
instead of setting fields in dict_index_t directly
remove everything related to dict_index_t::stats_latch
dict_stats_index_set_n_diff(): now returns computed values instead
of setting fields in dict_index_t directly
dict_stats_analyze_index(): now returns computed values instead
of setting fields in dict_index_t directly
Reviewed by: Marko Mäkelä
Let us introduce a dummy variable innodb_max_purge_lag_wait
for waiting that the InnoDB history list length is below
the user-specified limit. Specifically,
SET GLOBAL innodb_max_purge_lag_wait=0;
should wait for all history to be purged. This could be useful
when upgrading from an older version to MariaDB 10.3 or later,
to avoid hitting MDEV-15912.
Note: the history cannot be purged if there exist transactions
that may see old versions.
Reviewed by: Vladislav Vaintroub
InnoDB frees the block lock during buffer pool shrinking when other
thread is yet to release the block lock. While shrinking the
buffer pool, InnoDB allows the page to be freed unless it is buffer
fixed. In some cases, InnoDB releases the latch after unfixing the
block.
Fix:
====
- InnoDB should unfix the block after releases the latch.
- Add more assertion to check buffer fix while accessing the page.
- Introduced block_hint structure to store buf_block_t pointer
and allow accessing the buf_block_t pointer only by passing a
functor. It returns original buf_block_t* pointer if it is valid
or nullptr if the pointer become stale.
- Replace buf_block_is_uncompressed() with
buf_pool_t::is_block_pointer()
This change is motivated by a change in mysql-5.7.32:
mysql/mysql-server@46e60de444
Bug #31036301 ASSERTION FAILURE: SYNC0RW.IC:429:LOCK->LOCK_WORD
Problem:
1. The server terminates abnormally when phrase search doesn't
filter out doc_ids correctly. This problem has been fixed in bug
2. Wrong query result: It's a regression from the bug #22709692 fix.
This fix optimize full-text search query with limit clause.
when FTS expression involves only union operation, we fetch only
number of doc_ids specified with the limit clause.
Fulltext phrase search is not an union operation and we consider
phrase search with plugin parser a union operation.
In phrase search with limit clause, we fetch limited doc_ids for
each token and if any of the selected doc_id does not contain all
tokens in correct order then we do not include that row_id in the
result set.
Therefore phrase search gets fewer number of rows than the qualified
rows exist in the table.
Fix:
Added a condition that phrase search with plugin parser is not a
union operation.
RB: 24925
Reviewed by : Annamalai Gurusami <annamalai.gurusami@oracle.com>
This is a cherry-pick of
mysql/mysql-server@5549920b7a
without a test case, because the test case depends on an n-gram
tokenizer that will be missing from MariaDB until MDEV-10267 is added.
Problem:
In Full-text phrase search, we filter out row that do not contain
all the tokens in the phrase.
If we do not filter out doc_id that doesn't appear in all the
token's doc_id lists then we hit an assert.
Fix:
if any of the token has last doc_id equal to ith doc_id of the first
token doc_id list then filter out rest of the higher doc_ids.
RB: 24909
Reviewed by : Annamalai Gurusami <annamalai.gurusami@oracle.com>
This is a cherry-pick of
mysql/mysql-server@5aa075277d
but without a test case, because the test case depends on an n-gram
tokenizer that will be missing from MariaDB until MDEV-10267 is added.
This issue is caused by MDEV-22456 ad6171b91c. Fix involves the backported version of 10.4 patch
MDEV-22778 5f2628d1ee and few parts of
MDEV-17441 (e9a5f288f2).
dict_table_t::stats_latch_created: Removed
dict_table_t::stats_latch: make value member and always lock it for
simplicity even for stats cloned table.
zip_pad_info_t::mutex_created: Removed
zip_pad_info_t::mutex: make member value instead of pointer
os0once.h: Removed
dict_table_remove_from_cache_low(): Ensure that fts_free() is always
called, even if dict_mem_table_free() is deferred until
btr_search_lazy_free().
InnoDB would always zip_pad_info_t::mutex and
dict_table_t::autoinc_mutex, even for tables are not in
ROW_FORMAT=COMPRESSED nor include any AUTO_INCREMENT column.
MariaDB 10.2.2 inherited from MySQL 5.7 a perceived optimization
of ALTER TABLE, which skips the writing of redo log records.
In MDEV-16809 we introduced a parameter that allows the redo log to
be written, so that Mariabackup would not be impacted, but we kept
the MySQL 5.7 behaviour enabled by default (innodb_log_optimize_ddl=ON).
As noted in MDEV-19747 (Deprecate and ignore innodb_log_optimize_ddl,
implemented in MariaDB 10.5.1), omitting the redo log writes can
actually reduce performance, because we will have to wait for the data
pages to be written out. When the redo log file is configured to be
large enough, it actually can be much faster to write the redo log and
avoid the extra page flushing.
When the redo log is omitted (innodb_log_optimize_ddl=ON), also
Mariabackup may have to perform a lot of extra work, to re-copy the
entire data file if it is possible that any log was omitted during
the backup.
Starting with MariaDB 10.5.1, the parameter innodb_log_optimize_ddl
is deprecated and ignored. We hereby deprecate (but will not ignore)
the parameter in earlier versions as well.
The maximum innodb key length is 3500 what is hardcoded in
ha_innobase::max_supported_key_length()). The maximum number of innodb indexes
is configured with MAX_INDEXES macro (see also MAX_KEY definition).
The same is currently implemented for blackhole storage engine.
Cherry picked from percona-server 0d90d81c3c507a6b1476246a405504f6e4ef9d4d
Original lp bug 1733049
Reviewed-by: daniel@mariadb.org
The problem:
When incremental backup is taken, delta files are created for innodb tables
which are marked as new tables during innodb ddl tracking. When such
tablespace is tried to be opened during prepare in
xb_delta_open_matching_space(), it is "created", i.e.
xb_space_create_file() is invoked, instead of opening, even if
a tablespace with the same name exists in the base backup directory.
xb_space_create_file() writes page 0 header the tablespace.
This header does not contain crypt data, as mariabackup does not have
any information about crypt data in delta file metadata for
tablespaces.
After delta file is applied, recovery process is started. As the
sequence of recovery for different pages is not defined, there can be
the situation when crypt data redo log event is executed after some
other page is read for recovery. When some page is read for recovery, it's
decrypted using crypt data stored in tablespace header in page 0, if
there is no crypt data, the page is not decryped and does not pass corruption
test.
This causes error for incremental backup --prepare for encrypted
tablespaces.
The error is not stable because crypt data redo log event updates crypt
data on page 0, and recovery for different pages can be executed in
undefined order.
The fix:
When delta file is created, the corresponding write filter copies only
the pages which LSN is greater then some incremental LSN. When new file
is created during incremental backup, the LSN of all it's pages must be
greater then incremental LSN, so there is no need to create delta for
such table, we can just copy it completely.
The fix is to copy the whole file which was tracked during incremental backup
with innodb ddl tracker, and copy it to base directory during --prepare
instead of delta applying.
There is also DBUG_EXECUTE_IF() in innodb code to avoid writing redo log
record for crypt data updating on page 0 to make the test case stable.
Note:
The issue is not reproducible in 10.5 as optimized DDL's are deprecated
in 10.5. But the fix is still useful because it allows to decrease
data copy size during backup, as delta file contains some extra info.
The test case should be removed for 10.5 as it will always pass.
fts_query_t::nested_sub_exp: Keep track of nested
fts_ast_visit_sub_exp() calls.
fts_ast_visit_sub_exp(): Return DB_OUT_OF_MEMORY if the
maximum recursion depth is exceeded.
This is motivated by a change in MySQL 5.6.50:
mysql/mysql-server@e2a46b4834
Bug #29929684 USING MANY NESTED ARGUMENTS WITH BOOLEAN FTS CAN LEAD
TO TERMINATE SERVER
In row_undo_ins_remove_clust_rec() and similar places,
an assertion !node->trx->dict_operation_lock_mode could fail,
because an online ALTER is not allowed to run at the same time
while DDL operations on the table are being rolled back.
This race condition would be fixed by always acquiring an InnoDB
table lock in ha_innobase::prepare_inplace_alter_table() or
prepare_inplace_alter_table_dict(), or by ensuring that recovered
transactions are protected by MDL that would block concurrent DDL
until the rollback has been completed.
This reverts commit 1509363970
and commit 22c4a7512f.
The function innodb_show_mutex_status() is the only ultimate caller of
LatchCounter::iterate() via MutexMonitor::iterate(). Because the call
is not protected by LatchCounter::m_mutex, any mutex_create() or
mutex_free() that is invoked concurrently during the execution, bad
things such as a crash could happen.
The most likely way for this to happen is buffer pool resizing,
which could cause buf_block_t::mutex (which existed before MDEV-15053)
to be created or freed. We could also register InnoDB mutexes in
TrxFactory::init() if trx_pools needs to grow.
The view INFORMATION_SCHEMA.INNODB_MUTEXES is not affected, because it
only displays information about rw-locks, not mutexes.
This commit intentionally touches also MutexMonitor::iterate()
and the only code that interfaces with LatchCounter::iterate()
to make it clearer for future readers that the scattered code
that is obfuscated by templates belongs together.
This is based on
mysql/mysql-server@273a93396f
This issue is caused by the commit af40a2b43e.
In btr_search_update_hash_on_insert(), btr_search_sys->hash_tables
is being accessed without taking proper ahi latch. During buffer pool
resizing, btr_get_search_table() is being accessed and it leads to
segmentation fault.
Reviewed-by: Marko Mäkelä
Problem:
=======
InnoDB allows virtual index to be referenced index in foreign key
relations. While dropping the virtual column, Inplace alter does
allow the table to be closed and open it using table name to
update dict_table_t::v_cols. While loading the table, it doesn't
allow any error to be ignored. InnoDB can't find the referenced
virtual index and fails to load the table during Inplace alter.
Solution:
=========
During inplace alter, ignore the foreign key error while loading
the table.
Reviewed-by: Marko Mäkelä
Also add a new member Saved_Size in the Global structure.
modified: storage/connect/global.h
modified: storage/connect/plugutil.cpp
modified: storage/connect/user_connect.cc
modified: storage/connect/jsonudf.cpp
- Add session variables json_all_path and default_depth
modified: storage/connect/ha_connect.cc
modified: storage/connect/mongo.cpp
modified: storage/connect/tabjson.cpp
modified: storage/connect/tabxml.cpp
- ADD column options JPATH and XPATH
Work as FIELD_FORMAT but are more readable
modified: storage/connect/ha_connect.cc
modified: storage/connect/ha_connect.h
modified: storage/connect/mysql-test/connect/r/json_java_2.result
modified: storage/connect/mysql-test/connect/r/json_java_3.result
modified: storage/connect/mysql-test/connect/r/json_mongo_c.result
- Handle negative numbes in the option list
modified: storage/connect/ha_connect.cc
- Fix Json parse that could crash the server.
Was because it could use THROW out of the TRY block.
Also handle all error by THROW.
It is now done by a new class JSON.
modified: storage/connect/json.cpp
modified: storage/connect/json.h
- Add a new UDF function jfile_translate.
It translate a Json file to pretty = 0.
Fast because it does not a real parse of the file.
modified: storage/connect/jsonudf.cpp
modified: storage/connect/jsonudf.h
- Add a now options JSIZE and STRINGIFY to Json tables.
STRINGIFY makes Objects or Arrays to be returned by their
json representation instead of by their concatenated values.
JSIZE allows to specify the LRECL (was 256) defaults to 1024.
Also fix a bug about locating the sub-table by its path.
modified: storage/connect/tabjson.cpp
modified: storage/connect/tabjson.h
- row_search_mvcc() should return DB_INTERRUPTED when it got killed.
- Add a syncpoint for the ICP check.
- Add test coverage for killed-during-ICP-check scenario
Backport of MDEV-22761 fixes for ICP from 10.4 commits:
* a6f956488c
* c03885cd9c
XtraDB was fixed in deb3b9a174
Reviewer: Daniel Black
Sometimes blockdev --getss returns 4096.
In that case ROW_FORMAT=COMPRESSED tables might violate
that 4096 bytes alignment.
This patch disables O_DIRECT for COMPRESSED tables.
OS_DATA_FILE_NO_O_DIRECT: new possible value for os_file_create() argument
fil_node_open_file(): do not O_DIRECT
ROW_FORMAT=COMPRESSED tables
AIO::is_linux_native_aio_supported(): minimal alignment in a general case
is 4096 and not 512.
Problem is that dropping of fts table and sync of fts table
happens concurrently during fts optimize thread shutdown.
fts_optimize_remove_table() is executed by a user thread that
performs DDL, and that the fts_optimize_wq may be removed if
fts_optimize_shutdown() has started executing.
fts_optimize_remove_table() doesn't remove the table from the queue
and it leads to above scenario. While removing the table from
fts_optimize_wq, if the table can't be removed then wait till
fts_optimize_thread shuts down.
Reviewed-by: Marko Mäkelä
Marking of deletion of row in fts index happens twice in
self-referential foreign key relation. So while performing
referential checks of foreign key, InnoDB can avoid updating
of fts index if the foreign key has self-referential relationship.
Reviewed-by: Marko Mäkelä
This regression for debug builds was introduced by
MDEV-23101 (commit 224c950462).
Due to MDEV-16664, the parameter
innodb_lock_schedule_algorithm=VATS
is not enabled by default.
The purpose of the added assertions was to enforce the invariant that
Galera replication cannot be enabled together with VATS due to MDEV-12837.
However, upon closer inspection, it is obvious that the variable 'lock'
may be assigned to the null pointer if no match is found in the
previous->hash list.
lock_grant_and_move_on_page(), lock_grant_and_move_on_rec():
Assert !lock->trx->is_wsrep() only after ensuring that lock
is not a null pointer.
In MDEV-21452, SAFE_MUTEX flagged an ordering problem that involved
trx_t::mutex, LOCK_global_system_variables, and LOCK_commit_ordered
when running
./mtr --no-reorder\
binlog.binlog_checksum,mix binlog.binlog_commit_wait,mix
Because LOCK_commit_ordered is acquired by replication code before
innobase_commit_ordered() is invoked, and because LOCK_commit_ordered
should be below LOCK_global_system_variables in the global latching
order, it turns out that we must avoid acquiring
LOCK_global_system_variables in any low-level code.
It also turns out that lock_rec_lock() acquires lock_sys_t::mutex
and then carries on to call lock_rec_enqueue_waiting(), which may
invoke THDVAR() via thd_lock_wait_timeout(). This call is problematic
if THDVAR() had never been invoked in that thread earlier.
innobase_trx_init(): Let us invoke THDVAR() at the start of an InnoDB
transaction so that future invocations of THDVAR() will avoid
LOCK_global_system_variables acquisition on the same THD. Because
the first call to intern_sys_var_ptr() will initialize all session
variables by not passing the offset to sync_dynamic_session_variables(),
this will indeed make any future THDVAR() invocation mutex-free.
There are some THDVAR() calls in other code (related to indexed virtual
columns, fulltext indexes, and DDL operations). No SAFE_MUTEX warning
was known for those, but there does not appear to be any replication
test coverage for indexed virtual columns or fulltext indexes. DDL should
be covered, and perhaps DDL code paths were already invoking THDVAR()
while not holding any InnoDB mutex.
Side note: MySQL should avoid this type of deadlocks since
mysql/mysql-server@4d275c8995.
MariaDB never defined alloc_and_copy_thd_dynamic_variables(),
because we prefer to avoid overhead during connection creation.
An important part of the deadlock could be the current handling of
SET GLOBAL binlog_checksum=NONE; and similar assignments.
In binlog_checksum_update(), we would hold LOCK_global_system_variables
while potentially acquiring LOCK_commit_ordered in MYSQL_BIN_LOG::open().
Even if that code was changed later to release
LOCK_global_system_variables during the write to mysql_bin_log,
it could be a good idea for performance to avoid invoking the
expensive code path of THDVAR() while holding any InnoDB mutexes,
such as lock_sys.mutex in lock_rec_enqueue_waiting().
Thanks to Andrei Elkin for debugging the SAFE_MUTEX issue, and to
Sergei Golubchik for the suggestion to invoke THDVAR() early.
modified: storage/connect/ha_connect.cc
- Allow JSON columns to be "binary"
By setting their type as VARBINAY(132)
and their name begin with Jbin_
modified: storage/connect/json.h
modified: storage/connect/jsonudf.cpp
modified: storage/connect/tabjson.cpp
modified: storage/connect/value.cpp
modified: storage/connect/value.h
- CHARSET BINARY cannot be used for text columns
modified: storage/connect/mysql-test/connect/r/updelx.result
modified: storage/connect/mysql-test/connect/t/updelx.test
The setting innodb_lock_schedule_algorithm=VATS that was introduced
in MDEV-11039 (commit 021212b525)
causes conflicting exclusive locks to be incorrectly granted to
two transactions. Specifically, in lock_rec_insert_by_trx_age()
the predicate !lock_rec_has_to_wait_in_queue(in_lock) would hold even
though an active transaction is already holding an exclusive lock.
This was observed between two DELETE of the same clustered index record.
The HASH_DELETE invocation in lock_rec_enqueue_waiting() may be related.
Due to lack of progress in diagnosing the problem, we will deprecate the
option and issue a warning that using it may corrupt data. The unsafe
option was enabled between
commit 0c15d1a6ff (MariaDB 10.2.3)
and the parent of
commit 1cc1d0429d (MariaDB 10.2.17, 10.3.9).
The variable connect_work_size is now ulong or ulonglong for 64bit machines.
modified: storage/connect/ha_connect.cc
modified: storage/connect/user_connect.cc
All variables handling sizes that were uint are now size_t.
The variable connect_work_size is now ulong (was uint);
Also make Json functiosn to allocate a larger memory (M=9 was 7)
modified: storage/connect/global.h
modified: storage/connect/ha_connect.cc
modified: storage/connect/json.cpp
modified: storage/connect/jsonudf.cpp
modified: storage/connect/plgdbutl.cpp
modified: storage/connect/plugutil.cpp
modified: storage/connect/user_connect.cc
- Fix uninitialised variable (pretty) in Json_File.
Make Jbin_file accept the same arguments as Json_File ones.
modified: storage/connect/jsonudf.cpp
- Change the Level option to Depth (the word currently used)
(Level being still accepted)
modified: storage/connect/mongo.cpp
modified: storage/connect/tabjson.cpp
modified: storage/connect/tabxml.cpp
- Suppress 2nd argument default value for MYSQLtoPLG function
modified: storage/connect/myutil.h
- Allow REST tables to be create not specifying a file_name
modified: storage/connect/tabrest.cpp
In fts_optimize_remove_table(), InnoDB tries to access the
fts_optimize_wq after shutting down the fts optimize thread.
This issue caused by the commit a41d429765.
Fix should check for fts optimize thread shutdown state
before checking fts_optimize_wq.