Starting with commit baf276e6d4 (MDEV-19229)
the parameter innodb_undo_tablespaces can be increased from its
previous default value 0 while allowing an upgrade from old databases.
We will change the default setting to innodb_undo_tablespaces=3
so that the space occupied by possible bursts of undo log records
can be reclaimed after SET GLOBAL innodb_undo_log_truncate=ON.
We will not enable innodb_undo_log_truncate by default, because it
causes some observable performance degradation.
Special thanks to Thirunarayanan Balathandayuthapani for diagnosing
and fixing a number of bugs related to this new default setting.
Tested by: Matthias Leich, Axel Schwenke, Vladislav Vaintroub
(with both values of innodb_undo_log_truncate)
node->is_delete was incorrectly set to NO_DELETE for a set of operations.
In general we shouldn't rely on sql_command and look for more abstract ways
to control the behavior.
trg_event_map seems to be a suitable way. To mind replica nodes, it is ORed
with slave_fk_event_map, which stores trg_event_map when replica has
triggers disabled.
InnoDB tables that lack a primary key (and any UNIQUE INDEX whose
all columns are NOT NULL) will use an internally generated index,
called GEN_CLUST_INDEX(DB_ROW_ID) in the InnoDB data dictionary,
and hidden from the SQL layer.
The 48-bit (6-byte) DB_ROW_ID is being assigned from a
global sequence that is persisted in the DICT_HDR page.
There is absolutely no reason for the DB_ROW_ID to be globally
unique across all InnoDB tables.
A downgrade to earlier versions will be prevented by the file format
change related to removing the InnoDB change buffer (MDEV-29694).
DICT_HDR_ROW_ID, dict_sys_t::row_id: Remove.
dict_table_t::row_id: The per-table sequence of DB_ROW_ID.
commit_try_rebuild(): Copy dict_table_t::row_id from the old table.
btr_cur_instant_init(), row_import_cleanup(): If needed, perform
the equivalent of SELECT MAX(DB_ROW_ID) to initialize
dict_table_t::row_id.
row_ins(): If needed, obtain DB_ROW_ID from dict_table_t::row_id.
Should it exceed the maximum 48-bit value, return DB_OUT_OF_FILE_SPACE
to prevent further inserts into the table.
dict_load_table_one(): Move a condition to btr_cur_instant_init_low()
so that dict_table_t::row_id will be restored also for
ROW_FORMAT=COMPRESSED tables.
Tested by: Matthias Leich
The purpose of the change buffer was to reduce random disk access,
which could be useful on rotational storage, but maybe less so on
solid-state storage.
When we wished to
(1) insert a record into a non-unique secondary index,
(2) delete-mark a secondary index record,
(3) delete a secondary index record as part of purge (but not ROLLBACK),
and the B-tree leaf page where the record belongs to is not in the buffer
pool, we inserted a record into the change buffer B-tree, indexed by
the page identifier. When the page was eventually read into the buffer
pool, we looked up the change buffer B-tree for any modifications to the
page, applied these upon the completion of the read operation. This
was called the insert buffer merge.
We remove the change buffer, because it has been the source of
various hard-to-reproduce corruption bugs, including those fixed in
commit 5b9ee8d819 and
commit 165564d3c3 but not limited to them.
A downgrade will fail with a clear message starting with
commit db14eb16f9 (MDEV-30106).
buf_page_t::state: Merge IBUF_EXIST to UNFIXED and
WRITE_FIX_IBUF to WRITE_FIX.
buf_pool_t::watch[]: Remove.
trx_t: Move isolation_level, check_foreigns, check_unique_secondary,
bulk_insert into the same bit-field. The only purpose of
trx_t::check_unique_secondary is to enable bulk insert into an
empty table. It no longer enables insert buffering for UNIQUE INDEX.
btr_cur_t::thr: Remove. This field was originally needed for change
buffering. Later, its use was extended to cover SPATIAL INDEX.
Much of the time, rtr_info::thr holds this field. When it does not,
we will add parameters to SPATIAL INDEX specific functions.
ibuf_upgrade_needed(): Check if the change buffer needs to be updated.
ibuf_upgrade(): Merge and upgrade the change buffer after all redo log
has been applied. Free any pages consumed by the change buffer, and
zero out the change buffer root page to mark the upgrade completed,
and to prevent a downgrade to an earlier version.
dict_load_tablespaces(): Renamed from
dict_check_tablespaces_and_store_max_id(). This needs to be invoked
before ibuf_upgrade().
btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.
btr_page_alloc(): Renamed from btr_page_alloc_low(). We no longer
allocate any change buffer pages.
btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.
row_search_index_entry(), btr_lift_page_up(): Add a parameter thr
for the SPATIAL INDEX case.
rtr_page_split_and_insert(): Specialized from btr_page_split_and_insert().
rtr_root_raise_and_insert(): Specialized from btr_root_raise_and_insert().
Note: The support for upgrading from the MySQL 3.23 or MySQL 4.0
change buffer format that predates the MySQL 4.1 introduction of
the option innodb_file_per_table was removed in MySQL 5.6.5
as part of mysql/mysql-server@69b6241a79
and MariaDB 10.0.11 as part of 1d0f70c2f8.
In the tests innodb.log_upgrade and innodb.log_corruption, we create
valid (upgraded) change buffer pages.
Tested by: Matthias Leich
We introduce the following settable Boolean global variables:
innodb_log_file_write_through: Whether writes to ib_logfile0 are
write-through (disabling any caching, as in O_SYNC or O_DSYNC).
innodb_data_file_write_through: Whether writes to any InnoDB data files
(including the temporary tablespace) are write-through.
innodb_data_file_buffering: Whether the file system cache is enabled
for InnoDB data files.
All these parameters are OFF by default, that is, the file system cache
will be disabled, but any hardware caching is enabled, that is,
explicit calls to fsync(), fdatasync() or similar functions are needed.
On systems that support FUA it may make sense to enable write-through,
to avoid extra system calls.
If the deprecated read-only start-up parameter is set to one of the
following values, then the values of the 4 Boolean flags (the above 3
plus innodb_log_file_buffering) will be set as follows:
O_DSYNC:
innodb_log_file_write_through=ON, innodb_data_file_write_through=ON,
innodb_data_file_buffering=OFF, and
(if supported) innodb_log_file_buffering=OFF.
fsync, littlesync, nosync, or (Microsoft Windows specific) normal:
innodb_log_file_write_through=OFF, innodb_data_file_write_through=OFF,
and innodb_data_file_buffering=ON.
Note: fsync() or fdatasync() will only be disabled if the separate
parameter debug_no_sync (in the code, my_disable_sync) is set.
In mariadb-backup, the parameter innodb_flush_method will be ignored.
The Boolean parameters can be modified by SET GLOBAL while the
server is running. This will require reopening the ib_logfile0
or all currently open InnoDB data files.
We will open files straight in O_DSYNC or O_SYNC mode when applicable.
Data files we will try to open straight in O_DIRECT mode when the
page size is at least 4096 bytes. For atomically creating data files,
we will invoke os_file_set_nocache() to enable O_DIRECT afterwards,
because O_DIRECT is not supported on some file systems. We will also
continue to invoke os_file_set_nocache() on ib_logfile0 when
innodb_log_file_buffering=OFF can be fulfilled.
For reopening the ib_logfile0, we use the same logic that was developed
for online log resizing and reused for updates of
innodb_log_file_buffering.
Reopening all data files is implemented in the new function
fil_space_t::reopen_all().
Reviewed by: Vladislav Vaintroub
Tested by: Matthias Leich
Before commit 6112853cda in MySQL 4.1.1
introduced the parameter innodb_file_per_table, all InnoDB data was
written to the InnoDB system tablespace (often named ibdata1).
A serious design problem is that once the system tablespace has grown to
some size, it cannot shrink even if the data inside it has been deleted.
There are also other design problems, such as the server hang MDEV-29930
that should only be possible when using innodb_file_per_table=0 and
innodb_undo_tablespaces=0 (storing both tables and undo logs in the
InnoDB system tablespace).
The parameter innodb_change_buffering was deprecated
in commit b5852ffbee.
Starting with commit baf276e6d4
(MDEV-19229) the number of innodb_undo_tablespaces can be increased,
so that the undo logs can be moved out of the system tablespace
of an existing installation.
If all these things (tables, undo logs, and the change buffer) are
removed from the InnoDB system tablespace, the only variable-size
data structure inside it is the InnoDB data dictionary.
DDL operations on .ibd files was optimized in
commit 86dc7b4d4c (MDEV-24626).
That should have removed any thinkable performance advantage of
using innodb_file_per_table=0.
Since there should be no benefit of setting innodb_file_per_table=0,
the parameter should be deprecated. Starting with MySQL 5.6 and
MariaDB Server 10.0, the default value is innodb_file_per_table=1.
clang15 finally errors on old prototype definations.
Its also a lot fussier about variables that aren't used
as is the case a number of time with loop counters that
aren't examined.
RocksDB was complaining that its get_range function was
declared without the array length in ha_rocksdb.h. While
a constant is used rather than trying to import the
Rdb_key_def::INDEX_NUMBER_SIZE header (was causing a lot of
errors on the defination of other orders). If the constant
does change can be assured that the same compile warnings will
tell us of the error.
The ha_rocksdb::index_read_map_impl DBUG_EXECUTE_IF was similar
to the existing endless functions used in replication tests.
Its rather moot point as the rocksdb.force_shutdown test that
uses myrocks_busy_loop_on_row_read is currently disabled.
The MDEV-25004 test innodb_fts.versioning is omitted because ever since
commit 685d958e38 InnoDB would not allow
writes to a database where the redo log file ib_logfile0 is missing.
to copy datafile
- Mariabackup fails to copy the undo log tablespace when it undergoes
truncation. So Mariabackup should detect the redo log which does
undo tablespace truncation and also backup should read the minimum
file size of the tablespace and ignore the error while reading.
- Throw error when innodb undo tablespace read failed, but backup
doesn't find the redo log for undo tablespace truncation
1. In case of system-versioned table add row_end into FTS_DOC_ID index
in fts_create_common_tables() and innobase_create_key_defs().
fts_n_uniq() returns 1 or 2 depending on whether the table is
system-versioned.
After this patch recreate of FTS_DOC_ID index is required for
existing system-versioned tables. If you see this message in error
log or server warnings: "InnoDB: Table db/t1 contains 2 indexes
inside InnoDB, which is different from the number of indexes 1
defined in the MariaDB" use this command to fix the table:
ALTER TABLE db.t1 FORCE;
2. Fix duplicate history for secondary unique index like it was done
in MDEV-23644 for clustered index (932ec586aa). In case of
existing history row which conflicts with currently inseted row we
check in row_ins_scan_sec_index_for_duplicate() whether that row
was inserted as part of current transaction. In that case we
indicate with DB_FOREIGN_DUPLICATE_KEY that new history row is not
needed and should be silently skipped.
3. Some parts of MDEV-21138 (7410ff436e) reverted. Skipping of
FTS_DOC_ID index for history rows made problems with purge
system. Now this is fixed differently by p.2.
4. wait_all_purged.inc checks that we didn't affect non-history rows
so they are deleted and purged correctly.
Additional FTS fixes
fts_init_get_doc_id(): exclude history rows from max_doc_id
calculation. fts_init_get_doc_id() callback is used only for crash
recovery.
fts_add_doc_by_id(): set max value for row_end field.
fts_read_stopword(): stopwords table can be system-versioned too. We
now read stopwords only for current data.
row_insert_for_mysql(): exclude history rows from doc_id validation.
row_merge_read_clustered_index(): exclude history_rows from doc_id
processing.
fts_load_user_stopword(): for versioned table retrieve row_end field
and skip history rows. For non-versioned table we retrieve 'value'
field twice (just for uniformity).
FTS tests for System Versioning now include maybe_versioning.inc which
adds 3 combinations:
'vers' for debug build sets sysvers_force and
sysvers_hide. sysvers_force makes every created table
system-versioned, sysvers_hide hides WITH SYSTEM VERSIONING
for SHOW CREATE.
Note: basic.test, stopword.test and versioning.test do not
require debug for 'vers' combination. This is controlled by
$modify_create_table in maybe_versioning.inc and these
tests run WITH SYSTEM VERSIONING explicitly which allows to
test 'vers' combination on non-debug builds.
'vers_trx' like 'vers' sets sysvers_force_trx and sysvers_hide. That
tests FTS with trx_id-based System Versioning.
'orig' works like before: no System Versioning is added, no debug is
required.
Upgrade/downgrade test for System Versioning is done by
innodb_fts.versioning. It has 2 combinations:
'prepare' makes binaries in std_data (requires old server and OLD_BINDIR).
It tests upgrade/downgrade against old server as well.
'upgrade' tests upgrade against binaries in std_data.
Cleanups:
Removed innodb-fts-stopword.test as it duplicates stopword.test
Works like vers_force but forces trx_id-based system-versioned tables
if the storage supports it (currently InnoDB-only). Otherwise creates
timestamp-based system-versioned table.
The conn_kind, which stands for "connection kind", is no longer useful
because the HandlerSocket support is deleted and Spider now has only
one connection kind, SPIDER_CONN_KIND_MYSQL. Remove conn_kind and
related code.
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
Reviewed-by: Nayuta Yanagisawa <nayuta.yanagisawa@mariadb.com>
Produced using the following command
unifdef -UITEM_FUNC_TIMESTAMPDIFF_ARE_PUBLIC -m storage/spider/spd_*
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
Reviewed-by: nayuta.yanagisawa@mariadb.com
ha_spider::create(): Pass the correct length of the argument of the
CHARSET attribute. The macro STRING_WITH_LEN() is intended to be used
with NUL terminated string constants only. Here, it would incorrectly
pass sizeof(char*)-1 as the length.
The incorrect type handler caused an incorrect result_type() for
Item_cache_row (STRING_RESULT rather than ROW_RESULT). By updating the
constructor of Item_cache_row with the correct type handler, it fixes
this problem.
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
Reviewed-by: Sergei Golubchik <serg@mariadb.com>
When trying to create a spider table with banned charsets including
utf32, utf16, ucs2 and utf16le[1], spider should emit an error
immediately, rather than wait until a separate statement that
establishes a connection (e.g. SELECT). This also applies to ALTER
TABLE statement that changes charsets.
[1] https://mariadb.com/kb/en/server-system-variables/#character_set_client
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
Reviewed-by: Nayuta Yanagisawa <nayuta.yanagisawa@mariadb.com>
spider_rewrite plugin is not in 11.0. spider initialization code used
"11.0" for "in the distant future", but suddenly it's now
spider tests didn't expect anything beyond 10.x
Before the fix next-key lock was requested only if a record was
delete-marked for locking unique search in RR isolation level.
There can be several delete-marked records for the same unique key,
that's why InnoDB scans the records until eighter non-delete-marked record
is reached or all delete-marked records with the same unique key are
scanned.
For range scan next-key locks are used for RR to protect scanned range from
inserting new records by other transactions. And this is the reason of why
next-key locks are used for delete-marked records for unique searches.
If a record is not delete-marked, the requested lock type was "not-gap".
When a record is not delete-marked during lock request by trx 1, and
some other transaction holds conflicting lock, trx 1 creates waiting
not-gap lock on the record and suspends. During trx 1 suspending the
record can be delete-marked. And when the lock is granted on conflicting
transaction commit or rollback, its type is still "not-gap". So we have
"not-gap" lock on delete-marked record for RR. And this let some other
transaction to insert some record with the same unique key when trx 1 is
not committed, what can cause isolation level violation.
The fix is to set next-key locks for both delete-marked and
non-delete-marked records for unique search in RR.
When trying to create a spider table with banned charsets including
utf32, utf16, ucs2 and utf16le[1], spider should emit an error
immediately, rather than wait until a separate statement that
establishes a connection (e.g. SELECT). This also applies to ALTER
TABLE statement that changes charsets.
[1] https://mariadb.com/kb/en/server-system-variables/#character_set_client
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
Reviewed-by: Nayuta Yanagisawa <nayuta.yanagisawa@mariadb.com>
The variable was not really being used for anything. The parameters
innodb_read_io_threads, innodb_write_io_threads have replaced
innodb_file_io_threads.