If the user "opts in" (as in the parent
commit 92b2a911e5),
we can optimize multiple INSERT statements to use table-level locking
and undo logging.
There will be a change of behavior:
CREATE TABLE t(a PRIMARY KEY) ENGINE=InnoDB;
SET foreign_key_checks=0, unique_checks=0;
BEGIN; INSERT INTO t SET a=1; INSERT INTO t SET a=1; COMMIT;
will end up with an empty table, because in case of an error,
the entire transaction will be rolled back, instead of rolling
back the failing statement. Previously, the second INSERT statement
would have been logged row by row, and only that second statement
would have been rolled back, leaving the first INSERT intact.
lock_table_x_unlock(), trx_mod_table_time_t::WAS_BULK: Remove.
Because we cannot really support statement rollback in this
optimized mode, we will not optimize the locking. The exclusive
table lock will be held until the end of the transaction.
In MDEV-515, we enabled an optimization where an insert into an
empty table will use table-level locking and undo logging.
This may break applications that expect row-level locking.
The SQL statements created by the mysqldump utility will include the
following:
SET unique_checks=0, foreign_key_checks=0;
We will use these flags to enable the table-level locked and logged
insert. Unless the parameters are set, INSERT will be executed in
the old way, with row-level undo logging and implicit record locks.
fil_op_replay_rename(): Remove.
fil_rename_tablespace_check(): Remove a parameter is_discarded=false.
recv_sys_t::parse(): Instead of applying FILE_RENAME operations,
buffer the operations in renamed_spaces.
recv_sys_t::apply(): In the last_batch, apply renamed_spaces.
Updating the debian/control file will automatically update the dependencies
in all CI environments that directly read the debian/control file, such
as Salsa-CI and buildbot.mariadb.org to some degree.
(https://github.com/MariaDB/mariadb.org-tools/issues/43)
On Debian/Ubuntu releases that don't have liburing-dev available,
automatically downgrade to libaio-dev (just like libcurl4->3 is done).
This ensures the debian/control file is always up-to-date and works for
latest Debian and Ubuntu releases, while the backwards compatibility mods
are maintained in autobake-deb.sh separately, and can be dropped from there
once support for certain platforms end.
Debian/Ubuntu availability visible at:
- https://packages.debian.org/search?searchon=names&keywords=liburing-dev
- https://packages.ubuntu.com/search?searchon=names&keywords=liburing-dev
Also modify debian/rules to force a build without libaio. Use YES instead
of ON to make the flag more logical (=turning libaio check "off").
Stop running Salsa-CI for Debian Stretch-backports, as it does not have
liburing-dev available nor is the old-old Debian stable a relevant platform
for MariaDB 10.6 to test against anymore. Since the Stretch-backports build
can no longer be made, neither can the MySQL 5.7 on Bionic upgrade test be
run, as it depended on the Stretch binary.
This commit does not modify the .travis.yml file, as Travis-CI does not
have new enough Ubuntu releases available yet. Also Travis-CI.org is
practically dead now as build times have been shrunk to near zero.
The scope of this change is also Debian/Ubuntu only. No RPM or Windows or
Mac changes are included in this commit.
This commit does not update the external libmariadb or ColumnStore
CI pipelines, as those are maintained in different repositories.
liburing is a new optional dependency (WITH_URING=auto|yes|no)
that replaces libaio when it is available.
aio_uring: class which wraps io_uring stuff
aio_uring::bind()/unbind(): optional optimization
aio_uring::submit_io(): mutex prevents data race. liburing calls are
thread-unsafe. But if you look into it's implementation you'll see
atomic operations. They're used for synchronization between kernel and
user-space only. That's why our own synchronization is still needed.
For systemd, we add LimitMEMLOCK=524288 (ulimit -l 524288)
because the io_uring_setup system call that is invoked
by io_uring_queue_init() requests locked memory. The value
was found empirically; with 262144, we would occasionally
fail to enable io_uring when using the maximum values of
innodb_read_io_threads=64 and innodb_write_io_threads=64.
aio_uring::thread_routine(): Tolerate -EINTR return from
io_uring_wait_cqe(), because it may occur on shutdown
on Ubuntu 20.10 (Groovy Gorilla).
This was mostly implemented by Eugene Kosov. Systemd integration
and improved startup/shutdown error handling by Marko Mäkelä.
n_page_gets is a global counter that is updated on each page access.
This also means it is updated pretty often and with a multi-core machine
it easily boils up to be the hottest counter as also reported by perf.
Using existing distributed counter framework help ease the contention
and improve the performance.
Patch also tend to increase the slot of the distributed counter from original
64 to 128 given that is new normal for next-generation machines.
The original idea and patch came from Daniel Black which is now ported
to 10.6 with some improvement and adjustment.
- This is caused by merge commit a26e7a3726.
InnoDB fails to fetch the next index field when there is a externally
stored column length check involved.
In commit 118e258aaa (part of MDEV-23855)
we inadvertently broke crash recovery, reintroducing MDEV-11556.
fil_system_t::extend_to_recv_size(): Extend all open tablespace files
to the recovered size.
recv_sys_t::apply(): Invoke fil_system.extend_to_recv_size() at the
start of each batch. In this way, any fil_space_t::recv_size
changes that were parsed after the file was opened will be applied.
page_apply_insert_redundant(): Replace a too strict condition
hdr_c > pextra_size. It turns out that page_cur_insert_rec_low()
is not even computing the extra_size of cur->rec when it is trying
to reuse header bytes of the preceding record.
MDEV-25105 (commit 7a4fbb55b0)
in MariaDB 10.6 will refuse the innodb_checksum_algorithm
values none, innodb, strict_none, strict_innodb.
We will issue a deprecation warning if innodb_checksum_algorithm
is set to any of these non-default unsafe values.
innodb_checksum_algorithm=crc32 was made the default in
MySQL 5.7 and MariaDB Server 10.2, and given that older versions
of the server have reached their end of life, there is no valid
reason to use anything else than innodb_checksum_algorithm=crc32
or innodb_checksum_algorithm=strict_crc32 in MariaDB 10.3.
Reviewed by: Sergei Golubchik
Historically, InnoDB supported a buggy page checksum algorithm that did not
compute a checksum over the full page. Later, well before MySQL 4.1
introduced .ibd files and the innodb_file_per_table option, the algorithm
was corrected and the first 4 bytes of each page were redefined to be
a checksum.
The original checksum was so slow that an option to disable page checksum
was introduced for benchmarketing purposes.
The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set
extension, which includes instructions for faster computation of CRC-32C.
In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was
implemented to make of that. As that option was changed to be the default
in MySQL 5.7, a bug was found on big-endian platforms and some work-around
code was added to weaken that checksum further. MariaDB disables that
work-around by default since MDEV-17958.
Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER
and ARM and also for IA-32/AMD64, making use of carry-less multiplication
where available.
Long story short, innodb_checksum_algorithm=crc32 is faster and more secure
than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb.
It should have removed any need to use innodb_checksum_algorithm=none.
The setting innodb_checksum_algorithm=crc32 is the default in
MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5,
MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default.
It is even faster and more secure.
The default settings in MariaDB do allow old data files to be read,
no matter if a worse checksum algorithm had been used.
(Unfortunately, before innodb_checksum_algorithm=full_crc32,
the data files did not identify which checksum algorithm is being used.)
The non-default settings innodb_checksum_algorithm=strict_crc32 or
innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C
checksums. The incompatibility with old data files is why they are
not the default.
The newest server not to support innodb_checksum_algorithm=crc32
were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life.
A valid reason for using innodb_checksum_algorithm=innodb could have
been the ability to downgrade. If it is really needed, data files
can be converted with an older version of the innochecksum utility.
Because there is no good reason to allow data files to be written
with insecure checksums, we will reject those option values:
innodb_checksum_algorithm=none
innodb_checksum_algorithm=innodb
innodb_checksum_algorithm=strict_none
innodb_checksum_algorithm=strict_innodb
Furthermore, the following innochecksum options will be removed,
because only strict crc32 will be supported:
innochecksum --strict-check=crc32
innochecksum -C crc32
innochecksum --write=crc32
innochecksum -w crc32
If a user wishes to convert a data file to use a different checksum
(so that it might be used with the no-longer-supported
MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE
nor system tablespace format changes that were made in MariaDB 10.3),
then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or
MySQL 5.7 can be used.
Reviewed by: Thirunarayanan Balathandayuthapani
- Currently page cleaner thread will stop flushing if
dirty_pct < innodb_max_dirty_pages_pct_lwm.
- If the server is not performing any activity then said resources/time
could be used to flush the pending dirty pages and keep buffer pool
clean for the next burst of the cycle. This flushing is called idle flushing.
- flushing logic underwent a complete revamp in 10.5.7/8
and as part of the revamp idle flushing logic got removed.
- New proposed logic of idle flushing is based on updated logic of the
page cleaner that will enable idle flushing if
- buf page cleaner is idle
- there are dirty pages (< innodb_max_dirty_pages_pct_lwm)
- server is not performing any activity
Logic will kickstart the idle flushing bounded by innodb_io_capacity.
(Thanks to Marko Makela for reviewing the patch and idea
right from the its inception).
It was possibile for a user to create an interlocked state which may go on
for a significant period of time. There is a tight loop in the FTWRL code
path that tries to repeatedly acquire a read lock. As the weight of FTWRL
lock is the smallest among others, it's always selected by the deadlock
detector, but can never be killed.
Imaging the following sequence:
connection_0 connection_1
GET_LOCK("l1", 0);
LOCK TABLES t WRITE;
FLUSH TABLES WITH READ LOCK;
GET_LOCK("l1", 1000);
The GET_LOCK statement in connection_1 triggers the deadlock detector,
which tries to select the lock in FTWRL, since its weight is 0. However,
since a loop in Global_read_lock::lock_global_read_lock() tries to always
win, it tries to acquire lock again. Which invokes the deadlock detector,
and that cycle continues until GET_LOCK in connection_1 times out.
This patch resolves the live-locking by introducing a dynamic bonus to the
deadlock weight associated with every lock. Each lock gets a bonus weight
each time it's selected by the deadlock detector. In case of a live-lock
situation, those locks that cannot be killed, get additional weight each
iteration. Eventually their weight becomes so high that the deadlock
detector shifts its attention to other lock, until it find the one that
can be killed.
The problem was that the CONNECT engine is trying to open the .frm file
during drop_table(), which the code did not take into account.
Fixed by adding the HA_REUSES_FILE_NAMES table flag to CONNECT.
Other things:
- Fixed a wrong test of HA_REUSE_FILE_NAMES of in mysql_alter_table()
(Comment was correct, no the code)
- Added a test in the connect engine that if the .frm it tries to use in
delete is not made for connect, it will generate an error instead of
crash.
InnoDB set the space in dict_table_t as NULL when table
is discarded. So InnoDB shouldn't use the space present
in table to detect whether the given tablespace is
temporary tablespace.
- Reduce Build-Depends
150bf990c6
Dependencies chrpath, dh-apparmor and libarchive-dev are not needed.
Fixes buildbot sid failures that error on:
Unmet build dependencies: chrpath dh-apparmor libarchive-dev
- Salsa-CI: Remove mysql-5.7 upgrade in Sid test as package was removed
6f55ac620c
Also clean away extra Salsa-CI markup not needed anymore.
- Autopkgtest: Simplify autopkgtest 'smoke' to be easier to debug
836907989a
- Autopkgtest: Skip main.failed_auth_unixsocket on armhf and i386
74601f8b31
failed in dtuple_convert_big_rec
In dtuple_convert_big_rec(), InnoDB fails to consider the
instant metadata blob while choosing the variable length
field.
The maximum number of concurrently waiting transactions is one less
than the maximum number of concurrent transactions.
A 45-bit cumulative counter of lock waits will support more than
one million lock waits per second for a year.
Let us add the status variable innodb_buffer_pool_pages_LRU_freed
to monitor the number of pages that were freed by a buffer pool LRU
eviction scan, without flushing.
Also, let us simplify the monitor interface:
MONITOR_LRU_BATCH_FLUSH_COUNT, MONITOR_LRU_BATCH_FLUSH_PAGES,
MONITOR_LRU_BATCH_EVICT_COUNT, MONITOR_LRU_BATCH_EVICT_PAGES:
Remove.
MONITOR_LRU_BATCH_FLUSH_TOTAL_PAGE: Track buf_lru_flush_page_count
(innodb_buffer_pool_pages_LRU_flushed).
MONITOR_LRU_BATCH_EVICT_TOTAL_PAGE: Track buf_lru_freed_page_count
(buffer_pool_pages_LRU_freed).
Reviewed by: Vladislav Vaintroub
optimize_schema_tables_memory_usage() crashed when its argument included
TABLE struct that was not fully initialized.
To prevent such a crash, we check if a table is an information schema table at
the beginning of each iteration.
Closes#1768
1. Add FORMAT_MESSAGE_MAX_WIDTH_MASK flag to remove trailing \r\n from
the error message returned by FormatMessageA().
2. Also, add FORMAT_MESSAGE_IGNORE_INSERTS to avoid potential problems
with system messages that have argument placeholders.
Quote from FormatMessage docs on MSDN:
If this function is called without FORMAT_MESSAGE_IGNORE_INSERTS, the
Arguments parameter must contain enough parameters to satisfy all insertion
sequences in the message string, and they must be of the correct type.
Therefore, do not use untrusted or unknown message strings with inserts
enabled because they can contain more insertion sequences than Arguments
provides, or those that may be of the wrong type. In particular, it is
unsafe to take an arbitrary system error code returned from an API and use
FORMAT_MESSAGE_FROM_SYSTEM without FORMAT_MESSAGE_IGNORE_INSERTS.
not be dropped if the DEFINER is custom. Revert changes
to MDEV-23102 tests as they were designed to catch
this corner case.
The explanation for this corner case is that users
historically used to tweak the mysql.user table and
probably still do even though mysql.user is now a view.
Thus, if the DEFINER of the view is not default, i.e.
root@localhost or mariadb.sys@localhost, we should avoid
dropping the view during upgrade process to not discard
potential custom changes.
if a query used no fields from an I_S table, we were creating a temp
table with one, first, field (as a table cannot have zero fields),
with its length truncated to 1.
Now - force also this dummy field to be a normal field, not a BLOB