Many InnoDB rw-locks unnecessarily depend on the complex
InnoDB rw_lock_t implementation that support the SX lock mode
as well as recursive acquisition of X or SX locks.
One of them is the bunch of adaptive hash index search latches,
instrumented as btr_search_latch in PERFORMANCE_SCHEMA.
Let us introduce a simpler lock for those in order to
reduce overhead.
srw_lock: A simple read-write lock that does not support recursion.
On Microsoft Windows, this wraps SRWLOCK, only adding
runtime overhead if PERFORMANCE_SCHEMA is enabled.
On Linux (all architectures), this is implemented with
std::atomic<uint32_t> and the futex system call.
On other platforms, we will wrap mysql_rwlock_t with
zero runtime overhead.
The PERFORMANCE_SCHEMA instrumentation differs
from InnoDB rw_lock_t in that we will only invoke
PSI_RWLOCK_CALL(start_rwlock_wrwait) or
PSI_RWLOCK_CALL(start_rwlock_rdwait)
if there is an actual conflict.
We always defined PFS_SKIP_BUFFER_MUTEX_RWLOCK, that is,
the latches of the buffer pool blocks were never instrumented
in PERFORMANCE_SCHEMA.
For some reason, the debug_latch (which enforce proper usage of
buffer-fixing in debug builds) was instrumented.
The test seems to deterministically fail on RelWithDebInfo builds
due to a timeout in wait_condition.inc.
According to Matthias Leich (the original author of the test),
the failure rate would reduce if we disabled the purge of
transaction history by setting innodb_force_recovery=2.
For now, let us run this stress test on debug builds only.
When MDEV-19544 (commit 1a6f470464)
simplified the initialization of the local variable
set_also_gap_locks, an inadvertent change was included.
Essentially, all code branches that are executed when
set_also_gap_locks hold must also ensure that
trx->isolation_level > TRX_ISO_READ_COMMITTED holds.
This was being violated in a few code paths.
It turns out that there is an even simpler fix: Remove the test
of thd_is_select() completely. In that way, the first part of
UPDATE or DELETE should work exactly like SELECT...FOR UPDATE.
thd_is_select(): Remove.
Add a new privilege "SLAVE MONITOR" which will grant user the permission
to execute "SHOW SLAVE STATUS" and "SHOW RELAYLOG EVENTS" commands.
SHOW SLAVE STATUS requires either SLAVE MONITOR/SUPER
SHOW RELAYLOG EVENTS requires SLAVE MONITOR privilege.
Due to a premature cleanup of the unit that specified a recursive CTE
used in the second operand of union the server fell into an infinite
loop in the reported test case. In other cases this premature cleanup
could cause other problems.
The bug is the result of a not quite correct fix for MDEV-17024. The
unit that specifies a recursive CTE has to be cleaned only after the
cleanup of the last external reference to this CTE. It means that
cleanups of the unit triggered not by the cleanup of a external
reference to the CTE must be blocked.
Usage of local table chains in selects to get external references to
recursive CTEs was not correct either because of possible merges of
some selects.
Also fixed a minor bug in st_select_lex::set_explain_type() that caused
typing 'RECURSIVE UNION' instead of 'UNION' in EXPLAIN output for external
references to a recursive CTE.
A follow-up fix, for original fix for MDEV-21577, which did not handle well
temporary tables.
OPTIMIZE and REPAIR TABLE statements can take a list of tables as argument,
and some of the tables may be temporary. Proper handling of temporary tables
is to skip them and continue working on the real tables. The bad version, skipped all tables,
if a single temporary table was in the argument list. And this resulted so that FK parent
tables were not scnanned for the remaining real tables. Some mtr tests, using OPTIMIZE or REPAIR
for temporary tables caused regressions bacause of this, e.g. galera.galera_optimize_analyze_multi
The fix in this PR opens temporary and real tables first, and in the table handling loop skips
temporary tables, FK parent scanning is done only for real tables.
The test has new scenario for OPTIMIZE and REPAIR issued for two tables of which one is temporary table.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
This PR fixes same issue as MDEV-21577 for TRUNCATE TABLE.
MDEV-21577 fixed TOI replication for OPTIMIZE, REPAIR and ALTER TABLE
operating on FK child table. It was later found out that also TRUNCATE
has similar problem and needs a fix.
The actual fix is to do FK parent table lookup before TRUNCATE TOI
isolation and append found FK parent table names in certification key
list for the write set.
PR contains also new test scenario in galera_ddl_fk_conflict test where
FK child has two FK parent tables and there are two DML transactions operating
on both parent tables.
For development convenience, new TO isolation macro was added:
WSREP_TO_ISOLATION_BEGIN_IF and WSREP_TO_ISOLATION_BEGIN_ALTER macro was changed
to skip the goto statement.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
The InnoDB internal tables SYS_TABLESPACES and SYS_DATAFILES as well as the
INFORMATION_SCHEMA views INNODB_SYS_TABLESPACES and INNODB_SYS_DATAFILES
were introduced in MySQL 5.6 for no good reason in
mysql/mysql-server/commit/e9255a22ef16d612a8076bc0b34002bc5a784627
when the InnoDB support for the DATA DIRECTORY attribute was introduced.
The file system should be the authoritative source of information on files.
Storing information about file system paths in the file system (symlinks,
or even the .isl files that were unfortunately chosen as the solution) is
sufficient. If information is additionally stored in some hidden tables
inside the InnoDB system tablespace, everything unnecessarily becomes
more complicated, because more copies of data mean more opportunity
for the copies to be out of sync, and because modifying the data in
the system tablespace in the desired way might not be possible at all
without modifying the InnoDB source code. So, the copy in the system
tablespace basically is a redundant, non-authoritative source of
information.
We will stop creating or accessing the system tables SYS_TABLESPACES
and SYS_DATAFILES.
We will also remove the view
INFORMATION_SCHEMA.INNODB_SYS_DATAFILES along with SYS_DATAFILES.
The view
INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES will be repurposed
to directly reflect fil_system.space_list. The column
PAGE_SIZE, which would always contain the value of
the GLOBAL read-only variable innodb_page_size, is
removed. The column ZIP_PAGE_SIZE, which would actually
contain the physical page size of a page, is renamed to
PAGE_SIZE. Finally, a new column FILENAME is added, as a
replacement of SYS_DATAFILES.PATH.
This will also
address MDEV-21801 (files that were created before upgrading
to MySQL 5.6 or MariaDB 10.0 or later were never registered
in SYS_TABLESPACES or SYS_DATAFILES) and
MDEV-21801 (information about the system tablespace is not stored
in SYS_TABLESPACES or SYS_DATAFILES).
Let us introduce the parameter innodb_read_only_compressed
that is ON by default, making any ROW_FORMAT=COMPRESSED tables
read-only.
I developed the ROW_FORMAT=COMPRESSED format based on
Heikki Tuuri's rough design between 2005 and 2008. It might
have been a good idea back then, but no proper benchmarks were
ever run to validate the design or the implementation.
The format has been more or less obsolete for years.
It limits innodb_page_size to 16384 bytes (the default),
and instant ALTER TABLE is not supported.
This is the first step towards deprecating and removing
write support for ROW_FORMAT=COMPRESSED tables.
- Patch is solving generating report on warning
To repeat the error run single worker:
```
./mtr --mysqld=--lock-wait-timeout=-xx 1st 1st --force --parallel 1
```
or `N` workers with `N+1` tests with failures and `force`
```
./mtr --mysqld=--lock-wait-timeout=-xx 1st 1st grant5 --force --parallel 2
```
- Patch is doing cosmetic fix of `current_test` log file which holds the old log value of test `CURRENT TEST:..` in `mark_log()` in case of `unknown option` and as such
the logic which is using it's content doesn't output valid log content and doesn't generate valid `$test->{'comment'}` message.asdf
- Closing the socket/handler after the removing the handler from IO for
consistency
Reviewed by: serg@mariadb.com
This bug could manifest itself for a query with WHERE condition containing
top level OR formula such that each conjunct contained a single-range
condition supported by the same index. One of these range conditions must
be fully covered by another range condition that is used later in the OR
formula. Additionally at least one of these condition should be ANDed with
a sargable range condition supported by a different index.
There were several attempts to fix related problems for OR conditions after
the backport of range optimizer code from MySQL (commit
0e19f3e36f). Unfortunately the first of these
fixes contained typo remained unnoticed until recently. This typo bug led
to rejection of valid range accesses. This patch fixed this typo bug.
The fix revealed another two bugs: one in a constructor for SEL_ARG,
the other in the function tree_or(). Both are fixed in this patch.
Part#1: Revert the patch that caused it:
commit 291be49474
Author: Igor Babaev <igor@askmonty.org>
Date: Thu Sep 24 22:02:00 2020 -0700
MDEV-23811: With large number of indexes optimizer chooses an inefficient plan
MDEV-23855 broke the handling of innodb_flush_sync=OFF.
That parameter is supposed to limit the page write rate
in case the log capacity is being exceeded and log checkpoints
are needed.
With this fix, the following should pass:
./mtr --mysqld=--loose-innodb-flush-sync=0
One of our best regression tests for page flushing is
encryption.innochecksum. With innodb_page_size=16k and
innodb_flush_sync=OFF it would likely hang without this fix.
log_sys.last_checkpoint_lsn: Declare as Atomic_relaxed<lsn_t>
so that we are allowed to read the value while not holding
log_sys.mutex.
buf_flush_wait_flushed(): Let the page cleaner perform the flushing
also if innodb_flush_sync=OFF. After the page cleaner has
completed, perform a checkpoint if it is needed, because
buf_flush_sync_for_checkpoint() will not be run if
innodb_flush_sync=OFF.
buf_flush_ahead(): Simplify the condition. We do not really care
whether buf_flush_page_cleaner() is running.
buf_flush_page_cleaner(): Evaluate innodb_flush_sync at the low
level. If innodb_flush_sync=OFF, rate-limit the batches to
innodb_io_capacity_max pages per second.
Reviewed by: Vladislav Vaintroub
The parser of CREATE USER accepts ACCOUNT LOCK before PASSWORD
EXPIRE but not the other way around.
This just changes the SHOW CREATE USER to output a sql syntax that
is valid.
Thanks to Robert Bindar for analysis.