During a Sysbench oltp_point_select workload with 1 table and 400
concurrent connections, a bottleneck on dict_table_t::lock_mutex was
observed in ha_innobase::info_low().
dict_table_t::lock_latch: Replaces lock_mutex.
In ha_innobase::info_low() and several other places, we will acquire
a shared dict_table_t::lock_latch or we may elide the latch if
hardware memory transactions are available.
innobase_build_v_templ(): Remove the parameter "bool locked", and
require the caller to hold exclusive dict_table_t::lock_latch
(instead of holding an exclusive dict_sys.latch).
Tested by: Vladislav Vaintroub
Reviewed by: Vladislav Vaintroub
The log_event_old.cc is included by mysqlbinlog.cc.
With -DWITHOUT_SERVER the include path for the wsrep
include headers isn't there.
As these aren't needed by the mariadb-binlog, move
these to under a ifndef MYSQL_CLIENT preprocessor.
Caused by MDEV-18590
Just like the spider/bugfix suite.
One caveat is that my_2_3.cnf needs something under mysqld.2.3 group,
otherwise mtr will fail with something like:
There is no group named 'mysqld.2.3' that can be used to resolve
'port' for ...
This will allow new tests under the spider suite to use what is
needed. It also somehow fixes issues of running a test followed by
spider.slave_trx_isolation.
Executing an INSERT statement in PS mode having positional parameter
bound with an array could result in incorrect number of inserted rows
in case there is a BEFORE INSERT trigger that executes yet another
INSERT statement to put a copy of row being inserted into some table.
The reason for incorrect number of inserted rows is that a data structure
used for binding positional argument with its actual values is stored
in THD (this is thd->bulk_param) and reused on processing every INSERT
statement. It leads to consuming actual values bound with top-level
INSERT statement by other INSERT statements used by triggers' body.
To fix the issue, reset the thd->bulk_param temporary to the value nullptr
before invoking triggers and restore its value on finishing its execution.
Having Item_func_not items in item trees breaks assumptions during the
optimization phase about transformation possibilities in fix_fields().
Remove Item_func_not by extending normalization during parsing.
Reviewed by Oleksandr Byelkin (sanja@mariadb.com)
New error codes can only be added in the latest major version.
Adding ER_KILL_DENIED_HIGH_PRIORITY would shift by one all
error codes that were added in MariaDB Server 10.6 or later.
This amends commit 1001dae186
Suggested by: Sergei Golubchik
ha_innobase::info_low(): For HA_STATUS_VARIABLE without
HA_STATUS_VARIABLE_EXTRA, let us avoid unnecessary and costly updates
of the data_free statistics, which are only needed for SHOW TABLE STATUS.
This optimization had been enabled in
commit 247ecb7597 but not utilized until now.
The variable `sbindir` is never set for cmake. This adds borked paths to
`galera_recovery`, though it dit not break as the systemd unit changes
the dir to make the relative path work anyway.
Let's fix this nevertheless...
RAND() and UUID() are treated differently with respect to subquery
materialization both should be marked as uncacheable, forcing materialization.
Altered Create_func_uuid(_short)::create_builder().
Added comment in header about UNCACHEABLE_RAND meaning also unmergeable.
- Few of test case should make sure that InnoDB does hit
the debug sync point during startup of the server.
InnoDB can remove the double quotes of debug point
in restart parameters.
log_slow_innodb.test sets "log_slow_verbosity_innodb_expected_matches"
in include/log_slow_grep.inc:
- Fix a typo: take value from log_slow_*VERBOSITY*_innodb_expected_matches
- Add a missing "--source include/log_grep.inc"
- two-line search patterns do not work, search for each line individually.
The fixes in b8a6719889 have not disabled
semi-consistent read for innodb_snapshot_isolation=ON mode, they just allowed
read uncommitted version of a record, that's why the test for MDEV-26643 worked
well.
The semi-consistent read should be disabled on upper level in
row_search_mvcc() for READ COMMITTED isolation level.
Reviewed by Marko Mäkelä.
- InnoDB tries to write FILE_CHECKPOINT marker during
early recovery when log file size is insufficient.
While updating the log checkpoint at the end of the recovery,
InnoDB must already have written out all pending changes
to the persistent files. To complete the checkpoint, InnoDB
has to write some log records for the checkpoint and to
update the checkpoint header. If the server gets killed
before updating the checkpoint header then it would lead
the logfile to be unrecoverable.
- This patch avoids FILE_CHECKPOINT marker during early
recovery and narrows down the window of opportunity to
make the log file unrecoverable.
There were erroneous calls for charpos() in key_hashnr() and key_buf_cmp().
These functions are never called with prefix segments.
The charpos() calls were wrong. Before the change BNHL joins
- could return wrong result sets, as reported in MDEV-34417
- were extremely slow for multi-byte character sets, because
the hash was calculated on string prefixes, which increased
the amount of collisions drastically.
This patch fixes the wrong result set as reported in MDEV-34417,
as well as (partially) the performance problem reported in MDEV-34352.
A user reported that MariaDB server got a signal 6 but still accepted new
connections and did not crash. I have not been able to find a way to
repeat this or find the cause of issue. However to make it easier to
notice that the server is unstable, I added code to disable new
connections when the handle_fatal_signal() handler has been called.
The problem is seen on CI, where TEMP pointed to directory outside of
the usual vardir, when testing mysql_install_db.exe
A likely cause for this error is that TEMP was periodically cleaned up
by some automation running on the host, perhaps by buildbot itself.
To fix, mysql_install_db.exe will now use datadir as --tmpdir
for the bootstrap run. This will minimize chances to run into any
environment problems.
This commit introduces a reset of password errors counter on any alter user
command for the altered user. This is done so as to not require a
complete privilege system reload.
- The deadlock counter was moved from
Deadlock::find_cycle into Deadlock::report, because
the find_cycle method is called multiple times during deadlock
detection flow, which means it shouldn't have such side effects.
But report() can, which called only once for
a victim transaction.
- Also the deadlock_detect.test and *.result test case
has been extended to handle the fix.
Problem was that there was two non-conflicting local idle
transactions in node_1 that both inserted a key to primary key.
Then two transactions from other nodes inserted also
a key to primary key so that insert from node_2 conflicted
one of the local transactions in node_1 so that there would
be duplicate key if both are committed. For this insert
from other node tries to acquire S-lock for this record
and because this insert is high priority brute force (BF)
transaction it will kill idle local transaction.
Concurrently, second insert from node_3 conflicts the second
idle insert transaction in node_1. Again, it tries to acquire
S-lock for this record and kills idle local transaction.
At this point we have two non-conflicting high priority
transactions holding S-lock on different records in node_1.
For example like this: rec s-lock-node2-rec s-lock-node3-rec rec.
Because these high priority BF-transactions do not wait
each other insert from node3 that has later seqno compared
to insert from node2 can continue. It will try to acquire
insert intention for record it tries to insert (to avoid
duplicate key to be inserted by local transaction). Hower,
it will note that there is conflicting S-lock in same gap
between records. This will lead deadlock error as we have
defined that BF-transactions may not wait for record lock
but we can't kill conflicting BF-transaction because
it has lower seqno and it should commit first.
BF-transactions are executed concurrently because their
values to primary key are different i.e. they do not
conflict.
Galera certification will make sure that inserts from
other nodes i.e these high priority BF-transactions
can't insert duplicate keys. Local transactions naturally
can but they will be killed when BF-transaction
acquires required record locks.
Therefore, we can allow situation where there is conflicting
S-lock and insert intention lock regardless of their seqno
order and let both continue with no wait. This will lead
to situation where we need to allow BF-transaction
to wait when lock_rec_has_to_wait_in_queue is called
because this function is also called from
lock_rec_queue_validate and because lock is waiting
there would be assertion in ut_a(lock->is_gap()
|| lock_rec_has_to_wait_in_queue(cell, lock));
lock_wait_wsrep_kill
Add debug sync points for BF-transactions killing
local transaction.
wsrep_assert_no_bf_bf_wait
Print also requested lock information
lock_rec_has_to_wait
Add function to handle wsrep transaction lock wait
cases.
lock_rec_has_to_wait_wsrep
New function to handle wsrep transaction lock wait
exceptions.
lock_rec_has_to_wait_in_queue
Remove wsrep exception, in this function all
conflicting locks need to wait in queue.
Conflicts between BF and local transactions
are handled in lock_wait.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Changed error code for Galera unkillable threads to
be ER_KILL_DENIED_HIGH_PRIORITY giving message
This is a high priority thread/query and cannot be killed
without the compromising consistency of the cluster
also a warning is produced
Thread %lld is [wsrep applier|high priority] and cannot be killed
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
In an I/O bound concurrent INSERT test conducted by Mark Callaghan,
spin loops on dict_index_t::lock turn out to be beneficial.
This is a mixed bag; enabling the spin loops will improve throughput
and latency on some workloads and degrade in others.
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
Performance tested by: Axel Schwenke
srw_mutex_impl<spinloop>::wait_and_lock(): Invoke srw_pause() and
reload the lock word on each loop. Thanks to Mark Callaghan for
suggesting this.
ssux_lock_impl<spinloop>::rd_wait(): Actually implement a spin loop
on the rw-lock component without blocking on the mutex component.
If there is a conflict with wr_lock(), wait for writer.lock to be
released without actually acquiring it.
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
When MariaDB is built with PERFORMANCE_SCHEMA support enabled
and with futex-based rw-locks (not srw_lock_), we were unnecessarily
releasing and reacquiring lock.writer in srw_lock_impl::psi_wr_lock()
and ssux_lock::psi_wr_lock().
If there is a conflict with rd_lock(), let us hold the lock.writer
and execute u_wr_upgrade() to wait for rd_unlock().
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
This problem was earlier fixed by this commit:
> commit 08c7ab404f
> Author: Aleksey Midenkov <midenok@gmail.com>
> Date: Mon Apr 18 12:44:27 2022 +0300
>
> MDEV-24176 Server crashes after insert in the table with virtual
> column generated using date_format() and if()
Adding an mtr test only.
The U lock mode of the sux_lock that was introduced in
commit 03ca6495df (MDEV-24142)
is unnecessarily complex.
Internally, sux_lock comprises two parts, each with their own wait queue
inside the operating system kernel: a mutex and a rw-lock.
We can map the operations as follows:
x_lock(): (X,X)
u_lock(): (X,_)
s_lock(): (_,S)
The Update lock mode, which is mutually exclusive with itself and with
X (exclusive) locks but not with shared (S) locks, was unnecessarily
acquiring a shared lock on the second component. The mutual exclusion
is guaranteed by the first component.
We might simplify the #ifdef SUX_LOCK_GENERIC case further by omitting
srw_mutex_impl::lock, because it is kind-of duplicating the mutex
that we will use for having a wait queue. However, the predicate
buf_page_t::can_relocate() would depend on the predicate
is_locked_or_waiting(), which is not available for pthread_mutex_t.
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
After MDEV-4013, the maximum length of replication passwords was extended to
96 ASCII characters. After a restart, however, slaves only read the first 41
characters of MASTER_PASSWORD from the master.info file. This lead to slaves
unable to reconnect to the master after a restart.
After a slave restart, if a master.info file is detected, use the full
allowable length of the password rather than 41 characters.
Reviewed By:
============
Sergei Golubchik <serg@mariadb.com>
In the error case of THD::register_slave(), there is undefined
behavior of Slave_info si because it is allocated via malloc()
(my_malloc), and cleaned up via delete().
This patch makes these consistent by switching si's cleanup
to use my_free.
As all MariaDB Server errors now have a dedicated web page, the
perror utility is extended to include a link to the KB page of
the corresponding error code.
All new code of the whole pull request, including one or several
files that are either new files or modified ones, are contributed
under the BSD-new license. I am contributing on behalf of my
employer Amazon Web Services, Inc.
Adding a new statement into scripts/sys_schema/before_setup.sql:
ALTER DATABASE sys CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci;
to fix db.opt in case:
- the database `sys` was altered to unexpected CHARACTER SET or COLLATE values
- or db.opt was erroneously removed
to make sure that sys objects are always recreated using utf8mb3_general_ci.
Review followup: RANGE_OPT_PARAM statement_should_be_aborted()
checks for thd->is_fatal_error and thd->is_error(). The first is
redundant when the second is present.
The optimizer deals with Rowid Filters this way:
1. First, range optimizer is invoked. It saves information
about all potential range accesses.
2. A query plan is chosen. Suppose, it uses a Rowid Filter on
index $IDX.
3. JOIN::make_range_rowid_filters() calls the range optimizer
again to create a quick select on index $IDX which will be used
to populate the rowid filter.
The problem: KILL command catches the query in step #3. Quick
Select is not created which causes a crash.
Fixed by checking if query was killed. Note: the problem also
affects 10.6, even if error handling for
SQL_SELECT::test_quick_select is different there.
(Variant for 10.6: return error code from SQL_SELECT::test_quick_select)
The optimizer deals with Rowid Filters this way:
1. First, range optimizer is invoked. It saves information
about all potential range accesses.
2. A query plan is chosen. Suppose, it uses a Rowid Filter on
index $IDX.
3. JOIN::make_range_rowid_filters() calls the range optimizer
again to create a quick select on index $IDX which will be used
to populate the rowid filter.
The problem: KILL command catches the query in step #3. Quick
Select is not created which causes a crash.
Fixed by checking if query was killed.
Rowid Filter cannot be used with reverse-ordered scans, for the
same reason as IndexConditionPushdown cannot be.
test_if_skip_sort_order() already has logic to disable ICP when
setting up a reverse-ordered scan. Added logic to also disable
Rowid Filter in this case, factored out the code into
prepare_for_reverse_ordered_access(), and added a comment describing
the cause of this limitation.
To make this possible, it was also necessary to enhance the mariadb
client with the option --print-query-on-error.
This option can also be very useful when running a batch of queries
through the mariadb client and one wants to find out where things goes
wrong.
TODO: It would be good to enhance mariadb_upgrade to not call the mariadb
client for executing queries but instead do this internally. This
would have made this patch much easier!
Reviewed by: Sergei Golubchik <serg@mariadb.com>
- During recovery, InnoDB may fail to shrink the undo tablespaces
when there are no pages to recover while applying the redo log.
This issue exists only when innodb_undo_truncate is enabled.
trx_lists_init_at_db_start() could've applied the redo logs
for undo tablespace page0.