In 10.0 there was an assert to ensure that there were semi
sync clients before removing one, but it was removed in 10.1.
This patch adds the assertion back.
The performance regression seen while loading BP is caused by the
deadlock fix given in MDEV-33543. The area of impact is wider but is
more visible when BP is being loaded initially via DMLs. Specifically
the response time could be impacted in DML doing pessimistic operation
on index(split/merge) and the leaf pages are not found in buffer pool.
It is more likely to occur with small BP size.
The origin of the issue dates back to MDEV-30400 that introduced
btr_cur_t::search_leaf() replacing btr_cur_search_to_nth_level() for
leaf page searches. In btr_latch_prev, we use RW_NO_LATCH to get the
previous page fixed in BP without latching. When the page is not in BP,
we try to acquire and wait for S latch violating the latching order.
This deadlock was analyzed in MDEV-33543 and fixed by using the already
present wait logic in buf_page_get_gen() instead of waiting for latch.
The wait logic is inferior to usual S latch wait and is simply a
repeated sleep 100 of micro-sec (The actual sleep time could be more
depending on platforms). The bug was seen with "change-buffering" code
path and the idea was that this path should be less exercised. The
judgement was not correct and the path is actually quite frequent and
does impact performance when pages are not in BP and being loaded by
DML expanding/shrinking large data.
Fix: While trying to get a page with RW_NO_LATCH and we are attempting
"out of order" latch, return from buf_page_get_gen immediately instead
of waiting and follow the ordered latching path.
Valgrind looks as the assertions as examining uninitalized values.
As the assertions are tested in other Debug builds we know
it isn't all invalid.
Account for Valgrind by removing the assertion under
the WITH_VALGRIND=1 compulation.
The feedback plugin server_uid variable and the calculate_server_uid()
function is moved from feedback/utils.cc to sql/mysqld.cc
server_uid is added as a global variable (shown in 'show variables') and
is written to the error log on server startup together with server version
and server commit id.
We have an issue if a user have the following in a configuration file:
log_slow_filter="" # Log everything to slow query log
log_queries_not_using_indexes=ON
This set log_slow_filter to 'not_using_index' which disables
slow_query_logging of most queries.
In effect, on should never use log_slow_filter="" in config files but
instead use log_slow_filter=ALL.
Fixed by changing log_slow_filter="" that comes either from a
configuration file or from the command line, when starting to the server,
to log_slow_filter=ALL.
A warning will be printed when this happens.
Other things:
- One can now use =ALL for any 'set' variable to set all options at once.
(backported from 10.6)
Item_func_hex::fix_length_and_dec() evaluated a too short data type
for signed numeric arguments, which resulted in a 'Data too long for column'
error on CREATE..SELECT.
Fixing the code to take into account that a short negative
numer can produce a long HEX value: -1 -> 'FFFFFFFFFFFFFFFF'
Also fixing Item_func_hex::val_str_ascii_from_val_real().
Without this change, MTR test with HEX with negative float point arguments
failed on some platforms (aarch64, ppc64le, s390-x).
InnoDB transactions may be reused after committed:
- when taken from the transaction pool
- during a DDL operation execution
In this case wsrep flag on trx object is cleared, which may cause wrong
execution logic afterwards (wsrep-related hooks are not run).
Make trx->wsrep flag initialize from THD object only once on InnoDB transaction
start and don't change it throughout the transaction's lifetime.
The flag is reset at commit time as before.
Unconditionally set wsrep=OFF for THD objects that represent InnoDB background
threads.
Make Wsrep_schema::store_view() operate in its own transaction.
Fix streaming replication transactions' fragments rollback to not switch
THD->wsrep value during transaction's execution
(use THD->wsrep_ignore_table as a workaround).
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
During a Sysbench oltp_point_select workload with 1 table and 400
concurrent connections, a bottleneck on dict_table_t::lock_mutex was
observed in ha_innobase::info_low().
dict_table_t::lock_latch: Replaces lock_mutex.
In ha_innobase::info_low() and several other places, we will acquire
a shared dict_table_t::lock_latch or we may elide the latch if
hardware memory transactions are available.
innobase_build_v_templ(): Remove the parameter "bool locked", and
require the caller to hold exclusive dict_table_t::lock_latch
(instead of holding an exclusive dict_sys.latch).
Tested by: Vladislav Vaintroub
Reviewed by: Vladislav Vaintroub
PFS_atomic class contains wrappers around my_atomic_* operations, which
are macros to GNU atomic operations (__atomic_*). Due to different
implementations of compilers, clang may encounter errors when compiling
on x86_32 architecture.
The following functions are replaced with C++ std::atomic type in
performance schema code base:
- PFS_atomic::store_*()
-> my_atomic_store*
-> __atomic_store_n()
=> std::atomic<T>::store()
- PFS_atomic::load_*()
-> my_atomic_load*
-> __atomic_load_n()
=> std::atomic<T>::load()
- PFS_atomic::add_*()
-> my_atomic_add*
-> __atomic_fetch_add()
=> std::atomic<T>::fetch_add()
- PFS_atomic::cas_*()
-> my_atomic_cas*
-> __atomic_compare_exchange_n()
=> std::atomic<T>::compare_exchange_strong()
and PFS_atomic class could be dropped completely.
Note that in the wrapper memory order passed to original GNU atomic
extensions are hard-coded as `__ATOMIC_SEQ_CST`, which is equivalent to
`std::memory_order_seq_cst` in C++, and is the default parameter for
std::atomic_* functions.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services.
CMake WSREP=ON has some implications for client
executables so still present this as an option
when compiling WITHOUT_SERVER. In this case
default to ON for maximium compatibility of
the build client executables and libraries.
The log_event_old.cc is included by mysqlbinlog.cc.
With -DWITHOUT_SERVER the include path for the wsrep
include headers isn't there.
As these aren't needed by the mariadb-binlog, move
these to under a ifndef MYSQL_CLIENT preprocessor.
Caused by MDEV-18590
Just like the spider/bugfix suite.
One caveat is that my_2_3.cnf needs something under mysqld.2.3 group,
otherwise mtr will fail with something like:
There is no group named 'mysqld.2.3' that can be used to resolve
'port' for ...
This will allow new tests under the spider suite to use what is
needed. It also somehow fixes issues of running a test followed by
spider.slave_trx_isolation.
Executing an INSERT statement in PS mode having positional parameter
bound with an array could result in incorrect number of inserted rows
in case there is a BEFORE INSERT trigger that executes yet another
INSERT statement to put a copy of row being inserted into some table.
The reason for incorrect number of inserted rows is that a data structure
used for binding positional argument with its actual values is stored
in THD (this is thd->bulk_param) and reused on processing every INSERT
statement. It leads to consuming actual values bound with top-level
INSERT statement by other INSERT statements used by triggers' body.
To fix the issue, reset the thd->bulk_param temporary to the value nullptr
before invoking triggers and restore its value on finishing its execution.
Having Item_func_not items in item trees breaks assumptions during the
optimization phase about transformation possibilities in fix_fields().
Remove Item_func_not by extending normalization during parsing.
Reviewed by Oleksandr Byelkin (sanja@mariadb.com)
New error codes can only be added in the latest major version.
Adding ER_KILL_DENIED_HIGH_PRIORITY would shift by one all
error codes that were added in MariaDB Server 10.6 or later.
This amends commit 1001dae186
Suggested by: Sergei Golubchik
ha_innobase::info_low(): For HA_STATUS_VARIABLE without
HA_STATUS_VARIABLE_EXTRA, let us avoid unnecessary and costly updates
of the data_free statistics, which are only needed for SHOW TABLE STATUS.
This optimization had been enabled in
commit 247ecb7597 but not utilized until now.
The variable `sbindir` is never set for cmake. This adds borked paths to
`galera_recovery`, though it dit not break as the systemd unit changes
the dir to make the relative path work anyway.
Let's fix this nevertheless...
RAND() and UUID() are treated differently with respect to subquery
materialization both should be marked as uncacheable, forcing materialization.
Altered Create_func_uuid(_short)::create_builder().
Added comment in header about UNCACHEABLE_RAND meaning also unmergeable.
- Few of test case should make sure that InnoDB does hit
the debug sync point during startup of the server.
InnoDB can remove the double quotes of debug point
in restart parameters.
log_slow_innodb.test sets "log_slow_verbosity_innodb_expected_matches"
in include/log_slow_grep.inc:
- Fix a typo: take value from log_slow_*VERBOSITY*_innodb_expected_matches
- Add a missing "--source include/log_grep.inc"
- two-line search patterns do not work, search for each line individually.
The fixes in b8a6719889 have not disabled
semi-consistent read for innodb_snapshot_isolation=ON mode, they just allowed
read uncommitted version of a record, that's why the test for MDEV-26643 worked
well.
The semi-consistent read should be disabled on upper level in
row_search_mvcc() for READ COMMITTED isolation level.
Reviewed by Marko Mäkelä.
- InnoDB tries to write FILE_CHECKPOINT marker during
early recovery when log file size is insufficient.
While updating the log checkpoint at the end of the recovery,
InnoDB must already have written out all pending changes
to the persistent files. To complete the checkpoint, InnoDB
has to write some log records for the checkpoint and to
update the checkpoint header. If the server gets killed
before updating the checkpoint header then it would lead
the logfile to be unrecoverable.
- This patch avoids FILE_CHECKPOINT marker during early
recovery and narrows down the window of opportunity to
make the log file unrecoverable.
There were erroneous calls for charpos() in key_hashnr() and key_buf_cmp().
These functions are never called with prefix segments.
The charpos() calls were wrong. Before the change BNHL joins
- could return wrong result sets, as reported in MDEV-34417
- were extremely slow for multi-byte character sets, because
the hash was calculated on string prefixes, which increased
the amount of collisions drastically.
This patch fixes the wrong result set as reported in MDEV-34417,
as well as (partially) the performance problem reported in MDEV-34352.
A user reported that MariaDB server got a signal 6 but still accepted new
connections and did not crash. I have not been able to find a way to
repeat this or find the cause of issue. However to make it easier to
notice that the server is unstable, I added code to disable new
connections when the handle_fatal_signal() handler has been called.
The problem is seen on CI, where TEMP pointed to directory outside of
the usual vardir, when testing mysql_install_db.exe
A likely cause for this error is that TEMP was periodically cleaned up
by some automation running on the host, perhaps by buildbot itself.
To fix, mysql_install_db.exe will now use datadir as --tmpdir
for the bootstrap run. This will minimize chances to run into any
environment problems.
This commit introduces a reset of password errors counter on any alter user
command for the altered user. This is done so as to not require a
complete privilege system reload.
- The deadlock counter was moved from
Deadlock::find_cycle into Deadlock::report, because
the find_cycle method is called multiple times during deadlock
detection flow, which means it shouldn't have such side effects.
But report() can, which called only once for
a victim transaction.
- Also the deadlock_detect.test and *.result test case
has been extended to handle the fix.
Problem was that there was two non-conflicting local idle
transactions in node_1 that both inserted a key to primary key.
Then two transactions from other nodes inserted also
a key to primary key so that insert from node_2 conflicted
one of the local transactions in node_1 so that there would
be duplicate key if both are committed. For this insert
from other node tries to acquire S-lock for this record
and because this insert is high priority brute force (BF)
transaction it will kill idle local transaction.
Concurrently, second insert from node_3 conflicts the second
idle insert transaction in node_1. Again, it tries to acquire
S-lock for this record and kills idle local transaction.
At this point we have two non-conflicting high priority
transactions holding S-lock on different records in node_1.
For example like this: rec s-lock-node2-rec s-lock-node3-rec rec.
Because these high priority BF-transactions do not wait
each other insert from node3 that has later seqno compared
to insert from node2 can continue. It will try to acquire
insert intention for record it tries to insert (to avoid
duplicate key to be inserted by local transaction). Hower,
it will note that there is conflicting S-lock in same gap
between records. This will lead deadlock error as we have
defined that BF-transactions may not wait for record lock
but we can't kill conflicting BF-transaction because
it has lower seqno and it should commit first.
BF-transactions are executed concurrently because their
values to primary key are different i.e. they do not
conflict.
Galera certification will make sure that inserts from
other nodes i.e these high priority BF-transactions
can't insert duplicate keys. Local transactions naturally
can but they will be killed when BF-transaction
acquires required record locks.
Therefore, we can allow situation where there is conflicting
S-lock and insert intention lock regardless of their seqno
order and let both continue with no wait. This will lead
to situation where we need to allow BF-transaction
to wait when lock_rec_has_to_wait_in_queue is called
because this function is also called from
lock_rec_queue_validate and because lock is waiting
there would be assertion in ut_a(lock->is_gap()
|| lock_rec_has_to_wait_in_queue(cell, lock));
lock_wait_wsrep_kill
Add debug sync points for BF-transactions killing
local transaction.
wsrep_assert_no_bf_bf_wait
Print also requested lock information
lock_rec_has_to_wait
Add function to handle wsrep transaction lock wait
cases.
lock_rec_has_to_wait_wsrep
New function to handle wsrep transaction lock wait
exceptions.
lock_rec_has_to_wait_in_queue
Remove wsrep exception, in this function all
conflicting locks need to wait in queue.
Conflicts between BF and local transactions
are handled in lock_wait.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Changed error code for Galera unkillable threads to
be ER_KILL_DENIED_HIGH_PRIORITY giving message
This is a high priority thread/query and cannot be killed
without the compromising consistency of the cluster
also a warning is produced
Thread %lld is [wsrep applier|high priority] and cannot be killed
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
In an I/O bound concurrent INSERT test conducted by Mark Callaghan,
spin loops on dict_index_t::lock turn out to be beneficial.
This is a mixed bag; enabling the spin loops will improve throughput
and latency on some workloads and degrade in others.
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
Performance tested by: Axel Schwenke
srw_mutex_impl<spinloop>::wait_and_lock(): Invoke srw_pause() and
reload the lock word on each loop. Thanks to Mark Callaghan for
suggesting this.
ssux_lock_impl<spinloop>::rd_wait(): Actually implement a spin loop
on the rw-lock component without blocking on the mutex component.
If there is a conflict with wr_lock(), wait for writer.lock to be
released without actually acquiring it.
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
When MariaDB is built with PERFORMANCE_SCHEMA support enabled
and with futex-based rw-locks (not srw_lock_), we were unnecessarily
releasing and reacquiring lock.writer in srw_lock_impl::psi_wr_lock()
and ssux_lock::psi_wr_lock().
If there is a conflict with rd_lock(), let us hold the lock.writer
and execute u_wr_upgrade() to wait for rd_unlock().
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich