Problem was assertion assuming we always hold
THD::LOCK_thd_data mutex that is not true.
In most cases this is true but function is
also used from InnoDB lock manager and
there we can't take THD::LOCK_thd_data to
obey mutex ordering. Removed assertion as
wsrep transaction state can't change even
that case.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
in the $case=2 - it's wrong to kill after the first binlog EOF,
because that might happen between INSERT(4) and INSERT(5).
So, wait for the slave to acknowledge INSERT(5) before killing
the master, that is, both connection threads must pass
repl_semisync_master.wait_after_sync()
The slave IO thread sets MYSQL_SET_CHARSET_DIR. The code for this option
however is not thread-safe in sql-common/client.c. The value set is
temporarily written to mysys global variable `charsets-dir` and can be seen
by other threads running in parallel, which can result in use-after-free
error.
Problem was visible as random failures of test cases in suite multi_source
with Valgrind or MSAN.
Work-around by not setting this option for slave connect, it is redundant
anyway as it is just setting the default value.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The root cause of the failure is a bug in the Linux network stack:
https://lore.kernel.org/netdev/87sf0ldk41.fsf@urd.knielsen-hq.org/T/#u
If the slave does a connect(2) at the exact same time that kill -9 of the
master process closes the listening socket, the FIN or RST packet is lost in
the kernel, and the slave ends up timing out waiting for the initial
communication from the server. This timeout defaults to
--slave-net-timeout=120, which causes include/master_gtid_wait.inc to time
out first and fail the test.
Work-around this problem by reducing the --slave-net-timeout for this test
case. If this problem turns up in other tests, we can consider reducing the
default value for all tests.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
As of version 3.2.0, OpenSSL updated the error message in new versions
("https://github.com/openssl/openssl/commit/81b741f68984"). Update the
tests and result files such that they are compatible with both original
and new error messages.
All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new
license. I am contributing on behalf of my employer Amazon Web Services,
Inc.
mtr_t::memmove(): Revert to the parent of
commit a032f14b34
where there was supposed to be an equivalent change
that would avoid hitting a warning in some old version of GCC
when this change was part of another 10.6 based developmet branch.
For some reason, this change is not equivalent but will cause
massive amounts of backup failures in the stress tests
run by Matthias Leich, caught by
commit 4179f93d28 in 10.6.
ibuf_remove_free_page(): Correct the calculation of root_savepoint().
The first entry acquired by ibuf_tree_root_get() will be ibuf.index.lock
and not the change buffer root page.
Thanks to Matthias Leich for finding this bug in RQG.
Unfortunately, this code is very difficult to cover
in our regression test suite.
The libpmem dependency that had been added in
commit 3daef523af (MDEV-17084)
did not achieve any measurable performance improvement when
comparing the same PMEM device with and without "mount -o dax"
using the Linux ext4 file system.
Because Red Hat has deprecated libpmem, let us remove the code
altogether.
Note: This is a 10.6 version of
commit 3f9f5ca48e
which will retain PMEM support in MariaDB Server 10.11.
When the change buffer records for a page span across multiple change
buffer leaf pages or the starting record is at the beginning of a page
with a left sibling, ibuf_delete_recs deletes only the records in first
page and fails to move to subsequent pages.
Subsequently a slow shutdown hangs trying to delete those left over
records.
Fix-A: Position the cursor to an user record in B-tree and exit only
when all records are exhausted.
Fix-B: Make sure we call ibuf_delete_recs during slow shutdown for
pages with IBUF entries to cleanup any previously left over records.
If replicating an event in ROW format, and InnoDB detects a deadlock
while searching for a row, the row event will error and rollback in
InnoDB and indicate that the binlog cache also needs to be cleared,
i.e. by marking thd->transaction_rollback_request. In the normal
case, this will trigger an error in Rows_log_event::do_apply_event()
and cause a rollback. During the Rows_log_event::do_apply_event()
cleanup of a successful event application, there is a DBUG_ASSERT in
log_event_server.cc::rows_event_stmt_cleanup(), which sets the
expectation that thd->transaction_rollback_request cannot be set
because the general rollback (i.e. not the InnoDB rollback) should
have happened already. However, if the replica is configured to skip
deadlock errors, the rows event logic will clear the error and
continue on, as if no error happened. This results in
thd->transaction_rollback_request being set while in
rows_event_stmt_cleanup(), thereby triggering the assertion.
This patch fixes this in the following ways:
1) The assertion is invalid, and thereby removed.
2) The rollback case is forced in rows_event_stmt_cleanup() if
transaction_rollback_request is set.
Note the differing behavior between transactions which are skipped
due to deadlock errors and other errors. When a transaction is
skipped due to an ignored deadlock error, the entire transaction is
rolled back and skipped (though note MDEV-33930 which allows
statements in the same transaction after the deadlock-inducing one
to commit). When a transaction is skipped due to ignoring a
different error, only the erroring statements are rolled-back and
skipped - the rest of the transaction will execute as normal. The
effect of this can be seen in the test results. The added test case
to rpl_skip_error.test shows that only statements which are ignored
due to non-deadlock errors are ignored in larger transactions. A
diff between rpl_temporary_error2_skip_all.result and
rpl_temporary_error2.result shows that all statements in the errored
transaction are rolled back (diff pasted below):
: diff rpl_temporary_error2.result rpl_temporary_error2_skip_all.result
49c49
< 2 1
---
> 2 NULL
51c51
< 4 1
---
> 4 NULL
53c53
< * There will be two rows in t2 due to the retry.
---
> * There will be one row in t2 because the ignored deadlock does not retry.
57d56
< 1
59c58
< 1
---
> 0
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
On Windows systems, occurrences of ERROR_SHARING_VIOLATION due to
conflicting share modes between processes accessing the same file can
result in CreateFile failures.
mysys' my_open() already incorporates a workaround by implementing
wait/retry logic on Windows.
But this does not help if files are opened using shell redirection like
mysqltest traditionally did it, i.e via
--echo exec "some text" > output_file
In such cases, it is cmd.exe, that opens the output_file, and it
won't do any sharing-violation retries.
This commit addresses the issue by introducing a new built-in command,
'write_line', in mysqltest. This new command serves as a brief alternative
to 'write_file', with a single line output, that also resolves variables
like "exec" would.
Internally, this command will use my_open(), and therefore retry-on-error
logic.
Hopefully this will eliminate the very sporadic "can't open file because
it is used by another process" error on CI.
We have quite a few assertions
ut_a(m_prebuilt->trx == thd_to_trx(ha_thd()));
in low-level functions.
These had better be debug assertions for performance reasons.
It should suffice to check that condition in the less frequently invoked
ha_innobase::change_active_index().
convert_search_mode_to_innobase(): Return whether the mode is
unsupported, and optionally update ha_innobase::m_last_match_mode.
ha_innobase::index_read(): Only branch on find_flag once, and
simplify the error handling after invoking row_search_mvcc().
ha_innobase::rnd_pos(): Remove an assertion that is duplicating one
in ha_innobase::index_read(), which we are calling unconditionally.
ha_innobase::records_in_range(): Check only once whether
min_key, max_key are null pointers.
row_sel_convert_mysql_key_to_innobase(): Declare all parameters
except the conversion buffer pointer (buf) to be nonnull.
Reviewed by: Debarun Banerjee
rtr_pcur_getnext_from_path(): Remove a bogus assertion
that may cause a data races with buf_LRU_block_free_non_file_page().
If my_latch_mode == BTR_MODIFY_LEAF, we would have released all page
latches and buffer-fixes by invoking mtr->rollback_to_savepoint(1).
After this point, the btr_cur->page_cur.block is no longer valid and
must not be accessed.
Before 03ca6495df this assertion had
been disabled, because the preprocessor symbol UNIV_RTR_DEBUG
had never been enabled (except when explicitly specified in
CMAKE_CXX_FLAGS).
Reviewed by: Debarun Banerjee
Issue:
------
The actual order of acquisition of the IBUF pessimistic insert mutex
(SYNC_IBUF_PESS_INSERT_MUTEX) and IBUF header page latch
(SYNC_IBUF_HEADER) w.r.t space latch (SYNC_FSP) differs from the order
defined in sync0types.h. It was not discovered earlier as the path to
ibuf_remove_free_page was not covered by the mtr test. Ideal order and
one defined in sync0types.h is as follows.
SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX -> SYNC_FSP
In ibuf_remove_free_page, we acquire space latch earlier and we have
the order as follows resulting in the assert with innodb_sync_debug=on.
SYNC_FSP -> SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX
Fix:
---
We do maintain this order in other places and there doesn't seem to be
any real issue here. To reduce impact in GA versions, we avoid doing
extensive changes in mutex ordering to match the current
SYNC_IBUF_PESS_INSERT_MUTEX order. Instead we relax the ordering check
for IBUF pessimistic insert mutex using SYNC_NO_ORDER_CHECK.
Previous solution, that would entirely switch timer off, turned out
to be deadlock prone.
This patch fixed previous attempt to switch between long/short interval
periods in MDEV-24295. Now, initial state of the timer is fixed (it is ON).
Also, avoid switching timer to longer periods if there is any activity in
the pool.
Test was waiting INSERT-clause to make rollback but
wait_condition was too tight. State could be
Freeing items or Rollback. Fixed wait_condition
to expect one of them.
create_partitioning_metadata() should only mark transaction r/w
if it actually did anything (that is, the table is partitioned).
otherwise it's a no-op, called even for temporary tables and
it shouldn't do anything at all
Ideally our methods and functions should do one thing, do that well,
and do only that. add_table_to_list does far more than adding a
table to a list, so this commit factors the TABLE_LIST creation out
to a new TABLE_LIST constructor. It then uses placement new()
to create it in the correct memory area (result of thd->calloc).
Benefits of this approach:
1. add_table_to_list now returns as early as possible on an error
2. fewer side-effects incurred on creating the TABLE_LIST object
3. TABLE_LIST won't be calloc'd if copy_to_db fails
4. local declarations moved closer to their respective first uses
5. improved code readability and logical flow
Also factored a couple of other functions to keep the happy path
more to the left, which makes them easier to follow at a glance.
Synopsis: If SELECT returned answer from Query Cache it is not really executed.
The reason for firing of assertion
DBUG_ASSERT((mem_root->flags & ROOT_FLAG_READ_ONLY) == 0);
is that in case the query_cache is on and the same query run by different
stored routines the following use case can take place:
First, lets say that bodies of routines used by the test case are the same
and contains the only query 'SELECT * FROM t1';
call p1() -- a result set is stored in query cache for further use.
call p2() -- the same query is run against the table t1, that result in
not running the actual query but using its cached result.
On finishing execution of this routine, its memory root is
marked for read only since every SP instruction that this
routine contains has been executed.
INSERT INT t1 VALUE (1); -- force following invalidation of query cache
call p2() -- query the table t1 will result in assertion failure since its
execution would require allocation on the memory root that
has been already marked as read only memory root
The root cause of firing the assertion is that memory root of the stored
routine 'p2' was marked as read only although actual execution of the query
contained inside hadn't been performed.
To fix the issue, mark a SP instruction as not yet run in case its execution
doesn't result in real query processing and a result set got from query cache
instead.
Note that, this issue relates server built in debug mode AND with the protect
statement memory root feature turned on. It doesn't affect server built
in release mode.
This way, if manager thread somehow starts and stops again quickly before
main thread wakes up to check if it started correctly, we will not hang.
Patch suggested by Monty as follow-up to
7f498fbab8
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
matched_rec::rec_buf[], matched_rec::bufp: Remove.
matched_rec::block: Make this a pointer to something that
is allocated by buf_block_alloc(). In this way, the only
case where buf_block_t is constructed outside buf_pool
is ALTER TABLE...IMPORT TABLESPACE.
rtr_info::heap: Remove. This was only used for allocating matched_rec,
which now is smaller.
mtr_t::memmove(): Simplify some code to avoid GCC 9.4.0 -Wconversion
in the 10.6 branch as a result of these changes.
Reviewed by: Debarun Banerjee
It was updated for 10.6+ in MDEV-7317. Because a lower version spider
node may connect to a higher version data node, we need to change this
for 10.4 and 10.5 as well.
network timeouts might be rather large and feedback plugin
waits forever for the sender thread to exit.
an alternative could've been to use GNU-specific pthread_timedjoin_np(),
where _np mean "not portable".
followup for 81f75ca83a
improve over take 2. It's technically possible, though unlikely,
to see THD after it already reset the info to NULL, but has not
changed the command to COM_SLEEP yet (see THD::mark_connection_idle()).
Let's wait for "Sleep", not for NULL.
- Add additional MTRs for more coverage on invalid options
- Updating a few error messages to be more informative
- Use the exit code from handle_options() when there is an error processing
user options
All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.