Applied SR transaction on the child table was not BF aborted by TOI running
on the parent table for several reasons:
Although SR correctly collected FK-referenced keys to parent, TOI in Galera
disregards common certification index and simply sets itself to depend on
the latest certified write set seqno.
Since this write set was the fragment of SR transaction, TOI was allowed to
run in parallel with SR presuming it would BF abort the latter.
At the same time, DML transactions in the server don't grab MDL locks on
FK-referenced tables, thus parent table wasn't protected by an MDL lock from
SR and it couldn't provoke MDL lock conflict for TOI to BF abort SR transaction.
In InnoDB, DDL transactions grab shared MDL locks on child tables, which is not
enough to trigger MDL conflict in Galera.
InnoDB-level Wsrep patch didn't contain correct conflict resolution logic due to
the fact that it was believed MDL locking should always produce conflicts correctly.
The fix brings conflict resolution rules similar to MDL-level checks to InnoDB,
thus accounting for the problematic case.
Apart from that, wsrep_thd_is_SR() is patched to return true only for executing
SR transactions. It should be safe as any other SR state is either the same as
for any single write set (thus making the two logically equivalent), or it reflects
an SR transaction as being aborting or prepared, which is handled separately in
BF-aborting logic, and for regular execution path it should not matter at all.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
my_b_encr_write(): Initialize also block_length, and at the same time
last_block_length, so that all 128 bits can be initialized with fewer
writes. This fixes an error that was caught in the test
encryption.tempfiles_encrypted.
test_my_safe_print_str(): Skip a test that would attempt to
display uninitialized data in the test unit.stacktrace.
Previously, our CI did not build unit tests with MemorySanitizer.
handle_delayed_insert(): Remove a redundant call to pthread_exit(0),
which would for some reason cause MemorySanitizer in clang-19 to
report a stack overflow in a RelWithDebInfo build. This fixes a
failure of several tests.
Reviewed by: Vladislav Vaintroub
The 10.5->10.6 merge commit 3bc98a4ec4 casts the arg to an int16
pointer in set_extraction_flag_processor(). This matched the previous
commit c76eabfb5e where set_extraction_flag was changed to have int16 arg
instead of int.
The commit a5e4c34991 for MDEV-29363 added a call to
set_extraction_flag_processor on IMMUTABLE_FL (MARKER_IMMUTABLE in 10.6).
The subsequent 10.5->10.6 merge f071b7620b did not cast the flag
to int16 when merging this change.
The result is big-endian processors cleared the immutable
flag rather than set the flag, resulting in MDEV-29363
being unfixed on big-endian processors.
(Variant 4, with @@optimizer_adjust_secondary_key_costs, reuse in two
places, and conditions are replaced with equivalent simpler forms in two more)
In best_access_path(), ReuseRangeEstimateForRef-3, the check
for whether
"all used key_part_i used key_part_i=const"
was incorrect: it may produced a "NO" answer for cases when we
had:
key_part1= const // some key parts are usable
key_part2= value_not_in_join_prefix //present but unusable
key_part3= non_const_value // unusable due to gap in key parts.
This caused the optimizer to fail to apply ReuseRangeEstimateForRef
heuristics. The consequence is poor query plan choice when the index
in question has very skewed data distribution.
The fix is enabled if its @@optimizer_adjust_secondary_key_costs flag
is set.
The memory leak happened on second execution of a prepared statement
that runs UPDATE statement with correlated subquery in right hand side of
the SET clause. In this case, invocation of the method
table->stat_records()
could return the zero value that results in going into the 'if' branch
that handles impossible where condition. The issue is that this condition
branch missed saving of leaf tables that has to be performed as first
condition optimization activity. Later the PS statement memory root
is marked as read only on finishing first time execution of the prepared
statement. Next time the same statement is executed it hits the assertion
on attempt to allocate a memory on the PS memory root marked as read only.
This memory allocation takes place by the sequence of the following
invocations:
Prepared_statement::execute
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_update::execute_inner
Sql_cmd_update::update_single_table
st_select_lex::save_leaf_tables
List<TABLE_LIST>::push_back
To fix the issue, add the flag SELECT_LEX::leaf_tables_saved to control
whether the method SELECT_LEX::save_leaf_tables() has to be called or
it has been already invoked and no more invocation required.
Similar issue could take place on running the DELETE statement with
the LIMIT clause in PS/SP mode. The reason of memory leak is the same as for
UPDATE case and be fixed in the same way.
MySQL-Connector-Net casts SEQ_IN_INDEX to uint and will
raise an exception if the type is a System.Int64.
As we don't support a huge number of multi-columns in
an index reducing to a uint is sufficient to represent
all values and maintain compatibility with MySQL-Connector-Net.
This matches the type (uint) returned by MySQL-8.3 and 8.0.
Reviewer: Alexander Barkov <bar@mariadb.com>
It's possible that MDL conflict handling code is called more
than once for a transaction when:
- it holds more than one conflicting MDL lock
- reschedule_waiters() is executed,
which results in repeated attempts to BF-abort already aborted
transaction.
In such situations, it might be that BF-aborting logic sees
a partially rolled back transaction and erroneously decides
on future actions for such a transaction.
The specific situation tested and fixed is when a SR transaction
applied in the node gets BF-aborted by a started TOI operation.
It's then caught with the server transaction already rolled back,
but with no MDL locks yet released. This caused wrong state
detection for such a transaction during repeated MDL conflict
handling code execution.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
(Variant 2b: call greedy_search() twice, correct handling for limited
search_depth)
Modify the join optimizer to specifically try to produce join orders that
can short-cut their execution for ORDER BY..LIMIT clause.
The optimization is controlled by @@optimizer_join_limit_pref_ratio.
Default value 0 means don't construct short-cutting join orders.
Other value means construct short-cutting join order, and prefer it only
if it promises speedup of more than #value times.
In Optimizer Trace, look for these names:
* join_limit_shortcut_is_applicable
* join_limit_shortcut_plan_search
* join_limit_shortcut_choice
Replication of MyISAM and Aria DML is experimental and best
effort only. Earlier change make INSERT SELECT on both
MyISAM and Aria to replicate using TOI and STATEMENT
replication. Replication should happen only if user
has set needed wsrep_mode setting.
Note: This commit contains additional changes compared
to those already made for the 10.5 branch.
+ small refactoring after main fix.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
It's possible that MDL conflict handling code is called more
than once for a transaction when:
- it holds more than one conflicting MDL lock
- reschedule_waiters() is executed,
which results in repeated attempts to BF-abort already aborted
transaction.
In such situations, it might be that BF-aborting logic sees
a partially rolled back transaction and erroneously decides
on future actions for such a transaction.
The specific situation tested and fixed is when a SR transaction
applied in the node gets BF-aborted by a started TOI operation.
It's then caught with the server transaction already rolled back,
but with no MDL locks yet released. This caused wrong state
detection for such a transaction during repeated MDL conflict
handling code execution.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
For TOI events specifically we have a situation where in case of the
same error different nodes may generate different messages. This may
be for two reasons:
- different locale setting between the current client session and
server default (we can reasonably require server locales to be
identical on all nodes, but user can change message locale for the
session)
- non-deterministic course of STATEMENT execution e.g. for ALTER TABLE
On the other hand we may reasonably expect TOI event failures since
they are executed after replication, so we must ensure that voting is
consistent. For that purpose error codes should be sufficiently unique
and deterministic for TOI event failures as DDLs normally deal with
a single object, so we can merely use MySQL error codes to vote on.
Notice that this problem does not happen with regular transactional
writesets, since the originator node will always vote success and
replica nodes are assumed to have the same global locale setting.
As such different error messages indicate different errors even if
the error code is the same (e.g. ER_DUP_KEY can happen on different
rows tables).
Use only MySQL error code (without the error message) for error voting
in case of TOI event failure.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
When handling fatal signal, shut down Galera networking
before printing out stack trace and writing core file.
This is to achieve fail-silent semantics on crashes which may
keep the process running for a long time, but not fully responding
e.g. due to core dumping or symbol resolving.
Also suppress all Galera/wsrep logging to avoid logging from
background threads to garble crash information from signal handler.
Notice that for fully fail-silent crash, Galera 26.4.19 is needed.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that wsrep_schema tables were not marked as
category information. Fix allows access to wsrep_schema
tables even when node is detached.
This is 10.4-10.9 version of fix.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that we did not found that table was partitioned
and then we should find what is actual underlaying storage
engine.
We should not use RSU for !InnoDB tables.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
In commit fa8a46eb68 (MDEV-33613)
the parameter innodb_lru_flush_size ceased to have any effect.
Let us declare the parameter as deprecated and additionally as
MARIADB_REMOVED_OPTION, so that there will be a warning written
to the error log in case the option is specified in the command line.
Let us also do the same for the parameter
innodb_purge_rseg_truncate_frequency
that was deprecated&ignored earlier in MDEV-32050.
Reviewed by: Debarun Banerjee
Before doing mark_start_commit(), check that there is no pending deadlock
kill. If there is a pending kill, we won't commit (we will abort, roll back,
and retry). Then we should not mark the commit as started, since that could
potentially make the following GCO start too early, before we completed the
commit after the retry.
This condition could trigger in some corner cases, where InnoDB would take
temporarily table/row locks that are released again immediately, not held
until the transaction commits. This happens with dict_stats updates and
possibly auto-increment locks.
Such locks can be passed to thd_rpl_deadlock_check() and cause a deadlock
kill to be scheduled in the background. But since the blocking locks are
held only temporarily, they can be released before the background kill
happens. This way, the kill can be delayed until after mark_start_commit()
has been called. Thus we need to check the synchronous indication
rgi->killed_for_retry, not just the asynchroneous thd->killed.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Discovered this while working on MDEV-34720: test_if_cheaper_ordering()
uses rec_per_key, while the original estimate for the access method
is produced in best_access_path() by using actual_rec_per_key().
Make test_if_cheaper_ordering() also use actual_rec_per_key().
Also make several getter function "const" to make this compile.
Also adjusted the testcase to handle this (the change backported from
11.0)
MDEV-33582 (3541bd63f0) changed the "Could not write packet"
error message in net_serv.cc to use the function
sql_print_warning(), instead of my_printf_error(). The flags
argument was not removed in this change though, so the old
flags were printed in place of the file descriptor, and all
other args are presenting for the wrong field (and length is
never showed).
This patch removes flags as a parameter to sql_print_warning().
Running an UPDATE statement in PS mode and having positional
parameter(s) bound with an array of actual values (that is
prepared to be run in bulk mode) results in incorrect behaviour
in presence of on update trigger that also executes an UPDATE
statement. The same is true for handling a DELETE statement in
presence of on delete trigger. Typically, the visible effect of
such incorrect behaviour is expressed in a wrong number of
updated/deleted rows of a target table. Additionally, in case UPDATE
statement, a number of modified rows and a state message returned
by a statement contains wrong information about a number of modified rows.
The reason for incorrect number of updated/deleted rows is that
a data structure used for binding positional argument with its
actual values is stored in THD (this is thd->bulk_param) and reused
on processing every INSERT/UPDATE/DELETE statement. It leads to
consuming actual values bound with top-level UPDATE/DELETE statement
by other DML statements used by triggers' body.
To fix the issue, reset the thd->bulk_param temporary to the value
nullptr before invoking triggers and restore its value on finishing
its execution.
The second part of the problem relating with wrong value of affected
rows reported by Connector/C API is caused by the fact that diagnostics
area is reused by an original DML statement and a statement invoked
by a trigger. This fact should be take into account on finalizing a
state of diagnostics area on completion running of a statement.
Important remark: in case the macros DBUG_OFF is on, call of the method
Diagnostics_area::reset_diagnostics_area()
results in reset of the data members
m_affected_rows, m_statement_warn_count.
Values of these data members of the class Diagnostics_area are used on
sending OK and EOF messages. In case DML statement is executed in PS bulk
mode such resetting results in sending wrong result values to a client
for affected rows in case the DML statement fires a triggers. So, reset
these data members only in case the current statement being processed
is not run in bulk mode.
int wsrep_thd_append_key(THD*, const wsrep_key*, int, Wsrep_service_key_type)
CREATE TABLE [SELECT|REPLACE SELECT] is CTAS and idea was that
we force ROW format. However, it was not correctly enforced
and keys were appended before wsrep transaction was started.
At THD::decide_logging_format we should force used stmt binlog
format to ROW in CTAS case and produce a warning if used
binlog format was not ROW.
At ha_innodb::update_row we should not append keys similarly
as in ha_innodb::write_row if sql_command is SQLCOM_CREATE_TABLE.
Improved error logging on ::write_row, ::update_row and ::delete_row
if wsrep key append fails.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
A mixture of a multi-byte *TEXT column and a short binary column
produced a too large column.
For example, COALESCE(tinytext_utf8mb4, short_varbinary)
produced a BLOB column instead of an expected TINYBLOB.
- Adding a virtual method Type_all_attributes::character_octet_length(),
returning max_length by default.
- Overriding Item_field::character_octet_length() to extract
the octet length from the underlying Field.
- Overriding Item_ref::character_octet_length() to extract
the octet length from the references Item (e.g. as VIEW fields).
- Fixing Type_numeric_attributes::find_max_octet_length() to
take the octet length using the new method character_octet_length()
instead of accessing max_length directly.
fprintf() on Windows, when used on unbuffered FILE*, writes bytewise.
This can make crash handler messages harder to read, if they are mixed up
with other error log output.
Fixed , on Windows, by using a small buffer for formatting, and fwrite
instead of fprintf, if buffer is large enough for message.
Replication of MyISAM and Aria DML is experimental and best
effort only. Earlier change make INSERT SELECT on both
MyISAM and Aria to replicate using TOI and STATEMENT
replication. Replication should happen only if user
has set needed wsrep_mode setting.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
New runtime type diagnostic (MDEV-34490) has detected that classes
Item_func_eq, Item_default_value and Item_date_literal_for_invalid_dates
incorrectly return an instance of its ancestor classes when being cloned.
This commit fixes that.
Additionally, it fixes a bug at Item_func_case_simple::do_build_clone()
which led to an endless loop of cloning functions calls.
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>