Partial revert of this commit:
commit 6b685ea7b0
Author: Sergei Golubchik <serg@mariadb.org>
Date: Wed Sep 28 18:55:15 2022 +0200
Don't hold LOCK_thd_data over run_commit_ordered(). Holding the mutex
is unnecessary and will deadlock if any code in a commit_ordered
handlerton call tries to take the mutex to change THD local data.
Instead, set the current_thd for the duration of the call to keep
asserts happy around LOCK_thd_data.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
- Regression introduced in 957cb7b7ba
- Patch 4abb8216a0 change `mysql.user` to `mysql.global_priv` for
`add_anonymous.inc`, update `delete_anonymous.inc`.
- Added test case with `--skip-name-resolve`
- Add test case with anonymous user
- Disable this test for windows, assignes current user to anonymous
user.
Reviewed by: <serg@mariadb.com>
There where several reasons why the test failed:
- Constructors for Field_double and Field_float changed an argument
to the constructor instead of a the correct class variable.
- gcc 7.5.0 produced wrong code when inlining Field_double constructor
into Field_test_double constructor.
Fixed by changing the correct class variable and make the constructors
not inline to go around the gcc bug.
When resolving a column from the HAVING clause, a new Item_field
object may be created inside Item_ref::fix_fields().
But the object is created with an empty name resolution context,
which then leads to debug assertion failure during
Item_field::fix_fields().
The solution is to pass the correct name resolution context
when creating the Item_field object.
Reviewer: Oleksandr Byelkin (sanja@mariadb.com)
On creation of a VIEW that depends on a stored routine an instance of
the class Item_func_sp is allocated on a memory root of SP statement.
It happens since mysql_make_view() calls the method
THD::activate_stmt_arena_if_needed()
before parsing definition of the view.
On the other hand, when sp_head's rcontext is created an instance of
the class Field referenced by the data member
Item_func_sp::result_field
is allocated on the Item_func_sp's Query_arena (call arena) that set up
inside the method
Item_sp::execute_impl
just before calling the method
sp_head::execute_function()
On return from the method sp_head::execute_function() all items allocated
on the Item_func_sp's Query_arena are released and its memory root is freed
(see implementation of the method Item_sp::execute_impl). As a consequence,
the pointer
Item_func_sp::result_field
references to the deallocated memory. Later, when the method
sp_head::execute
cleans up items allocated for just executed SP instruction the method
Item_func_sp::cleanup is invoked and tries to delete an object referenced
by data member Item_func_sp::result_field that points to already deallocated
memory, that results in a server abnormal termination.
To fix the issue the current active arena shouldn't be switched to
a statement arena inside the function mysql_make_view() that invoked indirectly
by the method sp_head::rcontext_create. It is implemented by introducing
the new Query_arena's state STMT_SP_QUERY_ARGUMENTS that is set when explicit
Query_arena is created for placing SP arguments and other caller's side items
used during SP execution. Then the method THD::activate_stmt_arena_if_needed()
checks Query_arena's state and returns immediately without switching to
statement's arena.
The SQL thread and a user connection executing SHOW SLAVE STATUS
have a race condition on Last_SQL_Errno, such that a slave which
previously errored and stopped, on its next start, SHOW SLAVE STATUS
can show that the SQL Thread is running while the previous error is
also showing.
The fix is to move when the last error is cleared when the SQL
thread starts to occur before setting the status of
Slave_SQL_Running.
Thanks to Kristian Nielson for his work diagnosing the problem!
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Kristian Nielson <knielsen@knielsen-hq.org>
Remove TLSv1.1 from the default tls_version system variable.
Output a warning if TLSv1.0 or TLSv1.1 are selected.
Thanks Tingyao Nian for the feature request.
MariaDB async replication SQL thread was stopped for any failure
in applying of replication events and error message logged for the failure
was: "Node has dropped from cluster". The assumption was that event applying
failure is always due to node dropping out.
With optimistic parallel replication, event applying can fail for natural
reasons and applying should be retried to handle the failure. This retry
logic was never exercised because the slave SQL thread was stopped with first
applying failure.
To support optimistic parallel replication retrying logic this commit will
now skip replication slave abort, if node remains in cluster (wsrep_ready==ON)
and replication is configured for optimistic or aggressive retry logic.
During the development of this fix, galera.galera_as_slave_nonprim test showed
some problems. The test was analyzed, and it appears to need some attention.
One excessive sleep command was removed in this commit, but it will need more
fixes still to be fully deterministic. After this commit galera_as_slave_nonprim
is successful, though.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that if wsrep_notify_cmd was set it was called
with a new status "joined" it tries to connect to the server
to update some table, but the server isn't initialized yet,
it's not listening for connections. So the server waits for the
script to finish, script waits for mariadb client to connect,
and the client cannot connect, because the server isn't listening.
Fix is to call script only when Galera has already formed a
view or when it is synched or donor.
This fix also enables following test cases:
* galera.MW-284
* galera.galera_binlog_checksum
* galera_var_notify_ssl_ipv6
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
The problem was that parallel replication of temporary tables using
statement-based binlogging could overlap the COMMIT in one thread with a DML
or DROP TEMPORARY TABLE in another thread using the same temporary table.
Temporary tables are not safe for concurrent access, so this caused
reference to freed memory and possibly other nastiness.
The fix is to disable the optimisation with overlapping commits of one
transaction with the start of a later transaction, when temporary tables are
in use. Then the following event groups will be blocked from starting until
the one using temporary tables is completed.
This also fixes occasional test failures of rpl.rpl_parallel_temptable seen
in Buildbot.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
recalculate long unique hash in Write_rows_log_event
and Update_rows_log_event.
normally generated columns (stored and indexed virtual)
are deterministic and their values don't need to be recalculated
on the slave as they're already present in the row image.
but the long unique hash function was changed in MDEV-27653,
so a row event from the old master will have the old hash,
but a table created on the new slave will need a new hash.
Create test for for case insensitive gives a basic warning on creating
a test file and the next thing a user might see is an abort.
ProtectHome and other systemd setting protect system services from
accessing user data. Unfortunately some of our users do put things
on /home due space or other reasons.
Rather than enumberate the systemd options in a very clunkly fragile
way we put an error associated with the "Can't create test file" and
hope the user can work it out from there.
%M tip thanks Sergei.
Fixed memory leak taken place on executing a prepared statement or
a stored routine that querying a view and this view constructed
on an information schema table. For example,
Lets consider the following definition of the view 'v1'
CREATE VIEW v1 AS SELECT table_name FROM information_schema.views
ORDER BY table_name;
Querying this view in PS mode result in hit of assert.
PREPARE stmt FROM "SELECT * FROM v1";
EXECUTE stmt;
EXECUTE stmt; (*)
Running the statement marked with (*) leads to a crash in case
server build with mode to control allocation of a memory from SP/PS
memory root on the second and following executions of PS/SP.
The reason of leaking the memory is that a memory allocated on
processing of FRM file for the view requested from a PS/PS memory
root meaning that this memory be released only when a stored routine
be evicted from SP-cache or a prepared statement be deallocated
that typically happens on termination of a user session.
To fix the issue switch to a memory root specially created for
allocation of short-lived objects that requested on parsing FRM.
In case a table accessed by a PS/SP is dropped after the first execution of
PS/SP and a view created with the same name as a table just dropped then
the second execution of PS/SP leads to allocation of a memory on SP/PS
memory root already marked as read only on first execution.
For example, the following test case:
CREATE TABLE t1 (a INT);
PREPARE stmt FROM "INSERT INTO t1 VALUES (1)";
EXECUTE stmt;
DROP TABLE t1;
CREATE VIEW t1 S SELECT 1;
--error ER_NON_INSERTABLE_TABLE
EXECUTE stmt; # (*)
DROP VIEW t1;
will hit assert on running the statement 'EXECUTE stmt' marked with (*)
when allocation of a memory be performed on parsing the view.
Memory allocation is requested inside the function mysql_make_view
when a view definition being parsed. In order to avoid an assertion
failure, call of the function mysql_make_view() must be moved after
invocation of the function check_and_update_table_version().
It will result in re-preparing the whole PS statement or current
SP instruction that will free currently allocated items and reset
read_only flag for the memory root.
Moved call of the function check_and_update_table_version() just
before the place where the function extend_table_list() is invoked
in order to avoid allocation of memory on a PS/SP memory root
marked as read only. It happens by the reason that the function
extend_table_list() invokes sp_add_used_routine() to add a trigger
created for the table in time frame between execution the statement
EXECUTE `stmt_id` .
For example, the following test case
create table t1 (a int);
prepare stmt from "insert into t1 (a) value (1)";
execute stmt;
create trigger t1_bi before insert on t1 for each row
set @message= new.a;
execute stmt; # (*)
adds the trigger t1_bi to a list of used routines that involves
allocation of a memory on PS memory root that has been already marked
as read only on first run of the statement 'execute stmt'.
In result, when the statement marked with (*) is executed it results in
assert hit.
To fix the issue call the function check_and_update_table_version()
before invocation of extend_table_list() to force re-compilation of
PS/SP that resets read-only flag of its memory root.
This patch adds support for controlling of memory allocation
done by SP/PS that could happen on second and following executions.
As soon as SP or PS has been executed the first time its memory root
is marked as read only since no further memory allocation should
be performed on it. In case such allocation takes place it leads to
the assert hit for invariant that force no new memory allocations
takes place as soon as the SP/PS has been marked as read only.
The feature for control of memory allocation made on behalf SP/PS
is turned on when both debug build is on and the cmake option
-DWITH_PROTECT_STATEMENT_MEMROOT is set.
The reason for introduction of the new cmake option
-DWITH_PROTECT_STATEMENT_MEMROOT
to control memory allocation of second and following executions of
SP/PS is that for the current server implementation there are too many
places where such memory allocation takes place. As soon as all such
incorrect allocations be fixed the cmake option
-DWITH_PROTECT_STATEMENT_MEMROOT
can be removed and control of memory allocation made on second and
following executions can be turned on only for debug build. Before
every incorrect memory allocation be fixed it makes sense to guard
the checking of memory allocation on read only memory by extra cmake
option else we would get a lot of failing test on buildbot.
Moreover, fixing of all incorrect memory allocations could take pretty
long period of time, so for introducing the feature without necessary
to wait until all places throughout the source code be fixed it makes
sense to add the new cmake option.
Summary:
This patch enables possible index optimization when
the WHERE clause has an IN condition of the form:
signed_or_unsigned_column IN (signed_or_unsigned_constant,
signed_or_unsigned_constant
[,signed_or_unsigned_constant]*)
when the IN list constants are of different signess, e.g.:
WHERE signed_column IN (signed_constant, unsigned_constant ...)
WHERE unsigned_column IN (signed_constant, unsigned_constant ...)
Details:
In a condition like:
WHERE unsigned_predicant IN (1, LONGLONG_MAX + 1)
comparison handlers for individual (predicant,value) pairs are
calculated as follows:
* unsigned_predicant and 1 produce &type_handler_newdecimal
* unsigned_predicant and (LONGLONG_MAX + 1) produce &type_handler_slonglong
The old code decided that it could not use bisection because
the two pairs had different comparison handlers.
As a result, bisection was not allowed, and, in case of
an indexed integer column predicant the index on the column was not used.
The new code catches special cases like:
signed_predicant IN (signed_constant, unsigned_constant)
unsigned_predicant IN (signed_constant, unsigned_constant)
It enables bisection using in_longlong, which supports a mixture
of predicant and values of different signess.
In case when the predicant is an indexed column this change
automatically enables index range optimization.
Thanks to Vicențiu Ciorbaru for proposing the idea and for preparing MTR tests.
For clang compiler the compiler's flag -Wno-unused-but-set-variable
was set based on compiler version. This approach could result in
false positive detection for presence of compiler option since
only first three groups of digits in compiler version taken into account
and it could lead to inaccuracy in determining of supported compiler's
features.
Correct way to detect options supported by a compiler is to use
the macros MY_CHECK_CXX_COMPILER_FLAG and to check the result of
variable with prefix have_CXX__
So, to check whether compiler does support the option
-Wno-unused-but-set-variable
the macros
MY_CHECK_CXX_COMPILER_FLAG(-Wno-unused-but-set-variable)
should be called and the result variable
have_CXX__Wno_unused_but_set_variable
be tested for assigned value.
When the SQL driver thread goes to wait for room in the parallel slave
worker queue, there was a race where a kill at the right moment could
be ignored and the wait proceed uninterrupted by the kill.
Fix by moving the THD::check_killed() to occur _after_ doing ENTER_COND().
This bug was seen as sporadic failure of the testcase rpl.rpl_parallel
(rpl.rpl_parallel_gco_wait_kill since 10.5), with "Slave stopped with
wrong error code".
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.
Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:
MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master
Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB locking code changes.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Remove the exception that InnoDB does not report auto-increment locks waits
to the parallel replication.
There was an assumption that these waits could not cause conflicts with
in-order parallel replication and thus need not be reported. However, this
assumption is wrong and it is possible to get conflicts that lead to hangs
for the duration of --innodb-lock-wait-timeout. This can be seen with three
transactions:
1. T1 is waiting for T3 on an autoinc lock
2. T2 is waiting for T1 to commit
3. T3 is waiting on a normal row lock held by T2
Here, T3 needs to be deadlock killed on the wait by T1.
Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB lock code changes.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Field_varstring::get_copy_func() did not take into account
that functions do_varstring1[_mb], do_varstring2[_mb] do not support
compressed data.
Changing the return value of Field_varstring::get_copy_func()
to `do_field_string` if there is a compresion and truncation
at the same time. This fixes the problem, so now it works as follows:
- val_str() uncompresses the data
- The prefix is then calculated on the uncompressed data
Additionally, introducing two new copying functions
- do_varstring1_no_truncation()
- do_varstring2_no_truncation()
Using new copying functions in cases when:
- a Field_varstring with length_bytes==1 is changing to a longer
Field_varstring with length_bytes==1
- a Field_varstring with length_bytes==2 is changing to a longer
Field_varstring with length_bytes==2
In these cases we don't care neither of compression nor
of multi-byte prefixes: the entire data gets fully copied
from the source column to the target column as is.
This is a kind of new optimization, but this also was needed
to preserve existing MTR test results.
This is also related to
MDEV-31348 Assertion `last_key_entry >= end_pos' failed in virtual bool
JOIN_CACHE_HASHED::put_record()
Valgrind exposed a problem with the join_cache for hash joins:
=25636== Conditional jump or move depends on uninitialised value(s)
==25636== at 0xA8FF4E: JOIN_CACHE_HASHED::init_hash_table()
(sql_join_cache.cc:2901)
The reason for this was that avg_record_length contained a random value
if one had used SET optimizer_switch='optimize_join_buffer_size=off'.
This causes either 'random size' memory to be allocated (up to
join_buffer_size) which can increase memory usage or, if avg_record_length
is less than the row size, memory overwrites in thd->mem_root, which is
bad.
Fixed by setting avg_record_length in JOIN_CACHE_HASHED::init()
before it's used.
There is no test case for MDEV-31893 as valgrind of join_cache_notasan
checks that.
I added a test case for MDEV-31348.
Revert the old work-around for buggy fdatasync() on Linux ext3. This bug was
fixed in Linux > 10 years ago back to kernel version at least 3.0.
Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This is also related to
MDEV-31348 Assertion `last_key_entry >= end_pos' failed in virtual bool
JOIN_CACHE_HASHED::put_record()
Valgrind exposed a problem with the join_cache for hash joins:
=25636== Conditional jump or move depends on uninitialised value(s)
==25636== at 0xA8FF4E: JOIN_CACHE_HASHED::init_hash_table()
(sql_join_cache.cc:2901)
The reason for this was that avg_record_length contained a random value
if one had used SET optimizer_switch='optimize_join_buffer_size=off'.
This causes either 'random size' memory to be allocated (up to
join_buffer_size) which can increase memory usage or, if avg_record_length
is less than the row size, memory overwrites in thd->mem_root, which is
bad.
Fixed by setting avg_record_length in JOIN_CACHE_HASHED::init()
before it's used.
There is no test case for MDEV-31893 as valgrind of join_cache_notasan
checks that.
I added a test case for MDEV-31348.
There was two related problems:
(1) Galera node that is defined as a slave to async MariaDB
master at restart might do SST (state stransfer) and
part of that it will copy mysql.gtid_slave_pos table.
Problem is that updates on that table are not replicated
on a cluster. Therefore, table from donor that is not
slave is copied and joiner looses gtid position it was
and start executing events from wrong position of the binlog.
This incorrect position could break replication and
causes node to be dropped and requiring user action.
(2) Slave sql thread might start executing events before
galera is ready (wsrep_ready=ON) and that could also
cause node to be dropped from the cluster.
In this fix we enable replication of mysql.gtid_slave_pos
table on a cluster. In this way all nodes in a cluster
will know gtid slave position and even after SST joiner
knows correct gtid position to start.
Furthermore, we wait galera to be ready before slave
sql thread executes any events to prevent too early
execution.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
when validating vcol's (default, check, etc) in ALTER TABLE
vcol_info->flags are modified in place. This means that if ALTER TABLE
fails for any reason we need to restore them to their original values.
(mroonga was freeing the memory on ::reset() but not on ::close())
After MDEV-21580 the truncation of SORT_FIELD::length
set_if_smaller(sortorder->length, thd->variables.max_sort_length)
became conditional:
if (is_variable_sized())
set_if_smaller(length, thd->variables.max_sort_length)
To provide correct functioning of is_variable_sized() SORT_FIELD::type
must be set properly. This commit adds the necessary initialization
of SORT_FIELD::type to JOIN_TAB::remove_duplicates() as it is done
in filesort's sortlength() function.
DBUG_ASSERT is added to sortlength() just in case to prevent
a possible uint32 overflow
make TRANSACTIONAL table option behave similar to other engine-defined
table options. If the engine doesn't suport it:
* if specified expicitly in CREATE or ALTER - it's ER_UNKNOWN_OPTION
* an error or a warning depending on sql_mode IGNORE_BAD_TABLE_OPTIONS
* in ALTER TABLE from the engine that suppors it to the engine that
doesn't - silently preserved (no warning)
* it is commented out in SHOW CREATE unless IGNORE_BAD_TABLE_OPTIONS
* invoke check_expression() for all vcol_info's in
mysql_prepare_create_table() to check for FK CASCADE
* also check for SET NULL and SET DEFAULT
* to check against existing FKs when a vcol is added in ALTER TABLE,
old FKs must be added to the new_key_list just like other indexes are
* check columns recursively, if vcol1 references vcol2,
flags of vcol2 must be taken into account
* remove check_table_name_processor(), put that logic under
check_vcol_func_processor() to avoid walking the tree twice
mark old keys in the ALTER TABLE with the `old` flag, not with
the `key_create_info.check_for_duplicate_indexes`.
This allows to mark old foreign keys too.