table: rows are counted twice
Analysis: When the table we are trying to insert into and the SELECT table
are same for INSERT ... SELECT, rows from the SELECT table are copied into
internal temporary table and then to the INSERT table. We only want to
count the rows when we start inserting into the table.
Fix: Reset the counter to 1 before starting to copy from internal temporary
table to select table and then increment the counter.
The problem happened because Item_default_value did not overload
properly the val_xxx_result() family methods.
This change backports the patch for:
MDEV-24958 Server crashes in my_strtod / Value_source::Converter_strntod::Converter_strntod with DEFAULT(blob)
which earlier fixed the problem in 10.3.
failed in Diagnostics_area::set_ok_status in my_ok from
mysql_sql_stmt_prepare
Analysis: Before PREPARE is executed, binlog_format is STATEMENT.
This PREPARE had SET STATEMENT which sets binlog_format to ROW. Now after
PREPARE is done we reset the binlog_format (back to STATEMENT). But we have
temporary table, it doesn't let changing binlog_format=ROW to
binlog_format=STATEMENT and gives error which goes unreported. This
unreported error eventually causes assertion failure.
Fix: Change return type for LEX::restore_set_statement_var() to bool and
make it return error state.
If when extracting a range condition for an index from the WHERE condition
Range Optimizer sees that the range condition covers the whole index then
such condition should be discarded because it cannot be used in any range
scan. In some cases Range Optimizer really does it, but there remained some
conditions for which it was not done. As a result the optimizer could
produce index merge plans with the full index scan for one of the indexes
participating in the index merge.
This could be observed in one of the test cases from index_merge1.inc
where a plan with index_merge_sort_union was produced and in the test case
reported for this bug where a plan with index_merge_sort_intersect was
produced. In both cases one of two index scans participating in index merge
ran over the whole index.
The patch slightly changes the original above mentioned test case from
index_merge1.inc to be able to produce an intended plan employing
index_merge_sort_union. The original query was left to show that index
merge is not used for it anymore.
It should be noted that for the plan with index_merge_sort_intersect could
be chosen for execution only due to a defect in the InnoDB code that
returns wrong estimates for the cardinality of big ranges.
This bug led to serious problems in 10.4+ where the optimization using
Rowid filters is employed (see mdev-26446).
Approved by Sergey Petrunia <sergey@mariadb.com>
The old code erroneously used default_charset_info to compare field names.
default_charset_info can point to any arbitrary collation,
including ucs2*, utf16*, utf32*, including those that do not
support strcasecmp().
my_charset_utf8mb4_unicode_ci, which is used in this scenario:
CREATE TABLE t1 ENGINE=InnoDB WITH SYSTEM VERSIONING AS SELECT 0;
does not support strcasecmp().
Fixing the code to use Lex_ident::streq(), which uses
system_charset_info instead of default_charset_info.
MDEV-25803 excluded some cases from key sort upon alter table. That
particularly depends on ALTER_ADD_INDEX flag. Creating a column of
SERIAL data type missed that flag. Though equivalent operation
alter table t1 add x bigint unsigned not null auto_increment unique;
has ALTER_ADD_INDEX flag.
Repeating execution of a query containing the clause IN with string literals
in environment where the server variable in_predicate_conversion_threshold
is set results in server abnormal termination in case the query is run
as a Prepared Statement and conversion of charsets for string values in the
query are required.
The reason for server abnormal termination is that instances of the class
Item_string created on transforming the IN clause into subquery were created
on runtime memory root that is deallocated on finishing execution of Prepared
statement. On the other hand, references to Items placed on deallocated memory
root still exist in objects of the class table_value_constr. Subsequent running
of the same prepared statement leads to dereferencing of pointers to already
deallocated memory that could lead to undefined behaviour.
To fix the issue the values being pushed into a values list for TVC are created
by cloning their original items. This way the cloned items are allocate on
the PS memroot and as consequences no dangling pointer does more exist.
Consider the following use case:
MariaDB [test]> CREATE TABLE t1 (field1 BIGINT DEFAULT -1);
MariaDB [test]> CREATE VIEW v1 AS SELECT DISTINCT field1 FROM t1;
Repeated execution of the following query as a Prepared Statement
MariaDB [test]> PREPARE stmt FROM 'SELECT * FROM v1 WHERE field1 <=> NULL';
MariaDB [test]> EXECUTE stmt;
results in a crash for a server built with DEBUG.
MariaDB [test]> EXECUTE stmt;
ERROR 2013 (HY000): Lost connection to MySQL server during query
Assertion failed: (!result), function convert_const_to_int, file item_cmpfunc.cc, line 476.
Abort trap: 6 (core dumped)
The crash inside the function convert_const_to_int() happens by the reason
that the value -1 is stored in an instance of the class Field_longlong
on restoring its original value in the statement
result= field->store(orig_field_val, TRUE);
that leads to assigning the value 1 to the variable 'result' with subsequent
crash in the DBUG_ASSERT statement following it
DBUG_ASSERT(!result);
The main matter here is why this assertion failure happens on the second
execution of the prepared statement and doens't on the first one.
On first handling of the statement
'EXECUTE stmt;'
a temporary table is created for serving the query involving the view 'v1'.
The table is created by the function create_tmp_table() in the following
calls trace: (trace #1)
JOIN::prepare (at sql_select.cc:725)
st_select_lex::handle_derived
LEX::handle_list_of_derived
TABLE_LIST::handle_derived
mysql_handle_single_derived
mysql_derived_prepare
select_union::create_result_table
create_tmp_table
Note, that the data member TABLE::status of a TABLE instance returned by the
function create_tmp_table() has the value 0.
Later the function setup_table_map() is called on the TABLE instance just
created for the sake of the temporary table (calls trace #2 is below):
JOIN::prepare (at sql_select.cc:737)
setup_tables_and_check_access
setup_tables
setup_table_map
where the data member TABLE::status is set to the value STATUS_NO_RECORD.
After that when execution of the method JOIN::prepare reaches calling of
the function setup_without_group() the following calls trace is invoked
JOIN::prepare
setup_without_group
setup_conds
Item_func::fix_fields
Item_func_equal::fix_length_and_dec
Item_bool_rowready_func2::fix_length_and_dec
Item_func::setup_args_and_comparator
Item_func::convert_const_compared_to_int_field
convert_const_to_int
There is the following code snippet in the function convert_const_to_int()
at the line item_cmpfunc.cc:448
bool save_field_value= (field_item->const_item() ||
!(field->table->status & STATUS_NO_RECORD));
Since field->table->status has bits STATUS_NO_RECORD set the variable
save_field_value is false and therefore neither the method
Field_longlong::val_int() nor the method Field_longlong::store is called
on the Field instance that has the numeric value -1.
That is the reason why first execution of the Prepared Statement for the query
'SELECT * FROM v1 WHERE field1 <=> NULL'
is successful.
On second running of the statement 'EXECUTE stmt' a new temporary tables
is also created by running the calls trace #1 but the trace #2 is not executed
by the reason that data member SELECT_LEX::first_cond_optimization has been set
to false on first execution of the prepared statemet (in the method
JOIN::optimize_inner()). As a consequence, the data member TABLE::status for
a temporary table just created doesn't have the flags STATUS_NO_RECORD set and
therefore on re-execution of the prepared statement the methods
Field_longlong::val_int() and Field_longlong::store() are called for the field
having the value -1 and the DBUG_ASSERT(!result) is fired.
To fix the issue the data member TABLE::status has to be assigned the value
STATUS_NO_RECORD in every place where the macros empty_record() is called
to emptify a record for just instantiated TABLE object created on behalf
the new temporary table.
Followup to fix for MDEV-25858: When test_if_skip_sort_order() decides
to use an index to satisfy ORDER BY ... LIMIT clause, it should
disable "Range Checked for Each Record" optimization.
Do this in all cases.
it's not printed, not cleaned up without perfschema,
so isn't supposed to be written into either
this fixes "Memory not freed" warnings when early command line
options produce warnings in non-perfschema builds
CMake Error in wsrep-lib/CMakeLists.txt:
The custom command generating
/Users/name/build/mariadb-server/sql/lex_token.h
is attached to multiple targets:
GenServerSource
sql
but none of these is a common dependency of the other(s). This is not
allowed by the Xcode "new build system".
This bug was introduced by commit be00e279c6
The commit was applied for the task MDEV-6480 that allowed to remove top
level disjuncts from WHERE conditions if the range optimizer evaluated them
as always equal to FALSE/NULL.
If such disjuncts are removed the WHERE condition may become an AND formula
and if this formula contains multiple equalities the field JOIN::item_equal
must be updated to refer to these equalities. The above mentioned commit
forgot to do this and it could cause crashes for some queries.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
close_connections() in mysqld.cc sends a signal to all threads.
But InnoDB is too busy purging, doesn't react immediately.
close_connections() waits 20 seconds, which isn't enough in this
particular case, and then unlinks all threads from
the list and forcibly closes their vio connection.
InnoDB background threads have no vio connection to close, but
they're unlinked all the same. So when later they finally notice
the shutdown request and try to unlink themselves, they fail to
assert that they're still linked.
Fix: don't assert_linked, as another thread can unlink this THD anytime
The bug occurs where the float token containing a dot with an 'e'
notation was dropped from the request completely.
This causes a manner of invalid SQL statements like:
select id 1.e, char 10.e(id 2.e), concat 3.e('a'12356.e,'b'1.e,'c'1.1234e)1.e, 12 1.e*2 1.e, 12 1.e/2 1.e, 12 1.e|2 1.e, 12 1.e^2 1.e, 12 1.e%2 1.e, 12 1.e&2 from test;
To be parsed correctly as if it was:
select id, char(id), concat('a','b','c'), 12*2, 12/2, 12|2, 12^2, 12%2, 12&2 from test.test;
This correct parsing occurs when e is followed by any of:
( ) . , | & % * ^ /
This bug led to reporting bogus messages "No database selected" for DELETE
statements if they used subqueries in their WHERE conditions and these
subqueries contained references to CTEs.
The bug happened because the grammar rule for DELETE statement did not
call the function LEX::check_cte_dependencies_and_resolve_references() and
as a result of it references to CTEs were not identified as such.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This bug concerned only CREATE TABLE statements of the form
CREATE TABLE <table name> AS <with clause> <union>.
For such a statement not all references to CTE used in <union> were resolved.
As a result a bogus message was reported for the first unresolved reference.
This happened because for such statements the function resolving references
to CTEs LEX::check_cte_dependencies_and_resolve_references() was called
prematurely in the parser.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Remove section that was trying to rename default-character-set to character-set-server
This seems to be an old workaround for some upgrade warning, which did not
work for some time already, because the ini filename was not initialized.
This bug affected queries with two or more references to a CTE referring
another CTE if the definition of the latter contained an invocation of
a stored function that used a base table. The bug could lead to a bogus
error message or to an assertion failure.
For any non-first reference to CTE cte1 With_element::clone_parsed_spec()
is called that parses the specification of cte1 to construct the unit
structure for this usage of cte1. If cte1 refers to another CTE cte2
outside of the specification of cte1 then With_element::clone_parsed_spec()
has to be called for cte2 as well. This call is made by the function
LEX::resolve_references_to_cte() within the invocation of the function
With_element::clone_parsed_spec() for cte1.
When the specification of a CTE is parsed all table references encountered
in it must be added to the global list of table references for the query.
As the specification for the non-first usage of a CTE is parsed at a
recursive call of the parser the function With_element::clone_parsed_spec()
invoked at this recursive call should takes care of appending the list of
table references encountered in the specification of this CTE cte1 to the
list of table references created for the query. And it should do it after
the call of LEX::resolve_references_to_cte() that resolves references to
CTEs defined outside of the specification of cte1 because this call may
invoke the parser again for specifications of other CTEs and the table
references from their specifications must ultimately appear in the global
list of table references of the query.
The code of With_element::clone_parsed_spec() misplaced the call of
LEX::resolve_references_to_cte(). As a result LEX::query_tables_last used
for the query that was supposed to point to the field 'next_global' of the
last element in the global list of table references actually pointed to
'next_global' of the previous element.
The above inconsistency certainly caused serious problems when table
references used in the stored functions invoked in cloned specifications
of CTEs were added to the global list of table references.
When transaction creates or drops temporary tables and afterward its statement
faces an error even the transactional table statement's cached ROW
format events get involved into binlog and are visible after the transaction's commit.
Fixed with proper analysis of whether the errored-out statement needs
to be rolled back in binlog.
For instance a fact of already cached CREATE or DROP for temporary
tables by previous statements alone
does not cause to retain the being errored-out statement events in the
cache.
Conversely, if the statement creates or drops a temporary table
itself it can't be rolled back - this rule remains.
versioning_fields flag indicates that any columns were specified WITH
SYSTEM VERSIONING. In that case we imply WITH SYSTEM VERSIONING for
the whole table and WITHOUT SYSTEM VERSIONING for the other columns.
Replaced HA_ADMIN_NOT_IMPLEMENTED error code by HA_ADMIN_OK. Now CHECK
TABLE does not fail by unsupported check_misplaced_rows(). Admin
message is not needed as well.
Test case is the same as for MDEV-21011 (a7cf0db3d8), the result have
been changed.
There is a case when implicit primary key may be changed when removing
NOT NULL from the part of unique key. In that case we update
modified_primary_key which is then used to not skip key sorting.
According to is_candidate_key() there is no other cases when primary
kay may be changed implicitly.
mysql_prepare_create_table() does my_qsort(sort_keys) on key
info. This sorting is indeterministic: a table is created with one
order and inplace alter may overwrite frm with another order. Since
inplace alter does nothing about key info for MyISAM/Aria storage
engines this results in discrepancy between frm and storage engine key
definitions.
The fix avoids the sorting of keys when no new keys added by ALTER
(and this is ok for MyISAM/Aria since it cannot add new keys inplace).
Notes:
mi_keydef_write()/mi_keyseg_write() are used only in mi_create(). They
should be used in ha_inplace_alter_table() as well.
Aria corruption detection is unimplemented: maria_check_definition()
is never used!
MySQL 8.0 has this bug as well as of 8.0.26.
This breaks main.long_unique in 10.4. The new result is correct and
should be applied as it just different (original) order of keys.
There is a server startup option --gdb a.k.a. --debug-gdb that requests
signals to be set for more convenient debugging. Most notably, SIGINT
(ctrl-c) will not be ignored, and you will be able to interrupt the
execution of the server while GDB is attached to it.
When we are debugging, the signal handlers that would normally display
a terse stack trace are useless.
When we are debugging with rr, the signal handlers may interfere with
a SIGKILL that could be sent to the process by the environment, and ruin
the rr replay trace, due to a Linux kernel bug
https://lkml.org/lkml/2021/10/31/311
To be able to diagnose bugs in kill+restart tests, we may really need
both a trace before the SIGKILL and a trace of the failure after a
subsequent server startup. So, we had better avoid hitting the problem
by simply not installing those signal handlers.
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
When transaction creates or drops temporary tables and afterward its statement
faces an error even the transactional table statement's cached ROW
format events get involved into binlog and are visible after the transaction's commit.
Fixed with proper analysis of whether the errored-out statement needs
to be rolled back in binlog.
For instance a fact of already cached CREATE or DROP for temporary
tables by previous statements alone
does not cause to retain the being errored-out statement events in the
cache.
Conversely, if the statement creates or drops a temporary table
itself it can't be rolled back - this rule remains.
for example:
sql/sql_prepare.cc:5714:63: error: 'static void Ed_result_set::operator delete(void*, MEM_ROOT*)' called on pointer returned from a mismatched allocation function [-Werror=mismatched-new-delete]
The assert inside String::copy() prevents copying from from "str"
if its own String::Ptr also points to the same memory.
The idea of the assert is that copy() performs memory reallocation,
and this reallocation can free (and thus invalidate) the memory pointed by Ptr,
which can lead to further copying from a freed memory.
The assert was incomplete: copy() can free the memory pointed by its Ptr
only if String::alloced is true!
If the String is not alloced, it is still safe to copy even from
the location pointed by Ptr.
This scenario demonstrates a safe copy():
const char *tmp= "123";
String str1(tmp, 3);
String str2(tmp, 3);
// This statement is safe:
str2.copy(str1->ptr(), str1->length(), str1->charset(), cs_to, &errors);
Inside the copy() the parameter "str" is equal to String::Ptr in this example.
But it's still ok to reallocate the memory for str2, because str2
was a constant before the copy() call. Thus reallocation does not
make the memory pointed by str1->ptr() invalid.
Adjusting the assert condition to allow copying for constant strings.
Also fixes:
MDEV-25399 Assertion `name.length == strlen(name.str)' failed in Item_func_sp::make_send_field
Also fixes a problem that in this scenario:
SET NAMES binary;
SELECT 'some not well-formed utf8 string';
the auto-generated column name copied the binary string value directly
to the Item name, without checking utf8 well-formedness.
After this change auto-generated column names work as follows:
- Zero bytes 0x00 are copied to the name using HEX notation
- In case of "SET NAMES binary", all bytes sequences that do not make
well-formed utf8 characters are copied to the name using HEX notation.