Remove table_count from Query_tables_list (not used, moved to MYSQL_LOCK).
Rename table_count from LEX to avoid mixing it with other counters of tables.
Window Functions code tries to minimize the number of times it
needs to sort the select's resultset by finding "compatible"
OVER (PARTITION BY ... ORDER BY ...) clauses.
This employs compare_order_elements(). That function assumed that
the order expressions are Item_field-derived objects (that refer
to a temp.table). But this is not always the case: one can
construct queries order expressions are arbitrary item expressions.
Add handling for such expressions: sort them according to the window
specification they appeared in.
This means we cannot detect that two compatible PARTITION BY clauses
that use expressions can share the sorting step.
But at least we won't crash.
Per Marko's comment in JIRA, sql_kill is passing the thread id
as long long. We change the format of the error messages to match,
and cast the thread id to long long in sql_kill_user.
The 10.5 test error main.grant_kill showed up a incorrect
thread id on a big endian architecture.
The cause of this is the sql_kill_user function assumed the
error was ER_OUT_OF_RESOURCES, when the the actual error was
ER_KILL_DENIED_ERROR. ER_KILL_DENIED_ERROR as an error message
requires a thread id to be passed as unsigned long, however a
user/host was passed.
ER_OUT_OF_RESOURCES doesn't even take a user/host, despite
the optimistic comment. We remove this being passed as an
argument to the function so that when MDEV-21978 is implemented
one less compiler format warning is generated (which would
have caught this error sooner).
Thanks Otto for reporting and Marko for analysis.
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is
wsrep_thd_LOCK()
wsrep_kill_victim()
lock_rec_other_has_conflicting()
lock_clust_rec_read_check_and_lock()
row_search_mvcc()
ha_innobase::index_read()
ha_innobase::rnd_pos()
handler::ha_rnd_pos()
handler::rnd_pos_by_record()
handler::ha_rnd_pos_by_record()
Rows_log_event::find_row()
Update_rows_log_event::do_exec_row()
Rows_log_event::do_apply_event()
Log_event::apply_event()
wsrep_apply_events()
and mutexes are taken in the order
lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data
When a normal KILL statement is executed, the stack is
innobase_kill_query()
kill_handlerton()
plugin_foreach_with_mask()
ha_kill_query()
THD::awake()
kill_one_thread()
and mutexes are
victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This patch also fixes mutex locking order and unprotected
THD member accesses on bf aborting case. We try to hold
THD::LOCK_thd_data during bf aborting. Only case where it
is not possible is at wsrep_abort_transaction before
call wsrep_innobase_kill_one_trx where we take InnoDB
mutexes first and then THD::LOCK_thd_data.
This will also fix possible race condition during
close_connection and while wsrep is disconnecting
connections.
Added wsrep_bf_kill_debug test case
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
In the code existed just before this patch binding of a table reference to
the specification of the corresponding CTE happens in the function
open_and_process_table(). If the table reference is not the first in the
query the specification is cloned in the same way as the specification of
a view is cloned for any reference of the view. This works fine for
standalone queries, but does not work for stored procedures / functions
for the following reason.
When the first call of a stored procedure/ function SP is processed the
body of SP is parsed. When a query of SP is parsed the info on each
encountered table reference is put into a TABLE_LIST object linked into
a global chain associated with the query. When parsing of the query is
finished the basic info on the table references from this chain except
table references to derived tables and information schema tables is put
in one hash table associated with SP. When parsing of the body of SP is
finished this hash table is used to construct TABLE_LIST objects for all
table references mentioned in SP and link them into the list of such
objects passed to a pre-locking process that calls open_and_process_table()
for each table from the list.
When a TABLE_LIST for a view is encountered the view is opened and its
specification is parsed. For any table reference occurred in
the specification a new TABLE_LIST object is created to be included into
the list for pre-locking. After all objects in the pre-locking have been
looked through the tables mentioned in the list are locked. Note that the
objects referenced CTEs are just skipped here as it is impossible to
resolve these references without any info on the context where they occur.
Now the statements from the body of SP are executed one by one that.
At the very beginning of the execution of a query the tables used in the
query are opened and open_and_process_table() now is called for each table
reference mentioned in the list of TABLE_LIST objects associated with the
query that was built when the query was parsed.
For each table reference first the reference is checked against CTEs
definitions in whose scope it occurred. If such definition is found the
reference is considered resolved and if this is not the first reference
to the found CTE the the specification of the CTE is re-parsed and the
result of the parsing is added to the parsing tree of the query as a
sub-tree. If this sub-tree contains table references to other tables they
are added to the list of TABLE_LIST objects associated with the query in
order the referenced tables to be opened. When the procedure that opens
the tables comes to the TABLE_LIST object created for a non-first
reference to a CTE it discovers that the referenced table instance is not
locked and reports an error.
Thus processing non-first table references to a CTE similar to how
references to view are processed does not work for queries used in stored
procedures / functions. And the main problem is that the current
pre-locking mechanism employed for stored procedures / functions does not
allow to save the context in which a CTE reference occur. It's not trivial
to save the info about the context where a CTE reference occurs while the
resolution of the table reference cannot be done without this context and
consequentially the specification for the table reference cannot be
determined.
This patch solves the above problem by moving resolution of all CTE
references at the parsing stage. More exactly references to CTEs occurred in
a query are resolved right after parsing of the query has finished. After
resolution any CTE reference it is marked as a reference to to derived
table. So it is excluded from the hash table created for pre-locking used
base tables and view when the first call of a stored procedure / function
is processed.
This solution required recursive calls of the parser. The function
THD::sql_parser() has been added specifically for recursive invocations of
the parser.
This patch sets the proper name resolution context for outer references
used in a subquery from an ON clause. Usually this context is more narrow
than the name resolution context of the parent select that were used before
this fix.
This fix revealed another problem that concerned ON expressions used in
from clauses of specifications of derived tables / views / CTEs. The name
resolution outer context for such ON expression must be set to NULL to
prevent name resolution beyond the derived table where it is used.
The solution to resolve this problem applied in sql_derived.cc was provided
by Sergei Petrunia <sergey@mariadb.com>.
The change in sql_parse.cc is not good for 10.4+. A corresponding diff for
10.4+ will be provided in JIRA entry for this bug.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Store old value of binlog format before wsrep code so that
if we bail out because wsrep is not ready for connections
we can restore binlog format correctly.
A bogus error message was issued for any outer references occurred in
ON expressions used in subqueries. This prevented execution of queries
containing subqueries as soon as they used outer references in their ON
clauses. This happened because the Name_resolution_context structure
created for any ON expression erroneously had the field outer_context set
to NULL. The fields select_lex of this structure was not set correctly
either.
The idea of the fix was taken from mysql code of the function
push_new_name_resolution_context().
Approved by dmitry.shulga@mariadb.com
Running statements with SET STATEMENT FOR clause is handled incorrectly in
case the whole statement is executed in prepared statement mode.
For example, running of the following statement
SET STATEMENT sql_mode = 'NO_ENGINE_SUBSTITUTION' FOR CREATE TABLE t1 AS SELECT CONCAT('abc') AS c1;
results in different definition of the table t1 depending on whether
the statement is executed as a prepared or as a regular statement.
In first case the column c1 is defined as
`c1` varchar(3) DEFAULT NULL
in the last case the column c1 is defined as
`c1` varchar(3) NOT NULL
Different definition for the column c1 arise due to the fact that
a value of the data memeber Item_func_concat::maybe_null depends on
whether strict mode is on or off. Below is definition of the method
fix_fields() of the class Item_str_func that is base class for the
class Item_func_concat that is created on parsing the
SET STATEMENT FOR clause.
bool Item_str_func::fix_fields(THD *thd, Item **ref)
{
bool res= Item_func::fix_fields(thd, ref);
/*
In Item_str_func::check_well_formed_result() we may set null_value
flag on the same condition as in test() below.
*/
maybe_null= maybe_null || thd->is_strict_mode();
return res;
}
Although the clause SET STATEMENT sql_mode = 'NO_ENGINE_SUBSTITUTION' FOR
is parsed on PREPARE phase during processing of the prepared statement,
real setting of the sql_mode system variable is done on EXECUTION phase.
On the other hand, the method Item_str_func::fix_fields is called on PREPARE
phase. In result, thd->is_strict_mode() returns true during calling the method
Item_str_func::fix_fields(), the data member maybe_null is assigned the value
true and column c1 is defined as DEFAULT NULL.
To fix the issue the system variables listed in the SET STATEMENT FOR clause
are set at the beginning of handling the PREPARE phase just right before
calling the function check_prepared_statement() and their original values
restored immediate after return from this function.
Additionally, to avoid code duplication the source code used in the function
mysql_execute_command for setting variables, specified by SET STATEMENT
clause, were extracted to the standalone functions
run_set_statement_if_requested(). This new function is called from
the function mysql_execute_command() and the method
Prepared_statement::prepare().
1. only call calc_sum_of_all_status() if a global
SHOW_xxx_STATUS variable is to be returned
2. only lock LOCK_status when copying global_status_var,
but not when iterating all threads
The reason for the failure is that
thd->mdl_context.release_transactional_locks()
was called after commit & rollback even in cases where the current
transaction is still active.
For 10.2, 10.3 and 10.4 the fix is simple:
- Replace all calls to thd->mdl_context.release_transactional_locks() with
thd->release_transactional_locks(). The thd function will only call
the mdl_context function if there are no active transactional locks.
In 10.6 we will better fix where we will change the return value for
some trans_xxx() functions to indicate if transaction did close the
transaction or not. This will avoid the need of the indirect call.
Other things:
- trans_xa_commit() and trans_xa_rollback() will automatically
call release_transactional_locks() if the transaction is closed.
- We can't do that for the other functions as the caller of many of these
are doing additional work (like close_thread_tables) before calling
release_transactional_locks().
- Added missing abort_result_set() and missing DBUG_RETURN in
select_create::send_eof()
- Fixed wrong indentation in injector::transaction::commit()
ANALYSIS:
=========
During Bootstrap, while executing the statements from sql
file passed to the init-file server option, transaction
mem_root was being freed for every statement. This creates
an issue with multi statement transactions especially when a
statement in the transaction has to access the memory used
by the previous statement in the transaction.
FIX:
====
Transaction mem_root is freed whenever a transaction is
committed or rolled-back. Hence explicitly freeing it is not
necessary in the bootstrap implementation.
Change-Id: I40f71d49781bf7ad32d474bb176bd6060c9377dc
`LOCK TABLES view_name` should require
* invoker to have SELECT and LOCK TABLES privileges on the view
* either invoker or definer (only if sql security definer) to
have SELECT and LOCK TABLES privileges on the used tables/views.
Reimplement MDEV-14275 Improving memory utilization for information schema
Postpone temp table instantiation until after setup_fields().
Replace all unused (not marked in read_set) columns in an I_S table
with CHAR(0). This can drastically reduce the footprint of a MEMORY
table (a TABLE_CATALOG alone is 1538 bytes per row).
This does not change the engine. If the table was decided to be Aria
(because of, say, blobs) then after optimization it'll stay Aria
even if all blobs were removed.
Note 1: when transforming table structure, share->blob_fields is
preserved, otherwise Aria might switch from DYNAMIC to STATIC row format
and expect a special field for a deleted mark, which create_tmp_tabe
didn't provide.
Note 2: optimizer was doing handler::info() (to know the number of rows)
before the temp table is populated. That didn't make much sense. Now
it's done before the table is even instantiated. Preserve the old
behavior and report 0 rows.
This reverts e2664ee836 and a8458a2345
In case of NATURAL JOIN / USING mark all field (one table can not be opened
in any case so optimisation does not worth it).
IMHO table should be checked for used fields and filled after prepare,
when we will fave whole info about used fields but it is too big change
for a bugfix. Which will be made later by Serg patch
* Allocate items on thd->mem_root while refixing vcol exprs
* Make vcol tree changes register and roll them back after the statement is executed.
Explanation:
Due to collation implementation specifics an Item tree could change while fixing.
The tricky thing here is to make it on a proper arena.
It's usually not a problem when a field is deterministic, however, makes a pain vice-versa, during allocation allocating.
A non-deterministic field should be refixed on each statement, since it depends on the environment state.
Changing the tree will be temporary and therefore it should be reverted after the statement execution.
Allocate space for fields inside the window function (arguments, PARTITION BY and ORDER BY clause)
in the ref pointer array. All fields inside the window function are part of the temporary
table that is required for the window function computation.