Running a query using cursor could lead to a server crash on
building a temporary table used for handling the query.
For example, the following cursor
DECLARE cur1 CURSOR FOR
SELECT t2.c1 AS c1 FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1
WHERE EXISTS (SELECT 1 FROM t1 WHERE c2 = -1) ORDER BY c1;
declared and executed inside a stored routine could result in server
crash on creating a temporary table used for handling the ORDER BY clause.
Crash occurred on attempt to create the temporary table's fields based
on fields whose data located in a memory root that already freed.
It happens inside the function return_zero_rows() where the method
Select_materialize::send_result_set_metadata() is invoked for cursor case.
This method calls the st_select_lex_unit::get_column_types() in order to
get a list of items with types of columns for the temporary table being created.
The method st_select_lex_unit::get_column_types() returns
first_select()->join->fields
in case it is invoked for a cursor. Unfortunately, this memory has been already
deallocated bit earlier by calling
join->join_free();
inside the function return_zero_rows().
In case the query listed in the example is run in conventional way (without
using cursor) the method st_select_lex_unit::get_column_types()
returns first_select()->item_list that is not touched by invocation
of the method join->join_free() so everything is fine for that.
So, to fix the issue the resources allocated for the JOIN class should be
released after any activities with the JOIN class has been completed,
that is as the last statement before returning from the function
return_zero_rows().
This patch includes tests both for the case when a cursor is run explicitly
from within a stored routine and for the case when a cursor is opened
implicitly as prescribed by the STMT_ATTR_CURSOR_TYPE attribute of
binary protocol (the case of prepared statement).
table: rows are counted twice
Analysis: When the table we are trying to insert into and the SELECT table
are same for INSERT ... SELECT, rows from the SELECT table are copied into
internal temporary table and then to the INSERT table. We only want to
count the rows when we start inserting into the table.
Fix: Reset the counter to 1 before starting to copy from internal temporary
table to select table and then increment the counter.
Consider the following use case:
MariaDB [test]> CREATE TABLE t1 (field1 BIGINT DEFAULT -1);
MariaDB [test]> CREATE VIEW v1 AS SELECT DISTINCT field1 FROM t1;
Repeated execution of the following query as a Prepared Statement
MariaDB [test]> PREPARE stmt FROM 'SELECT * FROM v1 WHERE field1 <=> NULL';
MariaDB [test]> EXECUTE stmt;
results in a crash for a server built with DEBUG.
MariaDB [test]> EXECUTE stmt;
ERROR 2013 (HY000): Lost connection to MySQL server during query
Assertion failed: (!result), function convert_const_to_int, file item_cmpfunc.cc, line 476.
Abort trap: 6 (core dumped)
The crash inside the function convert_const_to_int() happens by the reason
that the value -1 is stored in an instance of the class Field_longlong
on restoring its original value in the statement
result= field->store(orig_field_val, TRUE);
that leads to assigning the value 1 to the variable 'result' with subsequent
crash in the DBUG_ASSERT statement following it
DBUG_ASSERT(!result);
The main matter here is why this assertion failure happens on the second
execution of the prepared statement and doens't on the first one.
On first handling of the statement
'EXECUTE stmt;'
a temporary table is created for serving the query involving the view 'v1'.
The table is created by the function create_tmp_table() in the following
calls trace: (trace #1)
JOIN::prepare (at sql_select.cc:725)
st_select_lex::handle_derived
LEX::handle_list_of_derived
TABLE_LIST::handle_derived
mysql_handle_single_derived
mysql_derived_prepare
select_union::create_result_table
create_tmp_table
Note, that the data member TABLE::status of a TABLE instance returned by the
function create_tmp_table() has the value 0.
Later the function setup_table_map() is called on the TABLE instance just
created for the sake of the temporary table (calls trace #2 is below):
JOIN::prepare (at sql_select.cc:737)
setup_tables_and_check_access
setup_tables
setup_table_map
where the data member TABLE::status is set to the value STATUS_NO_RECORD.
After that when execution of the method JOIN::prepare reaches calling of
the function setup_without_group() the following calls trace is invoked
JOIN::prepare
setup_without_group
setup_conds
Item_func::fix_fields
Item_func_equal::fix_length_and_dec
Item_bool_rowready_func2::fix_length_and_dec
Item_func::setup_args_and_comparator
Item_func::convert_const_compared_to_int_field
convert_const_to_int
There is the following code snippet in the function convert_const_to_int()
at the line item_cmpfunc.cc:448
bool save_field_value= (field_item->const_item() ||
!(field->table->status & STATUS_NO_RECORD));
Since field->table->status has bits STATUS_NO_RECORD set the variable
save_field_value is false and therefore neither the method
Field_longlong::val_int() nor the method Field_longlong::store is called
on the Field instance that has the numeric value -1.
That is the reason why first execution of the Prepared Statement for the query
'SELECT * FROM v1 WHERE field1 <=> NULL'
is successful.
On second running of the statement 'EXECUTE stmt' a new temporary tables
is also created by running the calls trace #1 but the trace #2 is not executed
by the reason that data member SELECT_LEX::first_cond_optimization has been set
to false on first execution of the prepared statemet (in the method
JOIN::optimize_inner()). As a consequence, the data member TABLE::status for
a temporary table just created doesn't have the flags STATUS_NO_RECORD set and
therefore on re-execution of the prepared statement the methods
Field_longlong::val_int() and Field_longlong::store() are called for the field
having the value -1 and the DBUG_ASSERT(!result) is fired.
To fix the issue the data member TABLE::status has to be assigned the value
STATUS_NO_RECORD in every place where the macros empty_record() is called
to emptify a record for just instantiated TABLE object created on behalf
the new temporary table.
Followup to fix for MDEV-25858: When test_if_skip_sort_order() decides
to use an index to satisfy ORDER BY ... LIMIT clause, it should
disable "Range Checked for Each Record" optimization.
Do this in all cases.
This bug was introduced by commit be00e279c6
The commit was applied for the task MDEV-6480 that allowed to remove top
level disjuncts from WHERE conditions if the range optimizer evaluated them
as always equal to FALSE/NULL.
If such disjuncts are removed the WHERE condition may become an AND formula
and if this formula contains multiple equalities the field JOIN::item_equal
must be updated to refer to these equalities. The above mentioned commit
forgot to do this and it could cause crashes for some queries.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Do not print illegal table field names for non-top-level SELECT list,
they will not be refered in any case but create problem for parsing
of printed result.
On deadlock transaction is rolled back (and trx->state is cleared) but
SELECT continued the loop because evaluate_join_record() ignored the
error status returned from lower join evaluation. val_int() does not
return error status so it is checked by thd->is_error().
Test case was created by Thirunarayanan Balathandayuthapani
<thiru@mariadb.com>
SQL processor failed to catch references to unknown columns and other
errors of the phase of semantic analysis in the specification of a
hanging recursive CTE. This happened because the function
With_clause::prepare_unreferenced_elements() failed to detect a CTE as
a hanging CTE if the CTE was recursive.
Fixing this problem in the code of the mentioned function opened another
problem: EXPLAIN started including the lines for the specifications of
hanging recursive CTEs in its output. This problem also was fixed in this
patch.
Approved by Dmitry Shulga <dmitry.shulga@mariadb.com>
If test_if_skip_sort_order() decides to use an index to produce required
ordering, it should disable "Range Checked for each record" optimization.
This is because Range-Checked-for-each-record may decide to use an index
(or an index_merge) which will not produce the required ordering.
Reformulate mark_columns_used_by_index* function family in a more laconic
way:
mark_columns_used_by_index -> mark_index_columns
mark_columns_used_by_index_for_read_no_reset -> mark_index_columns_for_read
mark_columns_used_by_index_no_reset -> mark_index_columns_no_reset
static mark_index_columns -> do_mark_index_columns
A less-intrusive fix: don't have table_cond_selectivity() assume that
there are less than MAX_REF_PARTS hash-join KEYUSEs.
If there are more than that, switch to using an array. Allocate the array
on the heap: we can't allocate it on MEM_ROOT as table_cond_selectivity()
is called many times during the optimization.
(Variant 2, with review input addressed)
If a select query contained an ORDER BY clause that followed a LIMIT clause
or an ORDER BY clause or ORDER BY with LIMIT the EXPLAIN output for the
query showed an execution plan different from that was actually executed.
Approved by Roman Nozdrin <roman.nozdrin@mariadb.com>
(trivial backport to 10.2)
The optimizer removes redundant GROUP BY operations. If GROUP BY element
is a subselect, it is "eliminated".
However one must not eliminate the item if it is used both in the select
list and in the GROUP BY, like so:
select (select ... ) as SUBQ from ... group by SUBQ
Do not eliminate such items.
At the second execution of the PS
1. mark_as_dependent() is called with the same parameters as at the first
execution (select#4 and select#3)
2. as outer_select (select#3) has been already merged at the first
execution of PS it cannot be reached using the outer_select() function
anymore (and so can not stop iteration).
3. as a result all selects towards the top level select including the
select for 'ca' are marked as uncacheable.
4. Marked uncacheable it executed incorrectly triggering filling its
temporary table several times and using freed memory at the end.
To avoid the problem we use name resolution context to go "up".
NOTE: problem also exists in 10.2 but has no visible effect on execution.
That is why the problem is fixed in 10.2.
The patch also add debug logging of important procedures and
better specify parameters types of st_select_lex::mark_as_dependent.
row_number() over () window function can be used without any column in the OVER
clause. Additionally, the item doesn't reference any tables, as it's not
effectively referencing any table. Rather it is specifically built based
on the end temporary table used for window function computation.
This caused remove_const function to wrongly drop it from the ORDER
list. Effectively, we shouldn't be dropping any window function from the
ORDER clause, so adjust remove_const to account for that.
Reviewed by: Sergei Petrunia sergey@mariadb.com
Attempt to execute EXPLAIN statement on multi-table DELETE statement
leads to firing firing of the assertion
DBUG_ASSERT(! is_set());
in the method Diagnostics_area::set_eof_status.
For example, above mentioned assertion failure happens
in case any of the following statements
EXPLAIN DELETE FROM t1.* USING t1
EXPLAIN DELETE b FROM t1 AS a JOIN t1 AS b
are executed in prepared statement mode provided the table t1
does exist.
This assertion is hit by the reason that a status of
Diagnostics_area is set twice. The first time it is set from
the function do_select() when the method multi_delete::send_eof()
called. The second time it is set when the method
Explain_query::send_explain() calls the method select_send::send_eof
(this method invokes the method Diagnostics_area::set_eof_status that
finally hits assertion)
The second invocation for a setter method of the class Diagnostics_area
is correct and run to send a response containing explain data.
But first invocation of a setter method of the class Diagnostics_area
is wrong since the function do_select() shouldn't be called at all
for handling of the EXPLAIN statement.
The reason by that the function do_select() is called during handling of
the EXPLAIN statement is that the flag SELECT_DESCRIBE not set in the
data member JOIN::select_options. The flag SELECT_DESCRIBE
if is copied from values select_lex->options.
During parsing of EXPLAIN statement this flag is set but latter reset
from the function reinit_stmt_before_use() that is called on
execution of prepared statement.
void reinit_stmt_before_use(THD *thd, LEX *lex)
{
...
for (; sl; sl= sl->next_select_in_list())
{
if (sl->changed_elements & TOUCHED_SEL_COND)
{
/* remove option which was put by mysql_explain_union() */
sl->options&= ~SELECT_DESCRIBE;
...
}
...
}
So, to fix the issue the flag SELECT_DESCRIBE is set forcibly at the
mysql_select() function in case thd->lex->describe set,
that is in case EXPLAIN being executed.
For an IN/ANY/ALL subquery without an aggregate function and HAVING clause,
the GROUP BY clause is removed.
Due to the GROUP BY list being removed, the invalid reference in the GROUP BY
clause was never resolved.
Remove the GROUP BY list only when the all the items in the GROUP BY list
are resolved.
Also removing the GROUP BY list later would not affect the extension that allows
using non-aggregated field in an aggregate function (when ONLY_FULL_GROUP_BY
is not set) because the GROUP BY list is removed only when their is
NO aggregate function in IN/ALL/ANY subquery.
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.
This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.
The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.
To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
orig_*, all_set bitmaps are never substituted already.
This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
MY_BITMAP* instead of my_bitmap_map*
These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
This bug could cause a crash when executing queries that used mutually
recursive CTEs with system variable big_tables set to 1. It happened due
to several bugs in the code that handled recursive table references
referred mutually recursive CTEs. For each recursive table reference a
temporary table is created that contains all rows generated for the
corresponding recursive CTE table on the previous step of recursion.
This temporary table should be created in the same way as the temporary
table created for a regular materialized derived table using the
method select_union::create_result_table(). In this case when the
temporary table is created it uses the select_union::TMP_TABLE_PARAM
structure as the parameter for the table construction. However the
code created the temporary table using just the function create_tmp_table()
and passed pointers to certain fields of the TMP_TABLE_PARAM structure
used for accumulation of rows of the recursive CTE table as parameters
for update. This was a mistake because now different temporary tables
cannot share some TMP_TABLE_PARAM fields in a general case. Besides,
depending on how mutually recursive CTE tables were defined and which
of them were referred in the executed query the select_union object
allocated for a recursive table reference could be allocated again after
the the temporary table had been created. In this case the TMP_TABLE_PARAM
object associated with the temporary table created for the recursive
table reference contained unassigned fields needed for execution when
Aria engine is employed as the engine for temporary tables.
This patch ensures that
- select_union object is created only once for any recursive table
reference
- any temporary table created for recursive CTEs uses its own
TMP_TABLE_PARAM structure
The patch also fixes a problem caused by incomplete cleanup of join tables
associated with recursive table references.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Due to a premature cleanup of the unit that specified a recursive CTE
used in the second operand of union the server fell into an infinite
loop in the reported test case. In other cases this premature cleanup
could cause other problems.
The bug is the result of a not quite correct fix for MDEV-17024. The
unit that specifies a recursive CTE has to be cleaned only after the
cleanup of the last external reference to this CTE. It means that
cleanups of the unit triggered not by the cleanup of a external
reference to the CTE must be blocked.
Usage of local table chains in selects to get external references to
recursive CTEs was not correct either because of possible merges of
some selects.
Also fixed a minor bug in st_select_lex::set_explain_type() that caused
typing 'RECURSIVE UNION' instead of 'UNION' in EXPLAIN output for external
references to a recursive CTE.
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
Diagnostics_area::set_error_status
Analysis: When strict mode is enabled, all warnings are converted to errors
including those which do not occur because of bad data.
Fix: Query should not be aborted when we have warning because limit to
examine rows was reached because it doesn't happen due to bad data.
So thd->abort_on_warning should be false.
The issue here was that the query was using ORDER BY LIMIT optimzation where
the access method was changed from EQ_REF access to an index scan (index that would
resolve the ORDER BY clause).
But the parameter READ_RECORD::unlock_row was not reset to rr_unlock_row, which is
used when the access method is not EQ_REF access.
The issue here is when records are read from the temporary file
(filesort result in this case) via a cache(rr_from_cache).
The cache is initialized with init_rr_cache.
For correlated subquery the cache allocation is happening at each execution
of the subquery but the deallocation happens only once and that was
when the query execution was done.
So generally for subqueries we do two types of cleanup
1) Full cleanup: we should free all resources of the query(like temp tables).
This is done generally when the query execution is complete or the subquery
re-execution is not needed (case with uncorrelated subquery)
2) Partial cleanup: Minor cleanup that is required if
the subquery needs recalculation. This is done for all the structures that
need to be allocated for each execution (example SORT_INFO for filesort
is allocated for each execution of the correlated subquery).
The fix here would be free the cache used by rr_from_cache in the partial
cleanup phase.
* Fix the crash: IN-to-EXISTS rewrite causes an error (and so
JOIN::optimize() fails with an error, too), don't call
update_used_tables(). Terminate the query execution instead.
* Fix the cause of the error in the IN-to-EXISTS rewrite: don't do
the rewrite if doing it will cause an error of this kind:
This version of MariaDB doesn't yet support 'SUBQUERY in ROW in left
expression of IN/ALL/ANY'
* Fix another issue exposed by this testcase:
JOIN::setup_subquery_caches() may be invoked before any select has
saved its query plan, and will crash because none of the SELECTs
has called create_explain_query_if_not_exists() to create the Explain
Data Structure for this SELECT.
TODO: When merging this to 10.2, remove the poorly-placed call to
create_explain_query_if_not_exists made by fix for M_D_E_V-16153
When a prepared statement parameter '?' is used in a CTE that is used
multiple times, the following happens:
- The CTE definition is re-parsed multiple times.
- There are multiple Item_param objects referring to the same "?" in
the original query.
- Prepared_statement::param has a pointer to the first of them, the
others are "clones".
- When prepared statement parameter gets the value, it should be passed
over to clones with param->sync_clones() call.
This call is made in insert_params(), etc. It was not made in
insert_params_with_log().
This would cause Item_param to not have any value which would confuse
the query optimizer.
Added the missing call.
In case of SELECT without tables which returns either 0 or 1 rows,
JOIN::exec_inner() did not check if the flag representing SQL_CALC_FOUND_ROWS
is set or not and send_records was direclty assigned 0. So SELECT FOUND_ROWS()
was giving 0 in the output. Now it checks if the flag is set, if it is set
send_record=1 else 0. 1 is the number of rows that could have been sent
to the client if the SELECT query had SQL_CALC_FOUND_ROWS.
It is 0 when no rows were sent because the SELECT query did not have
SQL_CALC_FOUND_ROWS.
A temporary table is needed for window function computation but if only a NAMED WINDOW SPEC
is used and there is no window function, then there is no need to create a temporary
table as there is no stage to compute WINDOW FUNCTION
It was:
implicit conversion from 'ha_rows' (aka 'unsigned long long') to 'double'
changes value from 18446744073709551615 to 18446744073709551616
Follow what JOIN::get_examined_rows() does for similar code.
The issue here is for degenerate joins we should execute the window
function but it is not getting executed in all the cases.
To get the window function values window function needs to be executed
always. This currently does not happen in few cases
where the join would return 0 or 1 row like
1) IMPOSSIBLE WHERE
2) MIN/MAX optimization
3) EMPTY CONST TABLE
The fix is to make sure that window functions get executed
and the temporary table is setup for the execution of window functions
The query requires 2 temporary tables for execution, the window function
is always attached to the last temporary table, but in this case the
result field of the window function points to the first temporary table
rather than the last one.
Fixed this by not changing window function items with temporary table
items of the first temporary table.
The issue here is the wrong estimate of the cardinality of a partial join,
the cardinality is too high because the function table_cond_selectivity()
returns an absurd number 100 while selectivity cannot be greater than 1.
When accessing table t by outer reference t1.a via index we do not perform any
range analysis for t. Yet we see TABLE::quick_key_parts[key] and
TABLE->quick_rows[key] contain a non-zero value though these should have been
remained untouched and equal to 0.
Thus real cause of the problem is that TABLE::init does not clean the arrays
TABLE::quick_key_parts[] and TABLE::>quick_rows[].
It should have done it because the TABLE structure created for any
instance of a table can be reused for many queries.