This is MDEV-7601, including it's sub tasks MDEV-7594, MDEV-7555, MDEV-7590, MDEV-7581, MDEV-7589
The problem was that select_lex->non_agg_fields was not properly reset for re-execution and this caused an overwrite of a random memory position.
The fix was move non_agg_fields from select_lext to JOIN, which is properly reset.
JOIN::cur_dups_producing_tables was not maintained correctly in
the cases of greedy optimization (search_depth < n_tables).
Moved it to POSITION structure where it will be maintained automatically.
Removed POSITION::prefix_dups_producing_tables since its value can now
be calculated.
- When traversing JOIN_TABs with first_linear_tab/next_linear_tab(), don't forget
to enter the semi-join nest when it is the first table in the join order.
Failure to do so could cause e.g. I_S tables not to be filled.
- The problem was that JOIN::prepare() tried to set TABLE::maybe_null
for a table in join. Non-merged semi-join tables 1) are present as
join's base tables on second EXECUTE, but 2) do not yet have a TABLE
object.
Worked around the problem by putting mixed_implicit_grouping into JOIN
object, and then passing it to JTBM tables in setup_jtbm_semi_joins().
Analysis:
st_select_lex_unit::prepare() computes can_skip_order_by as TRUE.
As a result join->prepare() gets called with order == NULL, and
doesn't do name resolution for the inner ORDER clause. Due to this
the prepare phase doesn't detect that the query references non-exiting
function and field.
Later join->optimize() calls update_used_tables() for a non-resolved
Item_field, which understandably has no Field object. This call results
in a crash.
Solution:
Resolve unnecessary ORDER BY clauses to detect if they reference non-exising
objects. Then remove such clauses from the JOIN object.
Apparently in a general case a short-cut for the distinct optimization
is invalid if join buffers are used to join tables after the tables whose
values are to selected.
ORDER BY does not work
Use "dynamic" row format (instead of "block") for MARIA internal
temporary tables created for cursors.
With "block" row format MARIA may shuffle rows, with "dynamic" row
format records are inserted sequentially (there are no gaps in data
file while we fill temporary tables).
This is needed to preserve row order when scanning materialized cursors.
The patch to fix mdev-4418 turned out to be incorrect.
At the substitution of single row tables in make_join_statistics()
the used multiple equalities may change and references to the new multiple
equalities must be updated. The function remove_eq_conds() takes care of it and
it should be called right after the substitution of single row tables.
Calling it after the call of make_join_statistics was a mistake.
Apply the patch from Patryk Pomykalski:
- create_internal_tmp_table_from_heap() will now return information whether
the last row that we tried to write was a duplicate row.
(mysql-5.6 also has this change)
-Added test and extra code to ensure we don't leave keyread on for a handler table.
-Create on disk temporary files always with long data pointers if SQL_SMALL_RESULT is not used. This ensures that we can handle temporary files bigger than 4G.
mysql-test/include/default_mysqld.cnf:
Run test suite with smaller aria keybuffer size
mysql-test/suite/maria/maria3.result:
Run test suite with smaller aria keybuffer size
mysql-test/suite/sys_vars/r/aria_pagecache_buffer_size_basic.result:
Run test suite with smaller aria keybuffer size
sql/handler.cc:
Disable key read (extra safety if something went wrong)
sql/multi_range_read.cc:
Ensure we have don't leave keyread on for secondary_file
sql/opt_range.cc:
Simplify code with mark_columns_used_by_index_no_reset()
Ensure that read_keys_and_merge() disableds keyread if it enables it
sql/opt_subselect.cc:
Remove not anymore used argument for create_internal_tmp_table()
sql/sql_derived.cc:
Remove not anymore used argument for create_internal_tmp_table()
sql/sql_select.cc:
Use 'enable_keyread()' instead of calling HA_EXTRA_RESET. (Makes debugging easier)
Create on disk temporary files always with long data pointers if SQL_SMALL_RESULT is not used. This ensures that we can handle temporary files bigger than 4G.
Remove not anymore used argument for create_internal_tmp_table()
More DBUG
sql/sql_select.h:
Remove not anymore used argument for create_internal_tmp_table()
This bug happened because the executor tried to use a wrong
TABLE REF object when building access keys. It constructed
keys from fields of a materialized table from a ref object
created to construct keys from the fields of the underlying
base table. This could happen only when materialized table
was created for a non-correlated IN subquery and only
when the materialized table used for lookups.
In this case we are guaranteed to be able to construct the
keys from the fields of tables that would be outer tables
for the tables of the IN subquery.
The patch makes sure that no ref objects constructed from
fields of materialized lookup tables are to be used.
WITH A VARIABLE AND ORDER BY
Bug#16035412 MYSQL SERVER 5.5.29 WRONG SORTING USING COMPLEX INDEX
This is a fix for a regression introduced by Bug#12667154:
Bug#12667154 attempted to fix a performance problem with subqueries
that did filesort. For doing filesort, the optimizer creates a quick
select object to use when building the sort index. This quick select
object was deleted after the first call to create_sort_index(). Thus,
for queries where the subquery was executed multiple times, the quick
object was only used for the first execution. For all later executions
of the subquery, filesort used a complete table scan for building the
sort index. The fix for Bug#12667154 tried to fix this by not deleting
the quick object after the first execution of create_sort_index() so
that it would be re-used for building the sort index by the following
executions of the subquery.
This regression introduced in Bug#12667154 is that due to not deleting
the quick select object after building the sort index, the quick
object could in some cases be used also during the second phase of the
execution of the subquery instead of using the created sort
index. This caused wrong results to be returned.
The fix for this issue is to delete the reference to the select object
after it has been used in create_sort_index(). In this way the select
and quick objects will not be available when doing the second phase
of the execution of the select operation. To ensure that the select
object can be re-used for the following executions of the subquery
we make a copy of the select pointer. This is used for restoring the
select object after the select operation is completed.
mysql-test/suite/innodb/r/innodb_mysql.result:
Changed explain output: The explain now contains "Using where" since we
have restored the select pointer after doing the filesort operation.
sql/sql_select.cc:
Change create_sort_index() so that it always sets the pointer to
the select object to NULL. This is done in order to avoid that the
select->quick object can be used when execution the main part of
the select operation.
sql/sql_select.h:
New member in JOIN_TAB: saved_select. Used by create_sort_index to
make a backup copy of the select pointer.
The problem was that in debugging binaries it try to print item to assign human readable name to the item.
But subquery item was already freed (join_free/cleanup with full cleanup) so Item_field refers to temporary
table which memory had been already freed.
MDEV-567: Wrong result from a query with correlated subquery if ICP is allowed:
backport the fix developed for SHOW EXPLAIN:
revision-id: psergey@askmonty.org-20120719115219-212cxmm6qvf0wlrb
branch nick: 5.5-show-explain-r21
timestamp: Thu 2012-07-19 15:52:19 +0400
BUG#992942 & MDEV-325: Pre-liminary commit for testing
and adjust it so that it handles DS-MRR scans correctly.
This task fixes an ineffeciency that is a remainder from MySQL 5.0/5.1. There, subqueries
were optimized in a lazy manner, when executed for the first time. During this lazy optimization
it may happen that the server finds a more efficient subquery engine, and substitute the current
engine of the query being executed with the new engine. This required re-execution of the engine.
MariaDB 5.3 pre-optimizes subqueries in almost all cases, and the engine is chosen in most cases,
except when subquery materialization found that it must use partial matching. In this case, the
current code was performing one extra re-execution although it was not needed at all. The patch
performs the re-execution only if the engine was changed while executing.
In addition the patch performs small cleanup by removing "enum store_key_result" because it is
essentially a boolean, and the code that uses it already maps it to a boolean.
The fix backports from MWL#182: Explain running statements the logic that
saves the original JOIN_TAB array of a query plan after optimization. This
array is later used during EXPLAIN to iterate over the original JOIN plan
nodes in the cases when this plan could be changed by early subquery
execution during the optimization phase of the outer query.
The patch enables back constant subquery execution during
query optimization after it was disabled during the development
of MWL#89 (cost-based choice of IN-TO-EXISTS vs MATERIALIZATION).
The main idea is that constant subqueries are allowed to be executed
during optimization if their execution is not expensive.
The approach is as follows:
- Constant subqueries are recursively optimized in the beginning of
JOIN::optimize of the outer query. This is done by the new method
JOIN::optimize_constant_subqueries(). This is done so that the cost
of executing these queries can be estimated.
- Optimization of the outer query proceeds normally. During this phase
the optimizer may request execution of non-expensive constant subqueries.
Each place where the optimizer may potentially execute an expensive
expression is guarded with the predicate Item::is_expensive().
- The implementation of Item_subselect::is_expensive has been extended
to use the number of examined rows (estimated by the optimizer) as a
way to determine whether the subquery is expensive or not.
- The new system variable "expensive_subquery_limit" controls how many
examined rows are considered to be not expensive. The default is 100.
In addition, multiple changes were needed to make this solution work
in the light of the changes made by MWL#89. These changes were needed
to fix various crashes and wrong results, and legacy bugs discovered
during development.
Analysis:
The reason for the wrong result is the interaction between constant
optimization (in this case 1-row table) and subquery optimization.
- First the outer query is optimized, and 'make_join_statistics' finds that
table t2 has one row, reads that row, and marks the whole table as constant.
This also means that all fields of t2 are constant.
- Next, we optimize the subquery in the end of the outer 'make_join_statistics'.
The field 'f2' is considered constant, with value '3'. The subquery predicate
is rewritten as the constant TRUE.
- The outer query execution detects early that the whole query result is empty
and calls 'return_zero_rows'. Since the query is with implicit grouping, we
have to produce one row with special values for the aggregates (depending on
each aggregate function), and NULL values for all non-aggregate fields. This
function calls 'no_rows_in_result' to set each aggregate function to the
default value when it aggregates over an empty result, and then calls
'send_data', which in turn evaluates each Item in the SELECT list.
- When evaluation reaches the subquery predicate, it executes the subquery
with field 'f2' having a constant value '3', and the subquery produces the
incorrect result '7'.
Solution:
Implement Item::no_rows_in_result for all subquery predicates. In order to
make this work, it is also needed to make all val_* methods of all subquery
predicates respect the Item_subselect::forced_const flag. Otherwise subqueries
are executed anyways, and override the default value set by no_rows_in_result
with whatever result is produced from the subquery evaluation.
- Remove all references of MAX_TABLES from JOIN struct and make these dynamic
- Updated Join_plan_state to allocate just as many elements as it's needed
sql/opt_subselect.cc:
Optimized version of Join_plan_state
sql/sql_select.cc:
Set join->positions and join->best_positions dynamicly
Don't call update_virtual_fields() if table->vfield is not set.
sql/sql_select.h:
Remove all references of MAX_TABLES from JOIN struct and Join_plan_state and make these dynamic
- Disable use of join cache when we're using FirstMatch strategy, and the join
order is such that subquery's inner tables are interleaved with outer. Join
buffering code is incapable of handling such join orders.
- The testcase requires use of @@debug_optimizer_prefer_join_prefix to hit the bug,
but I'm pushing it anyway (including the mention of the variable in .test file),
so that it can be found and enabled when/if we get something comparable in the
main tree.
The problem was that LooseScan execution code assumed that tab->key holds
the index used for looseScan. This is only true when range or full index
scan are used. In case of ref access, the index is in tab->ref.key (and
tab->index==0 which explains how LooseScan passed tests with ref access: they
used one index)
Fixed by setting/using loosescan_key, which always the correct index#.
The function subselect_uniquesubquery_engine::copy_ref_key has to take into
account that when EXPLAIN is processed the array of store_key object created
for any TABLE_REF may contain elements for constant items. These items should
be ignored by thefunction.
Problem: When building the condition for JOIN::outer_ref_cond the optimizer forgot to take into account
that this condition could depend on constant tables as well.
- Correctly handle plan refinement stage for LooseScan plans: run create_ref_for_key() if LooseScan
plan includes a ref access, and if we don't have any fixed key components, switch to a full index scan.