If an IN-subquery is used in a table-less select the current code
should never consider it as candidate for semi-join optimizations.
Yet the function check_and_do_in_subquery_rewrites() improperly
checked the property "to be a table-less select". As a result
such select in IN subquery was used in INSERT .. SELECT then
the IN subquery by mistake was registered as a semi-join subquery
and convert_subq_to_sj() was called for it. However the code of
this function does not assume that the parent select of the subquery
could be a table-less select.
In this case we are setting the field Item_func_eq::in_eqaulity_no for the semi-join equalities.
This helps us to remove these equalites as the inner tables are not available during parent select execution
while the outer tables are not available during materialization phase.
We only have it set for the equalites for the fields involved with the IN subquery
and reset it for the equalities which do not belong to the IN subquery.
For example in case of nested IN subqueries:
SELECT t1.a FROM t1 WHERE t1.a IN
(SELECT t2.a FROM t2 where t2.b IN
(select t3.b from t3 where t3.c=27 ))
there are two equalites involving the fields of the IN subquery
1) t2.b = t3.b : the field Item_func_eq::in_eqaulity_no is set when we merge the grandchild select into the child select
2) t1.a = t2.a : the field Item_func_eq::in_eqaulity_no is set when we merge the child select into the parent select
But when we perform case 2) we should ensure that we reset the equalities in the child's WHERE clause.
with join_cache_level>2
During muliple equality propagation for a query in which we have an IN subquery, the items in the select list of the
subquery may not be part of the multiple equality because there might be another occurence of the same field in the
where clause of the subquery.
So we keyuse_is_valid_for_access_in_chosen_plan function which expects the items in the select list of the subquery to
be same to the ones in the multiple equality (through these multiple equalities we create keyuse array).
The solution would be that we expect the same field not the same Item because when we have SEMI JOIN MATERIALIZATION SCAN,
we use copy back technique to copies back the materialised table fields to the original fields of the base tables.
The problem was that SJ (semi-join) used secondary list (array) of subquery select list. The items there was prepared once then cleaned up (but not really freed from memory because it was made in statement memory).
Original list was not prepared after first execution because select was removed by conversion to SJ.
The solution is to use original list but prepare it first.
Any expensive WHERE condition for a table-less query with
implicit aggregation was lost. As a result the used aggregate
functions were calculated over a non-empty set of rows even
in the case when the condition was false.
Conversion of a subquery to a semi-join is blocked when we have an
IN subquery predicate in the on_expr of an outer join. Currently this
scenario is handled but the cases when an IN subquery predicate is wrapped
inside a Item_in_optimizer item then this blocking is not done.
If the optimizer chose an execution plan where
a semi-join nest were materialized and the
result of materialization was scanned to access
other tables by ref access it could build a key
over columns of the tables from the nest that
were actually inaccessible.
The patch performs a proper check whether a key
that uses columns of the tables from a materialized
semi-join nest can be employed to access outer tables.
This is another correction of the patch for bug mdev-12670.
If a derived table is merged into a select with STRAIGHT_JOIN
modifier all IN subquery predicates contained in the
specification of the derived table cannot be subject to
conversion to semi-joins.
This patch is a correction of the patch for bug mdev-12670.
With the current code handling semi-joins the following must
be taken into account.
Conversion of an IN subquery predicate into semi-join
has to be blocked if the predicate occurs:
(a) in the ON expression of an outer join
(b) in the ON expression of an inner join embedded directly
or indirectly in the inner nest of an outer join.
The patch for mdev-12670 blocked conversion to semi-joins only
in the case (a), but not in the case (b). This patch blocks
the conversion in both cases.
The code that blocked conversion of a IN subselect pedicate to a semi-join
if it occurred in the ON expression of an outer join did not do it correctly.
As a result, the conversion was blocked for IN subselect predicates
encountered in ON expressions of INNER joins or in WHERE conditions
of mergeable views / derived tables. This patch fixes this problem.
The select mentioned in the bug attempted to create a temporary table
using the maria storage engine. The table needs to have primary keys such that
duplicates can be removed. Unfortunately this use case has a longer
than allowed key and the tmp table got created without a temporary key.
We must not allow materialization for the subquery if the total key
length and key parts is greater than what the storage engine supports.
Consider a query with subquery in form t.key=(select ...). Suppose, the
parent query uses this equality for ref access.
It will attempt to evaluate the subquery in get_best_combination(),
right before the join->join_tab[...] array is filled. The problem was
that subquery optimization will attempt to look at parent's join->join_tab
to check how many times subquery will be executed (and crash).
Fixed by not doing that when the subquery is constant (non-constant
subqueries are only be evaluated during join execution, so they are not
affected)
Problem was in rewriting left expression which had 2 references on it. Solved with making subselect reference main.
Item_in_optimized can have not Item_in_subselect reference in left part so type casting with no check is dangerous.
Item::cols() should be checked after Item::fix_fields().
convert_subq_to_sj() must check the results of in_equality->fix_fields()
call. It can fail in a meaningful way when e.g. we're trying to compare
columns with incompatible collations.
JOIN::cur_dups_producing_tables was not maintained correctly in
the cases of greedy optimization (search_depth < n_tables).
Moved it to POSITION structure where it will be maintained automatically.
Removed POSITION::prefix_dups_producing_tables since its value can now
be calculated.
- The problem was that JOIN::prepare() tried to set TABLE::maybe_null
for a table in join. Non-merged semi-join tables 1) are present as
join's base tables on second EXECUTE, but 2) do not yet have a TABLE
object.
Worked around the problem by putting mixed_implicit_grouping into JOIN
object, and then passing it to JTBM tables in setup_jtbm_semi_joins().
- convert_subq_to_sj() must connect child select's tables into
parent select's TABLE_LIST::next_local chain.
- The problem was that it took child's leaf_tables.head() which
is different. This could cause certain tables (in this bug's case,
child select's non-merged semi-join) not to be present in
TABLE_LIST::next_local chain. Which would cause non-merged semi-join
not to be initialized in setup_tables(), which would lead to
NULL pointer dereference.
- convert_subq_to_sj() must connect child select's tables into
parent select's TABLE_LIST::next_local chain.
- The problem was that it took child's leaf_tables.head() which
is different. This could cause certain tables (in this bug's case,
child select's non-merged semi-join) not to be present in
TABLE_LIST::next_local chain. Which would cause non-merged semi-join
not to be initialized in setup_tables(), which would lead to
NULL pointer dereference.
- Don't pull out a table out of a semi-join if it is on the inner side of an outer join.
- Make join->sort_by_table= get_sort_by_table(...) call after const table detection
is done. That way, the value of join->sort_by_table will match the actual execution.
Which will allow the code in setup_semijoin_dups_elimination() (search for
"Make sure that possible sorting of rows from the head table is not to be employed."
to see that "Using filesort" is going to be used together with Duplicate Elimination (
and change it to Using temporary + Using filesort)
1. Transformation of row IN subquery made the same as single value.
2. replace_where_subcondition() made working on several layers of OR/AND because it called on expression before fix_fields().
Apply the patch from Patryk Pomykalski:
- create_internal_tmp_table_from_heap() will now return information whether
the last row that we tried to write was a duplicate row.
(mysql-5.6 also has this change)
-Added test and extra code to ensure we don't leave keyread on for a handler table.
-Create on disk temporary files always with long data pointers if SQL_SMALL_RESULT is not used. This ensures that we can handle temporary files bigger than 4G.
mysql-test/include/default_mysqld.cnf:
Run test suite with smaller aria keybuffer size
mysql-test/suite/maria/maria3.result:
Run test suite with smaller aria keybuffer size
mysql-test/suite/sys_vars/r/aria_pagecache_buffer_size_basic.result:
Run test suite with smaller aria keybuffer size
sql/handler.cc:
Disable key read (extra safety if something went wrong)
sql/multi_range_read.cc:
Ensure we have don't leave keyread on for secondary_file
sql/opt_range.cc:
Simplify code with mark_columns_used_by_index_no_reset()
Ensure that read_keys_and_merge() disableds keyread if it enables it
sql/opt_subselect.cc:
Remove not anymore used argument for create_internal_tmp_table()
sql/sql_derived.cc:
Remove not anymore used argument for create_internal_tmp_table()
sql/sql_select.cc:
Use 'enable_keyread()' instead of calling HA_EXTRA_RESET. (Makes debugging easier)
Create on disk temporary files always with long data pointers if SQL_SMALL_RESULT is not used. This ensures that we can handle temporary files bigger than 4G.
Remove not anymore used argument for create_internal_tmp_table()
More DBUG
sql/sql_select.h:
Remove not anymore used argument for create_internal_tmp_table()
This bug happened because the executor tried to use a wrong
TABLE REF object when building access keys. It constructed
keys from fields of a materialized table from a ref object
created to construct keys from the fields of the underlying
base table. This could happen only when materialized table
was created for a non-correlated IN subquery and only
when the materialized table used for lookups.
In this case we are guaranteed to be able to construct the
keys from the fields of tables that would be outer tables
for the tables of the IN subquery.
The patch makes sure that no ref objects constructed from
fields of materialized lookup tables are to be used.