MDEV-32329 (patch) pushdown from having into where: Server crashes at sub_select
When generating an Item_equal with a Item_ref that refers to a field
outside of a subselect, remove_item_direct_ref() causes the dependency
(depended_from) on the outer select to be lost, which causes trouble
for code downstream that can no longer determine the scope of the Item.
Not calling remove_item_direct_ref() retains the Item's dependency.
Test cases from MDEV-32395 and MDEV-32329 are included.
Some fixes from other developers:
Monty:
- Fixed wrong code in Item_equal::create_pushable_equalities()
that could cause wrong item to be used if there was no matching items.
Daniel Black:
- Added test cases from MDEV-32329
Igor Babaev:
- Provided fix for removing call to remove_item_direct_ref() in
eliminate_item_equal()
MDEV-32395: update_depend_map_for_order: SEGV at /mariadb-11.3.0/sql/sql_select.cc:16583
Include test cases from MDEV-32329.
Two problem solved:
1) Item_default_value makes a shallow copy so the copy
should not delete field belong to the Item
2) Item_default_value should not inherit
derived_field_transformer_for_having and
derived_field_transformer_for_where (in this variant
pushing DEFAULT(f) is prohibited (return NULL) but
if return "this" it will be allowed (should go with
a lot of tests))
Statements affected by this bug need all the following to be true
1) a derived table table or view whose specification contains a set
operation at the top level.
2) a grouping operator (group by/having) operating on a column alias
other than in the first select of the union/intersect
3) an outer condition that will be pushed into all selects in this
union/intersect, either into the where or having clause
When pushing a condition into all selects of a unit with more than one
select, pushdown_cond_for_derived() renames items so we can re-use the
condition being pushed.
These names need to be saved and reset for correct name resolution on
second execution of prepared statements.
Reviewed by Igor Babaev (igor@mariadb.com)
This commits adds the "materialization" block to the output of
EXPLAIN/ANALYZE FORMAT=JSON when materialized subqueries are involved
into processing. In the case of ANALYZE additional runtime information
is displayed, such as:
- chosen strategy of materialization
- number of partial match/index lookup loops
- sizes of partial match buffers
This bug could affect queries with IN subqueries in WHERE clause and using
derived tables to which split optimization potentially could be applied.
When looking for the best split of a splittable derived table T any key
access from a semi-join materialized table used for lookups S to table T
must be excluded from consideration because in the current implementation
of such tables as S the values from its records cannot be used to access
other tables.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Author: Sergei Petrunia <sergey@mariadb.com>
Date: Wed Oct 11 19:02:25 2023 +0300
MDEV-32301: Server crashes at Arg_comparator::compare_row
In Item_bool_rowready_func2::build_clone(): if we're setting
clone->cmp.comparators=0
also set
const_item_cache=0
as the Item is currently in a state where one cannot compute it.
In Item_bool_rowready_func2::build_clone(): if we're setting
clone->cmp.comparators=0
also set
const_item_cache=0
as the Item is currently in a state where one cannot compute it.
ANALYZE FORMAT=JSON output now includes table.r_engine_stats which
has the engine statistics. Only non-zero members are printed.
Internally: EXPLAIN data structures Explain_table_acccess and
Explain_update now have handler* handler_for_stats pointer.
It is used to read statistics from handler_for_stats->handler_stats.
The following applies only to 10.9+, backport doesn't use it:
Explain data structures exist after the tables are closed. We avoid
walking invalid pointers using this:
- SQL layer calls Explain_query::notify_tables_are_closed() before
closing tables.
- After that call, printing of JSON output is disabled. Non-JSON output
can be printed but we don't access handler_for_stats when doing that.
This bug could affect queries containing a subquery over splittable derived
tables and having an outer references in its WHERE clause. If such subquery
contained an equality condition whose left part was a reference to a column
of the derived table and the right part referred only to outer columns
then the server crashed in the function st_join_table::choose_best_splitting()
The crashing code was added in the commit ce7ffe61d8
that made the code of the function sensitive to presence of the flag
OUTER_REF_TABLE_BIT in the KEYUSE_EXT::needed_in_prefix fields.
The field needed_in_prefix of the KEYUSE_EXT structure should not contain
table maps with OUTER_REF_TABLE_BIT or RAND_TABLE_BIT.
Note that this fix is quite conservative: for affected queries it just
returns the query plans that were used before the above mentioned commit.
In fact the equalities causing crashes should be pushed into derived tables
without any usage of split optimization.
Approved by Sergei Petrunia <sergey@mariadb.com>
This bug could manifest itself at the first execution of prepared statement
created for queries using a materialized view defined as union. A crash
could happen for sure if the query contained a condition pushable into
the view and this condition was over the column defined via a complex string
expression requiring implicit conversion from one charset to another for
some of its sub-expressions. The bug could cause crashes when executing
PS for some other queries whose optimization needed building clones for
such expressions.
This bug was introduced in the patch for MDEV-29988 where the class
Item_direct_ref_to_item was added. The implementations of the virtual
methods get_copy() and build_clone() were invalid for the class and this
could cause crashes after the method build_clone() was called for
expressions containing objects of the Item_direct_ref_to_item type.
Approved by Sergei Golubchik <serg@mariadb.com>
This bug manifested itself in very rare situations when splitting
optimization was applied to a materialized derived table with group clause
by key over a constant meargeable derived table that was in inner part of
an outer join. In this case the used tables for the key to access the
split table incorrectly was evaluated to a not empty table map.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Part of:
MDEV-28073 Slow query performance in MariaDB when using many tables
s->key_dependent has a list of tables that are compared with key fields
in the current table. However it does not take into account if a key
field could be resolved by another table.
This is because MariaDB expands 'join_tab->keyuse' to include all generated
comparisons.
For example:
SELECT * from t1,t2,t3 where t1.key=t2.key and t2.key=t3.key
In this case keyuse for t1 includes t2.key and t3.key and key_dependent
contains 't2.map | t3.map'
If we in best_extension_by_limited_search() consider t2,t1 then t1's
key is fully defined, but we cannot do any prune of plans as
s->key_dependent indicates that t3 is still needed.
Fixed by calculating in best_access_patch the current key_dependent map
of tables that is needed to satisfy all keys. This allows us to prune
more bad plans earlier as soon as all keys can be used.
We also set key_dependent to 0 if we found an EQ_REF key, as this an
optimal key for the table and there is no reason to check more keys.
best_extension_by_limited_search() assumes that tables should be sorted
according to size to be able to quickly disregard bad plans. However the
current usage of swap_variables() will change the table order to a not
sorted one for the next recursive call. This breaks the assumtion and
causes performance issues when using many tables (we have to examine
many more plans).
This patch fixes this by ensuring that the original table order is kept
for the not yet used tables when best_extension_by_limited_search() is
called.
This was done by always calling swap_variables() for each table and
restoring the original table order at exit.
Some test changed:
- In a majority of the test the change was that two "identical tables"
where swapped and the optimzer is now using the first/smaller table
- In few test the table order was changed. The new plan looks identical
or slighly better than the original.
UNION ALL queries are a subject of optimization introduced in MDEV-334
when creation of a temporary table is skipped.
While there is a check for this optimization in Explain_union::print_explain()
there was no such in Explain_union::print_explain_json(). This resulted in
printing irrelevant data like:
"union_result": {
"table_name": "<union2,3>",
"access_type": "ALL",
"r_loops": 0,
"r_rows": null
in case when creation of the temporary table was actually optimized out.
This commits adds a check whether the temporary table was actually created
during the UNION ALL processing and eliminates printing of the irrelevant data.
This bug may affect the queries that uses a grouping derived table with
grouping list containing references to columns from different tables if
the optimizer decides to employ the split optimization for the derived
table. In some very specific cases it may affect queries with a grouping
derived table that refers only one base table.
This bug was caused by an improper fix for the bug MDEV-25128. The fix
tried to get rid of the equality conditions pushed into the where clause
of the grouping derived table T to which the split optimization had been
applied. The fix erroneously assumed that only those pushed equalities
that were used for ref access of the tables referenced by T were needed.
In fact the function remove_const() that figures out what columns from the
group list can be removed if the split optimization is applied can uses
other pushed equalities as well.
This patch actually provides a proper fix for MDEV-25128. Rather than
trying to remove invalid pushed equalities referencing the fields of SJM
tables with a look-up access the patch attempts not to push such equalities.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This bug could affect queries with a grouping derived table containing
equalities in the where clause of its specification if the optimizer
chose to apply split optimization to access the derived table. In such
cases wrong results could be returned from the queries.
When the optimizer considers a possibility of using split optimization
to a derived table it injects equalities joining the derived table with
other tables into the where condition of the derived table. After the join
order for the execution using split optimization has been chosen as the
cheapest the injected equalities that are not used to access the derived
table are removed from the where condition of the derived table.
For this removal the optimizer looks through the conjuncts of the where
condition of the derived table, fetches the equalities and checks whether
they belong to the list of injected equalities.
As the injection of the list was performed just by the insertion of it
into the list of top level AND condition of the where condition some extra
conjuncts from the where condition could be automatically attached to the
end of the list of injected equalities. If such attached conjunct happened
to be an equality predicate it was removed from the where condition of the
derived table and thus lost for checking at the execution phase.
The bug has been fixed by injecting of a shallow copy of the list of the
pushed equalities rather than the list itself leaving the latter intact.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
If a join query uses a derived table (view / CTE) with GROUP BY clause then
the execution plan for such join may employ split optimization. When this
optimization is employed the derived table is not materialized. Rather only
some partitions of the derived table are subject to grouping. Split
optimization can be applied only if:
- there are some indexes over the tables used in the join specifying the
derived table whose prefixes partially cover the field items used in the
GROUP BY list (such indexes are called splitting indexes)
- the WHERE condition of the join query contains conjunctive equalities
between columns of the derived table that comprise major parts of
splitting indexes and columns of the other join tables.
When the optimizer evaluates extending of a partial join by the rows of the
derived table it always considers a possibility of using split optimization.
Different splitting indexes can be used depending on the extended partial
join. At some rare conditions, for example, when there is a non-splitting
covering index for a table joined in the join specifying the derived table
usage of a splitting index to produce rows needed for grouping may be still
less beneficial than usage of such covering index without any splitting
technique. The function JOIN_TAB::choose_best_splitting() must take this
into account.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
If a join query uses a derived table (view / CTE) with GROUP BY clause then
the execution plan for such join may employ split optimization. When this
optimization is employed the derived table is not materialized. Rather only
some partitions of the derived table are subject to grouping. Split
optimization can be applied only if:
- there are some indexes over the tables used in the join specifying the
derived table whose prefixes partially cover the field items used in the
GROUP BY list (such indexes are called splitting indexes)
- the WHERE condition of the join query contains conjunctive equalities
between columns of the derived table that comprise major parts of
splitting indexes and columns of the other join tables.
When the optimizer evaluates extending of a partial join by the rows of the
derived table it always considers a possibility of using split optimization.
Different splitting indexes can be used depending on the extended partial
join. At some rare conditions, for example, when there is a non-splitting
covering index for a table joined in the join specifying the derived table
usage of a splitting index to produce rows needed for grouping may be still
less beneficial than usage of such covering index without any splitting
technique. The function JOIN_TAB::choose_best_splitting() must take this
into account.
Approved by Oleksandr Byelkin <sanja@mariadb.com>