Condition can be pushed from the HAVING clause into the WHERE clause
if it depends only on the fields that are used in the GROUP BY list
or depends on the fields that are equal to grouping fields.
Aggregate functions can't be pushed down.
How the pushdown is performed on the example:
SELECT t1.a,MAX(t1.b)
FROM t1
GROUP BY t1.a
HAVING (t1.a>2) AND (MAX(c)>12);
=>
SELECT t1.a,MAX(t1.b)
FROM t1
WHERE (t1.a>2)
GROUP BY t1.a
HAVING (MAX(c)>12);
The implementation scheme:
1. Extract the most restrictive condition cond from the HAVING clause of
the select that depends only on the fields that are used in the GROUP BY
list of the select (directly or indirectly through equalities)
2. Save cond as a condition that can be pushed into the WHERE clause
of the select
3. Remove cond from the HAVING clause if it is possible
The optimization is implemented in the function
st_select_lex::pushdown_from_having_into_where().
New test file having_cond_pushdown.test is created.
MDEV-17631 select_handler for a full query pushdown
Interfaces + Proof of Concept for federatedx with test cases.
The interfaces have been developed for integration of ColumnStore engine.
This patch contains a full implementation of the optimization
that allows to use in-memory rowid / primary filters built for range
conditions over indexes. In many cases usage of such filters reduce
the number of disk seeks spent for fetching table rows.
In this implementation the choice of what possible filter to be applied
(if any) is made purely on cost-based considerations.
This implementation re-achitectured the partial implementation of
the feature pushed by Galina Shalygina in the commit
8d5a11122c.
Besides this patch contains a better implementation of the generic
handler function handler::multi_range_read_info_const() that
takes into account gaps between ranges when calculating the cost of
range index scans. It also contains some corrections of the
implementation of the handler function records_in_range() for MyISAM.
This patch supports the feature for InnoDB and MyISAM.
Detailed: changes:
1. Moving Field specific code into new methods on Field:
- Field *Field::create_tmp_field(...)
- virtual void init_for_tmp_table(...)
2. Removing virtual Item::create_tmp_field().
Adding instead a new virtual method Item::create_tmp_field_ex().
Note, a virtual create_tmp_field() still exists, but only for Item_sum.
This resembles 10.0 code structure. Perhaps create_tmp_field() should
be removed from Item_sum and Item_sum descendants should override
create_tmp_field_ex() directly. This can be done in a separate commit.
3. Adding helper classes Tmp_field_src and Tmp_field_param,
to make the API for Item::create_tmp_field_ex() smaller
and easier to extend in the future.
4. Decomposing the public function create_tmp_field() into
virtual implementations for Item and a number of its descendants:
- Item_basic_value
- Item_sp_variable
- Item_name_const
- Item_result_field
- Item_field
- Item_ref
- Item_type_holder
- Item_row
- Item_func_sp
- Item_func_user_var
- Item_sum
- Item_sum_field
- Item_proc
5. Adding DBUG_ASSERT-only virtual implementations for
Item types that should not appear in create_tmp_table_ex(),
for easier debugging:
- Item_nodeset_func
- Item_nodeset_to_const_comparator
- Item_null_result
- Item_copy
- Item_ident_for_show
- Item_user_var_as_out_param
6. Moving public function create_tmp_field_from_field()
as a method to Item_field.
7. Removing Item::set_result_field(). It's not needed any more.
8. Cleanup: Removing the enum value "EXPR_CACHE_ITEM",
as it's not used for a very long time.
preserve positions if the multi-update join is using tmp table:
* store positions in the tmp table if needed
JOIN::add_fields_for_current_rowid()
* take positions from the tmp table, not from file->position():
multi_update::prepare2()
The logic and the implementation scheme are similar with the
MDEV-9197 Pushdown conditions into non-mergeable views/derived tables
How the push down is made on the example:
select * from t1
where a>3 and b>10 and
(a,b) in (select x,max(y) from t2 group by x);
-->
select * from t1
where a>3 and b>10 and
(a,b) in (select x,max(y)
from t2
where x>3
group by x
having max(y)>10);
The implementation scheme:
1. Search for the condition cond that depends only on the fields
from the left part of the IN subquery (left_part)
2. Find fields F_group in the select of the right part of the
IN subquery (right_part) that are used in the GROUP BY
3. Extract from the cond condition cond_where that depends only on the
fields from the left_part that stay at the same places in the left_part
(have the same indexes) as the F_group fields in the projection of the
right_part
4. Transform cond_where so it can be pushed into the WHERE clause of the
right_part and delete cond_where from the cond
5. Transform cond so it can be pushed into the HAVING clause of the right_part
The optimization is made in the
Item_in_subselect::pushdown_cond_for_in_subquery() and is controlled by the
variable condition_pushdown_for_subquery.
New test file in_subq_cond_pushdown.test is created.
There are also some changes made for setup_jtbm_semi_joins().
Now it is decomposed into the 2 procedures: setup_degenerate_jtbm_semi_joins()
that is called before optimize_cond() for cond and setup_jtbm_semi_joins()
that is called after optimize_cond().
New setup_jtbm_semi_joins() is made in the way so that the result of its work is
the same as if it was called before optimize_cond().
The code that is common for pushdown into materialized derived and into materialized
IN subqueries is factored out into pushdown_cond_for_derived(),
Item_in_subselect::pushdown_cond_for_in_subquery() and
st_select_lex::pushdown_cond_into_where_clause().
The issue here is that the window function execution is not called for the correct join tab, when we have GROUP BY
where we create extra temporary tables then we need to call window function execution for the last join tab. For doing
so the current code does not take into account the JOIN::aggr_tables.
Fixed by introducing a new function JOIN::total_join_tab_cnt that takes in account the temporary tables also.
no matching operator delete found; memory will not be freed if initialization throws an exception
Added a no-op delete() for MEM_ROOT based placement-new()
Virtial_tmp_table did not set the "field_index" member for its Fields.
Fixing Virtual_tmp_table::add() to set "field_index" to the Field's ordinal position
inside the table, like a normal TABLE does, for consistency.
Although, this flaw did not seem to cause any bugs, having field_index properly
set is helpful for debugging purposes.
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db
Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")
After MDEV-14212, the Virtual_tmp_table instance that stores a ROW
variable elements is accessible from the underlying Field_row
(rather than Item_field_row).
This patch makes some further changes by moving the code from
sp_instr_xxx, sp_rcontext, Item_xxx to Virtual_tmp_table and Field_xxx.
The data type specific code (scalar vs ROW) now resides in
a new virtual method Field_xxx::sp_prepare_and_store_item().
The the code in sp_rcontext::set_variable() and sp_eval_expr()
is now symmetric for scalar and ROW values.
The code in sp_rcontext::set_variable_row_field(), sp_rcontext::set_variable_row_field(), sp_rcontext::set_variable_row()
is now symmetric for ROW elements (i.e. scalar and ROW elements inside a ROW).
Rationale:
Prepare the code to implement these tasks soon easier:
- MDEV-12252 ROW data type for stored function return values
- MDEV-12307 ROW data type for built-in function return values
- MDEV-6121 Data type: Array
- MDEV-10593 sql_mode=ORACLE: TYPE .. AS OBJECT: basic functionality
- ROW with ROW fields (no MDEV yet)
Details:
1. Moving the code in sp_eval_expr() responsible to backup/restore
thd->count_cuted_fields, thd->abort_on_warning,
thd->transaction.stmt.modified_non_trans_table
into a new helper class Sp_eval_expr_state, to reuse it easier.
Fixing sp_eval_expr() to use this new class.
2. Moving sp_eval_expr() and sp_prepare_func_item() from public functions
to methods in THD, so they can be reused in *.cc files easier without
a need to include "sp_head.h".
Splitting sp_prepare_func_item() into two parts.
Adding a new function sp_fix_func_item(), which fixes
the underlying items, but does not do check_cols() for them.
Reusing sp_fix_func_item() in Field_row::sp_prepare_and_store_item().
3. Moving the code to find ROW fields by name from Item to Virtual_tmp_table
Moving the code searching for ROW fields by their names
from Item_field_row::element_index_by_name() to a new method
Item_field_row to Virtual_tmp_table::sp_find_field_by_name().
Adding wrapper methods sp_rcontext::find_row_field_by_name() and
find_row_field_by_name_or_error(), to search for a ROW variable
fields by the variable offset and its field name.
Changing Item_splocal_row_field_by_name::fix_fields() to do
use sp_rcontext::find_row_field_by_name_or_error().
Removing virtual Item::element_index_by_name().
4. Splitting sp_rcontext::set_variable()
Adding a new virtual method Field::sp_prepare_and_store_item().
Spliting the two branches of the code in sp_rcontext::set_variable()
into two virtual implementations of Field::sp_prepare_and_store_item(),
(for Field and for Field_row).
Moving the former part of sp_rcontext::set_variable() with the loop
doing set_null() for all ROW fields into a new method
Virtual_tmp_table::set_all_fields_to_null() and using it in
Field_row::sp_prepare_and_store_item().
Moving the former part of sp_rcontext::set_variable() with the loop
doing set_variable_row_field() into a new method
Virtual_tmp_table::set_all_fields_from_item() and using it in
Field_row::sp_prepare_and_store_item().
The loop in the new method now uses sp_prepare_and_store_item()
instead of set_variable_row_field(), because saving/restoring
THD flags is now done on the upper level. No needs to save/restore
on every iteration.
5. Fixing sp_eval_expr() to simply do two things:
- backup/restore THD flags
- call result_field->sp_prepare_and_store_item()
So now sp_eval_expr() can be used for both scalar and ROW variables.
Reusing it in sp_rcontext::set_variable*().
6. Moving the loop in sp_rcontext::set_variable_row() into a
new method Virtual_tmp_table::sp_set_all_fields_from_item_list().
Changing the loop body to call field->sp_prepare_and_store_item()
instead of doing set_variable_row_field(). This removes
saving/restoring of the THD flags from every interation.
Instead, adding the code to save/restore the flags around
the entire loop in set_variable_row(), using Sp_eval_expr_state.
So now saving/restoring is done only once for the entire ROW
(a slight performance improvement).
7. Removing the code in sp_instr_set::exec_core() that sets
a variable to NULL if the value evaluation failed.
sp_rcontext::set_variable() now makes sure to reset
the variable properly by effectively calling sp_eval_expr(),
which calls virtual Field::sp_prepare_and_store_item().
Removing the similar code from sp_instr_set_row_field::exec_core()
and sp_instr_set_row_field_by_name::exec_core().
Removing the method sp_rcontext::set_variable_row_field_to_null(),
as it's not used any more.
8. Removing the call for sp_prepare_func_item() from
sp_rcontext::set_variable_row_field(), as it was duplicate:
it was done inside sp_eval_expr(). Now it's done inside
virtual Field::sp_prepare_and_store_item().
9. Moving the code from sp_instr_set_row_field_by_name::exec_core()
into sp_rcontext::set_variable_row_field_by_name(), for symmetry
with other sp_instr_set*::exec_core()/sp_rcontext::set_variable*() pairs.
Now sp_instr_set_row_field_by_name::exec_core() calls
sp_rcontext::set_variable_row_field_by_name().
10. Misc:
- Adding a helper private method sp_rcontext::virtual_tmp_table_for_row(),
reusing it in a new sp_rcontext methods.
- Removing Item_field_row::get_row_field(), as it's not used any more.
- Removing the "Item *result_item" from sp_eval_expr(),
as it's not needed any more.
The assertion failure was caused by an incorrectly set read_set for
functions in the ORDER BY clause in part of a union, when we are using
a mergeable view and the order by clause can be skipped (removed).
An order by clause can be skipped if it's part of one part of the UNION as
the result set is not meaningful when multiple SELECT queries are UNIONed. The
server is aware of this optimization and tries to remove the order by
clause before JOIN::prepare. The problem is that we need to throw an
error when the ORDER BY clause contains invalid columns. To do this, we
attempt resolving the ORDER BY expressions, then subsequently drop them
if resolution succeeded. However, ORDER BY resolution had the side
effect of adding the expressions to the all_fields list, which is used
to construct temporary tables to store the result. We may be ignoring
the ORDER BY statement, but the tmp table still tried to compute the
values for the expressions, even if the columns are never used.
The assertion only shows itself if the order by clause contains members
which were not previously in the select list, and are part of a
function.
There is an additional question as to why this only manifests when using
VIEWS and not when using a regular table. The difference lies with the
"reset" of the read_set for the temporary table during
SELECT_LEX::update_used_tables() in JOIN::optimize(). The changes
introduced in fdf789a7ea cleared the
read_set when a mergeable view is encountered in the TABLE_LIST
defintion.
Upon initial order_list resolution, the table's read_set is updated
correctly. JOIN::optimize() will only reset the read_set if it
encounters a VIEW. Since we no longer have ORDER BY clause in
JOIN::optimize() we never get to correctly update the read_set again.
Other relevant commit by Timour, which first introduced the order
resolution when we "can_skip_sort_order":
883af99e7d
Solution:
Don't add the resolved ORDER BY elements to all_fields. We only resolve
them to check if an error should be returned for the query. Ignore them
completely otherwise.
This was done to make thing consistent. It gives the additional benefit
that EXPLAIN EXTENDED now treat null_tables like constant's and replaces
columns with NULL, in a similar way that it replaces columns with constants
for constant tables.
- Null tables are tables where all columns are always NULL. The most common
NULL TABLE is a table used in a LEFT_JOIN that is never true.
- All result changes comes from replacing columns with NULL for null_tables.
- "Impossible where" is now also shows constants for const columns.
- Removed duplicated s->type= JT_CONST
- Reset found_const_table_map when JOIN is created (safety fix)
Most "new" failures fixed in the following files:
- sql_select.cc
- item.cc
- item_func.cc
- opt_subselect.cc
Other things:
- Allocate udf_handler strings in mem_root
- Required changes in sql_string.h
- Add mem_root as argument to some new [] calls
- Mark udf_handler strings as thread specific
- Removed some comment blocks with code
in joined table + GROUP BY + GROUP_CONCAT + HAVING + ORDER BY
[by field from HAVING] + 1 row expected
The fix is actually a port of the fix for bug #17055185 from
mysql code line (see commit f289aeeef0743508ff87211084453b3b88a6d017
by Mithun C Y into mysql-5.6). The test case for the bug #17055185
was also ported.