- When doing join optimization, pre-sort the tables so that they mimic the execution
order we've had with 'semijoin=off'.
- That way, we will not get regressions when there are two query plans (the old and the
new) that have indentical costs but different execution times (because of factors that
the optimizer was not able to take into account).
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
- Correctly handle plan refinement stage for LooseScan plans: run create_ref_for_key() if LooseScan
plan includes a ref access, and if we don't have any fixed key components, switch to a full index scan.
- if we're considering FirstMatch access with one inner table, and
@@optimizer_switch has semijoin_with_cache flag, calculate costs
as if we used join cache (because we will be able to do so)
in EXPLAIN as select_type==MATERIALIZED.
Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
- Make EXPLAIN display "Start temporary" at the start of the fanout (it used to display
at the first table whose rowid gets into temp. table which is not that useful for
the user)
- Updated test results (all checked)
If the optimizer switch 'semijoin_with_cache' is set to 'off' then
join cache cannot be used to join inner tables of a semijoin.
Also fixed a bug in the function check_join_cache_usage() that led
to wrong output of the EXPLAIN commands for some test cases.
of the 5.3 code line after a merge with 5.2 on 2010-10-28
in order not to allow the cost to access a joined table to be equal
to 0 ever.
Expanded data sets for many test cases to get the same execution plans
as before.
semijoin=on,firstmatch=on,loosescan=on
to
semijoin=off,firstmatch=off,loosescan=off
Adjust the testcases:
- Modify subselect*.test and join_cache.test so that all tests
use the same execution paths as before (i.e. optimizations that
are being tested are enabled)
- Let all other test files run with the new default settings (i.e.
with new optimizations disabled)
- Copy subquery testcases from these files into t/subselect_extra.test
which will run them with new optimizations enabled.
Resolved all conflicts, bad merges and fixed a few minor bugs in the code.
Commented out the queries from multi_update, view, subselect_sj, func_str,
derived_view, view_grant that failed either with crashes in ps-protocol or
with wrong results.
The failures are clear indications of some bugs in the code and these bugs
are to be fixed.
Analysis:
Build_equal_items_for_cond() rewrites the WHERE clause in such a way,
that it may merge the list join->cond_equal->current_level with the
list of child Items in an AND condition of the WHERE clause.
The place where this is done is:
static COND *build_equal_items_for_cond(THD *thd, COND *cond,
COND_EQUAL *inherited)
{
...
if (and_level)
{
args->concat(&eq_list);
args->concat((List<Item> *)&cond_equal.current_level);
}
...
}
As a result, later transformations on the WHERE clause may change the
structure of the list join->cond_equal->current_level without knowing this.
Specifically in this bug, Item_in_subselect::inject_in_to_exists_cond
creates a new AND of the old WHERE clause and the IN->EXISTS conditions.
It then calls fix_fields() for the new AND. Among other things, fix_fields
flattens all nested ANDs into one by merging the AND argument lists.
When there is a cond_equal for the JOIN, its list of Item_equal objects
is attached to the end of the original AND. When a lower-level AND is
merged into the top-level one, the argument list of the lower-level AND
is concatenated to the list of multiple equalities in the upper-level AND.
As a result, when substitute_for_best_equal_field processes the
multiple equalities, it turns out that the multiple equality list contains
the Items from the lower-level AND which were concatenated to the end of
the join->cond_equal->current_level list. This results in a crash because
this list must not contain any other Items except for the previously found
Item_equal ones.
Solution:
When performing IN->EXIST predicate injection, and the where clause is an
AND, detach the list of Item_equal objects before calling fix_fields on
the injected where clause.
After fix_fields is done, reattach back the multiple equalities list to
the end of the argument list of the new AND.
Analysis:
The wrong result is a consquence of sorting the subquery
result and then selecting only the first row due to the
artificial LIMIT 1 introduced by the fix_fields phase.
Normally, if there is an ORDER BY in a subquery, the ORDER
is removed (Item_in_subselect::select_in_like_transformer),
however if a GROUP BY is transformed into ORDER, this happens
later, after the removal of the ORDER clause of subqueries, so
we end up with a subquery with an ORDER clause, and an artificially
added LIMIT 1.
The reason why the same works in the main 5.3 without MWL#89, is
that the 5.3 performs all subquery transformations, including
IN->EXISTS before JOIN::optimize(). The beginning of JOIN::optimize
does:
if (having || (select_options & OPTION_FOUND_ROWS))
select_limit= HA_POS_ERROR;
which sets the limit back to infinity, thus 5.3 sorts the whole
subquery result, and IN performs the lookup into all subquery result
rows.
Solution:
Sorting of subqueries without LIMIT is meaningless. Since LIMIT in
subqueries is not supported, the patch removes sorting by setting
join->skip_sort_order= true
for each subquery JOIN object. This improves a number of execution
plans to not perform unnecessary sorting at all.
- Let advance_sj_state() save the value of JOIN::cur_dups_producing_tables
in POSITION::prefix_dups_producing_tables, and restore_sj_state() restore
it.
- "Using MRR" is no longer shown with range access.
- Instead, both range and BKA accesses will show one of the following:
= "Rowid-ordered scan"
= "Key-ordered scan"
= "Key-ordered Rowid-ordered scan"
depending on whether DS-MRR implementation will do scan keys in order, rowids in order,
or both.
- The patch also introduces a way for other storage engines/MRR implementations to
pass information to EXPLAIN output about the properties of employed MRR scans.
Merge 5.3-mwl89 into 5.3 main.
There is one remaining test failure in this merge:
innodb_mysql_lock2. All other tests have been checked to
deliver the same results/explains as 5.3-mwl89, including
the few remaining wrong results.
plans or wrong results due to the fact that JOIN_CACHE functions
ignored the possibility of interleaving materialized semijoin
tables with tables whose records were stored in join buffers.
This fixes would become mostly unnecessary if the new code of
mwl 90 was merged into 5.3 right now.
Yet the fix the code of optimize_wo_join_buffering was needed
in any case.
Phase 3: Implementation of re-optimization of subqueries with injected predicates
and cost comparison between Materialization and IN->EXISTS strategies.
The commit contains the following known problems:
- The implementation of EXPLAIN has not been re-engineered to reflect the
changes in subquery optimization. EXPLAIN for subqueries is called during
the execute phase, which results in different code paths during JOIN::optimize
and thus in differing EXPLAIN messages for constant/system tables.
- There are some valgrind warnings that need investigation
- Several EXPLAINs with minor differences need to be reconsidered after fixing
the EXPLAIN problem above.
This patch also adds one extra optimizer_switch: 'in_to_exists' for complete
manual control of the subquery execution strategies.
Applied the fix for bug #47217 from the mysql-6.0 codebase.
The patch adds not null predicates generated for the left parts
of the equality predicates used for ref accesses. This is done
for such predicates both in where conditions and on conditions.
For the where conditions the not null predicates were generated
but in 5.0/5.1 they actually never were used due to some lame
merge from 4.1 to 5.0. The fix for bug #47217 made these
predicates to be used in the condition pushed to the tables.
Yet only this patch generates not null predicates for equality
predicated from on conditions of outer joins.
This patch introduces a performance regression that can be
observed on a test case from null_key.test. The regression
will disappear after the fix for bug #57024 from mariadb-5.1
is pulled into mariadb-5.3.
The patch contains many changes in the outputs of the EXPLAIN
commands since generated not null predicates are considered as
parts of the conditions pushed to join tables and may add
'Usingwhere' in some rows of EXPLAINs where there used
to be no such comments.
- Corrected a wrong result that was recorded by the MySQL fix for BUG#39069.
- Removed Item_func_isnull::cached_value and all the logic around this custom-made
caching of the NULL result because MWL#89 optimizes subqueries before the outer
query is being executed, and this cache cannot be made easily to work for all
kinds of Items (specifically Item_sum_sum, but others too).
libmysqld/Makefile.am:
The new file added.
mysql-test/r/index_merge_myisam.result:
subquery_cache optimization option added.
mysql-test/r/myisam_mrr.result:
subquery_cache optimization option added.
mysql-test/r/subquery_cache.result:
The subquery cache tests added.
mysql-test/r/subselect3.result:
Subquery cache switched off to avoid changing read statistics.
mysql-test/r/subselect3_jcl6.result:
Subquery cache switched off to avoid changing read statistics.
mysql-test/r/subselect_no_mat.result:
subquery_cache optimization option added.
mysql-test/r/subselect_no_opts.result:
subquery_cache optimization option added.
mysql-test/r/subselect_no_semijoin.result:
subquery_cache optimization option added.
mysql-test/r/subselect_sj.result:
subquery_cache optimization option added.
mysql-test/r/subselect_sj_jcl6.result:
subquery_cache optimization option added.
mysql-test/t/subquery_cache.test:
The subquery cache tests added.
mysql-test/t/subselect3.test:
Subquery cache switched off to avoid changing read statistics.
sql/CMakeLists.txt:
The new file added.
sql/Makefile.am:
The new files added.
sql/item.cc:
Expression cache item (Item_cache_wrapper) added.
Item_ref and Item_field fixed for correct usage of result field and fast resolwing in SP.
sql/item.h:
Expression cache item (Item_cache_wrapper) added.
Item_ref and Item_field fixed for correct usage of result field and fast resolwing in SP.
sql/item_cmpfunc.cc:
Subquery cache added.
sql/item_cmpfunc.h:
Subquery cache added.
sql/item_subselect.cc:
Subquery cache added.
sql/item_subselect.h:
Subquery cache added.
sql/item_sum.cc:
Registration of subquery parameters added.
sql/mysql_priv.h:
subquery_cache optimization option added.
sql/mysqld.cc:
subquery_cache optimization option added.
sql/opt_range.cc:
Fix due to subquery cache.
sql/opt_subselect.cc:
Parameters of the function cahnged.
sql/procedure.h:
.h file guard added.
sql/sql_base.cc:
Registration of subquery parameters added.
sql/sql_class.cc:
Option to allow add indeces to temporary table.
sql/sql_class.h:
Item iterators added.
Option to allow add indeces to temporary table.
sql/sql_expression_cache.cc:
Expression cache for caching subqueries added.
sql/sql_expression_cache.h:
Expression cache for caching subqueries added.
sql/sql_lex.cc:
Registration of subquery parameters added.
sql/sql_lex.h:
Registration of subqueries and subquery parameters added.
sql/sql_select.cc:
Subquery cache added.
sql/sql_select.h:
Subquery cache added.
sql/sql_union.cc:
A new parameter to the function added.
sql/sql_update.cc:
A new parameter to the function added.
sql/table.cc:
Procedures to manage temporarty tables index added.
sql/table.h:
Procedures to manage temporarty tables index added.
storage/maria/ha_maria.cc:
Fix of handler to allow destoy a table in case of error during the table creation.
storage/maria/ha_maria.h:
.h file guard added.
storage/myisam/ha_myisam.cc:
Fix of handler to allow destoy a table in case of error during the table creation.
- Add Item_in_subselect::get_identifier() that returns subquery's id
- Change select_describe() to produce output in new format
- Update test results (checked)
This patch does three things:
- It adds the possibility to force the execution of top-level [NOT] IN
subquery predicates via the IN=>EXISTS transformation. This is done by
setting both optimizer switches partial_match_rowid_merge and
partial_match_table_scan to "off".
- It adjusts all test cases where the complete optimizer_switch is
selected because now we have two more switches.
- For those test cases where the plan changes because of the new available
strategies, we switch off both partial match strategies in order to
force the "old" IN=>EXISTS strategy. This is done because most of these
test cases specifically test bugs in this strategy.
sql/opt_subselect.cc:
Adds the possibility to force the execution of top-level [NOT] IN
subquery predicates via the IN=>EXISTS transformation. This is done by
setting both optimizer switches partial_match_rowid_merge and
partial_match_table_scan to "off".
The problem is that not all column names retrieved from a SELECT
statement can be used as view column names due to length and format
restrictions. The server failed to properly check the conformity
of those automatically generated column names before storing the
final view definition on disk.
Since columns retrieved from a SELECT statement can be anything
ranging from functions to constants values of any format and length,
the solution is to rewrite to a pre-defined format any names that
are not acceptable as a view column name.
The name is rewritten to "Name_exp_%u" where %u translates to the
position of the column. To avoid this conversion scheme, define
explict names for the view columns via the column_list clause.
Also, aliases are now only generated for top level statements.
mysql-test/include/view_alias.inc:
Add test case for Bug#40277
mysql-test/r/compare.result:
Bug#40277: SHOW CREATE VIEW returns invalid SQL
mysql-test/r/group_by.result:
Bug#40277: SHOW CREATE VIEW returns invalid SQL
mysql-test/r/ps.result:
Bug#40277: SHOW CREATE VIEW returns invalid SQL
mysql-test/r/subselect.result:
Bug#40277: SHOW CREATE VIEW returns invalid SQL
mysql-test/r/subselect3.result:
Bug#40277: SHOW CREATE VIEW returns invalid SQL
mysql-test/r/type_datetime.result:
Bug#40277: SHOW CREATE VIEW returns invalid SQL
mysql-test/r/union.result:
Bug#40277: SHOW CREATE VIEW returns invalid SQL
mysql-test/r/view.result:
Add test case result for Bug#40277
mysql-test/r/view_alias.result:
Add test case result for Bug#40277
mysql-test/t/view_alias.test:
Add test case for Bug#40277
sql/sql_view.cc:
Check if auto generated column names are conforming. Also, the
make_unique_view_field_name function is not used as it uses the
original name to construct a new one, which does not work if the
name is invalid.