The code in best_access_path() uses PREV_BITS(uint, N) to
compute a bitmap of all keyparts: {keypart0, ... keypart{N-1}).
The problem is that PREV_BITS($type, N) macro code can't handle the case
when N=<number of bits in $type).
Also, why use PREV_BITS(uint, ...) for key part map computations when
we could have used PREV_BITS(key_part_map) ?
Fixed both:
- Change PREV_BITS(type, N) to handle any N in [0; n_bits(type)].
- Change PREV_BITS() to use key_part_map when computing key_part_map bitmaps.
Improve performance of queries like
SELECT * FROM t1 WHERE field = NAME_CONST('a', 4);
by, in this example, replacing the WHERE clause with field = 4
in the case of ref access.
The rewrite is done during fix_fields and we disambiguate this
case from other cases of NAME_CONST by inspecting where we are
in parsing. We rely on THD::where to accomplish this. To
improve performance there, we change the type of THD::where to
be an enumeration, so we can avoid string comparisons during
Item_name_const::fix_fields. Consequently, this patch also
changes all usages of THD::where to conform likewise.
The code inside Item_subselect::fix_fields() could fail to check
that left expression had an Item_row, like this:
(('x', 1.0) ,1) IN (SELECT 'x', 1.23 FROM ... UNION ...)
In order to hit the failure, the first SELECT of the subquery had
to be a degenerate no-tables select. In this case, execution will
not enter into Item_in_subselect::create_row_in_to_exists_cond()
and will not check if left_expr is composed of scalars.
But the subquery is a UNION so as a whole it is not degenerate.
We try to create an expression cache for the subquery.
We create a temp.table from left_expr columns. No field is created
for the Item_row. Then, we crash when trying to add an index over a
non-existent field.
Fixed by moving the left_expr cardinality check to a point in
check_and_do_in_subquery_rewrites() which gets executed for all
cases.
It's better to make the check early so we don't have to care about
subquery rewrite code hitting Item_row in left_expr.
This patch also fixes
MDEV-31391 Assertion `((best.records_out) == 0.0 ... failed
Cost changes caused by this change:
- range queries with join buffer now have a notable smaller cost.
- range ranges are bit more expensive as the MULTI_RANGE_COST is now
properly applied to it in all cases (this extra cost is equal to a
key lookup).
- table scan cost is slight smaller as we now assume data is cached in
the engine after the first scan pass. (We did this before for range
scans and other access methods).
- partition tables had wrong values for max_row_blocks and
max_index_blocks. Correcting this, causes range access on
partitioned tables to have slightly higher cost because of the
increased estimated IO.
- Using first match + join buffer caused 'filtered' to be calcualted
wrong. (Only affected EXPLAIN, not query costs).
- Added cost_without_join_buffer to optimizer_trace.
- check_quick_select() adjusted the number of rows according to persistent
statistics, but did not adjust cost. Now fixed.
The big change in the patch are:
- In best_access_path(), where we now are using storing the cost in
'ALL_READ_COST cost' and only converting it to a double at the end.
This allows us to more exactly calculate the effect of the join_cache.
- In JOIN_TAB::estimate_scan_time(), store the cost also in a
ALL_READ_COST object.
One of effect if this change is that when joining very small tables:
t1 some_access_method
t2 range
t3 ALL Use join buffer
This is swiched to
t1 some_access_method
t3 ALL
t2 range use join buffer
Both plans has the same cost, but as table scan in this case has less
cost than rang, the table scan will be considered first and thus have
precidence.
Test case changes:
- optimizer_trace - Addition of cost_without_join_buffer
- subselect_mat_cost_bugs - Small tables and scan versus range
- range & range_mrr_icp - Range + join_cache is faster than ref
- optimizer_trace - cost_without_join_buffer, smaller scan cost,
range setup cost.
- mrr - range+join_buffer used as smaller cost
The code in create_internal_tmp_table() didn't take into account that
now temporary (derived) tables may have multiple indexes:
- one index due to duplicate removal
= In this example created by conversion of big-IN(...) into subquery
= this index might be converted into a "unique constraint" if the key
length is too large.
- one index added by derived_with_keys optimization.
Make create_internal_tmp_table() handle multiple indexes.
Before this patch, use of a unique constraint was indicated in
TABLE_SHARE::uniques. This was ok as unique constraint was the only index
in the table. Now it's no longer the case so TABLE_SHARE::uniques is removed
and replaced with an in-memory-only flag HA_UNIQUE_HASH.
This patch is based on Monty's patch.
Co-Author: Monty <monty@mariadb.org>
This bug affected EXPLAIN EXTENDED command for single-table DELETE that
used an IN subquery in its WHERE clause. A crash happened if the optimizer
chose to employ index_subquery or unique_subquery access when processing
such command.
The crash happened when the command tried to print the transformed query.
In the current code of 10.4 for single-table DELETE statements the output
of any explain command is produced after the join structures of all used
subqueries have been destroyed. JOIN::destroy() sets the field tab of the
JOIN_TAB structures created for subquery tables to NULL. As a result
subselect_indexsubquery_engine::print(), subselect_indexsubquery_engine()
cannot use this field to get the alias name of the joined table.
This patch suggests to use the field TABLE_LIST::TAB that can be accessed
from JOIN_TAB::tab_list to get the alias name of the joined table.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch allows to use semi-join optimization at the top level of
single-table update and delete statements.
The problem of supporting such optimization became easy to resolve after
processing a single-table update/delete statement started using JOIN
structure. This allowed to use JOIN::prepare() not only for multi-table
updates/deletes but for single-table ones as well. This was done in the
patch for mdev-28883:
Re-design the upper level of handling UPDATE and DELETE statements.
Note that JOIN::prepare() detects all subqueries that can be considered
as candidates for semi-join optimization. The code added by this patch
looks for such candidates at the top level and if such candidates are found
in the processed single-table update/delete the statement is handled in
the same way as a multi-table update/delete.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch fixes not only the assertion failure in the function
Field_iterator_table_ref::set_field_iterator() but also:
- fixes the problem of forced materialization of derived tables used
in subqueries contained in WHERE clauses of single-table and multi-table
UPDATE and DELETE statements
- fixes the problem of MDEV-17954 that prevented execution of multi-table
DELETE statements if they use in their WHERE clauses references to
the tables that are updated.
The patch must be considered a complement to the patch for MDEV-28883.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch introduces a new way of handling UPDATE and DELETE commands at
the top level after the parsing phase. This new way of processing update
and delete statements can be seen in the implementation of the prepare()
and execute() methods from the new Sql_cmd_dml class. This class derived
from the Sql_cmd class can be considered as an interface class for processing
such commands as SELECT, INSERT, UPDATE, DELETE and other comands
manipulating data in tables.
With this patch processing of update and delete statements after parsing
proceeds by the following schema:
- precheck of the access rights is performed for the used tables
- the used tables are opened
- context analysis phase is performed for the statement
- the used tables are locked
- the statement is optimized and executed
- clean-up is performed for the statement
The implementation of the method Sql_cmd_dml::execute() adheres this schema.
The virtual functions of the class Sql_cmd_dml used for precheck of the
access rights, context analysis, optimization and execution allow to adjust
this schema for processing data manipulation statements of any types.
This schema of processing data manipulation statements is taken from the
current MySQL code. Moreover the definition the class Sql_cmd_dml introduced
in this patch is almost a full replica of such class in the existing MySQL.
However the implementation of the derived classes for update and delete
statements is quite different. This implementation employs the JOIN class
for all kinds of update and delete statements. It allows to perform main
bulk of context analysis actions by the function JOIN::prepare(). This
guarantees that characteristics and properties of the statement tree
discovered for optimization phase when doing context analysis are the same
for single-table and multi-table updates and deletes.
With this patch the following functions are gone:
mysql_prepare_update(), mysql_multi_update_prepare(),
mysql_update(), mysql_multi_update(),
mysql_prepare_delete(), mysql_multi_delete_prepare(), mysql_delete().
The code within these functions have been used as much as possible though.
The functions mysql_test_update() and mysql_test_delete() are also not
needed anymore. The method Sql_cmd_dml::prepare() serves processing
- update/delete statement
- PREPARE stmt FROM "<update/delete statement>"
- EXECUTE stmt when stmt is prepared from update/delete statement.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Firstmatch_picker::check_qep() has an optimization that allows firstmatch
to be used together with join buffer under some conditions. In this
case the cost was assumed to be same as what best_access_path()
had calculated.
However if HASH+join_buffer was used, then
fix_semijoin_strategies_for_picked_join_order() would remove the
join_buffer (which would cause a full join to be used) and the cost
assumption by Firstmatch_picker::check_qep() would be wrong.
Later check_join_cache_usage() sees that it's a full scan and decides
it can use join buffering, (But not the hash join).
Fixed by also allowing HASH joins with firstmatch.
This removes the need to change disable and re-enable join buffer.
Test case changes:
- HASH join used with firstmatch (Using join buffer (flat, BNLH join))
- Filtered could change with firstmatch as the conversion with and without
join_buffered lost the filtering information.
- The not "re-enabling join buffer" is shown in main.optimizer_trace
Original code by Sergei, optimized by Monty.
Author: Sergei Petrunia <sergey@mariadb.com>, monty@mariadb.org
This error was discovered while working on
MDEV-30540 Wrong result with IN list length reaching
IN_PREDICATE_CONVERSION_THRESHOLD
If there is read error from handler::ha_rnd_next() during a recursive
query, st_select_lex_unit::exec_recursive() will crash as it will try to
get the error code from a structure that was deleted by the callee.
The code was using the construct:
sl->join->exec();
saved_error=sl->join->error;
This does not work as sl->join was freed by the exec() and sl->join would
be set to 0.
Fixed by having JOIN::exec() return the error code.
The included test case simulates the error in ha_rnd_next(), which causes
a crash without the patch.
scovered whle working on
MDEV-30540 Wrong result with IN list length reaching
IN_PREDICATE_CONVERSION_THRESHOLD
If there is read error from handler::ha_rnd_next() during a recursive
query, st_select_lex_unit::exec_recursive() will crash as it will try to
get the error code from a structure that was deleted by the callee.
The code was using the construct:
sl->join->exec();
saved_error=sl->join->error;
This does not work as sl->join was freed by the exec() and sl->join was
set to 0.
Fixed by having JOIN::exec() return the error code.
The included test case simulates the error in ha_rnd_next(), which causes
a crash without the patch.
This patch also fixes some bugs detected by valgrind after this
patch:
- Not enough copy_func elements was allocated by Create_tmp_table() which
causes an memory overwrite in Create_tmp_table::add_fields()
I added an ASSERT() to be able to detect this also without valgrind.
The bug was that TMP_TABLE_PARAM::copy_fields was not correctly set
when calling create_tmp_table().
- Aria::empty_bits is not allocated if there is no varchar/char/blob
fields in the table. Fixed code to take this into account.
This cannot cause any issues as this is just a memory access
into other Aria memory and the content of the memory would not be used.
- Aria::last_key_buff was not allocated big enough. This may have caused
issues with rtrees and ma_extra(HA_EXTRA_REMEMBER_POS) as they
would use the same memory area.
- Aria and MyISAM didn't take extended key parts into account, which
caused problems when copying rec_per_key from engine to sql level.
- Mark asan builds with 'asan' in version strihng to detect these in
not_valgrind_build.inc.
This is needed to not have main.sp-no-valgrind fail with asan.
DuplicateWeedout semi-join optimization requires that the tables in
the parent subquery provide rowids that can be compared across table
scans. Most engines support this, federated is the only exception.
DuplicateWeedout is the default catch-all semi-join strategy, which
must be always available. If it is not available for some edge case,
it's better to disable semi-join conversion altogether.
This is what was done in the fix for MDEV-30395. However that fix
has put the check before the view processing, so it didn't detect
federated tables inside mergeable VIEWs.
This patch moves the check to be done at a later phase, when mergeable
views are already merged.
The reason for this is that we call file->index_flags(index, 0, 1)
multiple times in best_access_patch()when optimizing a table.
For example, in InnoDB, the calls is not trivial (4 if's and 2 assignments)
Now the function is inlined and is just a memory reference.
Other things:
- handler::is_clustering_key() and pk_is_clustering_key() are now inline.
- Added TABLE::can_use_rowid_filter() to simplify some code.
- Test if we should use a rowid_filter only if can_use_rowid_filter() is
true.
- Added TABLE::is_clustering_key() to avoid a memory reference.
- Simplify some code using the fact that HA_KEYREAD_ONLY is true implies
that HA_CLUSTERED_INDEX is false.
- Added DBUG_ASSERT to TABLE::best_range_rowid_filter() to ensure we
do not call it with a clustering key.
- Reorginized elements in struct st_key to get better memory alignment.
- Updated ha_innobase::index_flags() to not have
HA_DO_RANGE_FILTER_PUSHDOWN for clustered index
- table_after_join_selectivity() should use records_init (new bug)
- get_examined_rows() changed to double to get similar results
as in MariaDB 10.11
- Fixed bug where table_after_join_selectivity() did not correct
selectivity in the case where a RANGE is used instead of a REF.
This can happen if the range can use more key_parts than the REF.
WHERE key_part1=10 and key_part2 < 10
Other things:
- Use JT_RANGE instead of JT_ALL for RANGE access in all parts of the code.
Before we used JT_ALL for RANGE.
- Force RANGE be used in best_access_path() if the range used more key
parts than ref. In the original code, this was done much later in
make_join_select)(). However we need to know in
table_after_join_selectivity() if we have used RANGE or not.
- Added more information about filtering to optimizer_trace.
Allows FirstMatch to handle the case where the fanout of firstmatch tables
is already less than 1.
Also Fixes LooseScan strategy to set position->{records_init, records_out}
(They were set to 0 which also caused assertion failures)
Author: Sergei Petrunia <sergey@mariadb.com>
Reviewer: Monty
This solves the current problem in the optimizer
- SELECT FROM big_table
- SELECT from small_table where small_table.eq_ref_key=big_table.id
The old code assumed that each eq_ref access will cause an IO.
As the cost of IO is high, this dominated the cost for the later table
which caused the optimizer to prefer table scans + join cache over
index reads.
This patch fixes this issue by limit the number of expected IO calls,
for rows and index separately, to the size of the table or index or
the number of accesses that we except in a range for the index.
The major changes are:
- Adding a new structure ALL_READ_COST that is mainly used in
best_access_path() to hold the costs parts of the cost we are
calculating. This allows us to limit the number of IO when multiplying
the cost with the previous row combinations.
- All storage engine cost functions are changed to return IO_AND_CPU_COST.
The virtual cost functions should now return in IO_AND_CPU_COST.io
the number of disk blocks that will be accessed instead of the cost
of the access.
- We are not limiting the io_blocks for table or index scans as we
assume that engines may not store these in the 'hot' part of the
cache. Table and index scan also uses much less IO blocks than
key accesses, so the original issue is not as critical with scans.
Other things:
OPT_RANGE now holds a 'Cost_estimate cost' instead a lot of different
costs. All the old costs, like index_only_read, can be extracted
from 'cost'.
- Added to the start of some functions 'handler *file= table->file'
to shorten the code that is using the handler.
- handler->cost() is used to change a ALL_READ_COST or IO_AND_CPU_COST
to 'cost in milliseconds'
- New functions: handler::index_blocks() and handler::row_blocks()
which are used to limit the IO.
- Added index_cost and row_cost to Cost_estimate and removed all not
needed members.
- Removed cost coefficients from Cost_estimate as these don't make sense
when costs (except IO_BLOCKS) are in milliseconds.
- Removed handler::avg_io_cost() and replaced it with DISK_READ_COST.
- Renamed best_range_rowid_filter_for_partial_join() to
best_range_rowid_filter() as using the old name made rows too long.
- Changed all SJ_MATERIALIZATION_INFO 'Cost_estimate' variables to
'double' as Cost_estimate power was not used for these and thus
just caused storage and performance overhead.
- Changed cost_for_index_read() to use 'worst_seeks' to only limit
IO, not number of table accesses. With this patch worst_seeks is
probably not needed anymore, but I kept it around just in case.
- Applying cost for filter got to be much shorter and easier thanks
to the API changes.
- Adjusted cost for fulltext keys in collaboration with Sergei Golubchik.
- Most test changes caused by this patch is that table scans are changed
to use indexes.
- Added ha_seq::keyread_time() and ha_seq::key_scan_time() to get
make checking number of potential IO blocks easier during debugging.
This makes it easier to compare different costs and also allows
the optimizer to optimizer different storage engines more reliably.
- Added tests/check_costs.pl, a tool to verify optimizer cost calculations.
- Most engine costs has been found with this program. All steps to
calculate the new costs are documented in Docs/optimizer_costs.txt
- User optimizer_cost variables are given in microseconds (as individual
costs can be very small). Internally they are stored in ms.
- Changed DISK_READ_COST (was DISK_SEEK_BASE_COST) from a hard disk cost
(9 ms) to common SSD cost (400MB/sec).
- Removed cost calculations for hard disks (rotation etc).
- Changed the following handler functions to return IO_AND_CPU_COST.
This makes it easy to apply different cost modifiers in ha_..time()
functions for io and cpu costs.
- scan_time()
- rnd_pos_time() & rnd_pos_call_time()
- keyread_time()
- Enhanched keyread_time() to calculate the full cost of reading of a set
of keys with a given number of ranges and optional number of blocks that
need to be accessed.
- Removed read_time() as keyread_time() + rnd_pos_time() can do the same
thing and more.
- Tuned cost for: heap, myisam, Aria, InnoDB, archive and MyRocks.
Used heap table costs for json_table. The rest are using default engine
costs.
- Added the following new optimizer variables:
- optimizer_disk_read_ratio
- optimizer_disk_read_cost
- optimizer_key_lookup_cost
- optimizer_row_lookup_cost
- optimizer_row_next_find_cost
- optimizer_scan_cost
- Moved all engine specific cost to OPTIMIZER_COSTS structure.
- Changed costs to use 'records_out' instead of 'records_read' when
recalculating costs.
- Split optimizer_costs.h to optimizer_costs.h and optimizer_defaults.h.
This allows one to change costs without having to compile a lot of
files.
- Updated costs for filter lookup.
- Use a better cost estimate in best_extension_by_limited_search()
for the sorting cost.
- Fixed previous issues with 'filtered' explain column as we are now
using 'records_out' (min rows seen for table) to calculate filtering.
This greatly simplifies the filtering code in
JOIN_TAB::save_explain_data().
This change caused a lot of queries to be optimized differently than
before, which exposed different issues in the optimizer that needs to
be fixed. These fixes are in the following commits. To not have to
change the same test case over and over again, the changes in the test
cases are done in a single commit after all the critical change sets
are done.
InnoDB changes:
- Updated InnoDB to not divide big range cost with 2.
- Added cost for InnoDB (innobase_update_optimizer_costs()).
- Don't mark clustered primary key with HA_KEYREAD_ONLY. This will
prevent that the optimizer is trying to use index-only scans on
the clustered key.
- Disabled ha_innobase::scan_time() and ha_innobase::read_time() and
ha_innobase::rnd_pos_time() as the default engine cost functions now
works good for InnoDB.
Other things:
- Added --show-query-costs (\Q) option to mysql.cc to show the query
cost after each query (good when working with query costs).
- Extended my_getopt with GET_ADJUSTED_VALUE which allows one to adjust
the value that user is given. This is used to change cost from
microseconds (user input) to milliseconds (what the server is
internally using).
- Added include/my_tracker.h ; Useful include file to quickly test
costs of a function.
- Use handler::set_table() in all places instead of 'table= arg'.
- Added SHOW_OPTIMIZER_COSTS to sys variables. These are input and
shown in microseconds for the user but stored as milliseconds.
This is to make the numbers easier to read for the user (less
pre-zeros). Implemented in 'Sys_var_optimizer_cost' class.
- In test_quick_select() do not use index scans if 'no_keyread' is set
for the table. This is what we do in other places of the server.
- Added THD parameter to Unique::get_use_cost() and
check_index_intersect_extension() and similar functions to be able
to provide costs to called functions.
- Changed 'records' to 'rows' in optimizer_trace.
- Write more information to optimizer_trace.
- Added INDEX_BLOCK_FILL_FACTOR_MUL (4) and INDEX_BLOCK_FILL_FACTOR_DIV (3)
to calculate usage space of keys in b-trees. (Before we used numeric
constants).
- Removed code that assumed that b-trees has similar costs as binary
trees. Replaced with engine calls that returns the cost.
- Added Bitmap::find_first_bit()
- Added timings to join_cache for ANALYZE table (patch by Sergei Petrunia).
- Added records_init and records_after_filter to POSITION to remember
more of what best_access_patch() calculates.
- table_after_join_selectivity() changed to recalculate 'records_out'
based on the new fields from best_access_patch()
Bug fixes:
- Some queries did not update last_query_cost (was 0). Fixed by moving
setting thd->...last_query_cost in JOIN::optimize().
- Write '0' as number of rows for const tables with a matching row.
Some internals:
- Engine cost are stored in OPTIMIZER_COSTS structure. When a
handlerton is created, we also created a new cost variable for the
handlerton. We also create a new variable if the user changes a
optimizer cost for a not yet loaded handlerton either with command
line arguments or with SET
@@global.engine.optimizer_cost_variable=xx.
- There are 3 global OPTIMIZER_COSTS variables:
default_optimizer_costs The default costs + changes from the
command line without an engine specifier.
heap_optimizer_costs Heap table costs, used for temporary tables
tmp_table_optimizer_costs The cost for the default on disk internal
temporary table (MyISAM or Aria)
- The engine cost for a table is stored in table_share. To speed up
accesses the handler has a pointer to this. The cost is copied
to the table on first access. If one wants to change the cost one
must first update the global engine cost and then do a FLUSH TABLES.
This was done to be able to access the costs for an open table
without any locks.
- When a handlerton is created, the cost are updated the following way:
See sql/keycaches.cc for details:
- Use 'default_optimizer_costs' as a base
- Call hton->update_optimizer_costs() to override with the engines
default costs.
- Override the costs that the user has specified for the engine.
- One handler open, copy the engine cost from handlerton to TABLE_SHARE.
- Call handler::update_optimizer_costs() to allow the engine to update
cost for this particular table.
- There are two costs stored in THD. These are copied to the handler
when the table is used in a query:
- optimizer_where_cost
- optimizer_scan_setup_cost
- Simply code in best_access_path() by storing all cost result in a
structure. (Idea/Suggestion by Igor)
One effect of this change in the test suite is that tests with very few
rows changed to use sub queries instead of materialization. This is
correct and expected as for these the materialization overhead is too high.
A lot of tests where fixed to still use materialization by adding a
few rows to the tables (most tests has only 2-3 rows and are thus easily
affected when cost computations are changed).
Other things:
- Added more variables to TMPTABLE_COSTS for better cost calculation
- Added cost of copying rows to TMPTABLE_COSTS lookup and write
- Added THD::optimizer_cache_hit_ratio for easier cost calculations
- Added DISK_FAST_READ_SIZE to be used when calculating costs when
reading big blocks from a disk
This cleans up the interface for choose_plan() as it is not depending
on setting join->emb_sj_nest.
choose_plan() now sets up join->emb_sj_nest and join->allowed_tables before
calling optimize_straight_join() and best_extension_by_limited_search().
Other things:
- Converted some 'if' to DBUG_ASSERT() as these should always be true.
- Calculate 'allowed_tables' in choose_plan() as this never changes in
the childs.
- Added assert to check that next_emb->nested_join->n_tables doesn't
get to a wrong value.
- Documented some variables in sql_select.h