Raise notes if indexes cannot be used:
- in case of data type or collation mismatch (diferent error messages).
- in case if a table field was replaced to something else
(e.g. Item_func_conv_charset) during a condition rewrite.
Added option to write warnings and notes to the slow query log for
slow queries.
New variables added/changed:
- note_verbosity, with is a set of the following options:
basic - All old notes
unusable_keys - Print warnings about keys that cannot be used
for select, delete or update.
explain - Print unusable_keys warnings for EXPLAIN querys.
The default is 'basic,explain'. This means that for old installations
the only notable new behavior is that one will get notes about
unusable keys when one does an EXPLAIN for a query. One can turn all
of all notes by either setting note_verbosity to "" or setting sql_notes=0.
- log_slow_verbosity has a new option 'warnings'. If this is set
then warnings and notes generated are printed in the slow query log
(up to log_slow_max_warnings times per statement).
- log_slow_max_warnings - Max number of warnings written to
slow query log.
Other things:
- One can now use =ALL for any 'set' variable to set all options at once.
For example using "note_verbosity=ALL" in a config file or
"SET @@note_verbosity=ALL' in SQL.
- mysqldump will in the future use @@note_verbosity=""' instead of
@sql_notes=0 to disable notes.
- Added "enum class Data_type_compatibility" and changing the return type
of all Field::can_optimize*() methods from "bool" to this new data type.
Reviewer & Co-author: Alexander Barkov <bar@mariadb.com>
- The code that prints out the notes comes mainly from Alexander
best_extension_by_limited_search() assumes that tables should be sorted
according to size to be able to quickly disregard bad plans. However the
current usage of swap_variables() will change the table order to a not
sorted one for the next recursive call. This breaks the assumtion and
causes performance issues when using many tables (we have to examine
many more plans).
This patch fixes this by ensuring that the original table order is kept
for the not yet used tables when best_extension_by_limited_search() is
called.
This was done by always calling swap_variables() for each table and
restoring the original table order at exit.
Some test changed:
- In a majority of the test the change was that two "identical tables"
where swapped and the optimzer is now using the first/smaller table
- In few test the table order was changed. The new plan looks identical
or slighly better than the original.
The issue was that best_extension_by_limited_search() had to go through
too many plans with the same cost as there where many EQ_REF tables.
Fixed by shortcutting EQ_REF (AND REF) when the result only contains one
row. This got the optimization time down from hours to sub seconds.
The only known downside with this patch is that in some cases a table
with ref and 1 record may be used before on EQ_REF table. The faster
optimzation phase should compensate for this.
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
not 1.0. In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
This was to ensure that handler::keyread_time() doesn't give
higher cost for heap tables than for normal tables. One effect of
this is that heap and derived tables stored in heap will prefer
key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
multi_range_read_info_const(). All index cost calculation is now
done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
(Backported to 10.3, addressed review input)
Sj_materialization_picker::check_qep(): fix error in cost/fanout
calculations:
- for each join prefix, add #prefix_rows / TIME_FOR_COMPARE to the cost,
like best_extension_by_limited_search does
- Remove the fanout produced by the subquery tables.
- Also take into account join condition selectivity
optimize_wo_join_buffering() (used by LooseScan and FirstMatch)
- also add #prefix_rows / TIME_FOR_COMPARE to the cost of each prefix.
- Also take into account join condition selectivity
Due to inconsistent usage of different cost models to calculate
the cost of ref accesses we have to make the calculation of the
gain promising by usage a range filter more complex.
This patch contains a full implementation of the optimization
that allows to use in-memory rowid / primary filters built for range
conditions over indexes. In many cases usage of such filters reduce
the number of disk seeks spent for fetching table rows.
In this implementation the choice of what possible filter to be applied
(if any) is made purely on cost-based considerations.
This implementation re-achitectured the partial implementation of
the feature pushed by Galina Shalygina in the commit
8d5a11122c.
Besides this patch contains a better implementation of the generic
handler function handler::multi_range_read_info_const() that
takes into account gaps between ranges when calculating the cost of
range index scans. It also contains some corrections of the
implementation of the handler function records_in_range() for MyISAM.
This patch supports the feature for InnoDB and MyISAM.