MDEV-28073 Slow query performance in MariaDB when using many table
The idea is to prefer and chain EQ_REF tables (tables that uses an
unique key to find a row) when searching for the best table combination.
This significantly reduces row combinations that has to be examined.
This is optimization is enabled when setting optimizer_prune_level=2
(which is now default).
Implementation:
- optimizer_prune_level has a new level, 2, which enables EQ_REF
optimization in addition to the pruning done by level 1.
Level 2 is now default.
- Added JOIN::eq_ref_tables that contains bits of tables that could use
potentially use EQ_REF access in the query. This is calculated
in sort_and_filter_keyuse()
Under optimizer_prune_level=2:
- When the greedy_optimizer notices that the preceding table was an
EQ_REF table, it tries to add an EQ_REF table next. If an EQ_REF
table exists, only this one will be considered at this level.
We also collect all EQ_REF tables chained by the next levels and these
are ignored on the starting level as we have already examined these.
If no EQ_REF table exists, we continue as normal.
This optimization speeds up the greedy_optimizer combination test with
~25%
Other things:
- I ported the changes in MySQL 5.7 to greedy_optimizer.test to MariaDB
to be able to ensure we can handle all cases that MySQL can do.
- I have run all tests with --mysqld=--optimizer_prune_level=1 to verify that
there where no test changes.
MDEV-28073 Slow query performance in MariaDB when using many tables
The faster we can find a good query plan, the more options we have for
finding and pruning (ignoring) bad plans.
This patch adds sorting of plans to best_extension_by_limited_search().
The plans, from best_access_path() are sorted according to the numbers
of found rows. This allows us to faster find 'good tables' and we are
thus able to eliminate 'bad plans' faster.
One side effect of this patch is that if two tables have equal cost,
the table that which was used earlier in the query is preferred.
This allows users to improve plans by reordering eq_ref tables in the
order they would like them to be uses.
Result changes caused by the patch:
- Traces are different as now we print the cost for using tables before
we start considering them in the plan.
- Table order are changed for some plans. In most cases this is because
the plans are equal and tables are in this case sorted according to
their usage in the original query.
- A few plans was changed as the optimizer was able to find a better
plan (that was pruned by the original code).
Other things:
- Added a new statistic variable: "optimizer_join_prefixes_check_calls",
which counts number of calls to best_extension_by_limited_search().
This can be used to check the prune efficiency in greedy_search().
- Added variable "JOIN_TAB::embedded_dependent" to be able to handle
XX IN (SELECT..) in the greedy_optimizer. The idea is that we
should prune a table if any of the tables in embedded_dependent is
not yet read.
- When using many tables in a query, there will be some additional
memory usage as we need to pre-allocate table of
table_count*table_count*sizeof(POSITION) objects (POSITION is 312
bytes for now) to hold the pre-calculated best_access_path()
information. This memory usage is offset by the expected
performance improvement when using many tables in a query.
- Removed the code from an earlier patch to keep the table order in
join->best_ref in the original order. This is not needed anymore as we
are now sorting the tables for each best_extension_by_limited_search()
call.
The issue was that best_extension_by_limited_search() had to go through
too many plans with the same cost as there where many EQ_REF tables.
Fixed by shortcutting EQ_REF (AND REF) when the result only contains one
row. This got the optimization time down from hours to sub seconds.
The only known downside with this patch is that in some cases a table
with ref and 1 record may be used before on EQ_REF table. The faster
optimzation phase should compensate for this.
sprintf() format of double changed from '%lg' to '%-.11lg'
The change was to make it easier to read optimizer trace output
with tables that has millions of records.
- Print the rowid filters that are available for use with each table.
- Make print_best_access_for_table() print which filter it has picked.
- Make best_access_path() print the filter for considered ref accesses.
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
not 1.0. In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
The default keyread_time() was optimized for blocks and not suitable for
HEAP. The effect was the HEAP prefered table scans over ranges for btree
indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
This was to ensure that handler::keyread_time() doesn't give
higher cost for heap tables than for normal tables. One effect of
this is that heap and derived tables stored in heap will prefer
key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
multi_range_read_info_const(). All index cost calculation is now
done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
This was to remove a performance regression between 10.3 and 10.4
In 10.5 we will have a better implementation of records_in_range
that will enable us to get more statistics.
This change was not done in 10.4 because the 10.5 will be part of
a larger change that is not suitable for the GA 10.4 version
Other things:
- Changed default handler block_size to 8192 to fix things statistics
for engines that doesn't set the block size.
- Fixed a bug in spider when using multiple part const ranges
(Patch from Kentoku)
Changed the function append_range_all_keyparts to use sel_arg_range_seq_init / sel_arg_range_seq_next to produce ranges.
Also adjusted to print format for the ranges, now the ranges are printed as:
(keypart1_min, keypart2_min,..) OP (keypart1_name,keypart2_name, ..) OP (keypart1_max,keypart2_max, ..)
Also added more tests for range and index merge access for optimizer trace
This task involves the implementation for the optimizer trace.
This feature produces a trace for any SELECT/UPDATE/DELETE/,
which contains information about decisions taken by the optimizer during
the optimization phase (choice of table access method, various costs,
transformations, etc). This feature would help to tell why some decisions were
taken by the optimizer and why some were rejected.
Trace is session-local, controlled by the @@optimizer_trace variable.
To enable optimizer trace we need to write:
set @@optimizer_trace variable= 'enabled=on';
To display the trace one can run:
SELECT trace FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
This task also involves:
MDEV-18489: Limit the memory used by the optimizer trace
introduces a switch optimizer_trace_max_mem_size which limits
the memory used by the optimizer trace. This was implemented by
Sergei Petrunia.