MDEV-28073 Slow query performance in MariaDB when using many tables
The faster we can find a good query plan, the more options we have for
finding and pruning (ignoring) bad plans.
This patch adds sorting of plans to best_extension_by_limited_search().
The plans, from best_access_path() are sorted according to the numbers
of found rows. This allows us to faster find 'good tables' and we are
thus able to eliminate 'bad plans' faster.
One side effect of this patch is that if two tables have equal cost,
the table that which was used earlier in the query is preferred.
This allows users to improve plans by reordering eq_ref tables in the
order they would like them to be uses.
Result changes caused by the patch:
- Traces are different as now we print the cost for using tables before
we start considering them in the plan.
- Table order are changed for some plans. In most cases this is because
the plans are equal and tables are in this case sorted according to
their usage in the original query.
- A few plans was changed as the optimizer was able to find a better
plan (that was pruned by the original code).
Other things:
- Added a new statistic variable: "optimizer_join_prefixes_check_calls",
which counts number of calls to best_extension_by_limited_search().
This can be used to check the prune efficiency in greedy_search().
- Added variable "JOIN_TAB::embedded_dependent" to be able to handle
XX IN (SELECT..) in the greedy_optimizer. The idea is that we
should prune a table if any of the tables in embedded_dependent is
not yet read.
- When using many tables in a query, there will be some additional
memory usage as we need to pre-allocate table of
table_count*table_count*sizeof(POSITION) objects (POSITION is 312
bytes for now) to hold the pre-calculated best_access_path()
information. This memory usage is offset by the expected
performance improvement when using many tables in a query.
- Removed the code from an earlier patch to keep the table order in
join->best_ref in the original order. This is not needed anymore as we
are now sorting the tables for each best_extension_by_limited_search()
call.
Due to inconsistent usage of different cost models to calculate
the cost of ref accesses we have to make the calculation of the
gain promising by usage a range filter more complex.
This patch contains a full implementation of the optimization
that allows to use in-memory rowid / primary filters built for range
conditions over indexes. In many cases usage of such filters reduce
the number of disk seeks spent for fetching table rows.
In this implementation the choice of what possible filter to be applied
(if any) is made purely on cost-based considerations.
This implementation re-achitectured the partial implementation of
the feature pushed by Galina Shalygina in the commit
8d5a11122c.
Besides this patch contains a better implementation of the generic
handler function handler::multi_range_read_info_const() that
takes into account gaps between ranges when calculating the cost of
range index scans. It also contains some corrections of the
implementation of the handler function records_in_range() for MyISAM.
This patch supports the feature for InnoDB and MyISAM.