(With trivial fixes by sergey@mariadb.com)
Added option fix_innodb_cardinality to optimizer_adjust_secondary_key_costs
Using fix_innodb_cardinality disables the 'divide by 2' of rec_per_key_int
in InnoDB that in effect doubles the Cardinality for secondary keys.
This has the biggest effect for indexes where a few rows has the same key
value. Using this may also cause table scans for very small tables (which
in some cases may be better than an index scan).
The user visible effect is that 'SHOW INDEX FROM table_name' will for
InnoDB show the true Cardinality (and not 2x the real value). It will
also allow the optimizer to chose a better index in some cases as the
division by 2 could have a bad effect for tables with 2-5 identical values
per key.
A few notes about using fix_innodb_cardinality:
- It has direct affect for SHOW INDEX FROM table_name. SHOW INDEX
will also update the statistics in table share.
- The effect of fix_innodb_cardinality for query plans or EXPLAIN
is only visible after first open of the table. This is why one must
do a flush tables or use SHOW INDEX for the option to take effect.
- Using fix_innodb_cardinality can thus affect all user in their query
plans if they are using the same tables.
Because of this, it is strongly recommended that one uses
optimizer_adjust_secondary_key_costs=fix_innodb_cardinality mainly
in configuration files to not cause issues for other users.
In MariaDB up to 10.11, the test_if_cheaper_ordering() code (that tries
to optimizer how GROUP BY is executed) assumes that if a table scan is used
then if there is any index usable by GROUP BY it will be used.
The reason MySQL 10.4 provides a better plan is because of two differences:
- Plans using 'ref' has a cost of 1/10 of what it should be (as a
protection against table scans). This is why 'ref' is used in 10.4
and not in 10.5.
- When 'ref' is used, then GROUP BY will not use an index for GROUP BY.
In MariaDB 10.5 the chosen plan is a table scan (as it calculated to be
faster) but as 'ref' is not used, the test_if_cheaper_ordering()
optimizer phase decides (as ref is not usd) to use an index for GROUP BY,
which has bad performance.
Description of fix:
- All new code is protected by the "optimizer_adjust_secondary_key_costs"
variable, which is now a bit map, and is only executed if the option
"disable_forced_index_in_group_by" set.
- Corrects GROUP BY handling in test_if_cheaper_ordering() by making
the choise of using and index with GROUP BY cost based instead of rule
based.
- Adds TIME_FOR_COMPARE to all costs, when using group by, to make
read_time, index_scan_time and range_cost comparable.
Other things:
- Made optimizer_adjust_secondary_key_costs a bit map (compatible with old
code).
Notes:
Current code ignores costs for the algorithm used when doing GROUP
BY on the first table:
- Create an in-memory temporary table for handling group by and doing a
filesort of the result file
We can probably in 10.6 continue to ignore this cost.
This patch should NOT be merged to 11.0 series (not needed in 11.0).
optimizer-adjust_secondary_key_costs is added to provide 2 small
adjustments to the 10.x optimizer cost model. This can be used in the
case where the optimizer wrongly uses a secondary key instead of a
clustered primary key.
The reason behind this change is that MariaDB 10.x does not take into
account that for engines like InnoDB, that scanning a primary key can be
up to 7x faster than scanning a secondary key + read the row data trough
the primary key.
The different values for optimizer_adjust_secondary_key_costs are:
optimizer_adjust_secondary_key_costs=0
- No changes to current model
optimizer_adjust_secondary_key_costs=1
- Ensure that the cost of of secondary indexes has a cost of at
least 5x times the cost of a clustered primary key (if one exists).
This disables part of the worst_seek optimization described below.
optimizer_adjust_secondary_key_costs=2
- Disable "worst_seek optimization" and adjust filter cost slightly
(add cost of 1 if filter is used).
The idea behind 'worst_seek optimization' is that we limit the
cost for all non clustered ref access to the least of:
- best-rows-by-range (or all rows in no range found) / 10
- scan-time-table (roughly number of file blocks to scan table) * 3
In addition we also do not try to use rowid_filter if number of rows
estimated for 'ref' access is less than the worst_seek limitation.
The idea is that worst_seek is trying to take into account that if
we do a lot of accesses through a key, this is likely to be cached.
However it only does this for secondary keys, and not for clustered
keys or index only reads.
The effect of the worst_seek are:
- In some cases 'ref' will have a much lower cost than range or using
a clustered key.
- Some possible rowid filters for secondary keys will be ignored.
When implementing optimizer_adjust_secondary_key_costs=2, I noticed
that there is a slightly different costs for how ref+filter and
range+filter are calculated. This caused a lot of range and
range+filter to change to ref+filter, which is not good as
range+filter provides the optimizer a better estimate of how many
accepted rows there will be in the result set.
Adding a extra small cost (1 seek) when using filter mitigated the
above problems in almost all cases.
This patch should not be applied to MariaDB 11.0 as worst_seeks is
removed in 11.0 and the cost calculation for clustered keys, secondary
keys, index scan and filter is more exact.
Test case changes for --optimizer-adjust_secondary_key_costs=1
(Fix secondary key costs to be 5x of primary key):
- stat_tables_innodb:
- Complex change (probably ok as number of rows are really small)
- ref over 1 row changed to range over 10 rows with join buffer
- ref over 5 rows changed to eq_ref
- secondary ref over 1 row changed to ref of primary key over 4 rows
- Change of key to use longer key with index pushdown (a little
bit worse but not significant).
- Change to use secondary (1 row) -> primary (4 rows)
- rowid_filter_innodb:
- index_merge (2 rows) & ref (1) -> all (23 rows) -> primary eq_ref.
Test case changes for --optimizer-adjust_secondary_key_costs=2
(remove of worst_seeks & adjust filter cost):
- stat_tables_innodb:
- Join order change (probably ok as number of rows are really small)
- ref (5 rows) & ref(1 row) changed to range (10 rows & join buffer)
& eq_ref.
- selectivity_innodb:
- ref -> ref|filter (ok)
- rowid_filter_innodb:
- ref -> ref|filter (ok)
- range|filter (64 rows) changed to ref|filter (128 rows).
ok as ref|filter outputs wrong number of rows in explain.
- range, range_mrr_icp:
-ref (500 rows -> ALL (1000 rows) (ok)
- select_pkeycache, select, select_jcl6:
- ref|filter (2 rows) -> ref (2 rows) (ok)
- selectivity:
- ref -> ref_filter (ok)
- range:
- Change of 'filtered' but no stat or plan change (ok)
- selectivity:
- ref -> ref+filter (ok)
- Change of filtered but no plan change (ok)
- join_nested_jcl6:
- range -> ref|filter (ok as only 2 rows)
- subselect3, subselect3_jcl6:
- ref_or_null (4 rows) -> ALL (10 rows) (ok)
- Index_subquery (4 rows) -> ALL (10 rows) (ok)
- partition_mrr_myisam, partition_mrr_aria and partition_mrr_innodb:
- Uses ALL instead of REF for a key value that is the same for > 50%
of rows. (good)
order_by_innodb:
- range (200 rows) -> ref (20 rows)+filesort (ok)
- subselect_sj2_mat:
- One test changed. One ALL removed and replaced with eq_ref. Likely
to be better.
- join_cache:
- Changed ref over 60% of the rows to use hash join (ok)
- opt_tvc:
- Changed to use eq_ref instead of ref with plan change (probably ok)
- opt_trace:
- No worst/max seeks clipping (good).
- Almost double range_scan_time and index_scan_time (ok).
- rowid_filter:
- ref -> ref|filtered (ok)
- range|filter (77 rows) changed to ref|filter (151 rows). Proably
ok as ref|filter outputs wrong number of rows in explain.
Reviewer: Sergei Petrunia <sergey@mariadb.com>