MDEV-38164: Fix the estimates reported by TABLE::key_storage_length()

They were based on the maximum possible key tuple length, which can be
much larger than the real data size.

The return value is used by handler::keyread_time(), which is used
to estimate the cost of range access.
This could cause range access not to be picked, even if it uses
the clustered PK and reads about 8% of the table.

The fix is to add KEY::stat_storage_length (next to KEY::rec_per_key) and
have the storage engine fill it in handler::info(HA_STATUS_CONST).

Currently, only InnoDB fills this based on its internal statistics:
index->stat_index_size and ib_table->stat_n_rows.

Also changed:
- In handler::calculate_costs(), use ha_keyread_clustered_time() when
  computing clustered PK read cost, not ha_keyread_time().

The fix is OFF by default and enabled by setting FIX_INDEX_LOOKUP_COST flag
in @@new_mode.
This commit is contained in:
Sergei Petrunia 2025-11-21 13:17:19 +02:00
commit 4cff562f3f
14 changed files with 170 additions and 15 deletions

View file

@ -15003,6 +15003,28 @@ stats_fetch:
}
KEY* key = &table->key_info[i];
/*
Provide statistics about how many bytes an index record takes on disk,
on average.
*/
if (ib_table->stat_n_rows && !(key->flags & (HA_FULLTEXT | HA_SPATIAL))) {
/*
Start with total space used by the index divided by number of rows
*/
key->stat_storage_length = (size_t) ((index->stat_index_size * srv_page_size) /
ib_table->stat_n_rows);
/*
The above can be too large
A) in case of tables with very few rows
B) in case of indexes with partially full pages.
So, clip the above estimate by a conservative estimate we've used
before:
*/
size_t conservative_estimate = table->key_storage_length_from_ddl(i);
if (key->stat_storage_length > conservative_estimate)
key->stat_storage_length = conservative_estimate;
}
for (j = 0; j < key->ext_key_parts; j++) {