Followup: remove this line from get_column_range_cardinality()
set_if_bigger(res, col_stats->get_avg_frequency());
and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
Also report JSON histogram load errors into error log, like it is already
done with other histogram/statistics load errors.
Add test coverage to see what happens if one upgrades but does NOT run
mysql_upgrade.
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.
Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
- Make Histogram_json_hb::range_selectivity handle singleton buckets
specially when computing selectivity of the max. endpoint bound.
(for min. endpoint, we already do that).
- Also, fixed comments for Histogram_json_hb::find_bucket
When loading the histogram, use table->field[N], not table->s->field[N].
When we used the latter we would corrupt the fields's default value. One
of the consequences of that would be that AUTO_INCREMENT fields would
stop working correctly.
The problem was introduced in fix for MDEV-26724. That patch has made it
possible for histogram collection to fail. In particular, it fails for
non-assigned characters.
When histogram construction fails, we also abort the computation of
COUNT(DISTINCT). When we try to use the value, we get valgrind failures.
Switched the code to abort the statistics collection in this case.
Part#3:
- make json_escape() return different errors on conversion error
and on out-of-space condition.
- Make histogram code handle conversion errors.
- Use String::c_ptr_safe() instead of String::c_ptr
- Do proper datatype conversions in Histogram_json_hb::parse
- Remove Histogram_json_hb::Bucket::end_value. Introduce
get_end_value() instead.
Basic ideas:
1. Store "popular" values in their own buckets.
2. Also store ndv (Number of Distinct Values) in each bucket.
Because of #1, the buckets are now variable-size, so store the size in
each bucket.
Adjust selectivity estimation functions accordingly.
- Fix bad tests in statistics_json test: make them meaningful and make them
work on windows
- Fix analyze_debug.test: correctly handle errors during ANALYZE
Factor the code that updates count, count_distinct,
count_distinct_single_occurrence into class Basic_stats_collector
Change from Histogram_builder and its descendant Histogram_builder_json
to Histogram_builder (the interface), and Histogram_binary_builder,
Histogram_json_builder.
In Histogram_json_builder, do not forget to collect the right bound
of the right-most bucket.