Histogram_json_hb::range_selectivity() may return small negative
numbers due to rounding errors in the histogram.
Make sure the returned value is non-negative.
Add an assert to catch negative values that are not small.
(attempt #2)
GCC does not understand that the variable have_ndv determines
whether the variable ndv_ll is initialized. Let us add a
redundant initialization to pacify GCC.
In Histogram_json_hb::point_selectivity(), do return selectivity of 0.0
when the histogram says so.
The logic of "Do not return 0.0 estimate as it causes a multiply-by-zero
meltdown in cost and cardinality calculations" is moved into
records_in_column_ranges() where it is one *once* per column pair (as
opposed to doing once per range, which can cause the error to add-up
to large number when there are many ranges)
Followup: remove this line from get_column_range_cardinality()
set_if_bigger(res, col_stats->get_avg_frequency());
and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
Also report JSON histogram load errors into error log, like it is already
done with other histogram/statistics load errors.
Add test coverage to see what happens if one upgrades but does NOT run
mysql_upgrade.
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.
Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
- Make Histogram_json_hb::range_selectivity handle singleton buckets
specially when computing selectivity of the max. endpoint bound.
(for min. endpoint, we already do that).
- Also, fixed comments for Histogram_json_hb::find_bucket
Part#3:
- make json_escape() return different errors on conversion error
and on out-of-space condition.
- Make histogram code handle conversion errors.
Basic ideas:
1. Store "popular" values in their own buckets.
2. Also store ndv (Number of Distinct Values) in each bucket.
Because of #1, the buckets are now variable-size, so store the size in
each bucket.
Adjust selectivity estimation functions accordingly.