Histogram_json_hb::range_selectivity() may return small negative
numbers due to rounding errors in the histogram.
Make sure the returned value is non-negative.
Add an assert to catch negative values that are not small.
(attempt #2)
In Histogram_json_hb::point_selectivity(), do return selectivity of 0.0
when the histogram says so.
The logic of "Do not return 0.0 estimate as it causes a multiply-by-zero
meltdown in cost and cardinality calculations" is moved into
records_in_column_ranges() where it is one *once* per column pair (as
opposed to doing once per range, which can cause the error to add-up
to large number when there are many ranges)
Followup: remove this line from get_column_range_cardinality()
set_if_bigger(res, col_stats->get_avg_frequency());
and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
Also report JSON histogram load errors into error log, like it is already
done with other histogram/statistics load errors.
Add test coverage to see what happens if one upgrades but does NOT run
mysql_upgrade.
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.
Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
- Make Histogram_json_hb::range_selectivity handle singleton buckets
specially when computing selectivity of the max. endpoint bound.
(for min. endpoint, we already do that).
- Also, fixed comments for Histogram_json_hb::find_bucket
When loading the histogram, use table->field[N], not table->s->field[N].
When we used the latter we would corrupt the fields's default value. One
of the consequences of that would be that AUTO_INCREMENT fields would
stop working correctly.
The problem was introduced in fix for MDEV-26724. That patch has made it
possible for histogram collection to fail. In particular, it fails for
non-assigned characters.
When histogram construction fails, we also abort the computation of
COUNT(DISTINCT). When we try to use the value, we get valgrind failures.
Switched the code to abort the statistics collection in this case.
Part#3:
- make json_escape() return different errors on conversion error
and on out-of-space condition.
- Make histogram code handle conversion errors.
- Fix bad tests in statistics_json test: make them meaningful and make them
work on windows
- Fix analyze_debug.test: correctly handle errors during ANALYZE
* it also adds an "explain select" statement to the test so that the fprintf calls
can print the computed intervals to mysqld.1.err
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
This fixes the memory allocation for json histogram builder and add more column types for testing.
Some challenges at the moment include:
* Garbage value at the end of JSON array still persists.
* Garbage value also gets appended to bucket values if the column is a primary key.
* There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests.
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>