Followup: remove this line from get_column_range_cardinality()
set_if_bigger(res, col_stats->get_avg_frequency());
and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.
Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
When loading the histogram, use table->field[N], not table->s->field[N].
When we used the latter we would corrupt the fields's default value. One
of the consequences of that would be that AUTO_INCREMENT fields would
stop working correctly.
The problem was introduced in fix for MDEV-26724. That patch has made it
possible for histogram collection to fail. In particular, it fails for
non-assigned characters.
When histogram construction fails, we also abort the computation of
COUNT(DISTINCT). When we try to use the value, we get valgrind failures.
Switched the code to abort the statistics collection in this case.
Part#3:
- make json_escape() return different errors on conversion error
and on out-of-space condition.
- Make histogram code handle conversion errors.
Factor the code that updates count, count_distinct,
count_distinct_single_occurrence into class Basic_stats_collector
Change from Histogram_builder and its descendant Histogram_builder_json
to Histogram_builder (the interface), and Histogram_binary_builder,
Histogram_json_builder.
In Histogram_json_builder, do not forget to collect the right bound
of the right-most bucket.
* it also adds an "explain select" statement to the test so that the fprintf calls
can print the computed intervals to mysqld.1.err
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
This fixes the wrong calculation for avg_frequency in json histograms
by replacing the specific histogram objects with the generic Histogram_base class.
It also restores get/set size functions as they were useful in calculating fields
for binary histogram.
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
A demo of how to use in-memory data structure for histogram.
The patch shows how to
* convert string form of data to binary form
* compare two values in binary form
* compute a fraction for val in [X, Y] range.
grep for GSOC-TODO for notes.
Preparation for handling different kinds of histograms:
- In Column_statistics, change "Histogram histogram" into
"Histogram *histogram_". This allows for different kinds
of Histogram classes with virtual functions.
- [Almost] remove the usage of Histogram->set_values and
Histogram->set_size. The code outside the histogram should
not make any assumptions about what/how is stored in the Histogram.
- Introduce drafts of methods to read/save histograms to/from disk.