Commit graph

283 commits

Author SHA1 Message Date
Marko Mäkelä
3c2a5ad3e8 Merge 10.7 into 10.8 2022-07-01 17:53:06 +03:00
Marko Mäkelä
62a20f8047 Merge 10.5 into 10.6 2022-07-01 15:24:50 +03:00
Marko Mäkelä
f09687094c Merge 10.4 into 10.5 2022-07-01 14:42:02 +03:00
Marko Mäkelä
392ee571c1 Merge 10.3 into 10.4 2022-07-01 13:10:36 +03:00
Marko Mäkelä
045771c050 Fix most clang-15 -Wunused-but-set-variable
Also, refactor trx_i_s_common_fill_table() to remove dead code.

Warnings about yynerrs in Bison-generated yyparse() will remain for now.
2022-07-01 09:48:36 +03:00
Sergei Petrunia
ce4956f322 Code cleanup 2022-01-19 18:14:07 +03:00
Sergei Petrunia
db8f15be93 MDEV-27229: Estimation for filtered rows less precise ... #5
Followup: remove this line from get_column_range_cardinality()

      set_if_bigger(res, col_stats->get_avg_frequency());

and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
1d14176ec4 MDEV-26519: Improved histograms: Make JSON parser efficient
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.

Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
106c785e2d MDEV-26911: Unexpected ER_DUP_KEY, ASAN errors, double free detected in ...
When loading the histogram, use table->field[N], not table->s->field[N].

When we used the latter we would corrupt the fields's default value. One
of the consequences of that would be that AUTO_INCREMENT fields would
stop working correctly.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
05877df472 MDEV-26849: JSON Histograms: point selectivity estimates are off
.. for non-existent values.

Handle this special case.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
27539cd2c8 MDEV-26801: Valgrind/MSAN errors in Column_statistics_collected::finish ...
The problem was introduced in fix for MDEV-26724. That patch has made it
possible for histogram collection to fail. In particular, it fails for
non-assigned characters.

When histogram construction fails, we also abort the computation of
COUNT(DISTINCT). When we try to use the value, we get valgrind failures.

Switched the code to abort the statistics collection in this case.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
3936dc3353 MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
Part#3:
- make json_escape() return different errors on conversion error
  and on out-of-space condition.
- Make histogram code handle conversion errors.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
943b8fccf9 MDEV-26710: Histogram field in mysql.column_stats is too short
Change it to LONGBLOB.
Also, update_statistics_for_table() should not "swallow" an error
from open_stat_tables.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
702f4efcd9 More "straightforward" memory management
Do not put Histogram objects on MEM_ROOT at all
2022-01-19 18:10:10 +03:00
Sergei Petrunia
9271bd17f7 More code cleanups
Remove Histogram_*::is_available(), it is not applicable anymore.
Fix compilation on Windows
2022-01-19 18:10:10 +03:00
Sergei Petrunia
1d98168547 Move JSON histograms code into its own files 2022-01-19 18:10:10 +03:00
Sergei Petrunia
4ab2b78b65 Histogram code cleanup and fixes
Factor the code that updates count, count_distinct,
count_distinct_single_occurrence into class Basic_stats_collector

Change from Histogram_builder and its descendant Histogram_builder_json
to  Histogram_builder (the interface), and Histogram_binary_builder,
Histogram_json_builder.

In Histogram_json_builder, do not forget to collect the right bound
of the right-most bucket.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
e675f44624 Code cleanup: don't duplicate the position-in-interval code 2022-01-19 18:10:09 +03:00
Sergei Petrunia
899dfb078e Code cleanups part #3 2022-01-19 18:10:09 +03:00
Sergei Petrunia
859c14ff01 Better names: s/histogram_/histogram/, s/Histogram_json/Histogram_json_hb/ 2022-01-19 18:10:09 +03:00
Sergei Petrunia
fc6a4a33b2 Cleanup histogram collection code 2022-01-19 18:10:09 +03:00
Sergei Petrunia
3486bf4110 Code cleanup + reduce the diff size 2022-01-19 18:10:09 +03:00
Sergei Petrunia
229c836d6a Fix valgrind failure 2022-01-19 18:10:09 +03:00
Sergei Petrunia
dde6d76995 Trivial code cleanup 2022-01-19 18:10:09 +03:00
Sergei Petrunia
a93b377863 Fix histogram memory management
There are "local" histograms that are allocated by one thread for
one TABLE object, and "global" that are allocated for TABLE_SHARE.
2022-01-19 18:10:09 +03:00
Sergei Petrunia
032587e2dc Code cleanup part #3 2022-01-19 18:10:09 +03:00
Sergei Petrunia
fcf58a5e0f Code cleanup part#2: do not copy key values in xxx_selectivity() functions 2022-01-19 18:10:09 +03:00
Sergei Petrunia
2a1cdbabec Fix JSON parsing: future-proof data representation in JSON, code cleanup 2022-01-19 18:10:09 +03:00
Sergei Petrunia
a0b4a86822 Code cleanup part #2. 2022-01-19 18:10:09 +03:00
Sergei Petrunia
72c0ba43b2 Code cleanup part #1 2022-01-19 18:10:09 +03:00
Sergei Petrunia
f76e310ace Rename histogram_type=JSON to JSON_HB 2022-01-19 18:10:09 +03:00
Sergei Petrunia
a48e63c5fe Fix compile error and test failure:
- Don't use 'res' uninitialized
- multiply it by col_non_nulls before set_if_bigger(...) call.
2022-01-19 18:10:09 +03:00
Michael Okoko
096995b106 Fix column range cardinality crash when histogram is null
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:09 +03:00
Michael Okoko
058a90e6f5 Use existing statistics test to improve coverage for JSON statistics
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:09 +03:00
Michael Okoko
3692adebd4 Fix avg_frequency statistics and remove stderr dumps
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
bff65a813e Implement point selectivity for JSON histograms
* Also merges tests relating to JSON statistics into one file

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
547f805311 Refactor histogram point selectivity
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
e10d99ce87 Backfill json histogram bounds during building
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
3d952cd8bd Improve tests and test results to cover larger cases
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
63cbd0748b replace range_selectivity methods for Histograms and add tests
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
c129689ddc Use binary search to compute range selectivity
* it also adds an "explain select" statement to the test so that the fprintf calls
  can print the computed intervals to mysqld.1.err

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
c605285bb8 fix returned value type for empty json objects
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
69f24c238e Use generic Histogram_base class for Histogram_builders
This fixes the wrong calculation for avg_frequency in json histograms
by replacing the specific histogram objects with the generic Histogram_base class.

It also restores get/set size functions as they were useful in calculating fields
for binary histogram.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
21e0f5487f MDEV-21130: Histograms: use JSON as on-disk format
A demo of how to use in-memory data structure for histogram.
The patch shows how to
* convert string form of data to binary form
* compare two values in binary form
* compute a fraction for val in [X, Y] range.

grep for GSOC-TODO for notes.
2022-01-19 18:10:08 +03:00
Michael Okoko
e778d12f83 report parse error when parsing JSON histogram fails
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
fe2e516a50 inform test result of zero hist_size for json histogram
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
bf4d0dcfe2 implement parse and serialize for histogram json 2022-01-19 18:10:08 +03:00
Michael Okoko
9bba595528 remove unneeded shared methods
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
1fa7af749e Split histogram classes and into JSON and binary classes
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
1998b787ac MDEV-21130: Histograms: use JSON as on-disk format
Preparation for handling different kinds of histograms:

- In Column_statistics, change "Histogram histogram" into
  "Histogram *histogram_".  This allows for different kinds
  of Histogram classes with virtual functions.

- [Almost] remove the usage of Histogram->set_values and
  Histogram->set_size. The code outside the histogram should
  not make any assumptions about what/how is stored in the Histogram.

- Introduce drafts of methods to read/save histograms to/from disk.
2022-01-19 18:10:08 +03:00