Commit graph

30 commits

Author SHA1 Message Date
Sergei Petrunia
d8d57d2c27 MDEV-26764: JSON_HB Histograms: handle BINARY and unassigned characters
Encode such characters in hex.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
748b293c14 More test coverage 2022-01-19 18:10:12 +03:00
Sergei Petrunia
c2d2c1e727 MDEV-26519: Improved histograms
Save extra information in the histogram:

    "target_histogram_size": nnn,
    "collected_at": "(date and time)",
    "collected_by": "(server version)",
2022-01-19 18:10:12 +03:00
Sergei Petrunia
a0916cf5a2 MDEV-26519: Improved histograms: Better error reporting, test coverage
Also report JSON histogram load errors into error log, like it is already
done with other histogram/statistics load errors.

Add test coverage to see what happens if one upgrades but does NOT run
mysql_upgrade.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
a0f93f433a Rename histogram_hb_v2 -> histogram_hb 2022-01-19 18:10:11 +03:00
Sergei Petrunia
1d14176ec4 MDEV-26519: Improved histograms: Make JSON parser efficient
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.

Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
be55ad0d34 MDEV-27062: Make histogram_type=JSON_HB the new default 2022-01-19 18:10:11 +03:00
Sergei Petrunia
eb6a9ad705 MDEV-26886: Estimation for filtered rows less precise with JSON histogram
- Make Histogram_json_hb::range_selectivity handle singleton buckets
  specially when computing selectivity of the max. endpoint bound.
  (for min. endpoint, we already do that).

- Also, fixed comments for Histogram_json_hb::find_bucket
2022-01-19 18:10:11 +03:00
Sergei Petrunia
ac0194bd0e MDEV-26892: JSON histograms become invalid with a specific (corrupt) value ..
Handle the case where the last value in the table cannot be represented
in utf8mb4.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
05877df472 MDEV-26849: JSON Histograms: point selectivity estimates are off
.. for non-existent values.

Handle this special case.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
f3f78bed85 MDEV-26750: Estimation for filtered rows is far off with JSON_HB histogram
Fix a bug in position_in_interval(). Do not overwrite one interval endpoint
with another.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
93d5980435 MDEV-26709: JSON histogram may contain bucketS than histogram_size allows
When computing bucket_capacity= records/histogram->get_width(), round
the value UP, not down.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
3936dc3353 MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
Part#3:
- make json_escape() return different errors on conversion error
  and on out-of-space condition.
- Make histogram code handle conversion errors.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
5d66eeb3a1 MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
.. part#2: correctly pass the charset to JSON [un]escape functions
2022-01-19 18:10:11 +03:00
Sergei Petrunia
5c709ef18c MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
Correctly handle empty string when [un]escaping JSON
2022-01-19 18:10:11 +03:00
Sergei Petrunia
61cd4f4412 MDEV-26711: Values in JSON histograms are not properly quoted
Escape values when serializing to JSON. Un-escape when reading back.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
d03daaf8a8 Use JSON_NAME, not the "histogram_hb_v2" constant 2022-01-19 18:10:10 +03:00
Sergei Petrunia
28ad128585 Fix off-by-one error in Histogram_json_hb::find_bucket 2022-01-19 18:10:10 +03:00
Sergei Petrunia
b179640219 MDEV-26590: Stack smashing/buffer overflow in Histogram_json_hb::parse
Provide buffer of sufficient size.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
382250c05c Address review input 2022-01-19 18:10:10 +03:00
Sergei Petrunia
cf8927e9cb Fix the previous cset: next() should have element_count as parameter 2022-01-19 18:10:10 +03:00
Sergei Petrunia
b6121ca36a Fix compile warnings/error on Windows 2022-01-19 18:10:10 +03:00
Sergei Petrunia
6375873c9a Fixes in opt_histogram_json.cc in the last commits
Aslo add more test coverage
2022-01-19 18:10:10 +03:00
Sergei Petrunia
ace961a1e7 Fix compile error on windows 2022-01-19 18:10:10 +03:00
Sergei Petrunia
f460272054 MDEV-26519: JSON Histograms: improve histogram collection
Basic ideas:
1. Store "popular" values in their own buckets.
2. Also store ndv (Number of Distinct Values) in each bucket.

Because of #1, the buckets are now variable-size, so store the size in
each bucket.

Adjust selectivity estimation functions accordingly.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
d64e104810 Fix compilation on windows 2022-01-19 18:10:10 +03:00
Sergei Petrunia
5ddbd72af4 Correctly decode string field values for pos_in_interval_for_string call 2022-01-19 18:10:10 +03:00
Sergei Petrunia
e0f42d32e5 Fix compilation on windows part #3 2022-01-19 18:10:10 +03:00
Sergei Petrunia
9271bd17f7 More code cleanups
Remove Histogram_*::is_available(), it is not applicable anymore.
Fix compilation on Windows
2022-01-19 18:10:10 +03:00
Sergei Petrunia
1d98168547 Move JSON histograms code into its own files 2022-01-19 18:10:10 +03:00