Commit graph

194734 commits

Author SHA1 Message Date
Sergei Petrunia
db8f15be93 MDEV-27229: Estimation for filtered rows less precise ... #5
Followup: remove this line from get_column_range_cardinality()

      set_if_bigger(res, col_stats->get_avg_frequency());

and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
d3e511d421 MDEV-27243: Estimation for filtered rows less precise ... #7
Added a testcase
2022-01-19 18:10:12 +03:00
Sergei Petrunia
531dd708ef MDEV-27229: Estimation for filtered rows less precise ... #5
Fix special handling for values that are right next to buckets with ndv=1.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
67d4d0426f Update test results 2022-01-19 18:10:12 +03:00
Sergei Petrunia
905634dc3f MDEV-27230: Estimation for filtered rows less precise ...
Fix the code in Histogram_json_hb::range_selectivity that handles
special cases: a non-inclusive endpoint hitting a bucket boundary...
2022-01-19 18:10:12 +03:00
Sergei Petrunia
08f1c4a2e0 MDEV-27203: Valgrind / MSAN errors in Histogram_json_hb::parse_bucket
In read_bucket_endpoint(), handle all possible parser states.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
d8d57d2c27 MDEV-26764: JSON_HB Histograms: handle BINARY and unassigned characters
Encode such characters in hex.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
748b293c14 More test coverage 2022-01-19 18:10:12 +03:00
Sergei Petrunia
c2d2c1e727 MDEV-26519: Improved histograms
Save extra information in the histogram:

    "target_histogram_size": nnn,
    "collected_at": "(date and time)",
    "collected_by": "(server version)",
2022-01-19 18:10:12 +03:00
Sergei Petrunia
a0916cf5a2 MDEV-26519: Improved histograms: Better error reporting, test coverage
Also report JSON histogram load errors into error log, like it is already
done with other histogram/statistics load errors.

Add test coverage to see what happens if one upgrades but does NOT run
mysql_upgrade.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
a0f93f433a Rename histogram_hb_v2 -> histogram_hb 2022-01-19 18:10:11 +03:00
Sergei Petrunia
1d14176ec4 MDEV-26519: Improved histograms: Make JSON parser efficient
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.

Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
be55ad0d34 MDEV-27062: Make histogram_type=JSON_HB the new default 2022-01-19 18:10:11 +03:00
Sergei Petrunia
eb6a9ad705 MDEV-26886: Estimation for filtered rows less precise with JSON histogram
- Make Histogram_json_hb::range_selectivity handle singleton buckets
  specially when computing selectivity of the max. endpoint bound.
  (for min. endpoint, we already do that).

- Also, fixed comments for Histogram_json_hb::find_bucket
2022-01-19 18:10:11 +03:00
Sergei Petrunia
106c785e2d MDEV-26911: Unexpected ER_DUP_KEY, ASAN errors, double free detected in ...
When loading the histogram, use table->field[N], not table->s->field[N].

When we used the latter we would corrupt the fields's default value. One
of the consequences of that would be that AUTO_INCREMENT fields would
stop working correctly.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
ac0194bd0e MDEV-26892: JSON histograms become invalid with a specific (corrupt) value ..
Handle the case where the last value in the table cannot be represented
in utf8mb4.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
05877df472 MDEV-26849: JSON Histograms: point selectivity estimates are off
.. for non-existent values.

Handle this special case.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
f3f78bed85 MDEV-26750: Estimation for filtered rows is far off with JSON_HB histogram
Fix a bug in position_in_interval(). Do not overwrite one interval endpoint
with another.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
27539cd2c8 MDEV-26801: Valgrind/MSAN errors in Column_statistics_collected::finish ...
The problem was introduced in fix for MDEV-26724. That patch has made it
possible for histogram collection to fail. In particular, it fails for
non-assigned characters.

When histogram construction fails, we also abort the computation of
COUNT(DISTINCT). When we try to use the value, we get valgrind failures.

Switched the code to abort the statistics collection in this case.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
93d5980435 MDEV-26709: JSON histogram may contain bucketS than histogram_size allows
When computing bucket_capacity= records/histogram->get_width(), round
the value UP, not down.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
3936dc3353 MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
Part#3:
- make json_escape() return different errors on conversion error
  and on out-of-space condition.
- Make histogram code handle conversion errors.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
8e0a342b91 Update test results 2022-01-19 18:10:11 +03:00
Sergei Petrunia
b17f33a04b MDEV-26737: Outdated VARIABLE_COMMENT for HISTOGRAM_TYPE in I_S.SYSTEM_VARIABLES
Fix the description
2022-01-19 18:10:11 +03:00
Sergei Petrunia
943b8fccf9 MDEV-26710: Histogram field in mysql.column_stats is too short
Change it to LONGBLOB.
Also, update_statistics_for_table() should not "swallow" an error
from open_stat_tables.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
5d66eeb3a1 MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
.. part#2: correctly pass the charset to JSON [un]escape functions
2022-01-19 18:10:11 +03:00
Sergei Petrunia
43a8d9f156 MDEV-26595: ASAN use-after-poison my_strnxfrm_simple_internal / Histogram_json_hb::range_selectivity
Add testcase
2022-01-19 18:10:11 +03:00
Sergei Petrunia
5ef350a7f1 MDEV-26589: Assertion failure upon DECODE_HISTOGRAM with NULLs
Item_func_decode_histogram::val_str should correctly set null_value
when "decoding" JSON histogram.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
5c709ef18c MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
Correctly handle empty string when [un]escaping JSON
2022-01-19 18:10:11 +03:00
Sergei Petrunia
61cd4f4412 MDEV-26711: Values in JSON histograms are not properly quoted
Escape values when serializing to JSON. Un-escape when reading back.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
d03daaf8a8 Use JSON_NAME, not the "histogram_hb_v2" constant 2022-01-19 18:10:10 +03:00
Sergei Petrunia
702f4efcd9 More "straightforward" memory management
Do not put Histogram objects on MEM_ROOT at all
2022-01-19 18:10:10 +03:00
Sergei Petrunia
28ad128585 Fix off-by-one error in Histogram_json_hb::find_bucket 2022-01-19 18:10:10 +03:00
Sergei Petrunia
b179640219 MDEV-26590: Stack smashing/buffer overflow in Histogram_json_hb::parse
Provide buffer of sufficient size.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
382250c05c Address review input 2022-01-19 18:10:10 +03:00
Sergei Petrunia
cf8927e9cb Fix the previous cset: next() should have element_count as parameter 2022-01-19 18:10:10 +03:00
Sergei Petrunia
b6121ca36a Fix compile warnings/error on Windows 2022-01-19 18:10:10 +03:00
Sergei Petrunia
6375873c9a Fixes in opt_histogram_json.cc in the last commits
Aslo add more test coverage
2022-01-19 18:10:10 +03:00
Sergei Petrunia
49a7bbb1f6 Valgrind fixes, poor .result fixes, code cleanups
- Use String::c_ptr_safe() instead of String::c_ptr
- Do proper datatype conversions in Histogram_json_hb::parse
- Remove Histogram_json_hb::Bucket::end_value. Introduce
  get_end_value() instead.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
ace961a1e7 Fix compile error on windows 2022-01-19 18:10:10 +03:00
Sergei Petrunia
f460272054 MDEV-26519: JSON Histograms: improve histogram collection
Basic ideas:
1. Store "popular" values in their own buckets.
2. Also store ndv (Number of Distinct Values) in each bucket.

Because of #1, the buckets are now variable-size, so store the size in
each bucket.

Adjust selectivity estimation functions accordingly.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
d64e104810 Fix compilation on windows 2022-01-19 18:10:10 +03:00
Sergei Petrunia
5ddbd72af4 Correctly decode string field values for pos_in_interval_for_string call 2022-01-19 18:10:10 +03:00
Sergei Petrunia
223fa6a891 Make tests pass
- Fix bad tests in statistics_json test: make them meaningful and make them
  work on windows
- Fix analyze_debug.test: correctly handle errors during ANALYZE
2022-01-19 18:10:10 +03:00
Sergei Petrunia
e0f42d32e5 Fix compilation on windows part #3 2022-01-19 18:10:10 +03:00
Sergei Petrunia
00377dbae8 Fix embedded to work 2022-01-19 18:10:10 +03:00
Sergei Petrunia
716c98b15d Fix compilation on windows part 2 2022-01-19 18:10:10 +03:00
Sergei Petrunia
1861a2a2cd Rollback a change from previous commit 2022-01-19 18:10:10 +03:00
Sergei Petrunia
9271bd17f7 More code cleanups
Remove Histogram_*::is_available(), it is not applicable anymore.
Fix compilation on Windows
2022-01-19 18:10:10 +03:00
Sergei Petrunia
1d98168547 Move JSON histograms code into its own files 2022-01-19 18:10:10 +03:00
Sergei Petrunia
4ab2b78b65 Histogram code cleanup and fixes
Factor the code that updates count, count_distinct,
count_distinct_single_occurrence into class Basic_stats_collector

Change from Histogram_builder and its descendant Histogram_builder_json
to  Histogram_builder (the interface), and Histogram_binary_builder,
Histogram_json_builder.

In Histogram_json_builder, do not forget to collect the right bound
of the right-most bucket.
2022-01-19 18:10:10 +03:00