Commit graph

93 commits

Author SHA1 Message Date
Sergei Petrunia
ce4956f322 Code cleanup 2022-01-19 18:14:07 +03:00
Sergei Petrunia
db8f15be93 MDEV-27229: Estimation for filtered rows less precise ... #5
Followup: remove this line from get_column_range_cardinality()

      set_if_bigger(res, col_stats->get_avg_frequency());

and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
1d14176ec4 MDEV-26519: Improved histograms: Make JSON parser efficient
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.

Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
05877df472 MDEV-26849: JSON Histograms: point selectivity estimates are off
.. for non-existent values.

Handle this special case.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
702f4efcd9 More "straightforward" memory management
Do not put Histogram objects on MEM_ROOT at all
2022-01-19 18:10:10 +03:00
Sergei Petrunia
9271bd17f7 More code cleanups
Remove Histogram_*::is_available(), it is not applicable anymore.
Fix compilation on Windows
2022-01-19 18:10:10 +03:00
Sergei Petrunia
1d98168547 Move JSON histograms code into its own files 2022-01-19 18:10:10 +03:00
Sergei Petrunia
4ab2b78b65 Histogram code cleanup and fixes
Factor the code that updates count, count_distinct,
count_distinct_single_occurrence into class Basic_stats_collector

Change from Histogram_builder and its descendant Histogram_builder_json
to  Histogram_builder (the interface), and Histogram_binary_builder,
Histogram_json_builder.

In Histogram_json_builder, do not forget to collect the right bound
of the right-most bucket.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
859c14ff01 Better names: s/histogram_/histogram/, s/Histogram_json/Histogram_json_hb/ 2022-01-19 18:10:09 +03:00
Sergei Petrunia
fc6a4a33b2 Cleanup histogram collection code 2022-01-19 18:10:09 +03:00
Sergei Petrunia
02a67307d3 Fix compiation on windows 2022-01-19 18:10:09 +03:00
Sergei Petrunia
3486bf4110 Code cleanup + reduce the diff size 2022-01-19 18:10:09 +03:00
Sergei Petrunia
a93b377863 Fix histogram memory management
There are "local" histograms that are allocated by one thread for
one TABLE object, and "global" that are allocated for TABLE_SHARE.
2022-01-19 18:10:09 +03:00
Sergei Petrunia
fcf58a5e0f Code cleanup part#2: do not copy key values in xxx_selectivity() functions 2022-01-19 18:10:09 +03:00
Sergei Petrunia
2a1cdbabec Fix JSON parsing: future-proof data representation in JSON, code cleanup 2022-01-19 18:10:09 +03:00
Sergei Petrunia
a0b4a86822 Code cleanup part #2. 2022-01-19 18:10:09 +03:00
Sergei Petrunia
72c0ba43b2 Code cleanup part #1 2022-01-19 18:10:09 +03:00
Sergei Petrunia
f76e310ace Rename histogram_type=JSON to JSON_HB 2022-01-19 18:10:09 +03:00
Michael Okoko
bff65a813e Implement point selectivity for JSON histograms
* Also merges tests relating to JSON statistics into one file

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
547f805311 Refactor histogram point selectivity
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
63cbd0748b replace range_selectivity methods for Histograms and add tests
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
c129689ddc Use binary search to compute range selectivity
* it also adds an "explain select" statement to the test so that the fprintf calls
  can print the computed intervals to mysqld.1.err

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
69f24c238e Use generic Histogram_base class for Histogram_builders
This fixes the wrong calculation for avg_frequency in json histograms
by replacing the specific histogram objects with the generic Histogram_base class.

It also restores get/set size functions as they were useful in calculating fields
for binary histogram.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
21e0f5487f MDEV-21130: Histograms: use JSON as on-disk format
A demo of how to use in-memory data structure for histogram.
The patch shows how to
* convert string form of data to binary form
* compare two values in binary form
* compute a fraction for val in [X, Y] range.

grep for GSOC-TODO for notes.
2022-01-19 18:10:08 +03:00
Michael Okoko
fe2e516a50 inform test result of zero hist_size for json histogram
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
bf4d0dcfe2 implement parse and serialize for histogram json 2022-01-19 18:10:08 +03:00
Michael Okoko
9bba595528 remove unneeded shared methods
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
1fa7af749e Split histogram classes and into JSON and binary classes
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
1998b787ac MDEV-21130: Histograms: use JSON as on-disk format
Preparation for handling different kinds of histograms:

- In Column_statistics, change "Histogram histogram" into
  "Histogram *histogram_".  This allows for different kinds
  of Histogram classes with virtual functions.

- [Almost] remove the usage of Histogram->set_values and
  Histogram->set_size. The code outside the histogram should
  not make any assumptions about what/how is stored in the Histogram.

- Introduce drafts of methods to read/save histograms to/from disk.
2022-01-19 18:10:08 +03:00
Michael Okoko
9954aecc2b Store bucket bounds and extend test cases for JSON histogram
This fixes the memory allocation for json histogram builder and add more column types for testing.
Some challenges at the moment include:
* Garbage value at the end of JSON array still persists.
* Garbage value also gets appended to bucket values if the column is a primary key.
* There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Michael Okoko
2aca7b0c33 Prepare JSON as valid histogram_type
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Sergei Golubchik
e841957416 Merge branch '10.3' into 10.4 2021-02-23 09:25:57 +01:00
Sergei Golubchik
0ab1e3914c Merge branch '10.2' into 10.3 2021-02-22 22:42:27 +01:00
Varun Gupta
a461e4d306 MDEV-19474: Histogram statistics are used even with optimizer_use_condition_selectivity=3
The issue here was histogram statistics were being used even when
the level of optimizer_use_condition_selectivity doesn't allow
usage of statistics from histogram.

The histogram statistics are read for a table only when
optimizer_use_condition_selectivity > 3. But the TABLE structure can be
stored in the internal table cache and be reused for the next query.
So in this case the histogram statistics will be available for the next query.

The fix would be to make sure to use the histogram statistics only when
optimizer_use_condition_selectivity > 3.
2021-02-16 11:53:13 +05:30
Marko Mäkelä
4b959bd8df Merge 10.3 into 10.4 2020-07-20 15:34:59 +03:00
Marko Mäkelä
acc58fd835 Merge 10.2 into 10.3 2020-07-20 15:11:59 +03:00
Marko Mäkelä
ca9276e37e Merge 10.1 into 10.2 2020-07-20 14:53:24 +03:00
Varun Gupta
dfdfeecb03 MDEV-22851: Engine independent index statistics are incorrect for large tables on Windows
An oveflow was happening on windows because on Windows sizeof(ulong) is 4 bytes
while it is 8 bytes on Linux.
Switched avg_frequency and avg length for column statistics to ulonglong.
Switched avg_frequency for index statistics to ulonglong.
2020-07-15 11:27:32 +05:30
Marko Mäkelä
8059148154 Merge 10.3 into 10.4 2020-06-03 07:32:09 +03:00
Marko Mäkelä
8300f639a1 Merge 10.2 into 10.3 2020-06-02 10:25:11 +03:00
Marko Mäkelä
d72eebaa3d Merge 10.1 into 10.2 2020-06-01 09:33:03 +03:00
Sergey Vojtovich
c279878493 Thread safe histograms loading
Previously multiple threads were allowed to load histograms concurrently.
There were no known problems caused by this. But given amount of data
races in this code, it'd happen sooner or later.

To avoid scalability bottleneck, histograms loading is protected by
per-TABLE_SHARE atomic variable.

Whenever histograms were loaded by preceding statement (hot-path), a
scalable load-acquire check is performed.

Whenever histograms have to be loaded anew, mutual exclusion for loaders
is established by atomic variable. If histograms are being loaded
concurrently, statement waits until load is completed.

- Table_statistics::total_hist_size moved to TABLE_STATISTICS_CB: only
  meaningful within TABLE_SHARE (not used for collected stats).
- TABLE_STATISTICS_CB::histograms_can_be_read and
  TABLE_STATISTICS_CB::histograms_are_read are replaced with a tri state
  atomic variable.
- Simplified away alloc_histograms_for_table_share().

Note: there's still likely a data race if a thread attempts accessing
histograms data after it failed to load it (because of concurrent load).
It was there previously and goes out of the scope of this effort. One way
of fixing it could be reviving TABLE::histograms_are_read and adding
appropriate checks whenever it is needed.

Part of MDEV-19061 - table_share used for reading statistical tables is
                     not protected
2020-05-29 21:53:54 +04:00
Marko Mäkelä
c11e5cdd12 Merge 10.3 into 10.4 2019-10-10 11:19:25 +03:00
Marko Mäkelä
892378fb9d Merge 10.2 into 10.3 2019-10-09 13:25:11 +03:00
Marko Mäkelä
24232ec12c Merge 10.1 into 10.2 2019-10-09 08:30:23 +03:00
Sergey Vojtovich
adefaeffcc MDEV-19536 - Server crash or ASAN heap-use-after-free in is_temporary_table /
read_statistics_for_tables_if_needed

Regression after 279a907, read_statistics_for_tables_if_needed() was
called after open_normal_and_derived_tables() failure.

Fixed by moving read_statistics_for_tables() call to a branch of
get_schema_stat_record() where result of open_normal_and_derived_tables()
is checked.

Removed THD::force_read_stats, added read_statistics_for_tables() instead.
Simplified away statistics_for_command_is_needed().
2019-10-07 13:30:22 +04:00
Sergey Vojtovich
e43791d4dc Cleanup EITS
Moved EITS allocation inside read_statistics_for_tables_if_needed().
Removed redundant is_safe argument.
2019-10-02 15:23:59 +04:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00