Commit graph

70 commits

Author SHA1 Message Date
Alexander Barkov
36eba98817 MDEV-19123 Change default charset from latin1 to utf8mb4
Changing the default server character set from latin1 to utf8mb4.
2024-07-11 10:21:07 +04:00
Sergei Golubchik
df10a945fc MDEV-28671 post-merge fixes
* use new deprecated printer for all deprecated server options
* restore alphabetic option sorting order
* move deprecated printer from mysqld.cc to my_getopt.c
* in --help print deprecation message at the end of the option help
* move 'ALL' help text where it belongs - to other SET options, and
  with a correct indentation.
* consistently end all or none command-line option help strings
  with a dot - my_print_help() needs that.
  It's about 50/50 now, so let's do none, less line wraps in --help
* remove trailing spaces from command-line option help strings
2024-05-27 12:39:02 +02:00
Marko Mäkelä
be24e75229 Merge 10.11 into 11.0 2023-10-19 08:12:16 +03:00
Marko Mäkelä
2ecc0443ec Merge 10.10 into 10.11 2023-10-17 16:04:21 +03:00
Marko Mäkelä
d5e15424d8 Merge 10.6 into 10.10
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.

Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue

Disabled test:
- spider/bugfix.mdev_27239 because we started to get
  +Error	1429 Unable to connect to foreign data source: localhost
  -Error	1158 Got an error reading communication packets
- main.delayed
  - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
    This part is disabled for now as it fails randomly with different
    warnings/errors (no corruption).
2023-10-14 13:36:11 +03:00
Marko Mäkelä
2e431ff7e6 Merge 10.11 into 11.0 2023-02-16 13:34:45 +02:00
Daniel Black
483ddb5684 MDEV-30621: Türkiye is the correct current country naming
As requested to the UN the country formerly known as Turkey is
to be refered to as Türkiye.
2023-02-10 08:44:14 +11:00
Sergei Petrunia
6c4076fac4 MDEV-30032: EXPLAIN FORMAT=JSON output: part #2: print 'loops'. 2023-02-03 11:22:17 +03:00
Sergei Petrunia
ffe0beca25 MDEV-30032: EXPLAIN FORMAT=JSON output: print costs
Basic printout for join and table execution costs.
2023-02-03 11:01:24 +03:00
Monty
727491b72a Added test cases for preceding test
This includes all test changes from
"Changing all cost calculation to be given in milliseconds"
and forwards.

Some of the things that caused changes in the result files:

- As part of fixing tests, I added 'echo' to some comments to be able to
  easier find out where things where wrong.
- MATERIALIZED has now a higher cost compared to X than before. Because
  of this some MATERIALIZED types have changed to DEPENDEND SUBQUERY.
  - Some test cases that required MATERIALIZED to repeat a bug was
    changed by adding more rows to force MATERIALIZED to happen.
- 'Filtered' in SHOW EXPLAIN has in many case changed from 100.00 to
  something smaller. This is because now filtered also takes into
  account the smallest possible ref access and filters, even if they
  where not used. Another reason for 'Filtered' being smaller is that
  we now also take into account implicit filtering done for subqueries
  using FIRSTMATCH.
  (main.subselect_no_exists_to_in)
  This is caluculated in best_access_path() and stored in records_out.
- Table orders has changed because more accurate costs.
- 'index' and 'ALL' for small tables has changed to use 'range' or
   'ref' because of optimizer_scan_setup_cost.
- index can be changed to 'range' as 'range' optimizer assumes we don't
  have to read the blocks from disk that range optimizer has already read.
  This can be confusing in the case where there is no obvious where clause
  but instead there is a hidden 'key_column > NULL' added by the optimizer.
  (main.subselect_no_exists_to_in)
- Scan on primary clustered key does not report 'Using Index' anymore
  (It's a table scan, not an index scan).
- For derived tables, the number of rows is now 100 instead of 2,
  which can be seen in EXPLAIN.
- More tests have "Using index for group by" as the cost of this
  optimization is now more correct (lower).
- A primary key could be preferred for a normal key, even if it would
  access more rows, as it's faster to do 1 lokoup and 3 'index_next' on a
  clustered primary key than one lookup trough a secondary.
  (main.stat_tables_innodb)

Notes:

- There was a 4.7% more calls to best_extension_by_limited_search() in
  the main.greedy_optimizer test.  However examining the test results
  it looked that the plans where slightly better (eq_ref where more
  chained together) so I assume this is ok.
- I have verified a few test cases where there was notable/unexpected
  changes in the plan and in all cases the new optimizer plans where
  faster.  (main.greedy_optimizer and some others)
2023-02-03 00:00:35 +03:00
Monty
bc9805e954 Return >= 1 from matching_candidates_in_table if records > 0.0
Having rows >= 1.0 helps ensure that when we calculate total rows of joins
the number of resulting rows will not be less after the join.

Changes in test cases:
- Join order change for some tables with few records
- 'Filtered' is much higher for tables with few rows, as 1 row is a high
  procent of a table with few rows.
2023-02-02 20:24:54 +03:00
Marko Mäkelä
618d820646 Merge 10.7 into 10.8 2022-10-13 10:42:41 +03:00
Marko Mäkelä
4345d93100 Merge 10.7 into 10.8 2022-09-21 09:52:09 +03:00
Sergei Petrunia
51bce3c59a MDEV-28882: Assertion `tmp >= 0' failed in best_access_path
Histogram_json_hb::range_selectivity() may return small negative
numbers due to rounding errors in the histogram.

Make sure the returned value is non-negative.
Add an assert to catch negative values that are not small.

(attempt #2)
2022-06-22 13:39:48 +03:00
Sergei Petrunia
4842a56356 JSON_HB histogram: represent values of BIT() columns in hex always 2022-01-19 18:10:12 +03:00
Sergei Petrunia
dae20dde4e MDEV-26901: Estimation for filtered rows less precise ... #4
In Histogram_json_hb::point_selectivity(), do return selectivity of 0.0
when the histogram says so.

The logic of "Do not return 0.0 estimate as it causes a multiply-by-zero
meltdown in cost and cardinality calculations" is moved into
records_in_column_ranges() where it is one *once* per column pair (as
opposed to doing once per range, which can cause the error to add-up
to large number when there are many ranges)
2022-01-19 18:10:12 +03:00
Sergei Petrunia
db8f15be93 MDEV-27229: Estimation for filtered rows less precise ... #5
Followup: remove this line from get_column_range_cardinality()

      set_if_bigger(res, col_stats->get_avg_frequency());

and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
d3e511d421 MDEV-27243: Estimation for filtered rows less precise ... #7
Added a testcase
2022-01-19 18:10:12 +03:00
Sergei Petrunia
531dd708ef MDEV-27229: Estimation for filtered rows less precise ... #5
Fix special handling for values that are right next to buckets with ndv=1.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
905634dc3f MDEV-27230: Estimation for filtered rows less precise ...
Fix the code in Histogram_json_hb::range_selectivity that handles
special cases: a non-inclusive endpoint hitting a bucket boundary...
2022-01-19 18:10:12 +03:00
Sergei Petrunia
08f1c4a2e0 MDEV-27203: Valgrind / MSAN errors in Histogram_json_hb::parse_bucket
In read_bucket_endpoint(), handle all possible parser states.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
d8d57d2c27 MDEV-26764: JSON_HB Histograms: handle BINARY and unassigned characters
Encode such characters in hex.
2022-01-19 18:10:12 +03:00
Sergei Petrunia
748b293c14 More test coverage 2022-01-19 18:10:12 +03:00
Sergei Petrunia
c2d2c1e727 MDEV-26519: Improved histograms
Save extra information in the histogram:

    "target_histogram_size": nnn,
    "collected_at": "(date and time)",
    "collected_by": "(server version)",
2022-01-19 18:10:12 +03:00
Sergei Petrunia
a0916cf5a2 MDEV-26519: Improved histograms: Better error reporting, test coverage
Also report JSON histogram load errors into error log, like it is already
done with other histogram/statistics load errors.

Add test coverage to see what happens if one upgrades but does NOT run
mysql_upgrade.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
a0f93f433a Rename histogram_hb_v2 -> histogram_hb 2022-01-19 18:10:11 +03:00
Sergei Petrunia
1d14176ec4 MDEV-26519: Improved histograms: Make JSON parser efficient
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.

Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
be55ad0d34 MDEV-27062: Make histogram_type=JSON_HB the new default 2022-01-19 18:10:11 +03:00
Sergei Petrunia
eb6a9ad705 MDEV-26886: Estimation for filtered rows less precise with JSON histogram
- Make Histogram_json_hb::range_selectivity handle singleton buckets
  specially when computing selectivity of the max. endpoint bound.
  (for min. endpoint, we already do that).

- Also, fixed comments for Histogram_json_hb::find_bucket
2022-01-19 18:10:11 +03:00
Sergei Petrunia
106c785e2d MDEV-26911: Unexpected ER_DUP_KEY, ASAN errors, double free detected in ...
When loading the histogram, use table->field[N], not table->s->field[N].

When we used the latter we would corrupt the fields's default value. One
of the consequences of that would be that AUTO_INCREMENT fields would
stop working correctly.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
ac0194bd0e MDEV-26892: JSON histograms become invalid with a specific (corrupt) value ..
Handle the case where the last value in the table cannot be represented
in utf8mb4.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
05877df472 MDEV-26849: JSON Histograms: point selectivity estimates are off
.. for non-existent values.

Handle this special case.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
f3f78bed85 MDEV-26750: Estimation for filtered rows is far off with JSON_HB histogram
Fix a bug in position_in_interval(). Do not overwrite one interval endpoint
with another.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
27539cd2c8 MDEV-26801: Valgrind/MSAN errors in Column_statistics_collected::finish ...
The problem was introduced in fix for MDEV-26724. That patch has made it
possible for histogram collection to fail. In particular, it fails for
non-assigned characters.

When histogram construction fails, we also abort the computation of
COUNT(DISTINCT). When we try to use the value, we get valgrind failures.

Switched the code to abort the statistics collection in this case.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
93d5980435 MDEV-26709: JSON histogram may contain bucketS than histogram_size allows
When computing bucket_capacity= records/histogram->get_width(), round
the value UP, not down.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
3936dc3353 MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
Part#3:
- make json_escape() return different errors on conversion error
  and on out-of-space condition.
- Make histogram code handle conversion errors.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
b17f33a04b MDEV-26737: Outdated VARIABLE_COMMENT for HISTOGRAM_TYPE in I_S.SYSTEM_VARIABLES
Fix the description
2022-01-19 18:10:11 +03:00
Sergei Petrunia
5d66eeb3a1 MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
.. part#2: correctly pass the charset to JSON [un]escape functions
2022-01-19 18:10:11 +03:00
Sergei Petrunia
43a8d9f156 MDEV-26595: ASAN use-after-poison my_strnxfrm_simple_internal / Histogram_json_hb::range_selectivity
Add testcase
2022-01-19 18:10:11 +03:00
Sergei Petrunia
5ef350a7f1 MDEV-26589: Assertion failure upon DECODE_HISTOGRAM with NULLs
Item_func_decode_histogram::val_str should correctly set null_value
when "decoding" JSON histogram.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
5c709ef18c MDEV-26724 Endless loop in json_escape_to_string upon ... empty string
Correctly handle empty string when [un]escaping JSON
2022-01-19 18:10:11 +03:00
Sergei Petrunia
61cd4f4412 MDEV-26711: Values in JSON histograms are not properly quoted
Escape values when serializing to JSON. Un-escape when reading back.
2022-01-19 18:10:11 +03:00
Sergei Petrunia
28ad128585 Fix off-by-one error in Histogram_json_hb::find_bucket 2022-01-19 18:10:10 +03:00
Sergei Petrunia
b179640219 MDEV-26590: Stack smashing/buffer overflow in Histogram_json_hb::parse
Provide buffer of sufficient size.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
382250c05c Address review input 2022-01-19 18:10:10 +03:00
Sergei Petrunia
6375873c9a Fixes in opt_histogram_json.cc in the last commits
Aslo add more test coverage
2022-01-19 18:10:10 +03:00
Sergei Petrunia
49a7bbb1f6 Valgrind fixes, poor .result fixes, code cleanups
- Use String::c_ptr_safe() instead of String::c_ptr
- Do proper datatype conversions in Histogram_json_hb::parse
- Remove Histogram_json_hb::Bucket::end_value. Introduce
  get_end_value() instead.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
f460272054 MDEV-26519: JSON Histograms: improve histogram collection
Basic ideas:
1. Store "popular" values in their own buckets.
2. Also store ndv (Number of Distinct Values) in each bucket.

Because of #1, the buckets are now variable-size, so store the size in
each bucket.

Adjust selectivity estimation functions accordingly.
2022-01-19 18:10:10 +03:00
Sergei Petrunia
5ddbd72af4 Correctly decode string field values for pos_in_interval_for_string call 2022-01-19 18:10:10 +03:00
Sergei Petrunia
223fa6a891 Make tests pass
- Fix bad tests in statistics_json test: make them meaningful and make them
  work on windows
- Fix analyze_debug.test: correctly handle errors during ANALYZE
2022-01-19 18:10:10 +03:00