Commit graph

43 commits

Author SHA1 Message Date
Alexander Barkov
4ced4898fd MDEV-32958 Unusable key notes do not get reported for some operations
Enable unusable key notes for non-equality predicates:
   <, <=, =>, >, BETWEEN, IN, LIKE

Note, in some scenarios it displays duplicate notes, e.g.
for queries with ORDER BY:

  SELECT * FROM t1
  WHERE    indexed_string_column >= 10
  ORDER BY indexed_string_column
  LIMIT 5;

This should be tolarable. Getting rid of the diplicate note
completely would need a much more complex patch, which is
not desiable in 10.6.

Details:

- Changing RANGE_OPT_PARAM::note_unusable_keys from bool
  to a new data type Item_func::Bitmap, so the caller can
  choose with a better granuality which predicates
  should raise unusable key notes inside the range optimizer:
    a. all predicates (=, <=>, <, <=, =>, >, BETWEEN, IN, LIKE)
    b. all predicates except equality (=, <=>)
    c. none of the predicates

  "b." is needed because in some scenarios equality predicates (=, <=>)
  send unusable key notes at an earlier stage, before the range optimizer,
  during update_ref_and_keys(). Calling the range optimizer with
  "all predicates" would produce duplicate notes for = and <=> in such cases.

- Fixing get_quick_record_count() to call the range optimizer
  with "all predicates except equality" instead of "none of the predicates".
  Before this change the range optimizer suppressed all notes for
  non-equality predicates: <, <=, =>, >, BETWEEN, IN, LIKE.
  This actually fixes the reported problem.

- Fixing JOIN::make_range_rowid_filters() to call the range optimizer
  with "all predicates except equality" instead of "all predicates".
  Before this change the range optimizer produced duplicate notes
  for = and <=> during a rowid_filter optimization.

- Cleanup:
  Adding the op_collation argument to Field::raise_note_cannot_use_key_part()
  and displaying the operation collation rather than the argument collation
  in the unusable key note. This is important for operations with more than
  two arguments: BETWEEN and IN, e.g.:

    SELECT * FROM t1
    WHERE column_utf8mb3_general_ci
          BETWEEN 'a' AND 'b' COLLATE utf8mb3_unicode_ci;

    SELECT * FROM t1
    WHERE column_utf8mb3_general_ci
          IN ('a', 'b' COLLATE utf8mb3_unicode_ci);

    The note for 'a' now prints utf8mb3_unicode_ci as the collation.
    which is the collation of the entire operation:

      Cannot use key key1 part[0] for lookup:
      "`column_utf8mb3_general_ci`" of collation `utf8mb3_general_ci` >=
      "'a'" of collation `utf8mb3_unicode_ci`

    Before this change it printed the collation of 'a',
    so the note was confusing:

      Cannot use key key1 part[0] for lookup:
      "`column_utf8mb3_general_ci`" of collation `utf8mb3_general_ci` >=
      "'a'" of collation `utf8mb3_general_ci`"
2023-12-11 08:55:27 +04:00
Oleksandr Byelkin
6bf8483cac Merge branch '10.5' into 10.6 2023-08-01 15:08:52 +02:00
Oleksandr Byelkin
f52954ef42 Merge commit '10.4' into 10.5 2023-07-20 11:54:52 +02:00
Monty
6daccd4e48 Moved test from group_min_max.test to group_min_max_not_embedded.test 2023-06-25 16:26:28 +03:00
Sergei Petrunia
b3074128a6 MDEV-31380 post-fix: fix group_min_max.test with embedded and view-protocol
embedded doesn't have optimizer trace,
view-protocol doesn't work with long column names.
2023-06-08 11:14:50 +03:00
Marko Mäkelä
80585c9d6f Merge 10.5 into 10.6 2023-06-08 10:42:56 +03:00
Sergei Petrunia
a0e7bd735b MDEV-31380: Assertion `s->table->opt_range_condition_rows <= s->found_records' failed
LooseScan code set opt_range_condition_rows to be the

  MIN(loose_scan_plan->records, table->records)

totally ignoring possible quick range selects.  If there was a quick
select $QUICK on another index with

  $QUICK->records < loose_scan_plan->records

this would create a situation where

   opt_range_condition_rows > $QUICK->records

which causes an assert in 10.6+ and potentially wrong query plan
choice in 10.5.

Fixed by making opt_range_condition_rows to be the minimum #rows
of any quick select.

Approved-by: Monty <monty@mariadb.org>
2023-06-07 13:54:34 +03:00
Marko Mäkelä
270eeeb523 Merge 10.5 into 10.6 2023-05-23 12:25:39 +03:00
Monty
16258677b3 MDEV-6768 Wrong result with aggregate with join with no result set
When a query does implicit grouping and join operation produces an empty
result set, a NULL-complemented row combination is generated.
However, constant table fields still show non-NULL values.

What happens in the is that end_send_group() is called with a
const row but without any rows matching the WHERE clause.
This last part is shown by 'join->first_record' not being set.

This causes item->no_rows_in_result() to be called for all items to reset
all sum functions to their initial state. However fields are not set
to NULL.

The used fix is to produce NULL-complemented records for constant tables
as well. Also, reset the constant table's records back in case we're
in a subquery which may get re-executed.
An alternative fix would have item->no_rows_in_result() also work
with Item_field objects.

There is some other issues with the code:
- join->no_rows_in_result_called is used but never set.
- Tables that are used with group functions are not properly marked as
  maybe_null, which is required if the table rows should be regarded as
  null-complemented (not existing).
- The code that tries to detect if mixed_implicit_grouping should be set
  didn't take into account all usage of fields and sum functions.
- Item_func::restore_to_before_no_rows_in_result() called the wrong
  function.
- join->clear() does not use a table_map argument to clear_tables(),
  which caused it to ignore constant tables.
- unclear_tables() does not correctly restore status to what is
  was before clear_tables().

Main bug fix was to always use a table_map argument to clear_tables() and
always use join->clear() and clear_tables() together with unclear_tables().

Other fixes:
- Fixed Item_func::restore_to_before_no_rows_in_result()
- Set 'join->no_rows_in_result_called' when no_rows_in_result_set()
  is called.
- Removed not used argument from setup_end_select_func().
- More code comments
- Ensure that end_send_group() modifies the same fields as are in the
  result set.
- Changed return_zero_rows() to use pointers instead of references,
  similar to the rest of the code.

Reviewer: Sergei Petrunia <sergey@mariadb.com>
2023-05-22 17:15:46 +03:00
Monty
7f96dd50e2 MDEV-6768 Wrong result with aggregate with join with no result set
When a query does implicit grouping and join operation produces an empty
result set, a NULL-complemented row combination is generated.
However, constant table fields still show non-NULL values.

What happens in the is that end_send_group() is called with a
const row but without any rows matching the WHERE clause.
This last part is shown by 'join->first_record' not being set.

This causes item->no_rows_in_result() to be called for all items to reset
all sum functions to their initial state. However fields are not set
to NULL.

The used fix is to produce NULL-complemented records for constant tables
as well. Also, reset the constant table's records back in case we're
in a subquery which may get re-executed.
An alternative fix would have item->no_rows_in_result() also work
with Item_field objects.

There is some other issues with the code:
- join->no_rows_in_result_called is used but never set.
- Tables that are used with group functions are not properly marked as
  maybe_null, which is required if the table rows should be regarded as
  null-complemented (not existing).
- The code that tries to detect if mixed_implicit_grouping should be set
  didn't take into account all usage of fields and sum functions.
- Item_func::restore_to_before_no_rows_in_result() called the wrong
  function.
- join->clear() does not use a table_map argument to clear_tables(),
  which caused it to ignore constant tables.
- unclear_tables() does not correctly restore status to what is
  was before clear_tables().

Main bug fix was to always use a table_map argument to clear_tables() and
always use join->clear() and clear_tables() together with unclear_tables().

Other fixes:
- Fixed Item_func::restore_to_before_no_rows_in_result()
- Set 'join->no_rows_in_result_called' when no_rows_in_result_set()
  is called.
- Removed not used argument from setup_end_select_func().
- More code comments
- Ensure that end_send_group() modifies the same fields as are in the
  result set.
- Changed return_zero_rows() to use pointers instead of references,
  similar to the rest of the code.
2023-05-02 23:43:12 +03:00
Marko Mäkelä
abe4c7bfd6 Merge 10.5 into 10.6 2023-04-21 16:38:22 +03:00
Sergei Petrunia
be7ef6566f MDEV-30605: Wrong result while using index for group-by
A GROUP BY query which uses "MIN(pk)" and has "pk<>const" in the
WHERE clause would produce wrong result when handled with "Using index
for group-by".  Here "pk" column is the table's primary key.

The problem was introduced by fix for MDEV-23634. It made the range
optimizer to not produce ranges for conditions in form "pk != const".

However, LooseScan code requires that the optimizer is able to
convert the condition on the MIN/MAX column into an equivalent range.
The range is used to locate the row that has the MIN/MAX value.

LooseScan checks this in check_group_min_max_predicates(). This fix
makes the code in that function to take into account that "pk != const"
does not produce a range.
2023-04-18 14:42:47 +03:00
Marko Mäkelä
56c9b0bca0 Merge 10.5 into 10.6 2023-01-10 13:54:17 +02:00
Monty
d0603fc5ba MDEV-30240 Wrong result upon aggregate function with SQL_BUFFER_RESULT
The problem was that when storing rows into a temporary table,
MIN/MAX items that where marked as constants (as theire value had
been computed at start of query) would be reset.

Fixed by not reseting MIN/MAX items that are marked as const in
Item_sum_min_max::clear().
2023-01-03 19:44:19 +02:00
Marko Mäkelä
30914389fe Merge 10.5 into 10.6 2022-07-27 17:52:37 +03:00
Marko Mäkelä
098c0f2634 Merge 10.4 into 10.5 2022-07-27 17:17:24 +03:00
Oleksandr Byelkin
3bb36e9495 Merge branch '10.3' into 10.4 2022-07-27 11:02:57 +02:00
Alexander Barkov
57f5c319af MDEV-21445 Strange/inconsistent behavior of IN condition when mixing numbers and strings 2022-07-06 15:42:21 +04:00
Marko Mäkelä
d96433ad20 Merge 10.5 into 10.6 2022-03-01 17:40:27 +02:00
Sergei Petrunia
a1965b80e1 Make testcase for MDEV-26585 stable. 2022-03-01 17:19:58 +03:00
Marko Mäkelä
cce994057b Merge 10.5 into 10.6 2022-02-09 15:49:50 +02:00
Oleksandr Byelkin
34c5019698 Merge branch '10.5' into bb-10.5-release 2022-02-09 08:57:41 +01:00
Monty
38058c04a4 MDEV-26585 Wrong query results when using index for group-by
The problem was that "group_min_max optimization" does not work if
some aggregate functions, like COUNT(*), is used.
The function get_best_group_min_max() is using the join->sum_funcs
array to check which aggregate functions are used.
The bug was that aggregates in HAVING where not yet added to
join->sum_funcs at the time get_best_group_min_max() was called.

Fixed by populate join->sum_funcs already in prepare, which means that
all sum functions will be in join->sum_funcs in get_best_group_min_max().
A benefit of this approach is that we can remove several calls to
make_sum_func_list() from the code and simplify the function.

I removed some wrong setting of 'sort_and_group'.
This variable is set when alloc_group_fields() is called, as part
of allocating the cache needed by end_send_group() and does not need
to be set by other functions.

One problematic thing was that Spider is using *join->sum_funcs to detect
at which stage the optimizer is and do internal calculations of aggregate
functions. Updating join->sum_funcs early caused Spider to fail when trying
to find min/max values in opt_sum_query().
Fixed by temporarily resetting sum_funcs during opt_sum_query().

Reviewer: Sergei Petrunia
2022-02-08 14:32:29 +02:00
Monty
d314bd2664 MDEV-27442 Wrong result upon query with DISTINCT and EXISTS subquery
The problem was that get_best_group_min_max() did not check if fields used
by the "group_min_max optimization" where used in sub queries.
Because of this, it did not detect that a key (b,a) was used in the WHERE
clause for the statement:
SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 ).

Fixed by also traversing the sub queries when checking if a field is used.
This disables group_min_max_optimization for the above query.

Reviewer: Sergei Petrunia
2022-02-08 14:32:28 +02:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Monty
fdec885201 MDEV-25830 optimizer_use_condition_selectivity=4 sometimes produces worse plan than optimizer_use_condition_selectivity=1
The issue was that calc_cond_selectivity_for_table prefered ranges with
many parts and when deciding on which selectivity to use.

Fixed by going through ranges according to the number of rows in the range.

This ensures that selectivity from ranges with few rows will be prefered
over ranges with many rows for indexes that uses the same columns.
2022-01-19 18:49:53 +02:00
Marko Mäkelä
d2e2d32933 Merge 10.5 into 10.6 2021-04-14 12:32:27 +03:00
Marko Mäkelä
6c3e860cbf Merge 10.4 into 10.5 2021-04-14 11:35:39 +03:00
Sergei Petrunia
c03841ec0e MDEV-23634: Select query hanged the server and leads to OOM ...
Handle "col<>const" in the same way that MDEV-21958 did for
"col NOT IN(const-list)": do not use the condition for range/index_merge
accesses if there is a unique UNIQUE KEY(col).

The testcase is in main/range.test. The rest of test updates are
due to widespread use of 'pk<>1' in the testsuite. Changed the test
to use different but equivalent forms of the conditions.
2021-04-08 17:25:02 +03:00
Varun Gupta
d79c3f3297 MDEV-24353: Adding GROUP BY slows down a query
A heuristic in best_access_path says that if for an index
ref access involved key parts which are greater than equal to that
for range access, then range access should not be considered.
The assumption made by this heuristic does not hold when
the range optimizer opted to use the group-by min-max optimization.
So the fix here would be to not consider the heuristic if
the range optimizer picked the usage of group-by min-max
optimization.
2020-12-11 18:38:18 +05:30
Monty
eb483c5181 Updated optimizer costs in multi_range_read_info_const() and sql_select.cc
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
  not 1.0.  In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
  TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
  keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
  The default keyread_time() was optimized for blocks and not suitable for
  HEAP. The effect was the HEAP prefered table scans over ranges for btree
  indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
  Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
  we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
  as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
  HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
  This was to ensure that handler::keyread_time() doesn't give
  higher cost for heap tables than for normal tables. One effect of
  this is that heap and derived tables stored in heap will prefer
  key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
  multi_range_read_info_const(). All index cost calculation is now
  done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
  so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
2020-03-27 03:58:32 +02:00
Marko Mäkelä
c016ea660e Merge 10.2 into 10.3 2019-09-23 10:25:34 +03:00
Oleksandr Byelkin
2792c6e7b0 Merge branch '10.3' into 10.4 2019-07-28 13:43:26 +02:00
Galina Shalygina
7fe1ca7ed6 Merge branch '10.4' into bb-10.4-mdev7486 2019-02-19 11:00:39 +03:00
Sergei Petrunia
15a77a1a2c MDEV-18608: Defaults for 10.4: histogram size should be set
Change the defaults:

-histogram_size=0
+histogram_size=254

-histogram_type=SINGLE_PREC_HB
+histogram_type=DOUBLE_PREC_HB

Adjust the testcases:
- Some have ignorable changes in EXPLAIN outputs and
  more counter increments due to EITS table reads.
- Testcases that meaningfully depend on the old defaults
  are changed to use the old values.
2019-02-18 13:37:57 +03:00
Galina Shalygina
7a77b221f1 MDEV-7486: Condition pushdown from HAVING into WHERE
Condition can be pushed from the HAVING clause into the WHERE clause
if it depends only on the fields that are used in the GROUP BY list
or depends on the fields that are equal to grouping fields.
Aggregate functions can't be pushed down.

How the pushdown is performed on the example:

SELECT t1.a,MAX(t1.b)
FROM t1
GROUP BY t1.a
HAVING (t1.a>2) AND (MAX(c)>12);

=>

SELECT t1.a,MAX(t1.b)
FROM t1
WHERE (t1.a>2)
GROUP BY t1.a
HAVING (MAX(c)>12);

The implementation scheme:

1. Extract the most restrictive condition cond from the HAVING clause of
   the select that depends only on the fields that are used in the GROUP BY
   list of the select (directly or indirectly through equalities)
2. Save cond as a condition that can be pushed into the WHERE clause
   of the select
3. Remove cond from the HAVING clause if it is possible

The optimization is implemented in the function
st_select_lex::pushdown_from_having_into_where().

New test file having_cond_pushdown.test is created.
2019-02-17 23:38:44 -08:00
Igor Babaev
37deed3f37 Merge branch '10.4' into bb-10.4-mdev16188 2019-02-03 18:41:18 -08:00
Igor Babaev
658128af43 MDEV-16188 Use in-memory PK filters built from range index scans
This patch contains a full implementation of the optimization
that allows to use in-memory rowid / primary filters built for range  
conditions over indexes. In many cases usage of such filters reduce  
the number of disk seeks spent for fetching table rows.

In this implementation the choice of what possible filter to be applied  
(if any) is made purely on cost-based considerations.

This implementation re-achitectured the partial implementation of
the feature pushed by Galina Shalygina in the commit
8d5a11122c.

Besides this patch contains a better implementation of the generic  
handler function handler::multi_range_read_info_const() that
takes into account gaps between ranges when calculating the cost of
range index scans. It also contains some corrections of the
implementation of the handler function records_in_range() for MyISAM.

This patch supports the feature for InnoDB and MyISAM.
2019-02-03 14:56:12 -08:00
Varun Gupta
93c360e3a5 MDEV-15253: Default optimizer setting changes for MariaDB 10.4
use_stat_tables= PREFERABLY
optimizer_use_condition_selectivity= 4
2018-12-09 09:22:00 +05:30
Sergei Golubchik
57e0da50bb Merge branch '10.2' into 10.3 2018-09-28 16:37:06 +02:00
Marko Mäkelä
7830fb7f45 Merge 10.2 into 10.3 2018-08-28 12:22:56 +03:00
Michael Widenius
a7abddeffa Create 'main' test directory and move 't' and 'r' there 2018-03-29 13:59:44 +03:00
Renamed from mysql-test/r/group_min_max.result (Browse further)