Commit graph

141 commits

Author SHA1 Message Date
Sergei Golubchik
4a5d25c338 Merge branch '10.1' into 10.2 2016-12-29 13:23:18 +01:00
Sergei Golubchik
180065ebb0 Item::print(): remove redundant parentheses
by introducing new Item::precedence() method and using it
to decide whether parentheses are required
2016-12-12 20:44:41 +01:00
Sergei Golubchik
2f20d297f8 Merge branch '10.0' into 10.1 2016-12-11 09:53:42 +01:00
Sergei Golubchik
7f2fd34500 MDEV-11231 Server crashes in check_duplicate_key on CREATE TABLE ... SELECT
be consistent and don't include the table name into the error message,
no other CREATE TABLE error does it.

(the crash happened, because thd->lex->query_tables was NULL)
2016-12-04 01:59:35 +01:00
Monty
af7490f95d Remove end . from error messages to get them consistent
Fixed a few failing tests
2016-10-05 01:11:08 +03:00
Sergei Golubchik
6b1863b830 Merge branch '10.0' into 10.1 2016-08-25 12:40:09 +02:00
Monty
6f31dd093a Added new status variables to make it easier to debug certain problems:
- Handler_read_retry
- Update_scan
- Delete_scan
2016-08-21 20:18:39 +03:00
Monty
24881437b7 Fixed bug found by bar where we didn't properely check length of last argument for BETWEEN
This should not have caused any notable errors in most cases.

After fix, we are not using keys to solve MIN/MAX if the string used for comparision is longer thant the column-
2015-07-10 09:18:17 +03:00
Alexander Barkov
42bc08b347 MDEV-8229 GROUP_MIN_MAX is erroneously applied for BETWEEN in some cases 2015-06-25 12:51:32 +04:00
Alexander Barkov
bb3115b256 MDEV-6990 GROUP_MIN_MAX optimization is not applied in some cases when it could 2015-03-12 18:12:15 +04:00
Sergey Petrunya
00475d40d1 MDEV-7118: Anemometer stop working after upgrade to from...
When the optimizer considers an option to use Loose Scan, it should 
still consider UNIQUE keys (Previously, MDEV-4120 disabled loose scan
for all kinds of unique indexes. That was wrong)

However, we should not use Loose Scan when trying to satisfy 
 "SELECT DISTINCT col1, col2, .. colN"
when using an index defined as UNIQU(col1, col2, ... colN).
2014-11-19 17:14:49 +03:00
Alexander Barkov
b52d4d0076 MDEV-6991 GROUP_MIN_MAX optimization is erroneously applied in some cases 2014-11-18 23:15:54 +04:00
Sergei Golubchik
6fb17a0601 5.5.39 merge 2014-08-07 18:06:56 +02:00
Sergei Golubchik
1c6ad62a26 mysql-5.5.39 merge
~40% bugfixed(*) applied
~40$ bugfixed reverted (incorrect or we're not buggy)
~20% bugfixed applied, despite us being not buggy
(*) only changes in the server code, e.g. not cmakefiles
2014-08-02 21:26:16 +02:00
unknown
4cd676cbd9 MDEV-6047: Make exists_to_in optimization ON by default 2014-06-09 13:42:21 +03:00
mithun
f220233512 Bug#17217128 : BAD INTERACTION BETWEEN MIN/MAX AND
"HAVING SUM(DISTINCT)": WRONG RESULTS.
ISSUE:
------
If a query uses loose index scan and it has both
AGG(DISTINCT) and MIN()/MAX()functions. Then, result values
of MIN/MAX() is set improperly.
When query has AGG(DISTINCT) then end_select is set to
end_send_group. "end_send_group" keeps doing aggregation
until it sees a record from next group. And, then it will
send out the result row of that group.
Since query also has MIN()/MAX() and loose index scan is
used, values of MIN/MAX() are set as part of loose index
scan itself. Setting MIN()/MAX() values as part of loose
index scan overwrites values computed in end_send_group.
This caused invalid result.
For such queries to work loose index scan should stop
performing MIN/MAX() aggregation. And, let end_send_group to
do the same. But according to current design loose index
scan can produce only one row per group key. If we have both
MIN() and MAX() then it has to give two records out. This is
not possible as interface has to use common buffer
record[0]! for both records at a time.

SOLUTIONS:
----------
For such queries to work we need a new interface for loose
index scan. Hence, do not choose loose_index_scan for such
cases. So a new rule SA7 is introduced to take care of the
same.

SA7: "If Q has both AGG_FUNC(DISTINCT ...) and
      MIN/MAX() functions then loose index scan access
      method is not used."

mysql-test/r/group_min_max.result:
  Expected result.
mysql-test/t/group_min_max.test:
  1. Test with various combination of AGG(DISTINCT) and
  MIN(), MAX() functions.
  2. Corrected the plan for old queries.
sql/opt_range.cc:
  A new rule SA7 is introduced.
2014-05-15 11:46:57 +05:30
Igor Babaev
3e0f63c18f Fixed the problem of mdev-5947.
Back-ported from the mysql 5.6 code line the patch with
the following comment:

  Fix for Bug#11757108 CHANGE IN EXECUTION PLAN FOR COUNT_DISTINCT_GROUP_ON_KEY
                       CAUSES PEFORMANCE REGRESSION

  The cause for the performance regression is that the access strategy for the
  GROUP BY query is changed form using "index scan" in mysql-5.1 to use "loose
  index scan" in mysql-5.5. The index used for group by is unique and thus each
  "loose scan" group will only contain one record. Since loose scan needs to
  re-position on each "loose scan" group this query will do a re-position for
  each index entry. Compared to just reading the next index entry as a normal
  index scan does, the use of loose scan for this query becomes more expensive.

  The cause for selecting to use loose scan for this query is that in the current
  code when the size of the "loose scan" group is one, the formula for
  calculating the cost estimates becomes almost identical to the cost of using
  normal index scan. Differences in use of integer versus floating point arithmetic
  can cause one or the other access strategy to be selected.

  The main issue with the formula for estimating the cost of using loose scan is
  that it does not take into account that it is more costly to do a re-position
  for each "loose scan" group compared to just reading the next index entry.
  Both index scan and loose scan estimates the cpu cost as:

    "number of entries needed too read/scan" * ROW_EVALUATE_COST

  The results from testing with the query in this bug indicates that the real
  cost for doing re-position four to eight times higher than just reading the
  next index entry. Thus, the cpu cost estimate for loose scan should be increased.
  To account for the extra work to re-position in the index we increase the
  cost for loose index scan to include the cost of navigating the index.
  This is modelled as a function of the height of the b-tree:

    navigation cost= ceil(log(records in table)/log(indexes per block))
                   * ROWID_COMPARE_COST;

  This will avoid loose index scan being used for indexes where the "loose scan"
  group contains very few index entries.
2014-04-22 14:39:57 -07:00
unknown
a695674001 Test with double index fixed. 2013-09-27 07:29:36 +03:00
Sergei Golubchik
4ec2e9d7ed 5.5 merge and fixes for compiler/test errors 2013-09-18 13:07:31 +02:00
Sergei Golubchik
b838d081ad mysql-5.5.33 merge 2013-09-06 22:31:30 +02:00
unknown
33c66eb7fb MDEV-4120: UNIQUE indexes should not be considered for loose index scan
Currently the loose scan code in opt_range.cc considers all indexes as
possible for the access method. Due to inexact statistics it may happen
that a loose scan is selected over a unique index.
  
This is clearly wrong since a "loose scan" over a unique index will read
the same keys as a direct index scan, but the loose scan has more overhead.
  
This task adds a rule to skip unique indexes for loose scan.
2013-08-21 10:51:08 +03:00
unknown
66ec79fc87 Fix for MDEV-4219 A simple select query returns random data (upstream bug#68473)
In the case of loose scan used as input for order by, end_send()
didn't detect correctly that a loose scan was used, and didn't copy
the non-aggregated fields from the temp table used for ORDER BY.
  
The fix uses the fact that the quick select used for sorting is
attached to JOIN::pre_sort_join_tab instead of JOIN::join_tab.
2013-07-17 16:42:13 +03:00
Neeraj Bisht
35a3f9d76c Bug#12328597 - MULTIPLE COUNT(DISTINCT) IN SAME SELECT FALSE
WITH COMPOSITE KEY COLUMNS

Problem:-
While running a SELECT query with several AGGR(DISTINCT) function 
and these are referring to different field of same composite key, 
Returned incorrect value.

Analysis:-

In a table, where we have composite key like (a,b,c)
and when we give a query like

select COUNT(DISTINCT b), SUM(DISTINCT a) from ....

here, we first make a list of items in Aggr(distinct) function
(which is a, b), where order of item doesn't matter. 
and then we see, whether we have a composite key where the prefix 
of index columns matches the items of the aggregation function.
(in this case we have a,b,c).

if yes, so we can use loose index scan and we need not perform 
duplicate removal to distinct in our aggregate function.

In our table, we traverse column marked with <-- and get the result as
(a,b,c)      count(distinct b)           sum(distinct a)
             treated as count b          treated as sum(a)
(1,1,2)<--              1                      1		
(1,2,2)<--              1++=2                  1+1=2
(1,2,3)		
(2,1,2)<--              2++=3                  1+1+2=4
(2,2,2)<--              3++=4                  1+1+2+2=6
(2,2,3)

result will be 4,6, but it should be (2,3)

As in this case, our assumption is incorrect. If we have
query like 
select count(distinct a,b), sum(distinct a,b)from ..
then we can use loose index scan

Solution:-
In our query, when we have more then one aggr(distinct) function 
then they should refer to same  fields like

select count(distinct a,b), sum(distinct a,b) from .. 

-->we can use loose scan index as both aggr(distinct) refer to same fields a,b.

If they are referring to different field like

select count(distinct a), sum(distinct b) from .. 

-->will not use loose scan index as both aggr(distinct) refer to different fields.
2013-05-13 17:15:25 +05:30
Sergei Golubchik
213f1c76a0 5.3->5.5 merge 2013-02-28 22:47:29 +01:00
Igor Babaev
6537b551ca Merge. 2013-02-20 19:22:02 -08:00
Igor Babaev
c9b63e6a49 Fixed bug mdev-3913.
The wrong result set returned by the left join query  from
the bug test case happened due to several inconsistencies 
and bugs of the legacy mysql code.

The bug test case uses an execution plan that employs a scan
of a materialized IN subquery from the WHERE condition.
When materializing such an IN- subquery the optimizer injects
additional equalities  into the WHERE clause. These equalities
express the constraints imposed by the subquery predicate.
The injected equality of the query in the  test case happens
to belong to the same equality class, and a new equality 
imposing a condition on the rows of the materialized subquery
is inferred from this class. Simultaneously the multiple
equality is added to the ON expression of the LEFT JOIN
used in the main query.
  
The inferred equality of the form f1=f2 is taken into account
when optimizing the scan of  the rows the temporary table 
that is the result of the subquery materialization: only the 
values of the field f1 are read from the table into the record 
buffer. Meanwhile the inferred equality is removed from the
WHERE conditions altogether as a constraint on the fields
of the temporary table that has been used when filling this table. 
This equality is supposed to be removed from the ON expression
when the multiple equalities of the ON expression are converted
into an optimal set of equality predicates. It supposed to be
removed from the ON expression as an equality inferred from only
equalities of the WHERE condition. Yet, it did not happened
due to the following bug in the code.

Erroneously the code tried to build multiple equality for ON
expression twice: the first time, when it called optimize_cond()
for the WHERE condition, the second time, when it called
this function for the HAVING condition. When executing
optimize_con() for the WHERE condition  a reference
to the multiple equality of the WHERE condition is set
in the multiple equality of the  ON expression. This reference
would allow later to convert multiple equalities of the
ON expression into equality predicates. However the 
the second call of build_equal_items() for the ON expression
that happened when optimize_cond() was called for the
HAVING condition reset this reference to NULL.

This bug fix blocks calling build_equal_items() for ON
expressions for the second time. In general, it will be
beneficial for many queries as it removes from ON 
expressions any equalities that are to be checked for the
WHERE condition.
The patch also fixes two bugs in the list manipulation
operations and a bug in the function  
substitute_for_best_equal_field() that resulted
in passing wrong reference to the multiple equalities
of where conditions when processing multiple
equalities  of ON expressions.

The code of substitute_for_best_equal_field() and
the code the helper function eliminate_item_equal()
were also streamlined and cleaned up.
Now the conversion of the multiple equalities into
an optimal set of equality predicates first produces
the sequence of the all equalities processing multiple
equalities one by one, and, only after this, it inserts
the equalities at the beginning of the other conditions.

The multiple changes in the output of EXPLAIN
EXTENDED are mainly the result of this streamlining,
but in some cases is the result of the removal of
unneeded equalities from ON expressions. In
some test cases this removal were reflected in the
output of EXPLAIN resulted in disappearance of 
“Using where” in some rows of the execution plans.
2013-02-20 18:01:36 -08:00
unknown
d4b1e8f31a Fix for MDEV-4140
Analysis:
Range analysis detects that the subquery is expensive and doesn't
build a range access method. Later, the applicability test for loose
scan doesn't take that into account, and builds a loose scan method
without a range scan on the min/max column. As a result loose scan
fetches the first key in each group, rather than the first key that
satisfies the condition on the min/max column.

Solution:
Since there is no SEL_ARG tree to be used for the min/max column,
it is not possible to use loose scan if the min/max column is compared
with an expensive scalar subquery. Make the test for loose scan
applicability to be in sync with the range analysis code by testing if
the min/max argument is compared with an expensive predicate.
2013-02-13 11:58:16 +02:00
unknown
0b2dc3fc59 Fix for bug MDEV-765 (LP:825075)
Analys:
The cause for the wrong result was that the optimizer
incorrectly chose min/max loose scan when it is not
applicable. The applicability test missed the case when
a condition on the MIN/MAX argument was OR-ed with a
condition on some other field. In this case, the MIN/MAX
condition cannot be used for loose scan.

Solution:
Extend the test check_group_min_max_predicates() to check
that the WHERE clause is of the form: "cond1 AND cond2"
where 
  cond1 - does not use min_max_column at all.
  cond2 - is an AND/OR tree with leaves in form "min_max_column $CMP$ const"
          or $CMP$ is one of the functions between, is [not] null
2013-02-04 17:35:48 +02:00
Sergei Golubchik
87de27e46b 5.3 merge 2013-01-28 13:36:05 +01:00
Sergei Golubchik
34e84c227f 5.2 merge 2013-01-28 09:12:23 +01:00
Sergei Golubchik
e400450f2d 5.1 merge 2013-01-25 17:22:21 +01:00
Sergei Golubchik
7f208d3c35 MDEV-729 lp:998028 - Server crashes on normal shutdown in closefrm after executing a query from MyISAM table
don't write a key value into the record buffer - a key length can be larger then the record length.
2013-01-25 14:29:46 +01:00
Sergey Petrunya
b6eccf51c0 Update test results (checked) 2012-08-28 16:03:22 +04:00
unknown
da5214831d Fix for bug lp:944706, task MDEV-193
The patch enables back constant subquery execution during
query optimization after it was disabled during the development
of MWL#89 (cost-based choice of IN-TO-EXISTS vs MATERIALIZATION).

The main idea is that constant subqueries are allowed to be executed
during optimization if their execution is not expensive.

The approach is as follows:
- Constant subqueries are recursively optimized in the beginning of
  JOIN::optimize of the outer query. This is done by the new method
  JOIN::optimize_constant_subqueries(). This is done so that the cost
  of executing these queries can be estimated.
- Optimization of the outer query proceeds normally. During this phase
  the optimizer may request execution of non-expensive constant subqueries.
  Each place where the optimizer may potentially execute an expensive
  expression is guarded with the predicate Item::is_expensive().
- The implementation of Item_subselect::is_expensive has been extended
  to use the number of examined rows (estimated by the optimizer) as a
  way to determine whether the subquery is expensive or not.
- The new system variable "expensive_subquery_limit" controls how many
  examined rows are considered to be not expensive. The default is 100.

In addition, multiple changes were needed to make this solution work
in the light of the changes made by MWL#89. These changes were needed
to fix various crashes and wrong results, and legacy bugs discovered
during development.
2012-05-17 13:46:05 +03:00
Igor Babaev
8b469eb515 Merge 5.3->5.5. 2012-03-01 14:22:22 -08:00
Igor Babaev
841a74a4d6 Fixed LP bug #939009.
The result of materialization of the right part of an IN subquery predicate
is placed into a temporary table. Each row of the materialized table is
distinct. A unique key over all fields of the temporary table is defined and
created. It allows to perform key look-ups into the table.
The table created for a materialized subquery can be accessed by key as
any other table. The function best_access-path search for the best access
to join a table to a given partial join. With some where conditions this
function considers a possibility of a ref_or_null access. If such access
employs the unique key on the temporary table then when estimating
the cost this access the function tries to use the array rec_per_key. Yet,
such array is not built for this unique key. This causes a crash of the server.

Rows returned by the subquery that contain nulls don't have to be placed
into temporary table, as they cannot be match any row produced by the
left part of the subquery predicate. So all fields of the temporary table
can be defined as non-nullable. In this case any ref_or_null access
to the temporary table does not make any sense and it does not make sense
to estimate such an access.

The fix makes sure that the temporary table for a materialized IN subquery
is defined with columns that are all non-nullable. The also ensures that 
any row with nulls returned by the subquery is not placed into the
temporary table.
2012-02-24 16:50:22 -08:00
Sergei Golubchik
4f435bddfd 5.3 merge 2012-01-13 15:50:02 +01:00
unknown
072073c09e Backport of WL#5953 from MySQL 5.6
The patch differs from the original MySQL patch as follows:
- All test case differences have been reviewed one by one, and
  care has been taken to restore the original plan so that each
  test case executes the code path it was designed for.
- A bug was found and fixed in MariaDB 5.3 in
  Item_allany_subselect::cleanup().
- ORDER BY is not removed because we are unsure of all effects,
  and it would prevent enabling ORDER BY ... LIMIT subqueries.
- ref_pointer_array.m_size is not adjusted because we don't do
  array bounds checking, and because it looks risky.

Original comment by Jorgen Loland:
-------------------------------------------------------------
WL#5953 - Optimize away useless subquery clauses
      
For IN/ALL/ANY/SOME/EXISTS subqueries, the following clauses are 
meaningless:
      
* ORDER BY (since we don't support LIMIT in these subqueries)
* DISTINCT
* GROUP BY if there is no HAVING clause and no aggregate 
  functions
      
This WL detects and optimizes away these useless parts of the
query during JOIN::prepare()
2011-12-19 23:05:44 +02:00
Igor Babaev
f5dac20f38 Made the optimizer switch flags 'outer_join_with_cache', 'semijoin_with_cache'
set to 'on' by default.
2011-12-15 00:21:15 -08:00
Sergei Golubchik
745c53ec06 5.2->5.3 merge 2011-12-12 13:00:33 +01:00
unknown
6404504d0c Fixed bug lp:900375
The range optimizer incorrectly chose a loose scan for group by
when there is a correlated WHERE condition. This range access
method cannot be executed for correlated conditions also with the
"range checked for each record" because generally the range access
method can change for each outer record. Loose scan destructively
changes the query plan and removes the GROUP operation, which will
result in wrong query plans if another range access is chosen
dynamically.
2011-12-12 12:36:46 +02:00
unknown
314c377422 Fixed bug lp:888456
Analysis:
The class member QUICK_GROUP_MIN_MAX_SELECT::seen_first_key
was not reset between subquery re-executions. Thus each
subsequent execution continued from the group that was
reached by the previous subquery execution. As a result
loose scan reached end of file much earlier, and returned
empty result where it shouldn't.

Solution:
Reset seen_first_key before each re-execution of the
loose scan.
2011-12-08 12:05:52 +02:00
Sergey Petrunya
255fd6c929 Make subquery Materialization, as well as semi-join Materialization be shown
in EXPLAIN as select_type==MATERIALIZED. 

Before, we had select_type==SUBQUERY and it was difficult to tell materialized
subqueries from uncorrelated scalar-context subqueries.
2011-12-05 01:31:42 +04:00
Sergei Golubchik
effed09bd7 5.3->5.5 merge 2011-11-27 17:46:20 +01:00
Sergei Golubchik
d2755a2c9c 5.3->5.5 merge 2011-11-22 18:04:38 +01:00
unknown
511459bd14 Enable subquery materialization=ON by default. 2011-11-09 15:36:25 +02:00
Sergey Petrunya
47861a6577 Change the default @@optimizer_switch settings:
- semijoin=on
- firstmatch=on
- loosescan=on
2011-11-02 13:48:41 +04:00
Sergei Golubchik
76f0b94bb0 merge with 5.3
sql/sql_insert.cc:
  CREATE ... IF NOT EXISTS may do nothing, but
  it is still not a failure. don't forget to my_ok it.
  ******
  CREATE ... IF NOT EXISTS may do nothing, but
  it is still not a failure. don't forget to my_ok it.
sql/sql_table.cc:
  small cleanup
  ******
  small cleanup
2011-10-19 21:45:18 +02:00
unknown
2df1914791 Fix bug lp:827416
Analysis:
Constant table optimization of the outer query finds that
the right side of the equality is a constant that can
be used for an eq_ref access to fetch one row from t1,
and substitute t1 with a constant. Thus constant optimization
triggers evaluation of the subquery during the optimize
phase of the outer query.

The innermost subquery requires a plan with a temporary
table because with InnoDB tables the exact count of rows
is not known, and the empty tables cannot be optimzied
way. JOIN::exec for the innermost subquery substitutes
the subquery tables with a temporary table.

When EXPLAIN gets to print the tables in the innermost
subquery, EXPLAIN needs to print the name of each table
through the corresponding TABLE_LIST object. However,
the temporary table created during execution doesn't
have a corresponding TABLE_LIST, so we get a null
pointer exception.

Solution:
The solution is to forbid using expensive constant
expressions for eq_ref access for contant table
optimization. Notice that eq_ref with a subquery
providing the value is still possible during regular
execution.
2011-08-27 00:40:29 +03:00
Sergey Petrunya
0e19f3e36f Backport of:
revno: 2876.47.174
revision-id: jorgen.loland@oracle.com-20110519120355-qn7eprkad9jqwu5j
parent: mayank.prasad@oracle.com-20110518143645-bdxv4udzrmqsjmhq
committer: Jorgen Loland <jorgen.loland@oracle.com>
branch nick: mysql-trunk-11765831
timestamp: Thu 2011-05-19 14:03:55 +0200
message:
  BUG#11765831: 'RANGE ACCESS' MAY INCORRECTLY FILTER 
                        AWAY QUALIFYING ROWS
        
  The problem was that the ranges created when OR'ing two 
  conditions could be incorrect. Without the bugfix, 
  "I <> 6 OR (I <> 8 AND J = 5)" would create these ranges:
  
  "NULL < I < 6",
  "6 <= I <= 6 AND 5 <= J <= 5",
  "6 < I < 8",
  "8 <= I <= 8 AND 5 <= J <= 5",
  "8 < I"
  
  While the correct ranges is
  "NULL < I < 6",
  "6 <= I <= 6 AND 5 <= J <= 5",
  "6 < I"
  
  The problem occurs when key_or() ORs
  (1) "NULL < I < 6, 6 <= I <= 6 AND 5 <= J <= 5, 6 < I" with 
  (2) "8 < I AND 5 <= J <= 5"
  
  The reason for the bug is that in key_or(), SEL_ARG *tmp is 
  used to point to the range in (1) above that is merged with 
  (2) while key1 points to the root of the red-black tree of 
  (1). When merging (1) and (2), tmp refers to the "6 < I" 
  part whereas the root is the "6 <= ... AND 5 <= J <= 5" part. 
  
  key_or() decides that the tmp range needs to be split into
  "6 < I < 8, 8 <= I <= 8, 8 < I", in which next_key_part of the 
  second range should be that of tmp. However, next_key_part is
  set to key1->next_key_part ("5 <= J <= 5") instead of 
  tmp->next_key_part (empty). Fixing this gives the correct but
  not optimal ranges:
  "NULL < I < 6",
  "6 <= I <= 6 AND 5 <= J <= 5",
  "6 < I < 8",
  "8 <= I <= 8",
  "8 < I"
  
  A second problem can be seen above: key_or() may create 
  adjacent ranges that could be replaced with a single range. 
  Fixes for this is also included in the patch so that the range
  above becomes correct AND optimal:
  "NULL < I < 6",
  "6 <= I <= 6 AND 5 <= J <= 5",
  "6 < I"
  
  Merging adjacent ranges like this gives a slightly lower cost 
  estimate for the range access.
2011-08-05 22:01:49 +04:00