remember range endpoints
The Loose Index Scan optimization keeps track of a sequence
of intervals. For the current interval it maintains the
current interval's endpoints. But the maximum endpoint was
not stored in the SQL layer; rather, it relied on the
storage engine to retain this value in-between reads. By
coincidence this holds for MyISAM and InnoDB. Not for the
partitioning engine, however.
Fixed by making the key values iterator
(QUICK_RANGE_SELECT) keep track of the current maximum endpoint.
This is also more efficient as we save a call through the
handler API in case of open-ended intervals.
The code to calculate endpoints was extracted into
separate methods in QUICK_RANGE_SELECT, and it was possible to
get rid of some code duplication as part of fix.
for each record'
There was an error in an internal structure in the range
optimizer (SEL_ARG). Bad design causes parts of a data
structure not to be initialized when it is in a certain
state. All client code must check that this state is not
present before trying to access the structure's data. Fixed
by
- Checking the state before trying to access data (in
several places, most of which not covered by test case.)
- Copying the keypart id when cloning SEL_ARGs
When merging ranges during calculation of the result of OR
to two range sets the current range may be obsoleted by the
resulting merged range.
The first overlapping range can be obsoleted as well.
Fixed by moving the pointer to the first overlapping range to the
pointer of the resulting union range.
Added few comments at key places in key_or().
When a query was using a DATE or DATETIME value formatted
using any other separator characters beside hyphen '-', a
query with a greater-or-equal '>=' condition matching only
the greatest value in an indexed column, the result was
empty if index range scan was employed.
The range optimizer got a new feature between 5.1.38 and
5.1.39 that changes a greater-or-equal condition to a
greater-than if the value matching that in the query was not
present in the table. But the value comparison function
compared the dates as strings instead of dates.
The bug was fixed by splitting the function
get_date_from_str in two: One part that parses and does
error checking. This function is now visible outside the
module. The old get_date_from_str now calls the new
function.
The problem was in incorrect handling of predicates involving
NULL as a constant value by the range optimizer.
For example, when creating a SEL_ARG node from a condition of
the form "field < const" (which would normally result in the
"NULL < field < const" SEL_ARG), the special case when "const"
is NULL was not taken into account, so "NULL < field < NULL"
was produced for the "field < NULL" condition.
As a result, SEL_ARG structures of this form could not be
further optimized which in turn could lead to incorrectly
constructed SEL_ARG trees. In particular, code assuming SEL_ARG
structures to always form a sequence of ordered disjoint
intervals could enter an infinite loop under some
circumstances.
Fixed by changing get_mm_leaf() so that for any sargable
predicate except "<=>" involving NULL as a constant, "empty"
SEL_ARG is returned, since such a predicate is always false.
The problem was in incorrect handling of predicates involving
NULL as a constant value by the range optimizer.
For example, when creating a SEL_ARG node from a condition of
the form "field < const" (which would normally result in the
"NULL < field < const" SEL_ARG), the special case when "const"
is NULL was not taken into account, so "NULL < field < NULL"
was produced for the "field < NULL" condition.
As a result, SEL_ARG structures of this form could not be
further optimized which in turn could lead to incorrectly
constructed SEL_ARG trees. In particular, code assuming SEL_ARG
structures to always form a sequence of ordered disjoint
intervals could enter an infinite loop under some
circumstances.
Fixed by changing get_mm_leaf() so that for any sargable
predicate except "<=>" involving NULL as a constant, "empty"
SEL_ARG is returned, since such a predicate is always false.
covering index
When two range predicates were combined under an OR
predicate, the algorithm tried to merge overlapping ranges
into one. But the case when a range overlapped several other
ranges was not handled. This lead to
1) ranges overlapping, which gave repeated results and
2) a range that overlapped several other ranges was cut off.
Fixed by
1) Making sure that a range got an upper bound equal to the
next range with a greater minimum.
2) Removing a continue statement
WHERE f1 < n ignored row if f1 was indexed integer column and
f1 = TYPE_MAX ^ n = TYPE_MAX+1. The latter value when treated
as TYPE overflowed (obviously). This was not handled, it is now.
Two disjuncts containing equalities of the form key=const1 and key=const2 can
be merged into one if const1 is equal to const2. To check it the common
collation of the constants were used rather than the collation of the field key.
For example when the default collation of the constants was cases insensitive
while the collation of the field was case sensitive, then two or-ed equality
predicates key='b' and key='B' incorrectly were merged into one f='b'. As a
result ref access was used instead of range access and wrong result sets were
returned in many cases.
Fixed the problem by comparing constant in the or-ed predicate with collation of
the key field.
- added join cache indication in EXPLAIN (Extra column).
- prefer filesort over full scan over
index for ORDER BY (because it's faster).
- when switching from REF to RANGE because
RANGE uses longer key turn off sort on
the head table only as the resulting
RANGE access is a candidate for join cache
and we don't want to disable it by sorting
on the first table only.
Pushbuild fixes:
- Make MAX_SEL_ARGS smaller (even 16K records_in_range() calls is
more than it makes sense to do in typical cases)
- Don't call sel_arg->test_use_count() if we've already allocated
more than MAX_SEL_ARGs elements. The test will succeed but will take
too much time for the test suite (and not provide much value).
- Added PARAM::alloced_sel_args where we count the # of SEL_ARGs
created by SEL_ARG tree cloning operations.
- Made the range analyzer to shortcut and not do any more cloning
if we've already created MAX_SEL_ARGS SEL_ARG objects in cloning.
- Added comments about space complexity of SEL_ARG-graph
representation.
for queries using 'range checked for each record'.
The problem was fixed in 5.0 by the patch for bug 12291.
This patch down-ported the corresponding code from 5.0 into
QUICK_SELECT::init() and added a new test case.
We miss some records sometimes using RANGE method if we have
partial key segments.
Example:
Create table t1(a char(2), key(a(1)));
insert into t1 values ('a'), ('xx');
select a from t1 where a > 'x';
We call index_read() passing 'x' key and HA_READ_AFTER_KEY flag
in the handler::read_range_first() wich is wrong because we have
a partial key segment for the field and might miss records like 'xx'.
Fix: don't use open segments in such a case.
Select_type in the EXPLAIN output for the query SELECT * FROM t1 was
'SIMPLE', while for the query SELECT * FROM v1, where the view v1
was defined as SELECT * FROM t1, the EXPLAIN output contained 'PRIMARY'
for the select_type column.
when a range condition use an invalid DATETIME constant.
Now we do not use invalid DATETIME constants to form end keys for
range intervals: range analysis just ignores predicates with such
constants.