The problem was in incorrect handling of predicates involving
NULL as a constant value by the range optimizer.
For example, when creating a SEL_ARG node from a condition of
the form "field < const" (which would normally result in the
"NULL < field < const" SEL_ARG), the special case when "const"
is NULL was not taken into account, so "NULL < field < NULL"
was produced for the "field < NULL" condition.
As a result, SEL_ARG structures of this form could not be
further optimized which in turn could lead to incorrectly
constructed SEL_ARG trees. In particular, code assuming SEL_ARG
structures to always form a sequence of ordered disjoint
intervals could enter an infinite loop under some
circumstances.
Fixed by changing get_mm_leaf() so that for any sargable
predicate except "<=>" involving NULL as a constant, "empty"
SEL_ARG is returned, since such a predicate is always false.
covering index
When two range predicates were combined under an OR
predicate, the algorithm tried to merge overlapping ranges
into one. But the case when a range overlapped several other
ranges was not handled. This lead to
1) ranges overlapping, which gave repeated results and
2) a range that overlapped several other ranges was cut off.
Fixed by
1) Making sure that a range got an upper bound equal to the
next range with a greater minimum.
2) Removing a continue statement
WHERE f1 < n ignored row if f1 was indexed integer column and
f1 = TYPE_MAX ^ n = TYPE_MAX+1. The latter value when treated
as TYPE overflowed (obviously). This was not handled, it is now.
Two disjuncts containing equalities of the form key=const1 and key=const2 can
be merged into one if const1 is equal to const2. To check it the common
collation of the constants were used rather than the collation of the field key.
For example when the default collation of the constants was cases insensitive
while the collation of the field was case sensitive, then two or-ed equality
predicates key='b' and key='B' incorrectly were merged into one f='b'. As a
result ref access was used instead of range access and wrong result sets were
returned in many cases.
Fixed the problem by comparing constant in the or-ed predicate with collation of
the key field.
- added join cache indication in EXPLAIN (Extra column).
- prefer filesort over full scan over
index for ORDER BY (because it's faster).
- when switching from REF to RANGE because
RANGE uses longer key turn off sort on
the head table only as the resulting
RANGE access is a candidate for join cache
and we don't want to disable it by sorting
on the first table only.
Pushbuild fixes:
- Make MAX_SEL_ARGS smaller (even 16K records_in_range() calls is
more than it makes sense to do in typical cases)
- Don't call sel_arg->test_use_count() if we've already allocated
more than MAX_SEL_ARGs elements. The test will succeed but will take
too much time for the test suite (and not provide much value).
- Added PARAM::alloced_sel_args where we count the # of SEL_ARGs
created by SEL_ARG tree cloning operations.
- Made the range analyzer to shortcut and not do any more cloning
if we've already created MAX_SEL_ARGS SEL_ARG objects in cloning.
- Added comments about space complexity of SEL_ARG-graph
representation.
for queries using 'range checked for each record'.
The problem was fixed in 5.0 by the patch for bug 12291.
This patch down-ported the corresponding code from 5.0 into
QUICK_SELECT::init() and added a new test case.
We miss some records sometimes using RANGE method if we have
partial key segments.
Example:
Create table t1(a char(2), key(a(1)));
insert into t1 values ('a'), ('xx');
select a from t1 where a > 'x';
We call index_read() passing 'x' key and HA_READ_AFTER_KEY flag
in the handler::read_range_first() wich is wrong because we have
a partial key segment for the field and might miss records like 'xx'.
Fix: don't use open segments in such a case.
Select_type in the EXPLAIN output for the query SELECT * FROM t1 was
'SIMPLE', while for the query SELECT * FROM v1, where the view v1
was defined as SELECT * FROM t1, the EXPLAIN output contained 'PRIMARY'
for the select_type column.
when a range condition use an invalid DATETIME constant.
Now we do not use invalid DATETIME constants to form end keys for
range intervals: range analysis just ignores predicates with such
constants.
In fix for BUG#15872, a condition of type "t.key NOT IN (c1, .... cN)"
where N>1000, was incorrectly converted to
(-inf < X < c_min) OR (c_max < X)
Now this conversion is removed, we dont produce any range lists for such
conditions.
Under row-based replication, DELETE FROM will now always be
replicated as individual row deletions, while TRUNCATE TABLE will
always be replicated as a statement.
after merge fix.
range.result:
fixing result accordingly
,
cast.result:
after merge fix
range.test:
Avoid SELECT'ing of BINARY column not to output 0x00 bytes.
- CHAR() now returns binary string as default
- CHAR(X*65536+Y*256+Z) is now equal to CHAR(X,Y,Z) independent of the character set for CHAR()
- Test for both ETIMEDOUT and ETIME from pthread_cond_timedwait()
(Some old systems returns ETIME and it's safer to test for both values
than to try to write a wrapper for each old system)
- Fixed new introduced bug in NOT BETWEEN X and X
- Ensure we call commit_by_xid or rollback_by_xid for all engines, even if one engine has failed
- Use octet2hex() for all conversion of string to hex
- Simplify and optimize code