We miss some records sometimes using RANGE method if we have
partial key segments.
Example:
Create table t1(a char(2), key(a(1)));
insert into t1 values ('a'), ('xx');
select a from t1 where a > 'x';
We call index_read() passing 'x' key and HA_READ_AFTER_KEY flag
in the handler::read_range_first() wich is wrong because we have
a partial key segment for the field and might miss records like 'xx'.
Fix: don't use open segments in such a case.
When using index for group by and range access the server isolates
a set of ranges based on the conditions over the key parts of the
index used. Then it uses only the ranges over the GROUP BY fields to
jump from one group to another. Since the GROUP BY fields may form a
prefix over the index, we may use only a prefix of the ranges produced
by the range optimizer.
Each range contains a notion on whether it includes its border values.
The problem is that when using a range prefix, the last range is open
because it assumes that there is a range on the next keypart. Thus when
we use a prefix range as it is, it excludes all border values.
The solution is when ignoring the suffix of the range conditions
(to jump over the GROUP BY prefix only) the server must change the
remaining intervals so they always contain their borders, e.g.
if the whole range was :
(1,-inf) <= (<group_by_col>,<min_max_arg_col>) < (1, 3) we must make
(1) <= (<group_by_col>) <= (1) because (a,b) < (c1,c2) means :
a < c1 OR (a = c1 AND b < c2).
this key does not stop" (version for 5.0 only).
UPDATE statement which WHERE clause used key and which invoked trigger
that modified field in this key worked indefinetely.
This problem occured because in cases when UPDATE statement was
executed in update-on-the-fly mode (in which row is updated right
during evaluation of select for WHERE clause) the new version of
the row became visible to select representing WHERE clause and was
updated again and again.
We already solve this problem for UPDATE statements which does not
invoke triggers by detecting the fact that we are going to update
field in key used for scanning and performing update in two steps,
during the first step we gather information about the rows to be
updated and then doing actual updates. We also do this for
MULTI-UPDATE and in its case we even detect situation when such
fields are updated in triggers (actually we simply assume that
we always update fields used in key if we have before update
trigger).
The fix simply extends this check which is done in check_if_key_used()/
QUICK_SELECT_I::check_if_keys_used() routine/method in such way that
it also detects cases when field used in key is updated in trigger.
As nice side-effect we have more precise and thus more optimal
perfomance-wise check for the MULTI-UPDATE.
Also check_if_key_used()/QUICK_SELECT_I::check_if_keys_used() were
renamed to is_key_used()/QUICK_SELECT_I::is_keys_used() in order to
better reflect that boolean predicate.
Note that this check is implemented in much more elegant way in 5.1
Only MyISAM tables locked with LOCK TABLES ... WRITE were affected.
A query that is optimized with index_merge doesn't reflect rows
inserted within LOCK TABLES.
MyISAM doesn't flush a state within LOCK TABLES. index_merge
optimization creates a copy of the handler, which thus gets
outdated MyISAM state.
New handler->clone() method is introduced to fix this problem.
For non-MyISAM storage engines it allocates a handler and opens
it with ha_open(). For MyISAM it additionally copies MyISAM state
pointer to cloned handler.
when a range condition use an invalid DATETIME constant.
Now we do not use invalid DATETIME constants to form end keys for
range intervals: range analysis just ignores predicates with such
constants.
buffer for a MY_BITMAP temporary buffer allocated on stack in the
function get_best_covering_ror_intersect().
Now the buffer of a proper size is allocated by a request from this
function in mem_root.
We succeeded to demonstrate the bug only on Windows with a very large
database. That's why no test case is provided for in the patch.
Fix when __attribute__() is stubbed out, add ATTRIBUTE_FORMAT() for specifying
__attribute__((format(...))) safely, make more use of the format attribute,
and fix some of the warnings that this turns up (plus a bonus unrelated one).
In fix for BUG#15872, a condition of type "t.key NOT IN (c1, .... cN)"
where N>1000, was incorrectly converted to
(-inf < X < c_min) OR (c_max < X)
Now this conversion is removed, we dont produce any range lists for such
conditions.
3.23 regression test failure
The member SEL_ARG::min_flag was not initialized,
due to which the condition for no GEOM_FLAG in function
key_or did not choose "Range checked for each record" as
the correct access method.
- When manually constructing a SEL_TREE for "t.key NOT IN(...)", take into account that
get_mm_parts may return a tree with type SEL_TREE::IMPOSSIBLE
- Added missing OOM checks
- Added comments
Re-work best_access_path() and find_best() to reuse E(#rows(range access)) as
E(#rows(ref[_or_null](const) access) only when it is appropriate.
[This is the final cumulative patch]
too much memory. Instead, either create the equvalent SEL_TREE manually, or create only two ranges that
strictly include the area to scan
(Note: just to re-iterate: increasing NOT_IN_IGNORE_THRESHOLD will make optimization run slower for big
IN-lists, but the server will not run out of memory. O(N^2) memory use has been eliminated)
The bug was due to a missed case in the detection of whether an index
can be used for loose scan. More precisely, the range optimizer chose
to use loose index scan for queries for which the condition(s) over
an index key part could not be pushed to the index together with the
loose scan.
As a result, loose index scan was selecting the first row in the
index with a given GROUP BY prefix, and was applying the WHERE
clause after that, while it should have inspected all rows with
the given prefix, and apply the WHERE clause to all of them.
The fix detects and skips such cases.
- Added empty constructors and virtual destructors to many classes and structs
- Removed some usage of the offsetof() macro to instead use C++ class pointers
If check_quick_select returns non-empty range then the function cost_group_min_max
cannot return 0 as an estimate of the number of retrieved records.
Yet the function erroneously returned 0 as the estimate in some situations.
- Fixed tests
- Optimized new code
- Fixed some unlikely core dumps
- Better bug fixes for:
- #14397 - OPTIMIZE TABLE with an open HANDLER causes a crash
- #14850 (ERROR 1062 when a quering a view using a Group By on a column that can be null
Allow for configuration of the maximum number of indexes per table.
Added and used a configure.in macro.
Replaced fixed limits by the configurable limit.
Limited MyISAM indexes to its hard limit.
Fixed a bug in opt_range.cc for many indexes with InnoDB.
Tested for 2, 63, 64, 65, 127, 128, 129, 255, 256, and 257 indexes.
Testing this part of the bugfix requires rebuilding of the server
with different options. This cannot be done with our test suite.
Therefore I added the necessary test files to the bug report.
If you repeat the tests, please note that the ps_* tests fail for
everything but 64 indexes. This is because of differences in the
meta data, namely field lengths for index names etc.
by starting with an empty index set and adding indexes to it until it
becomes covering. If the set becomes covering after adding the first index,
return NULL and don't try constructing ROR-intersection of one index (which
caused a crash)