Part 2 :
There was a special optimization on the ref access method for
ORDER BY ... DESC that was set without actually looking on the type of the
selected index for ORDER BY.
Fixed the SELECT ... ORDER BY .. DESC (it uses a different code path compared
to the ASC that has been fixed with the previous fix).
values
We should re-set the access method functions when changing the access
method when switching to another index to avoid sorting.
Fixed by doing a little re-engineering : encapsulating all the function
assignment into a special function and calling it when flipping the
indexes.
buffering is used
FORCE INDEX FOR ORDER BY now prevents the optimizer from
using join buffering. As a result the optimizer can use
indexed access on the first table and doesn't need to
sort the complete resultset at the end of the statement.
Problem 1:
When the 'Using index' optimization is used, the optimizer may still - after
cost-based optimization - decide to use another index in order to avoid using
a temporary table. But when this happens, the flag to the storage engine to
read index only (not table) was still set. Fixed by resetting the flag in the
storage engine and TABLE structure in the above scenario, unless the new index
allows for the same optimization.
Problem 2:
When a 'ref' access method was employed by cost-based optimizer, (when the column
is non-NULLable), it was assumed that it needed no initialization if 'quick' access
methods (since they are based on range scan). When ORDER BY optimization overrides
the decision, however, it expects to have this initialized and hence crashes.
Fixed in 5.1 (was fixed in 6.0 already) by initializing 'quick' even when there's
'ref' access.
Server crashed during a sort order optimization
of a dependent subquery:
SELECT
(SELECT t1.a FROM t1, t2
WHERE t1.a = t2.b AND t2.a = t3.c
ORDER BY t1.a)
FROM t3;
Bitmap of tables, that the reference to outer table
column uses, in addition to the regular table bit
has the OUTER_REF_TABLE_BIT bit set.
The only_eq_ref_tables function traverses this map
bit by bit simultaneously with join->map2table list.
Obviously join->map2table never contains an entry
for the OUTER_REF_TABLE_BIT pseudo-table, so the
server crashed there.
The only_eq_ref_tables function has been modified
to traverse regular table bits only like the
update_depend_map function (resetting of the
OUTER_REF_TABLE_BIT there is enough, but
resetting of the whole set of PSEUDO_TABLE_BITS
is used there for sure).
The function test_if_skip_sort_order ignored any covering index used for ref
access of a table in a query with ORDER BY if this index was incompatible
with the ORDER BY list and there was another covering index compatible with
this list.
As a result sub-optimal execution plans were chosen for some queries with
ORDER BY clause.
The code for executing indexed ORDER BY was not setting all the
internal fields correctly when selecting to execute ORDER BY over
and index.
Fixed by change the access method to one that will use the
quick indexed access if one is selected while selecting indexed
ORDER BY.
The out of memory error was thrown when the sort buffer size were too small.
This led to a user confusion.
Now filesort throws the error message about sort buffer being too small.
The optimizer takes different execution paths during EXPLAIN than SELECT,
this fix relates only to EXPLAIN, hence no behavior changes.
The test of sort keys for ORDER BY was prohibited from considering keys
that were mentioned in IGNORE KEYS FOR ORDER BY. This led to two
inconsistencies: One was that IGNORE INDEX FOR GROUP BY and
IGNORE INDEX FOR ORDER BY gave apparently different EXPLAINs; the latter
erroneously claimed to do filesort. The second inconsistency
is that the test of sort keys is called twice, finding a sort key the first
time but not the second time, leading to the mentioned filesort.
Fixed by making the test of sort keys consider all enabled
keys on the table. This test rejects keys that are not covering, and for
covering keys the hint should be ignored anyway.
This patch adds cost estimation for the queries with ORDER BY / GROUP BY
and LIMIT.
If there was a ref/range access to the table whose rows were required
to be ordered in the result set the optimizer always employed this access
though a scan by a different index that was compatible with the required
order could be cheaper to produce the first L rows of the result set.
Now for such queries the optimizer makes a choice between the cheapest
ref/range accesses not compatible with the given order and index scans
compatible with it.
IN/BETWEEN predicates in sorting expressions.
Wrong results may occur when the select list contains an expression
with IN/BETWEEN predicate that differs from a sorting expression by
an additional NOT only.
Added the method Item_func_opt_neg::eq to compare correctly expressions
containing [NOT] IN/BETWEEN.
The eq method inherited from the Item_func returns TRUE when comparing
'a IN (1,2)' with 'a NOT IN (1,2)' that is not, of course, correct.
DATE/DATETIME values are out of the currently supported
4 basic value types (INT,STRING,REAL and DECIMAL).
So expressions (not fields) of compile type DATE/DATETIME are
generally considered as STRING values. This is not so
when they are compared : then they are compared as
INTEGER values.
But the rule for comparison as INTEGERS must be checked
explicitly each time when a comparison is to be performed.
filesort is one such place. However there the check was
not done and hence the expressions (not fields) of type
DATE/DATETIME were sorted by their string representation.
Fixed to compare them as INTEGER values for filesort.
Functions over sum functions wasn't set up correctly for the ORDER BY clause
which leads to a wrong order of the result set.
The split_sum_func() function is called now for each ORDER BY item that
contains a sum function to set it up correctly.
"update existingtable set anycolumn=nonexisting order by nonexisting" would crash
the server.
Though we would find the reference to a field, that doesn't mean we can then use
it to set some values. It could be a reference to another field. If it is NULL,
don't try to use it to set values in the Item_field and instead return an error.
Over the previous patch, this signals an error at the location of the error, rather
than letting the subsequent deref signal it.
st_table::const_key_parts member is used in determining if
certain key has a prefix that is compared to constant(s) in
the query predicates.
If there's such prefix the index can be used to get the data
from the remaining suffix columns in sorted order.
However if a field is compared to another field from a "const"
table the const_key_parts is not amended.
This makes the optimizer unable to detect that the key can be
used for sorting and adds an extra filesort.
Fixed by updating const_key_parts after reading in the "const"
table.
In the method Item_field::fix_fields we try to resolve the name of
the field against the names of the aliases that occur in the select
list. This is done by a call of the function find_item_in_list.
When this function finds several occurrences of the field name
it sends an error message to the error queue and returns 0.
Yet the code did not take into account that find_item_in_list
could return 0 and tried to dereference the returned value.
The parser is allocating Item_field for references by name in ORDER BY
expressions. Such expressions however may point not only to Item_field
in the select list (or to a table column) but also to an arbitrary Item.
This causes Item_field::fix_fields to throw an error about missing
column.
The fix substitutes Item_field for the reference with an Item_ref when
not pointing to Item_field.
table in a join
The optimizer removes redundant columns in ORDER BY. It is considering
redundant every reference to const table column, e.g b in :
create table t1 (a int, b int, primary key(a));
select 1 from t1 order by b where a = 1
But it must not remove references to const table columns if the
const table is an outer table because there still can be 2 values :
the const value and NULL. e.g.:
create table t1 (a int, b int, primary key(a));
select t2.b c from t1 left join t1 t2 on (t1.a = t2.a and t2.a = 5)
order by c;
When a default of '' was specified for TEXT/BLOB columns, the specification
was silently ignored. This is presumably to be nice to applications (or
people) who generate their column definitions in a not-very-clever fashion.
For clarity, doing this now results in a warning, or an error in strict
mode.