Treat queries with no FROM and aggregate functions as normal queries,
so the aggregate function get correctly calculated as if there is 1 row.
This means that they will be considered to have one row, so COUNT(*) will return
1 instead of 0. Other aggregates will behave in compatible manner.
- Make the range-et-al optimizer produce E(#table records after table
condition is applied),
- Make the join optimizer use this value,
- Add "filtered" column to EXPLAIN EXTENDED to show
fraction of records left after table condition is applied
- Adjust test results, add comments
The problem was in that opt_sum_query() replaced MIN/MAX functions
with the corresponding constant found in a key, but due to imprecise
representation of float numbers, when evaluating the where clause,
this comparison failed.
When MIN/MAX optimization detects that all tables can be removed,
also remove all conjuncts in a where clause that refer to these
tables. As a result of this fix, these conditions are not evaluated
twice, and in the case of float number comparisons we do not discard
result rows due to imprecise float representation.
As a side-effect this fix also corrects an unnoticed problem in
bug 12882.
The bug caused a crash of the server if a subquery with
ORDER BY DESC used the range access method.
The bug happened because the method QUICK_SELECT_DESC::reset
was not reworked after MRR interface had been introduced.
The bug was due to a loss happened during a refactoring made
on May 30 2005 that modified the function JOIN::reinit.
As a result of it for any subquery the value of offset_limit_cnt
was not restored for the following executions. Yet the first
execution of the subquery made it equal to 0.
The fix restores this value in the function JOIN::reinit.
DESCRIBE returned the type BIGINT for a column of a view if the column
was specified by an expression over values of the type INT.
E.g. for the view defined as follows:
CREATE VIEW v1 SELECT COALESCE(f1,f2) FROM t1
DESCRIBE returned type BIGINT for the only column of the view if f1,f2 are
columns of the INT type.
At the same time DESCRIBE returned type INT for the only column of the table
defined by the statement:
CREATE TABLE t2 SELECT COALESCE(f1,f2) FROM t1.
This inconsistency was removed by the patch.
Now the code chooses between INT/BIGINT depending on the
precision of the aggregated column type.
Thus both DESCRIBE commands above returns type INT for v1 and t2.
may return a wrong result.
An Item_sum_hybrid object has the was_values flag which indicates whether any
values were added to the sum function. By default it is set to true and reset
to false on any no_rows_in_result() call. This method is called only in
return_zero_rows() function. An ALL/ANY subquery can be optimized by MIN/MAX
optimization. The was_values flag is used to indicate whether the subquery
has returned at least one row. This bug occurs because return_zero_rows() is
called only when we know that the select will return zero rows before
starting any scans but often such information is not known.
In the reported case the return_zero_rows() function is not called and
the was_values flag is not reset to false and yet the subquery return no rows
Item_func_not_all and Item_func_nop_all functions return a wrong
comparison result.
The end_send_group() function now calls no_rows_in_result() for each item
in the fields_list if there is no rows were found for the (sub)query.
The ALL/ANY subqueries are the subject of MIN/MAX optimization. The matter
of this optimization is to embed MIN() or MAX() function into the subquery
in order to get only one row by which we can tell whether the expression
with ALL/ANY subquery is true or false.
But when it is applied to a subquery like 'select a_constant' the reported bug
occurs. As no tables are specified in the subquery the do_select() function
isn't called for the optimized subquery and thus no values have been added
to a MIN()/MAX() function and it returns NULL instead of a_constant.
This leads to a wrong query result.
For the subquery like 'select a_constant' there is no reason to apply
MIN/MAX optimization because the subquery anyway will return at most one row.
Thus the Item_maxmin_subselect class is more appropriate for handling such
subqueries.
The Item_in_subselect::single_value_transformer() function now checks
whether tables are specified for the subquery. If no then this subselect is
handled like a UNION using an Item_maxmin_subselect object.
The problem was that we restored SQL_CACHE, SQL_NO_CACHE flags in SELECT
statement from internal structures based on value set later at runtime, not
the original value set by the user.
The solution is to remember that original value.
The convert_constant_item() function converts constant items to ints on
prepare phase to optimize execution speed. In this case it tries to evaluate
subselect which contains a derived table and is contained in a derived table.
All derived tables are filled only after all derived tables are prepared.
So evaluation of subselect with derived table at the prepare phase will
return a wrong result.
A new flag with_subselect is added to the Item class. It indicates that
expression which this item represents is a subselect or contains a subselect.
It is set to 0 by default. It is set to 1 in the Item_subselect constructor
for subselects.
For Item_func and Item_cond derived classes it is set after fixing any argument
in Item_func::fix_fields() and Item_cond::fix_fields accordingly.
The convert_constant_item() function now doesn't convert a constant item
if the with_subselect flag set in it.
When a view statement is compiled on CREATE VIEW time, most of the
optimizations should not be done. Finding the right optimization
for a subquery is one of them.
Unfortunately the optimizer is resolving the column references of
the left expression of IN subqueries in the process of deciding
witch optimization to use (if needed). So there should be a
special case in Item_in_subselect::fix_fields() : check the
validity of the left expression of IN subqueries in CREATE VIEW
mode and then proceed as normal.
Re-work best_access_path() and find_best() to reuse E(#rows(range access)) as
E(#rows(ref[_or_null](const) access) only when it is appropriate.
[This is the final cumulative patch]
This performance degradation was due to the fact that some
cost evaluation code added into 4.1 in the function find_best was
not merged into the code of the function best_access_path added
together with other code for greedy optimizer.
Added a parameter to the function print_plan. The parameter contains
accumulated cost for a given partial join.
The patch does not include a special test case since this performance
degradation is hard to reproduse with a simple example.
TODO: make the function find_best use the function best_access_path
in order to remove duplication of code which might result in incomplete
merges in the future.
In the code that converts IN predicates to EXISTS predicates it is changing
the select list elements to constant 1. Example :
SELECT ... FROM ... WHERE a IN (SELECT c FROM ...)
is transformed to :
SELECT ... FROM ... WHERE EXISTS (SELECT 1 FROM ... HAVING a = c)
However there can be no FROM clause in the IN subquery and it may not be
a simple select : SELECT ... FROM ... WHERE a IN (SELECT f(..) AS
c UNION SELECT ...) This query is transformed to : SELECT ... FROM ...
WHERE EXISTS (SELECT 1 FROM (SELECT f(..) AS c UNION SELECT ...)
x HAVING a = c) In the above query c in the HAVING clause is made to be
an Item_null_helper (a subclass of Item_ref) pointing to the real
Item_field (which is not referenced anywhere else in the query anymore).
This is done because Item_ref_null_helper collects information whether
there are NULL values in the result. This is OK for directly executed
statements, because the Item_field pointed by the Item_null_helper is
already fixed when the transformation is done. But when executed as
a prepared statement all the Item instances are "un-fixed" before the
recompilation of the prepared statement. So when the Item_null_helper
gets fixed it discovers that the Item_field it points to is not fixed
and issues an error. The remedy is to keep the original select list
references when there are no tables in the FROM clause. So the above
becomes : SELECT ... FROM ... WHERE EXISTS (SELECT c FROM (SELECT f(..)
AS c UNION SELECT ...) x HAVING a = c) In this way c is referenced
directly in the select list as well as by reference in the HAVING
clause. So it gets correctly fixed even with prepared statements. And
since the Item_null_helper subclass of Item_ref_null_helper is not used
anywhere else it's taken out.
Multiple equalities were not adjusted after reading constant tables.
It resulted in neglecting good index based methods that could be
used to access of other tables.
When there is conjunction of conds, the substitute_for_best_equal_field()
will call the eliminate_item_equal() function in loop to build final
expression. But if eliminate_item_equal() finds that some cond will always
evaluate to 0, then that cond will be substituted by Item_int with value ==
0. In this case on the next iteration eliminate_item_equal() will get that
Item_int and treat it as Item_cond. This is leads to memory corruption and
server crash on cleanup phase.
To the eliminate_item_equal() function was added DBUG_ASSERT for checking
that all items treaten as Item_cond are really Item_cond.
The substitute_for_best_equal_field() now checks that if
eliminate_item_equal() returns Item_int and it's value is 0 then this
value is returned as the result of whole conjunction.
A subquery transformation changes the HAVING clause of the embedding query if the subquery contains
a GROUP BY clause. Yet the split_sum_func2 function was not applied to the modified HAVING clause.
This could result in wrong answers.