Completed the fix for this bug.
Note: in 5.3 the affected 'if' statement in Item_in_subselect::single_value_transformer()
starting with the condition (thd->variables.sql_mode & MODE_ONLY_FULL_GROUP_BY)
should be removed altogether. The change from table.cc is not needed either.
This is because in 5.3
- min/max transformation for subqueries are done at the optimization phase
- evaluation of the expensive subqueries is done at the execution phase.
Added an EXPLAIN EXTENDED to the test case for bug #12329653.
from a heap temptable, which uses pointers to records (that is, byte*
pointers) as rowids.
This meant that for rows with the same sort key value, the order
was determined by memory layout.
The MIN/MAX optimizer code from the function opt_sum_query erroneously
did not take into account conjunctive conditions that did not depend on
any table, yet were not identified as constant items. These could be
items containing rand() or PS/SP parameters. These items are supposed
to be evaluated at the execution phase. That's why if such conditions
can be extracted from the WHERE condition the MIN/MAX optimization is
not applied as currently it is always done at the optimization phase.
(In 5.3 expensive subqueries are also evaluated only at the execution
phase. So, if a constant condition with such subquery can be extracted
from the WHERE clause the MIN/MAX optimization should not be applied
in 5.3.)
IF an IN/ALL/SOME predicate with a constant left part is transformed
into an EXISTS subquery the resulting subquery should not be considered
uncacheable if the right part of the predicate is not uncacheable.
Backported the function dbug_print_item() from 5.3. The function is used
only for debugging.
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
The function st_table::mark_virtual_columns_for_write() did not take into
account the fact that for any table the value of st_table::vfield is 0
when there are no virtual columns in the table definition.
The patch differs from the original MySQL patch as follows:
- All test case differences have been reviewed one by one, and
care has been taken to restore the original plan so that each
test case executes the code path it was designed for.
- A bug was found and fixed in MariaDB 5.3 in
Item_allany_subselect::cleanup().
- ORDER BY is not removed because we are unsure of all effects,
and it would prevent enabling ORDER BY ... LIMIT subqueries.
- ref_pointer_array.m_size is not adjusted because we don't do
array bounds checking, and because it looks risky.
Original comment by Jorgen Loland:
-------------------------------------------------------------
WL#5953 - Optimize away useless subquery clauses
For IN/ALL/ANY/SOME/EXISTS subqueries, the following clauses are
meaningless:
* ORDER BY (since we don't support LIMIT in these subqueries)
* DISTINCT
* GROUP BY if there is no HAVING clause and no aggregate
functions
This WL detects and optimizes away these useless parts of the
query during JOIN::prepare()
- The problem was that const-table-reading code would try to evaluate MATCH()
before init_ftfuncs() was called.
- Fixed by making MATCH function "expensive" so that nobody tries to evaluate it
at optimization phase.
- Correctly handle plan refinement stage for LooseScan plans: run create_ref_for_key() if LooseScan
plan includes a ref access, and if we don't have any fixed key components, switch to a full index scan.
- Let JTBM optimization code handle the case where the subquery is degenerate and doesn't have a
join query plan. Regular materialization would fall back to IN->EXISTS for such cases. Semi-Join
materialization does not have such option, instead we introduce and use "constant JTBM join tabs".
- Do a "more thorough" cleanup of SJ-Materialization join tab in JOIN_TAB::cleanup. The bug
was due to the fact that JOIN_TAB::cleanup() may be called multiple times for the same tab
if the join has grouping.
- Changed storage to be 2 bytes instead of sizeof(size_t) (simple optimization)
- Fixed bug when using query_cache_strip_comments and query that started with '('
- Fixed DBUG_PRINT() that used wrong (not initialized) variables.
mysql-test/mysql-test-run.pl:
Added some space to make output more readable.
mysql-test/r/query_cache.result:
Updated test results
mysql-test/t/query_cache.test:
Added test with query_cache_strip_comments
sql/mysql_priv.h:
Added QUERY_CACHE_DB_LENGTH_SIZE
sql/sql_cache.cc:
Fixed bug when using query_cache_strip_comments and query that started with '('
Store db length in 2 characters instead of size_t.
Get db length from correct position (earlier we had an error when query started with ' ')
Fixed DBUG_PRINT() that used wrong (not initialized) variables.
SMALL KEY CACHE
The server crashed on division by zero because the key cache was not
initialized and the block length was 0 which was used in a division.
The fix was to not allow CACHE INDEX if the key cache was not initiallized.
Thus never try LOAD INDEX INTO CACHE for an uninitialized key cache.
Also added some windows files/directories to .bzrignore.
The range optimizer incorrectly chose a loose scan for group by
when there is a correlated WHERE condition. This range access
method cannot be executed for correlated conditions also with the
"range checked for each record" because generally the range access
method can change for each outer record. Loose scan destructively
changes the query plan and removes the GROUP operation, which will
result in wrong query plans if another range access is chosen
dynamically.
If the duplicate elimination strategy is used for a semi-join and potentially
one of the block-based join algorithms can be employed to join the inner
tables of the semi-join then sorting of the head (first non-constant) table
for a query with ORDER BY / GROUP BY cannot be used.
The function setup_sj_materialization_part1() forgot to set the value
of TABLE::map for any materialized IN subquery.
This could lead to wrong results for queries with subqueries that were
converted to queries with semijoins.
Analysis:
The class member QUICK_GROUP_MIN_MAX_SELECT::seen_first_key
was not reset between subquery re-executions. Thus each
subsequent execution continued from the group that was
reached by the previous subquery execution. As a result
loose scan reached end of file much earlier, and returned
empty result where it shouldn't.
Solution:
Reset seen_first_key before each re-execution of the
loose scan.
The execution plan cannot use sorting on the first table from the
sequence of the joined tables if it plans to employ the block-based
hash join algorithm.
- Part 1 of the fix: for semi-join merged subqueries, calling child_join->optimize() until we're done with all
PS-lifetime optimizations in the parent.
- Make create_tmp_table() set KEY_PART_INFO attributes for the keys it creates.
This wasn't needed before but is needed now, when temp. tables that are
results of SJ-Materialization are being used for joins.
This particular bug depended on HA_VAR_LENGTH_PART being set,
but also added code to set HA_BLOB_PART and HA_NULL_PART when appropriate.
KEYUSE elements for a possible hash join key are not sorted by field
numbers of the second table T of the hash join operation. Besides
some of these KEYUSE elements cannot be used to build any key as their
key expressions depend on the tables that are planned to be accessed
after the table T.
The code before the patch did not take this into account and, as a result,
execition of a query the employing block-based hash join algorithm could
cause a crash or return a wrong result set.
If has been decided that the first match strategy is to be used to join table T
from a semi-join nest while no buffer can be employed to join this table
then no join buffer can be used to join any table in the join sequence between
the first one belonging to the semi-join nest and table T.