An outer join query with a semi-join subquery could return a wrong result
if the optimizer chose to materialize the subquery.
It happened because when substituting for the best field into a ref item
used to build access keys not all COND_EQUAL objects that could be employed
at substitution were checked.
Also refined some code in the function check_join_cache_usage to make it
safer.
Problem was that now we can merge derived table (subquery in the FROM clause).
Fix: in case of detected conflict and presence of derived table "over" the table which cased the conflict - try materialization strategy.
If the flag 'optimize_join_buffer_size' is set to 'off' and the value
of the system variable 'join_buffer_size' is greater than the value of
the system variable 'join_buffer_space_limit' than no join cache can
be employed to join tables of the executed query.
A bug in the function JOIN_CACHE::alloc_buffer allowed to use join
buffer even in this case while another bug in the function
revise_cache_usage could cause a crash of the server in this case if the
chosen execution plan for the query contained outer join or semi-join
operation.
For single table update/insert added deep check of single tables (single_table_updatable()).
For multi-table view insert added additional check of target table (check_view_single_update).
Multi-update was correct.
Test suite for all cases added.
- If LooseScan is used with quick select, require that quick select produces
data in key order (this disables use of MRR, which can return data in arbitrary order).
not be reproduced in the latest release of mariadb-5.3 as it was was fixed
by Sergey Petrunia when working on the problems concerning outer joins within
in subqueries converted to semi-joins.
Fixed Item* Item_equal::get_first(JOIN_TAB *context, Item *field_item) to work correctly in the case where:
- context!= NO_PARTICULAR_TAB, it points to a table within SJ-Materialization nest
- field_item points to an item_equal that has a constant Item_field but does not have any fields
from tables that are within semi-join nests.
- Disable use of join cache when we're using FirstMatch strategy, and the join
order is such that subquery's inner tables are interleaved with outer. Join
buffering code is incapable of handling such join orders.
- The testcase requires use of @@debug_optimizer_prefer_join_prefix to hit the bug,
but I'm pushing it anyway (including the mention of the variable in .test file),
so that it can be found and enabled when/if we get something comparable in the
main tree.
The problem was that LooseScan execution code assumed that tab->key holds
the index used for looseScan. This is only true when range or full index
scan are used. In case of ref access, the index is in tab->ref.key (and
tab->index==0 which explains how LooseScan passed tests with ref access: they
used one index)
Fixed by setting/using loosescan_key, which always the correct index#.
of mysql-5.6 code line. The bugs could not be reproduced in the latest release
of mariadb-5.3 as they were fixed either when the code of subquery optimization
was back-ported from mysql-6.0 or later when some other bugs were fixed.
of mysql-5.6 code line. The bugs could not be reproduced in the latest release
of mariadb-5.3 as they were fixed either when the code of subquery optimization
was back-ported from mysql-6.0 or later when some other bugs were fixed.
The function subselect_uniquesubquery_engine::copy_ref_key has to take into
account that when EXPLAIN is processed the array of store_key object created
for any TABLE_REF may contain elements for constant items. These items should
be ignored by thefunction.
- equality substitution code was geared towards processing WHERE/ON clauses.
that is, it assumed that it was doing substitions on the code that
= wasn't attached to any particular join_tab yet
= was going to be fed to make_join_select() which would take the condition
apart and attach various parts of it to tables inside/outside semi-joins.
- However, somebody added equality substition for ref access. That is, if
we have a ref access on TBL.key=expr, they would do equality substition in
'expr'. This possibility wasn't accounted for.
- Fixed equality substition code by adding a mode that does equality
substition under assumption that the processed expression will be
attached to a certain particular table TBL.
If the expression for a derived table of a query contained a LIMIT
clause the estimate of the number of rows in this derived table
returned by the EXPLAIN command could be badly off since the
optimizer ignored the limit number from the LIMIT clause when
getting the estimate.
The call of the method SELECT_LEX_UNIT->set_limit added in the code
of mysql_derived_optimize() will be needed also in maria-5.5 where
parameters in the LIMIT clause are supported.
Problem: When building the condition for JOIN::outer_ref_cond the optimizer forgot to take into account
that this condition could depend on constant tables as well.
mysql-test/r/information_schema_all_engines.result:
Update result
mysql-test/t/information_schema_all_engines.test:
Added --sorted-results as tables in information_schema are not sorted.
- Create/use do_copy_nullable_row_to_notnull() function for ref access, which is used
when copying from not-NULL field in table that can be NULL-complemented to not-NULL field.
Completed the fix for this bug.
Note: in 5.3 the affected 'if' statement in Item_in_subselect::single_value_transformer()
starting with the condition (thd->variables.sql_mode & MODE_ONLY_FULL_GROUP_BY)
should be removed altogether. The change from table.cc is not needed either.
This is because in 5.3
- min/max transformation for subqueries are done at the optimization phase
- evaluation of the expensive subqueries is done at the execution phase.
Added an EXPLAIN EXTENDED to the test case for bug #12329653.
from a heap temptable, which uses pointers to records (that is, byte*
pointers) as rowids.
This meant that for rows with the same sort key value, the order
was determined by memory layout.
The MIN/MAX optimizer code from the function opt_sum_query erroneously
did not take into account conjunctive conditions that did not depend on
any table, yet were not identified as constant items. These could be
items containing rand() or PS/SP parameters. These items are supposed
to be evaluated at the execution phase. That's why if such conditions
can be extracted from the WHERE condition the MIN/MAX optimization is
not applied as currently it is always done at the optimization phase.
(In 5.3 expensive subqueries are also evaluated only at the execution
phase. So, if a constant condition with such subquery can be extracted
from the WHERE clause the MIN/MAX optimization should not be applied
in 5.3.)
IF an IN/ALL/SOME predicate with a constant left part is transformed
into an EXISTS subquery the resulting subquery should not be considered
uncacheable if the right part of the predicate is not uncacheable.
Backported the function dbug_print_item() from 5.3. The function is used
only for debugging.
fixed several defects in the greedy optimization:
1) The greedy optimizer calculated the 'compare-cost' (CPU-cost)
for iterating over the partial plan result at each level in
the query plan as 'record_count / (double) TIME_FOR_COMPARE'
This cost was only used locally for 'best' calculation at each
level, and *not* accumulated into the total cost for the query plan.
This fix added the 'CPU-cost' of processing 'current_record_count'
records at each level to 'current_read_time' *before* it is used as
'accumulated cost' argument to recursive
best_extension_by_limited_search() calls. This ensured that the
cost of a huge join-fanout early in the QEP was correctly
reflected in the cost of the final QEP.
To get identical cost for a 'best' optimized query and a
straight_join with the same join order, the same change was also
applied to optimize_straight_join() and get_partial_join_cost()
2) Furthermore to get equal cost for 'best' optimized query and a
straight_join the new code substrcated the same '0.001' in
optimize_straight_join() as it had been already done in
best_extension_by_limited_search()
3) When best_extension_by_limited_search() aggregated the 'best' plan a
plan was 'best' by the check :
'if ((search_depth == 1) || (current_read_time < join->best_read))'
The term '(search_depth == 1' incorrectly caused a new best plan to be
collected whenever the specified 'search_depth' was reached - even if
this partial query plan was more expensive than what we had already
found.
The function st_table::mark_virtual_columns_for_write() did not take into
account the fact that for any table the value of st_table::vfield is 0
when there are no virtual columns in the table definition.
The patch differs from the original MySQL patch as follows:
- All test case differences have been reviewed one by one, and
care has been taken to restore the original plan so that each
test case executes the code path it was designed for.
- A bug was found and fixed in MariaDB 5.3 in
Item_allany_subselect::cleanup().
- ORDER BY is not removed because we are unsure of all effects,
and it would prevent enabling ORDER BY ... LIMIT subqueries.
- ref_pointer_array.m_size is not adjusted because we don't do
array bounds checking, and because it looks risky.
Original comment by Jorgen Loland:
-------------------------------------------------------------
WL#5953 - Optimize away useless subquery clauses
For IN/ALL/ANY/SOME/EXISTS subqueries, the following clauses are
meaningless:
* ORDER BY (since we don't support LIMIT in these subqueries)
* DISTINCT
* GROUP BY if there is no HAVING clause and no aggregate
functions
This WL detects and optimizes away these useless parts of the
query during JOIN::prepare()
- The problem was that const-table-reading code would try to evaluate MATCH()
before init_ftfuncs() was called.
- Fixed by making MATCH function "expensive" so that nobody tries to evaluate it
at optimization phase.
- Correctly handle plan refinement stage for LooseScan plans: run create_ref_for_key() if LooseScan
plan includes a ref access, and if we don't have any fixed key components, switch to a full index scan.
- Let JTBM optimization code handle the case where the subquery is degenerate and doesn't have a
join query plan. Regular materialization would fall back to IN->EXISTS for such cases. Semi-Join
materialization does not have such option, instead we introduce and use "constant JTBM join tabs".
- Do a "more thorough" cleanup of SJ-Materialization join tab in JOIN_TAB::cleanup. The bug
was due to the fact that JOIN_TAB::cleanup() may be called multiple times for the same tab
if the join has grouping.
- Changed storage to be 2 bytes instead of sizeof(size_t) (simple optimization)
- Fixed bug when using query_cache_strip_comments and query that started with '('
- Fixed DBUG_PRINT() that used wrong (not initialized) variables.
mysql-test/mysql-test-run.pl:
Added some space to make output more readable.
mysql-test/r/query_cache.result:
Updated test results
mysql-test/t/query_cache.test:
Added test with query_cache_strip_comments
sql/mysql_priv.h:
Added QUERY_CACHE_DB_LENGTH_SIZE
sql/sql_cache.cc:
Fixed bug when using query_cache_strip_comments and query that started with '('
Store db length in 2 characters instead of size_t.
Get db length from correct position (earlier we had an error when query started with ' ')
Fixed DBUG_PRINT() that used wrong (not initialized) variables.
SMALL KEY CACHE
The server crashed on division by zero because the key cache was not
initialized and the block length was 0 which was used in a division.
The fix was to not allow CACHE INDEX if the key cache was not initiallized.
Thus never try LOAD INDEX INTO CACHE for an uninitialized key cache.
Also added some windows files/directories to .bzrignore.
The range optimizer incorrectly chose a loose scan for group by
when there is a correlated WHERE condition. This range access
method cannot be executed for correlated conditions also with the
"range checked for each record" because generally the range access
method can change for each outer record. Loose scan destructively
changes the query plan and removes the GROUP operation, which will
result in wrong query plans if another range access is chosen
dynamically.
If the duplicate elimination strategy is used for a semi-join and potentially
one of the block-based join algorithms can be employed to join the inner
tables of the semi-join then sorting of the head (first non-constant) table
for a query with ORDER BY / GROUP BY cannot be used.
The function setup_sj_materialization_part1() forgot to set the value
of TABLE::map for any materialized IN subquery.
This could lead to wrong results for queries with subqueries that were
converted to queries with semijoins.
Analysis:
The class member QUICK_GROUP_MIN_MAX_SELECT::seen_first_key
was not reset between subquery re-executions. Thus each
subsequent execution continued from the group that was
reached by the previous subquery execution. As a result
loose scan reached end of file much earlier, and returned
empty result where it shouldn't.
Solution:
Reset seen_first_key before each re-execution of the
loose scan.
The execution plan cannot use sorting on the first table from the
sequence of the joined tables if it plans to employ the block-based
hash join algorithm.
- Part 1 of the fix: for semi-join merged subqueries, calling child_join->optimize() until we're done with all
PS-lifetime optimizations in the parent.
- Make create_tmp_table() set KEY_PART_INFO attributes for the keys it creates.
This wasn't needed before but is needed now, when temp. tables that are
results of SJ-Materialization are being used for joins.
This particular bug depended on HA_VAR_LENGTH_PART being set,
but also added code to set HA_BLOB_PART and HA_NULL_PART when appropriate.
KEYUSE elements for a possible hash join key are not sorted by field
numbers of the second table T of the hash join operation. Besides
some of these KEYUSE elements cannot be used to build any key as their
key expressions depend on the tables that are planned to be accessed
after the table T.
The code before the patch did not take this into account and, as a result,
execition of a query the employing block-based hash join algorithm could
cause a crash or return a wrong result set.
If has been decided that the first match strategy is to be used to join table T
from a semi-join nest while no buffer can be employed to join this table
then no join buffer can be used to join any table in the join sequence between
the first one belonging to the semi-join nest and table T.
Fixed feedback_plugin_send to not generate a random number of lines.
mysql-test/t/feedback_plugin_send.test:
Don't print more than 4 lines (sometimes there are 6 feedback lines in the log...)
mysql-test/valgrind.supp:
Added suppression for failure on work
support-files/compiler_warnings.supp:
Suppress warning from xtradb
2. dialog plugin now always returns mysql->password if non-empty and the first question is of password type
3. split get_tty_password into get_tty_password_buff and strdup.
4. dialog plugin now uses get_tty_password by default
5. dialog.test
6. moved small tests of individual plugins into a dedicated suite
After sending packet that is too large, clienrt can get either an error packet with
ER_NET_PACKET_TOO_LARGE, or a socket error. Both cases are valid, since the
server does not ensure reply was fully read by client, before shutting down and closing
the socket.
The tables from the same semi-join or outer join nest cannot use
join buffers if in the join sequence of the query execution plan
they are separated by a table that is planned to be joined without
usage of a join buffer.