- convert_subq_to_jtbm() didn't check that subuqery optimization was successful. If it wasn't (in this
example because of @@max_join_size violation), it would proceed further and eventually crash when
trying to execute the un-optimized subquery.
- The problem was that JOIN::save/restore_query_plan() did not save/restore parts of
the query plan that are located inside SJ_MATERIALIZATION_INFO structures. This could
cause parts of one plan to be used with another, which led get_best_combination() to
constructing non-sensical join plans (and crash).
Fixed by saving/restoring SJM parts of the query plans.
- check_and_do_in_subquery_rewrites() will not set SUBS_MATERIALIZATION flag when it
records that the subquery predicate is to be converted into semi-join.
If convert_join_subqueries_to_semijoins() later decides not to convert to semi-join,
let it set SUBS_MATERIALIZATION flag, if appropriate.
- If convert_join_subqueries_to_semijoins() decides to wrap Item_in_subselect in Item_in_optimizer,
it should do so in prep_on_expr/prep_where, too, as long as they are present.
There seems to be two possibilities of how we arrive in this function:
- prep_on_expr/prep_where==NULL, and will be set later by simplify_joins()
- prep_on_expr/prep_where!=NULL, and it is a copy_and_or_structure()-made copy of on_expr/where.
the latter can happen for some (but not all!) nested joins. This bug was that we didn't handle this case.
- Make subquery_types_allow_materialization() detect a case where
create_tmp_table() would create a blob column which would make it
impossible to use materialization
Non-semi-join materialization worked because it detected that this case
and felt back to use IN->EXISTS. Semi-join Materialization cannot easily
fallback, so we have to detect this case early.
- setup_sj_materialization() code failed to take into account that it can be that
the first [in join order ordering] table inside semi-join-materialization nest
is also an inner table wrt an outer join (that is embedded in the semi-join).
This can happen when all of the tables that are inside the semi-join but not inside
the outer join are constant.
- Made a trivial to not assume that table's embedding join nest is the semi-join
nest: instead, walk up the outer join nests until we reach the semi-join nest.
Analysis:
In the test query semi-join merges the inner-most subquery
into the outer subquery, and the optimization of the merged
subquery finds some new index access methods. Later the
IN-EXISTS transformation is applied to the unmerged subquery.
Since the optimizer is instructed to not consider
materialization, it reoptimizes the plan in-place to take into
account the new IN-EXISTS conditions. Just before reoptimization
JOIN::choose_subquery_plan resets the query plan, which also
resets the access methods found during the semi-join merge.
Then reoptimization discovers there are no new access methods,
but it leaves the query plan in its reset state. Later semi-join
crashes because it assumes these access methods are present.
Solution:
When reoptimizing in-place, reset the query plan only after new
access methods were discovered. If no new access methods were
discovered, leave the current plan as it was.
- The problem was that the code that made the check whether the subquery is an AND-part of the WHERE
clause didn't work correctly for nested subqueries. In particular, grand-child subquery in HAVING was
treated as if it was in the WHERE, which eventually caused an assert when replace_where_subcondition
looked for the subquery predicate in the WHERE and couldn't find it there.
- The fix: Removed implementation of "thd_marker approach". thd->thd_marker was used to determine the
location of subquery predicate: setup_conds() would set accordingly it when making the
{where|on_expr}->fix_fields(...)
call so that AND-parts of the WHERE/ON clauses can determine they are the AND-parts.
Item_cond_or::fix_fields(), Item_func::fix_fields(), Item_subselect::fix_fields (this one was missed),
and all other items-that-contain-items had to reset thd->thd_marker before calling fix_fields() for
their children items, so that the children can see they are not AND-parts of WHERE/ON.
- The "thd_marker approach" required that a lot of code in different locations maintains correct value of
thd->thd_marker, so it was replaced with:
- The new approach with mark_as_condition_AND_part does not keep context in thd->thd_marker. Instead,
setup_conds() now calls
{where|on_expr}->mark_as_condition_AND_part()
and implementations of that function make sure that:
- parts of AND-expressions get the mark_as_condition_AND_part() call
- Item_in_subselect objects record that they are AND-parts of WHERE/ON
Analysis:
Both the wrong result and the valgrind warning were a result
of incomplete cleanup of the MIN/MAX subquery rewrite. At the
first execution of the query, the non-aggregate subquery is
transformed into an aggregate MIN/MAX subquery. During the
fix_fields phase of the MIN/MAX function, it sets the property
st_select_lex::with_sum_func to true.
The second execution of the query finds this flag to be ON.
When optimization reaches the same MIN/MAX subquery
transformation, it tests if the subquery is an aggregate or not.
Since select_lex->with_sum_func == true from the previous
execution, the transformation executes the second branch that
handles aggregate subqueries. This substitutes the subquery
Item into a Item_maxmin_subselect. At the same time elsewhere
it is assumed that the subquery Item is of type
Item_allany_subselect. Ultimately this results in casting the
actual object to the wrong class, and calling the wrong
any_value() method from empty_underlying_subquery().
Solution:
Cleanup the st_select_lex::with_sum_func property in the case
when the MIN/MAX transformation was performed for a non-aggregate
subquery, so that the transformation can be repeated.
Also:
1. simplified the code of the function mysql_derived_merge_for_insert.
2. moved merge of views/dt for multi-update/delete to the prepare stage.
3. the list of the references to the candidates for semi-join now is
allocated in the statement memory.
(This is not a real fix for this bug, even though it makes it to no longer repeat)
- Semi-join subquery predicates, i.e. ... WHERE outer_expr IN (SELECT ...) may have null-rejecting properties,
may allow to convert outer joins into inner.
- When convert_subq_to_sj() injected IN-equality into parent's WHERE/ON clause, it didn't call
$new_cond->top_level_item(), which would cause null-rejecting properties to be lost.
- Fixed, now the mentioned outer-to-inner conversion will really take place.
The cause of the crash is sj_nest->sj_subq_pred->unit->first_select()->item_list
contains "stale" items for the second execution. By "stale" I mean that they have
item->fixed==FALSE, and they are Item_field object instead of Item_direct_view_ref.
The solution is to use sj_nest->sj_subq_pred->unit->first_select()->ref_pointer_array.
Surprisingly, that array contains items that are ok.
Oracle team has introduced and is using NESTED_JOIN::sj_inner_exprs, but we go without that
and always copy the ref_pointer_array.
- JOIN::prepare would have set JOIN::table_count to incorrect value (bad merge of MWL 106)
- optimize_keyuse() would use table-bit as table number
(the change in optimize_keyuse is also the reason for query plan changes. Not
expected to have much effect because only handles cases of no index statistics)
- st_select_lex::register_dependency_item() ignored the fact that some of the
selects on the dependency paths could have been merged to their parents (because they
were mergeable VIEWs)
- Undo the incorrect fix in Item_subselect::recalc_used_tables(): do not call
fix_after_pullout() for Item_subselect::Ref_to_outside members.
- Update test results
- Fix a problem with PS:
= convert_subq_to_sj() should not save where to prep_where or on_expr to prep_on_expr.
= After an unmerged subquery predicate has been pulled, it should call fix_after_pullout() for
outer_refs.
Analysis:
The failed assert ensured that the choice of subquery strategy
is performed only for queries with at least one table. If there
is a LIMIT 0 clause all tables are removed, and the subquery is
neither optimized, nor executed during actual optimization. However,
if the query is EXPLAIN-ed, the EXPLAIN execution path doesn't remove
the query tables if there is a LIMIT 0 clause. As a result, the
subquery optimization code is called, which violates the ASSERT
condition.
Solution:
Transform the assert into a condition, and if the outer query
has no tables assume that there will be at most one subquery
execution.
There is potentially a better solution by reengineering the
EXPLAIN/optimize code, so that subquery optimization is not
done if not needed. Such a solution would be a lot bigger and
more complex than a bug fix.
- Added regression test with queries over the WORLD database.
- Discovered and fixed several bugs in the related cost calculation
functionality both in the semijoin and non-semijon subquery code.
- Added DBUG printing of the cost variables used to decide between
IN-EXISTS and MATERIALIZATION.
The code that added semi-join transformations missed checking
the state of the fixed flag for the items built with the
and_items function before calls of the fix_fields method.
This could lead to an abort failure when the first argument
of and_items() happened to be NULL.
- Don't attempt to construct FirstMatch access method if we've
just figured three lines above that it can't be used (because join
prefix doesn't have the needed tables), and so have set
pos->first_firstmatch_table= MAX_TABLES
Attempts to analyze join->positions[MAX_TABLES] caused valgrind warnings
way of processing prepared statements:
- conversion subquery_predicate -> TABLE_LIST is once per-statement
- However, the code must take into account that materialized temptable
is dropped and re-created on each execution (the tricky part is that
at start of n-th EXECUTE we have TABLE_LIST object but not its TABLE object)
- IN-equality is injected into WHERE on every execution.
A lot of small fixes and new test cases.
client/mysqlbinlog.cc:
Cast removed
client/mysqltest.cc:
Added missing DBUG_RETURN
include/my_pthread.h:
set_timespec_time_nsec() now only takes one argument
mysql-test/t/date_formats.test:
Remove --disable_ps_protocl as now also ps supports microseconds
mysys/my_uuid.c:
Changed to use my_interval_timer() instead of my_getsystime()
mysys/waiting_threads.c:
Changed to use my_hrtime()
sql/field.h:
Added bool special_const_compare() for fields that may convert values before compare (like year)
sql/field_conv.cc:
Added test to get optimal copying of identical temporal values.
sql/item.cc:
Return that item_int is equal if it's positive, even if unsigned flag is different.
Fixed Item_cache_str::save_in_field() to have identical null check as other similar functions
Added proper NULL check to Item_cache_int::save_in_field()
sql/item_cmpfunc.cc:
Don't call convert_constant_item() if there is nothing that is worth converting.
Simplified test when years should be converted
sql/item_sum.cc:
Mark cache values in Item_sum_hybrid as not constants to ensure they are not replaced by other cache values in compare_datetime()
sql/item_timefunc.cc:
Changed sec_to_time() to take a my_decimal argument to ensure we don't loose any sub seconds.
Added Item_temporal_func::get_time() (This simplifies some things)
sql/mysql_priv.h:
Added Lazy_string_decimal()
sql/mysqld.cc:
Added my_decimal constants max_seconds_for_time_type, time_second_part_factor
sql/table.cc:
Changed expr_arena to be of type CONVENTIONAL_EXECUTION to ensure that we don't loose any items that are created by fix_fields()
sql/tztime.cc:
TIME_to_gmt_sec() now sets *in_dst_time_gap in case of errors
This is needed to be able to detect if timestamp is 0
storage/maria/lockman.c:
Changed from my_getsystime() to set_timespec_time_nsec()
storage/maria/ma_loghandler.c:
Changed from my_getsystime() to my_hrtime()
storage/maria/ma_recovery.c:
Changed from my_getsystime() to mmicrosecond_interval_timer()
storage/maria/unittest/trnman-t.c:
Changed from my_getsystime() to mmicrosecond_interval_timer()
storage/xtradb/handler/ha_innodb.cc:
Added support for new time,datetime and timestamp
unittest/mysys/thr_template.c:
my_getsystime() -> my_interval_timer()
unittest/mysys/waiting_threads-t.c:
my_getsystime() -> my_interval_timer()