If the duplicate elimination strategy is used for a semi-join and potentially
one of the block-based join algorithms can be employed to join the inner
tables of the semi-join then sorting of the head (first non-constant) table
for a query with ORDER BY / GROUP BY cannot be used.
The execution plan cannot use sorting on the first table from the
sequence of the joined tables if it plans to employ the block-based
hash join algorithm.
KEYUSE elements for a possible hash join key are not sorted by field
numbers of the second table T of the hash join operation. Besides
some of these KEYUSE elements cannot be used to build any key as their
key expressions depend on the tables that are planned to be accessed
after the table T.
The code before the patch did not take this into account and, as a result,
execition of a query the employing block-based hash join algorithm could
cause a crash or return a wrong result set.
The function setup_semijoin_dups_elimination erroneously assumed
that if join_cache_level is set to 3 or 4 then the type of the
access to a table cannot be JT_REF or JT_EQ_REF. This could lead
to wrong query result sets.
of the 5.3 code line after a merge with 5.2 on 2010-10-28
in order not to allow the cost to access a joined table to be equal
to 0 ever.
Expanded data sets for many test cases to get the same execution plans
as before.
- Set the default
- Adjust the testcases so that 'new' tests are run with optimizations turned on.
- Pull out relevant tests from "irrelevant" tests and run them with optimizations on.
- Run range.test and innodb.test with both mrr=on and mrr=off
semijoin=on,firstmatch=on,loosescan=on
to
semijoin=off,firstmatch=off,loosescan=off
Adjust the testcases:
- Modify subselect*.test and join_cache.test so that all tests
use the same execution paths as before (i.e. optimizations that
are being tested are enabled)
- Let all other test files run with the new default settings (i.e.
with new optimizations disabled)
- Copy subquery testcases from these files into t/subselect_extra.test
which will run them with new optimizations enabled.
This crashing bug could manifest itself at execution of join queries
over materialized derived tables with IN subquery predicates in the
where clause. If for such a query the optimizer chose to use duplicate
weed-out with duplicates in a materialized derived table and chose to
employ join cache the the execution could cause a crash of the server.
It happened because the JOIN_CACHE::init method assumed that the value
of TABLE::file::ref is set at the moment when the method was called
for the employed join cache. It's true for regular tables, but it's
not true for materialized derived tables that are filled now at the
first access to them, i.e. after the JOIN_CACHE::init has done its job.
To fix this problem for any ROWID field of materialized derived table
the procedure that copies fields from record buffers into the employed
join buffer first checks whether the value of TABLE::file::ref has
been set for the table, and if it's not so the procedure sets this value.
The bug in the function print_keyuse() caused crashes if
hash join could be used. It happened because the function
ignored the fact that KEYUSE structures could be created
for hash joins as well.
even in the cases when there existed range/index-merge scans that
were cheaper than the full table scan.
This was a defect/bug of the implementation of mwl #128.
Now hash join can work not only with full table scan of the joined
table, but also with full index scan, range and index-merge scans.
Accordingly, in the cases when hash join is used the column 'type'
in the EXPLAINs can contain now 'hash_ALL', 'hash_index', 'hash_range'
and 'hash_index_merge'. If hash join is coupled with a range/index_merge
scan then the columns 'key' and 'key_len' contain info not only on
the used hash index, but also on the indexes used for the scan.
This bug could manifest itself when hash join over a varchar column
with NULL values in some rows was used. It happened because the
function key_buf_cmp erroneously returned FALSE when one of the joined
key fields was null while the second was not.
Also fixed two other bugs in the functions key_hashnr and key_buf_cmp
that could possibly lead to wrong results for some queries that
used hash join over several columns with nulls.
Also reverted the latest addition of the test case for bug #45092. It
had been already backported earlier.
One of the hash functions employed by the BNLH join algorithm
calculates the the value of hash index for key value utilizing
every byte of the key buffer. To make this calculation valid
one has to ensure that for any key value unused bytes of the
buffer are filled with with a certain filler. We choose 0 as
a filler for these bytes.
Added an optional boolean parameter with_zerofill to the function
key_copy. If the value of the parameter is TRUE all unused bytes
of the key buffer is filled with 0.
In some cases the function make_cond_for_index() was mistaken
when detecting index only pushdown conditions for a table:
a pushdown condition that was not index only could be marked
as such.
It happened because the procedure erroneously used the markers
for index only conditions that remained from the calls of
this function that extracted the index conditions for other
tables.
Fixed by erasing index only markers as soon as they are need
anymore.
The bug happened when BKA join algorithm used an incremental buffer
and some of the fields over which access keys were constructed
- were allocated in the previous join buffers
- were non-nullable
- belonged to inner tables of outer joins.
For such fields an offset to the field value in the record is saved
in the postfix of the record, and a zero offset indicates that the value
is null. Before the key using the field value is constructed the
value is read into the corresponding field of the record buffer and
the null bit is set for the field if the offset is 0. However if
the field is non-nullable the table->null_row must be set to 1
for null values and to 0 for non-null values to ensure proper reading
of the value from the record buffer.
The condition that was supposed to check whether a join table
is an inner table of a nested outer join or semi-join was not
quite correct in the code of the function check_join_cache_usage.
That's why some queries with nested outer joins triggered
an assertion failure.
Encapsulated this condition in the new method called
JOIN_TAB::is_nested_inner and provided a proper code for it.
Also corrected a bug in the code of check_join_cache_usage()
that caused a downgrade of not first join buffers of the
level 5 and 7 to level 4 and 6 correspondingly.
When pushing the condition for a table in the function
JOIN_TAB::make_scan_filter the optimizer must not push
conditions from WHERE if the table is some inner table
of an outer join..
The condition over outer tables extracted from the on expression
for a outer join must be ANDed to the condition pushed to the
first inner table of this outer join only.
Nested outer joins cannot use flat join buffers. So if join_cache_level
is set to 1 then any join algorithm employing join buffers cannot be used
for nested outer joins.
The patch that introduced the new enumeration type Match_flag
for the values of match flags in the records put into join buffers
missed the necessary modifications in JOIN_CACHE::set_match_flag_if_none.
This could cause wrong results for outer joins with on expressions
only over outer tables.
A non-incremental join buffer cannot be used for inner tables of nested
outer joins. That's why when join_cache_level is set to 7 it must
be downgraded to level 6 for the inner tables of nested outer joins.
For the same reason with join_cache_level set to 3 no join buffer is
used for the inner tables of outer joins (we could downgrade it to
level 2, but this level does not support ref access).
Miscalculation of the minimum possible buffer size could trigger
an assert in JOIN_CACHE_HASHED::put_record when if join_buffer_size
was set to the values that is less than the length of one record to
stored in the join buffer.
It happened due to the following mistakes:
- underestimation of space needed for a key in the hash table
(we have to take into account that hash table can have more
buckets than the expected number of records).
- the value of maximum total length of all records stored in
the join buffer was not saved in the field max_used_fieldlength
by the function calc_used_field_length.
Currently BNLH join uses a simplified implementation of hash function
when hash function is calculated over the whole key buffer, not only
the significant bytes of it. It means that both building keys and
probing keys both must fill insignificant bytes with the same filler.
Usually 0 is used as such a filler.
Yet the code before patch filled insignificant bytes only for probing
keys.
When probing into the hash table of a hashed join cache is performed
the key value should not constructed in the buffer used to build keys
in the hash tables. The constant parts of these keys copied only once,
so they should not be ever overwritten. Otherwise wrong results
can be produced by queries that employ hashed join buffers.
Prohibited to use hash join algorithm BNLH if join attributes
need non-binary collations. It has to be done because BNLH does
not support join for such attributes yet.
Later this limitations will be lifted.
Changed default collations for the schemes of some test cases
to preserve the old execution plans.
When join buffers are employed no index scan for the first
table with grouping columns can be used.
mysql-test/r/join_cache.result:
Added a test case for bug #664508.
Sorted results for some other test cases.
mysql-test/t/join_cache.test:
Added a test case for bug #664508.
Sorted results for some other test cases.
When adding a new record into the join buffer that is employed by
BNLH join algorithm the writing procedure JOIN_CACHE::write_record_data
checks whether there is enough space for the record in the buffer.
When doing this it must take into account a possible new key entry
added to the buffer. It might happen, as it has been demonstrated by
the bug test case, that there is enough remaining space in the buffer
for the record, but not for the additional key entry for this record.
In this case the key entry overwrites the end of the record that might
cause a crash or wrong results.
Fixed by taking into account a possible addition of new key entry when
estimating the remaining free space in the buffer.
The fix aligns join_null_complements() with join_matching_records()
making both call generate_full_extensions().
There should not be any difference between how the WHERE clause
is applied to NULL-complemented records from a partial join and how
it is applied to other partially joined records:the latter happens in
join_matching_records(), precisely in generate_full_extensions().