statements
Currently the optimizer evaluates loose index scan only for top-level SELECT
statements
Extend loose index scan applicability by :
- Test the applicability of loose scan for each sub-select, instead of the
whole query. This change enables loose index scan for sub-queries.
- allow non-select statements with SELECT parts (like, e.g.
CREATE TABLE .. SELECT ...) to use loose index scan.
When using index for group by and range access the server isolates
a set of ranges based on the conditions over the key parts of the
index used. Then it uses only the ranges over the GROUP BY fields to
jump from one group to another. Since the GROUP BY fields may form a
prefix over the index, we may use only a prefix of the ranges produced
by the range optimizer.
Each range contains a notion on whether it includes its border values.
The problem is that when using a range prefix, the last range is open
because it assumes that there is a range on the next keypart. Thus when
we use a prefix range as it is, it excludes all border values.
The solution is when ignoring the suffix of the range conditions
(to jump over the GROUP BY prefix only) the server must change the
remaining intervals so they always contain their borders, e.g.
if the whole range was :
(1,-inf) <= (<group_by_col>,<min_max_arg_col>) < (1, 3) we must make
(1) <= (<group_by_col>) <= (1) because (a,b) < (c1,c2) means :
a < c1 OR (a = c1 AND b < c2).
The problem was that store_top_level_join_columns() incorrectly assumed
that the left/right neighbor of a nested join table reference can be only
at the same level in the join tree.
The fix checks if the current nested join table reference has no immediate
left/right neighbor, and if so chooses the left/right neighbors of the
nearest upper level, where these references are != NULL.
When converting DISTINCT to GROUP BY where the columns are from the covering
index and they are quoted twice in the SELECT list the optimizer is creating
improper processing sequence. This is because of the fact that the columns
of the covering index are not recognized as such and treated as non-index
columns.
Generally speaking duplicate columns can safely be removed from the GROUP
BY/DISTINCT list because this will not add or remove new rows in the
resulting set. Duplicates can be removed even if they are not consecutive
(as is the case for ORDER BY, where the duplicate columns can be removed
only if they are consecutive).
So we can safely transform "SELECT DISTINCT a,a FROM ... ORDER BY a" to
"SELECT a,a FROM ... GROUP BY a ORDER BY a" instead of
"SELECT a,a FROM .. GROUP BY a,a ORDER BY a". We can even transform
"SELECT DISTINCT a,b,a FROM ... ORDER BY a,b" to
"SELECT a,b,a FROM ... GROUP BY a,b ORDER BY a,b".
The fix to this bug consists of checking for duplicate columns in the SELECT
list when constructing the GROUP BY list in transforming DISTINCT to GROUP
BY and skipping the ones that are already in.
The bug was due to a missed case in the detection of whether an index
can be used for loose scan. More precisely, the range optimizer chose
to use loose index scan for queries for which the condition(s) over
an index key part could not be pushed to the index together with the
loose scan.
As a result, loose index scan was selecting the first row in the
index with a given GROUP BY prefix, and was applying the WHERE
clause after that, while it should have inspected all rows with
the given prefix, and apply the WHERE clause to all of them.
The fix detects and skips such cases.
If check_quick_select returns non-empty range then the function cost_group_min_max
cannot return 0 as an estimate of the number of retrieved records.
Yet the function erroneously returned 0 as the estimate in some situations.
The cause of the bug was the use of end_write_group instead of end_write
in the case when ORDER BY required a temporary table, which didn't take
into account the fact that loose index scan already computes the result
of MIN/MAX aggregate functions (and performs grouping).
The solution is to call end_write instead of end_write_group and to add
the MIN/MAX functions to the list of regular functions so that their
values are inserted into the temporary table.
Loose index scan using only second part of multipart index was choosen, which
results in creating wrong keys and endless loop.
get_best_group_min_max() now allows loose index scan for distinct only if used
keyparts forms a prefix of the index.
The problem was that when there was no MIN or MAX function, after finding the
group prefix based on the DISTINCT or GROUP BY attributes we did not search further
for a key in the group that satisfies the equi-join conditions on attributes that
follow the group attributes. Thus we ended up with the wrong rows, and subsequent
calls to select_cond->val_int() in evaluate_join_record() were filtering those
rows. Hence - the query result set was empty.
The problem occured both for GROUP BY queries without MIN/MAX and for queries
with DISTINCT (which were internally executed as GROUP BY queries).
"the server side preparedStatement error for LIMIT placeholder",
which moves all uses of LIMIT clause from PREPARE to OPTIMIZE
and later steps.
After-review fixes.
myisam_max_extra_sort_file_size is depricated
Ensure that myisam_data_pointer_size is honoured when creating new MyISAM files
Changed default value of myisam_data_pointer_size from 4 to 6 to get rid of 'table-is-full' errors
The problem was in that the code that analyses the applicability of the
QUICK_GROUP_MIN_MAX access method for DISTINC queries assumed that there
are no duplicate column references in the DISTINCT clause, and it added
non-exiting key parts for the duplicate column references.
The solution adds a test to check whether the select list already contained
a field with the same name. If such field was already present, then it was
already decided to use its key part for index access. In this such case we
must skip the duplicate field instead of counting it as a new field.
Split TABLE to TABLE and TABLE_SHARE (TABLE_SHARE is still allocated as part of table, will be fixed soon)
Created Field::make_field() and made Field_num::make_field() to call this
Added 'TABLE_SHARE->db' that points to database name; Changed all usage of table_cache_key as database name to use this instead
Changed field->table_name to point to pointer to alias. This allows us to change alias for a table by just updating one pointer.
Renamed TABLE_SHARE->real_name to table_name
Renamed TABLE->table_name to alias
Renamed TABLE_LIST->real_name to table_name