The problem was due to a prior fix for BUG 9676, which limited
the rows stored in a temporary table to the LIMIT clause. This
optimization is not applicable to non-group queries with aggregate
functions. The fix disables the optimization in this case.
GROUP BY/DISTINCT pruning optimization must be done before ORDER BY
optimization because ORDER BY may be removed when GROUP BY/DISTINCT
sorts as a side effect, e.g. in
SELECT DISTINCT <non-key-col>,<pk> FROM t1
ORDER BY <non-key-col> DISTINCT
must be removed before ORDER BY as if done the other way around
it will remove both.
When processing aggregate functions all tables values are reset
to NULLs at the end of each group.
When doing that if there are no rows found for a group
the const tables must not be reset as they are not recalculated
by do_select()/sub_select() for each group.
Too many cursors (more than 1024) could lead to memory corruption.
This affects both, stored routines and C API cursors, and the
threshold is per-server, not per-connection. Similarly, the
corruption could happen when the server was under heavy load
(executing more than 1024 simultaneous complex queries), and this is
the reason why this bug is fixed in 4.1, which doesn't support
cursors.
The corruption was caused by a bug in the temporary tables code, when
an attempt to create a table could lead to a write beyond allocated
space. Note, that only internal tables were affected (the tables
created internally by the server to resolve the query), not tables
created with CREATE TEMPORARY TABLE. Another pre-condition for the
bug is TRUE value of --temp-pool startup option, which, however, is a
default.
The cause of a bug was that random memory was overwritten in
bitmap_set_next() due to out-of-bound memory access.
When optimizing conditions like 'a = <some_val> OR a IS NULL' so that they're
united into a single condition on the key and checked together the server must
check which value is the NULL value in a correct way : not only using ->is_null
but also check if the expression doesn't depend on any tables referenced in the
current statement.
This additional check must be performed because that optimization takes place
before the actual execution of the statement, so if the field was initialized
to NULL from a previous statement the optimization would be applied incorrectly.
The problem was in that opt_sum_query() replaced MIN/MAX functions
with the corresponding constant found in a key, but due to imprecise
representation of float numbers, when evaluating the where clause,
this comparison failed.
When MIN/MAX optimization detects that all tables can be removed,
also remove all conjuncts in a where clause that refer to these
tables. As a result of this fix, these conditions are not evaluated
twice, and in the case of float number comparisons we do not discard
result rows due to imprecise float representation.
As a side-effect this fix also corrects an unnoticed problem in
bug 12882.
* don't use join cache when the incoming data set is already ordered
for ORDER BY
This choice must be made because join cache will effectively
reverse the join order and the results will be sorted by the index
of the table that uses join cache.
may return a wrong result.
An Item_sum_hybrid object has the was_values flag which indicates whether any
values were added to the sum function. By default it is set to true and reset
to false on any no_rows_in_result() call. This method is called only in
return_zero_rows() function. An ALL/ANY subquery can be optimized by MIN/MAX
optimization. The was_values flag is used to indicate whether the subquery
has returned at least one row. This bug occurs because return_zero_rows() is
called only when we know that the select will return zero rows before
starting any scans but often such information is not known.
In the reported case the return_zero_rows() function is not called and
the was_values flag is not reset to false and yet the subquery return no rows
Item_func_not_all and Item_func_nop_all functions return a wrong
comparison result.
The end_send_group() function now calls no_rows_in_result() for each item
in the fields_list if there is no rows were found for the (sub)query.
To make MySQL compatible with some ODBC applications, you can find
the AUTO_INCREMENT value for the last inserted row with the following query:
SELECT * FROM tbl_name WHERE auto_col IS NULL.
This is done with a special code that replaces 'auto_col IS NULL' with
'auto_col = LAST_INSERT_ID'.
However this also resets the LAST_INSERT_ID to 0 as it uses it for a flag
so as to ensure that only the first SELECT ... WHERE auto_col IS NULL
after an INSERT has this special behaviour.
In order to avoid resetting the LAST_INSERT_ID a special flag is introduced
in the THD class. This flag is used to restrict the second and subsequent
SELECTs instead of LAST_INSERT_ID.
'SELECT DISTINCT a,b FROM t1' should not use temp table if there is unique
index (or primary key) on a.
There are a number of other similar cases that can be calculated without the
use of a temp table : multi-part unique indexes, primary keys or using GROUP BY
instead of DISTINCT.
When a GROUP BY/DISTINCT clause contains all key parts of a unique
index, then it is guaranteed that the fields of the clause will be
unique, therefore we can optimize away GROUP BY/DISTINCT altogether.
This optimization has two effects:
* there is no need to create a temporary table to compute the
GROUP/DISTINCT operation (or the temporary table will be smaller if only GROUP
is removed and DISTINCT stays or if DISTINCT is removed and GROUP BY stays)
* this causes the statement in effect to become updatable in Connector/Java
because the result set columns will be direct reference to the primary key of
the table (instead to the temporary table that it currently references).
Implemented a check that will optimize away GROUP BY/DISTINCT for queries like
the above.
Currently it will work only for single non-constant table in the FROM clause.
tables
Currently in INSERT ... SELECT ... LIMIT ... the compiler uses a
temporary table to store the results of SELECT ... LIMIT .. and then
uses that table as a source for INSERT. The problem is that in some cases
it actually skips the LIMIT clause in doing that and materializes the
whole SELECT result set regardless of the LIMIT.
This fix is limiting the process of filling up the temp table with only
that much rows that will be actually used by propagating the LIMIT value.
The bug report revealed two problems related to min/max optimization:
1. If the length of a constant key used in a SARGable condition for
for the MIN/MAX fields is greater than the length of the field an
unwanted warning on key truncation is issued;
2. If MIN/MAX optimization is applied to a partial index, like INDEX(b(4))
than can lead to returning a wrong result set.
A query with a group by and having clauses could return a wrong
result set if the having condition contained a constant conjunct
evaluated to FALSE.
It happened because the pushdown condition for table with
grouping columns lost its constant conjuncts.
Pushdown conditions are always built by the function make_cond_for_table
that ignores constant conjuncts. This is apparently not correct when
constant false conjuncts are present.
The bug was as follows: When merge_key_fields() encounters "t.key=X OR t.key=Y" it will
try to join them into ref_or_null access via "t.key=X OR NULL". In order to make this
inference it checks if Y<=>NULL, ignoring the fact that value of Y may be not yet known.
The fix is that the check if Y<=>NULL is made only if value of Y is known (i.e. it is a
constant).
TODO: When merging to 5.0, replace used_tables() with const_item() everywhere in merge_key_fields().
The bug caused wrong result sets for union constructs of the form
(SELECT ... ORDER BY order_list1 [LIMIT n]) ORDER BY order_list2.
For such queries order lists were concatenated and limit clause was
completely neglected.
Fixing part2 of this problem: AND didn't work well
with utf8_czech_ci and utf8_lithianian_ci in some cases.
The problem was because when during condition optimization
field was replaced with a constant, the constant's collation
and collation derivation was used later for comparison instead
of the field collation and derivation, which led to non-equal
new condition in some cases.
This patch copies collation and derivation from the field being removed
to the new constant, which makes comparison work using the same collation
with the one which would be used if no condition optimization were done.
In other words:
where s1 < 'K' and s1 = 'Y';
was rewritten to:
where 'Y' < 'K' and s1 = 'Y';
Now it's rewritten to:
where 'Y' collate collation_of_s1 < 'K' and s1 = 'Y'
(using derivation of s1)
Note, the first problem of this bug (with latin1_german2_ci) was fixed
earlier in 5.0 tree, in a separate changeset.
used
In a simple queries a result of the GROUP_CONCAT() function was always of
varchar type.
But if length of GROUP_CONCAT() result is greater than 512 chars and temporary
table is used during select then the result is converted to blob, due to
policy to not to store fields longer than 512 chars in tmp table as varchar
fields.
In order to provide consistent behaviour, result of GROUP_CONCAT() now
will always be converted to blob if it is longer than 512 chars.
Item_func_group_concat::field_type() is modified accordingly.
The GROUP_CONCAT uses its own temporary table. When ROLLUP is present
it creates the second copy of Item_func_group_concat. This copy receives the
same list of arguments that original group_concat does. When the copy is
set up the result_fields of functions from the argument list are reset to the
temporary table of this copy.
As a result of this action data from functions flow directly to the ROLLUP copy
and the original group_concat functions shows wrong result.
Since queries with COUNT(DISTINCT ...) use temporary tables to store
the results the COUNT function they are also affected by this bug.
The idea of the fix is to copy content of the result_field for the function
under GROUP_CONCAT/COUNT from the first temporary table to the second one,
rather than setting result_field to point to the second temporary table.
To achieve this goal force_copy_fields flag is added to Item_func_group_concat
and Item_sum_count_distinct classes. This flag is initialized to 0 and set to 1
into the make_unique() member function of both classes.
To the TMP_TABLE_PARAM structure is modified to include the similar flag as
well.
The create_tmp_table() function passes that flag to create_tmp_field().
When the flag is set the create_tmp_field() function will set result_field
as a source field and will not reset that result field to newly created
field for Item_func_result_field and its descendants. Due to this there
will be created copy func to copy data from old result_field to newly
created field.
Remove wrong fix for Bug#14397 - OPTIMIZE TABLE with an open HANDLER causes a crash
Safety fix for bug #13855 "select distinct with group by caused server crash"
Initialized usable_keys from table->keys_in_use instead of ~0
in test_if_skip_sort_order(). It was possible that a disabled
index was used for sorting.
Procedure analyse() redefines select's fields_list. setup_copy_fields() assumes
that fields_list is a part of all_fields_list. Because select have only
3 columns and analyse() redefines it to have 10 columns, int overrun in
setup_copy_fields() occurs and server goes to almost infinite loop.
Because fields_list used not only to send data ad fields types, it's wrong
to allow procedure redefine it. This patch separates select's fileds_list
and procedure's one. Now if procedure is present, copy of fields_list is
created in procedure_fields_list and it is used for sending data and fields.
Date field was declared as not null, thus expression 'datefield is null'
was always false. For SELECT special handling of such cases is used.
There 'datefield is null' converted to 'datefield eq "0000-00-00"'.
In mysql_update() before creation of select added remove_eq_conds() call.
It makes some optimization of conds and in particular performs conversion
from 'is null' to 'eq'.
Also remove_eq_conds() makes some evaluation of conds and if it founds that
conds is always false then update statement is not processed further.
All this allows to perform some update statements process faster due to
optimized conds, and not wasting resources if conds known to be false.
DISTINCT wasn't optimized away and caused creation of tmp table in wrong
case. This result in integer overrun and running out of memory.
Fix backported from 4.1. Now if optimizer founds that in result be only 1
row it removes distinct.
When fixing Item_func_plus in ORDER BY clause field c is searched in all
opened tables, but because c is an alias it wasn't found there.
This patch adds a flag to select_lex which allows Item_field::fix_fields()
to look up in select's item_list to find aliased fields.
For queries with GROUP BY and without hidden GROUP BY fields DISTINCT is
optimized away becuase such queries produce result set without duplicates.
But ROLLUP can add rows which may be same to some rows and this fact was
ignored.
Added check so if ROLLUP is present DISTINCT can't be optimized away.