"alter table from MyISAM to MERGE lost data without errors and warnings"
Add new handlerton flag which prevent user from altering table storage
engine to storage engines which would lose data. Both 'blackhole' and
'merge' are marked with the new flag.
Tests included.
When converting DISTINCT to GROUP BY where the columns are from the covering
index and they are quoted twice in the SELECT list the optimizer is creating
improper processing sequence. This is because of the fact that the columns
of the covering index are not recognized as such and treated as non-index
columns.
Generally speaking duplicate columns can safely be removed from the GROUP
BY/DISTINCT list because this will not add or remove new rows in the
resulting set. Duplicates can be removed even if they are not consecutive
(as is the case for ORDER BY, where the duplicate columns can be removed
only if they are consecutive).
So we can safely transform "SELECT DISTINCT a,a FROM ... ORDER BY a" to
"SELECT a,a FROM ... GROUP BY a ORDER BY a" instead of
"SELECT a,a FROM .. GROUP BY a,a ORDER BY a". We can even transform
"SELECT DISTINCT a,b,a FROM ... ORDER BY a,b" to
"SELECT a,b,a FROM ... GROUP BY a,b ORDER BY a,b".
The fix to this bug consists of checking for duplicate columns in the SELECT
list when constructing the GROUP BY list in transforming DISTINCT to GROUP
BY and skipping the ones that are already in.
The bug was as follows: When merge_key_fields() encounters "t.key=X OR t.key=Y" it will
try to join them into ref_or_null access via "t.key=X OR NULL". In order to make this
inference it checks if Y<=>NULL, ignoring the fact that value of Y may be not yet known.
The fix is that the check if Y<=>NULL is made only if value of Y is known (i.e. it is a
constant).
TODO: When merging to 5.0, replace used_tables() with const_item() everywhere in merge_key_fields().
The reason of the bug is in that `get_var_with_binlog' performs missed
assingment of
the variables as side-effect. Doing that it eventually calls
`free_underlaid_joins' to pass as an argument `thd->lex->select_lex' of the lex
which belongs to the user query, not
to one which is emulated i.e SET @var1:=NULL.
`get_var_with_binlog' is refined to supply a temporary lex to sql_set_variables's stack.
mysqldump / SHOW CREATE TABLE will show the NEXT available value for
the PK, rather than the *first* one that was available (that named in
the original CREATE TABLE ... AUTO_INCREMENT = ... statement).
This should produce correct and robust behaviour for the obvious use
cases -- when no data were inserted, then we'll produce a statement
featuring the same value the original CREATE TABLE had; if we dump
with values, INSERTing the values on the target machine should set the
correct next_ID anyway (and if not, we'll still have our AUTO_INCREMENT =
... to do that). Lastly, just the CREATE statement (with no data) for
a table that saw inserts would still result in a table that new values
could safely be inserted to).
There seems to be no robust way however to see whether the next_ID
field is > 1 because it was set to something else with CREATE TABLE
... AUTO_INCREMENT = ..., or because there is an AUTO_INCREMENT column
in the table (but no initial value was set with AUTO_INCREMENT = ...)
and then one or more rows were INSERTed, counting up next_ID. This
means that in both cases, we'll generate an AUTO_INCREMENT =
... clause in SHOW CREATE TABLE / mysqldump. As we also show info on,
say, charsets even if the user did not explicitly give that info in
their own CREATE TABLE, this shouldn't be an issue.
As per above, the next_ID will be affected by any INSERTs that have
taken place, though. This /should/ result in correct and robust
behaviour, but it may look non-intuitive to some users if they CREATE
TABLE ... AUTO_INCREMENT = 1000 and later (after some INSERTs) have
SHOW CREATE TABLE give them a different value (say, CREATE TABLE
... AUTO_INCREMENT = 1006), so the docs should possibly feature a
caveat to that effect.
It's not very intuitive the way it works now (with the fix), but it's
*correct*. We're not storing the original value anyway, if we wanted
that, we'd have to change on-disk representation?
If we do dump/load cycles with empty DBs, nothing will change. This
changeset includes an additional test case that proves that tables
with rows will create the same next_ID for AUTO_INCREMENT = ... across
dump/restore cycles.
Confirmed by support as likely solution for client's problem.
This performance degradation was due to the fact that some
cost evaluation code added into 4.1 in the function find_best was
not merged into the code of the function best_access_path added
together with other code for greedy optimizer.
Added a parameter to the function print_plan. The parameter contains
accumulated cost for a given partial join.
The patch does not include a special test case since this performance
degradation is hard to reproduse with a simple example.
TODO: make the function find_best use the function best_access_path
in order to remove duplication of code which might result in incomplete
merges in the future.