ha_partition::init_record_priority_queue()
Cherry-pick rev.6b0ee0c795499cee7f9deb649fb010801e0be4c2 from mysql-5.6.
Bug #18305270 BACKPORT BUG#18694052 FIX
FOR ASSERTION `!M_ORDERED_REC_BUFFER'
FAILED TO 5.6
PROBLEM
-------
Missed to remove record priority queue if
init_index failed for a partition which
was causing the crash.
FIX
---
Remove priority queue if init_index fails
for partition.
PROBLEM
Create time is calculated as last status change time of .frm file.
The first problem was that innodb was passing file name as
"table_name#po#p0.frm" to the stat() call which calculates the create time.
Since there is no frm file with this name create_time will be stored as NULL.
The second problem is ha_partition::info() updates stats for create time
when HA_STATUS_CONST flag was set ,where as innodb calculates this statistic
when HA_STATUS_TIME is set,which causes create_time to be set as NULL.
Fix
Pass proper .frm name to stat() call and calculate create time when
HA_STATUS_CONST flag is set.
Problem is on test it tried to verify that no files were left
on test database.
Fix: There's no need to list other file types, it can only
list *.par files
MySQL Bug#71095: Wrong results with PARTITION BY LIST COLUMNS()
MySQL Bug#72803: Wrong "Impossible where" with LIST partitioning
MDEV-6240: Wrong "Impossible where" with LIST partitioning
- Backprot the fix from MySQL Bug#71095.
(This is attempt at fix#2) (re-commit with fixed typo)
- Moved the testcase from partition_test to partition_innodb.test where it can really work.
- Made ordered index scans over ha_partition tables to satisfy ROR property for
the case where underlying table uses extended keys.
- Backport MySQL's fix: do set ha_partition::m_pkey_is_clustered for ha_partition
objects created with handler->clone() call.
- Also, include a testcase.
includes:
* remove some remnants of "Bug#14521864: MYSQL 5.1 TO 5.5 BUGS PARTITIONING"
* introduce LOCK_share, now LOCK_ha_data is strictly for engines
* rea_create_table() always creates .par file (even in "frm-only" mode)
* fix a 5.6 bug, temp file leak on dummy ALTER TABLE
from MariaDB 10.0.
The bug in mdev-3948 was an instance of the problem fixed by Sergey's patch
in 10.0 - namely that the range optimizer could change table->[read | write]_set,
and not restore it.
revno: 3471
committer: Sergey Petrunya <psergey@askmonty.org>
branch nick: 10.0-serg-fix-imerge
timestamp: Sat 2012-11-03 12:24:36 +0400
message:
# MDEV-3817: Wrong result with index_merge+index_merge_intersection, InnoDB table, join, AND and OR conditions
Reconcile the fixes from:
#
# guilhem.bichot@oracle.com-20110805143029-ywrzuz15uzgontr0
# Fix for BUG#12698916 - "JOIN QUERY GIVES WRONG RESULT AT 2ND EXEC. OR
# AFTER FLUSH TABLES [-INT VS NULL]"
#
# guilhem.bichot@oracle.com-20111209150650-tzx3ldzxe1yfwji6
# Fix for BUG#12912171 - ASSERTION FAILED: QUICK->HEAD->READ_SET == SAVE_READ_SET
# and
#
and related fixes from: BUG#1006164, MDEV-376:
Now, ROR-merged QUICK_RANGE_SELECT objects make no assumptions about the values
of table->read_set and table->write_set.
Each QUICK_ROR_SELECT has (and had before) its own column bitmap, but now, all
QUICK_ROR_SELECT's functions that care: reset(), init_ror_merged_scan(), and
get_next() will set table->read_set when invoked and restore it back to what
it was before the call before they return.
This allows to avoid the mess when somebody else modifies table->read_set for
some reason.
Reconcile the fixes from:
#
# guilhem.bichot@oracle.com-20110805143029-ywrzuz15uzgontr0
# Fix for BUG#12698916 - "JOIN QUERY GIVES WRONG RESULT AT 2ND EXEC. OR
# AFTER FLUSH TABLES [-INT VS NULL]"
#
# guilhem.bichot@oracle.com-20111209150650-tzx3ldzxe1yfwji6
# Fix for BUG#12912171 - ASSERTION FAILED: QUICK->HEAD->READ_SET == SAVE_READ_SET
# and
#
and related fixes from: BUG#1006164, MDEV-376:
Now, ROR-merged QUICK_RANGE_SELECT objects make no assumptions about the values
of table->read_set and table->write_set.
Each QUICK_ROR_SELECT has (and had before) its own column bitmap, but now, all
QUICK_ROR_SELECT's functions that care: reset(), init_ror_merged_scan(), and
get_next() will set table->read_set when invoked and restore it back to what
it was before the call before they return.
This allows to avoid the mess when somebody else modifies table->read_set for
some reason.
PARTITION STATISTICS
Problem was the fix for bug#11756867; It always used the first
partitions, and stopped after it checked 10 [sub]partitions.
(or until it found a partition which would contain a match).
This results in bad statistics for tables where the first 10 partitions
don't represent the majority of the data (like when the first 10
partitions only contained a few rows in total).
The solution was to take statisics from the partitions containing
the most rows instead:
Added an array of partition ids which is sorted by number of records
in descending order.
this array is used in records_in_range to cover as many records as
possible in as few calls as possible.
Also changed the limit of how many partitions to use for the statistics
from a static max of 10 partitions, into a dynamic model:
Maximum number of partitions is now log2(total number of partitions)
taken from the ordered array.
It will continue calling partitions records_in_range until it has
checked:
(total rows in matching partitions) * (maximum number of partitions)
/ (number of used partitions)
Also reverted the changes for ha_partition::scan_time() and
ha_partition::estimate_rows_upper_bound() to before
the fix of bug#11756867. Since they are not as slow as
records_in_range.
PARTITION STATISTICS
Problem was the fix for bug#11756867; It always used the first
partitions, and stopped after it checked 10 [sub]partitions.
(or until it found a partition which would contain a match).
This results in bad statistics for tables where the first 10 partitions
don't represent the majority of the data (like when the first 10
partitions only contained a few rows in total).
The solution was to take statisics from the partitions containing
the most rows instead:
Added an array of partition ids which is sorted by number of records
in descending order.
this array is used in records_in_range to cover as many records as
possible in as few calls as possible.
Also changed the limit of how many partitions to use for the statistics
from a static max of 10 partitions, into a dynamic model:
Maximum number of partitions is now log2(total number of partitions)
taken from the ordered array.
It will continue calling partitions records_in_range until it has
checked:
(total rows in matching partitions) * (maximum number of partitions)
/ (number of used partitions)
Also reverted the changes for ha_partition::scan_time() and
ha_partition::estimate_rows_upper_bound() to before
the fix of bug#11756867. Since they are not as slow as
records_in_range.
leave the table unusable".
Failing ALTER statement on partitioned table could have left
this table in an unusable state. This has happened in cases
when ALTER was executed using "fast" algorithm, which doesn't
involve copying of data between old and new versions of table,
and the resulting new table was incompatible with partitioning
function in some way.
The problem stems from the fact that discrepancies between new
table definition and partitioning function are discovered only
when the table is opened. In case of "fast" algorithm this has
happened too late during ALTER's execution, at the moment when
all changes were already done and couldn't have been reverted.
In the cases when "slow" algorithm, which copies data, is used
such discrepancies are detected at the moment new table
definition is opened implicitly when new version of table is
created in storage engine. As result ALTER is aborted before
any changes to table were done.
This fix tries to address this issue by ensuring that "fast"
algorithm behaves similarly to "slow" algorithm and checks
compatibility between new definition and partitioning function
by trying to open new definition after .FRM file for it has
been created.
Long term we probably should implement some way to check
compatibility between partitioning function and new table
definition which won't involve opening it, as this should
allow much cleaner fix for this problem.
mysql-test/r/partition_innodb.result:
Added test for bug #57985 "ONLINE/FAST ALTER PARTITION can
fail and leave the table unusable".
mysql-test/t/partition_innodb.test:
Added test for bug #57985 "ONLINE/FAST ALTER PARTITION can
fail and leave the table unusable".
sql/sql_table.cc:
Ensure that in cases when .FRM for partitioned table is
created without creating table in storage engine (e.g.
during "fast" ALTER TABLE) we still open table definition.
This allows to check that definition of created table/.FRM
is compatible with its partitioning function.
leave the table unusable".
Failing ALTER statement on partitioned table could have left
this table in an unusable state. This has happened in cases
when ALTER was executed using "fast" algorithm, which doesn't
involve copying of data between old and new versions of table,
and the resulting new table was incompatible with partitioning
function in some way.
The problem stems from the fact that discrepancies between new
table definition and partitioning function are discovered only
when the table is opened. In case of "fast" algorithm this has
happened too late during ALTER's execution, at the moment when
all changes were already done and couldn't have been reverted.
In the cases when "slow" algorithm, which copies data, is used
such discrepancies are detected at the moment new table
definition is opened implicitly when new version of table is
created in storage engine. As result ALTER is aborted before
any changes to table were done.
This fix tries to address this issue by ensuring that "fast"
algorithm behaves similarly to "slow" algorithm and checks
compatibility between new definition and partitioning function
by trying to open new definition after .FRM file for it has
been created.
Long term we probably should implement some way to check
compatibility between partitioning function and new table
definition which won't involve opening it, as this should
allow much cleaner fix for this problem.
When having a sub query in partitioned innodb one could
make the partitioning engine to search for a 'index_next_same'
on a partition that had not been initialized.
Problem was that the subselect function looks at table->status
which was not set in the partitioning handler when it skipped
scanning due to no matching partitions found.
Fixed by setting table->status = STATUS_NOT_FOUND when
there was no partitions to scan. (If there are partitions to
scan, it will be set in the partitions handler.)
mysql-test/r/partition_innodb.result:
added result
mysql-test/t/partition_innodb.test:
added test
sql/ha_partition.cc:
set table status to not found, if there ar no partitions to scan.
When having a sub query in partitioned innodb one could
make the partitioning engine to search for a 'index_next_same'
on a partition that had not been initialized.
Problem was that the subselect function looks at table->status
which was not set in the partitioning handler when it skipped
scanning due to no matching partitions found.
Fixed by setting table->status = STATUS_NOT_FOUND when
there was no partitions to scan. (If there are partitions to
scan, it will be set in the partitions handler.)
The ALTER PARTITION and SELECT seemed to be deadlocked
when having innodb_thread_concurrency = 1.
Problem was that there was unreleased latches
in the ALTER PARTITION thread which was needed
by the SELECT thread to be able to continue.
Solution was to release the latches by commit
before requesting upgrade to exclusive MDL lock.
Updated according to reviewers comments (3).
mysql-test/r/partition_innodb.result:
updated test result
mysql-test/t/partition_innodb.test:
added test
sql/sql_partition.cc:
Moved implicit commit into mysql_change_partition
so that if latches are taken, they are always released
before waiting on exclusive lock.
sql/sql_table.cc:
refactored the code to prepare and commit
around copy_data_between_tables, to be able
to reuse it in mysql_change_partitions
sql/sql_table.h:
exporting mysql_trans_prepare/commit_alter_copy_data
The ALTER PARTITION and SELECT seemed to be deadlocked
when having innodb_thread_concurrency = 1.
Problem was that there was unreleased latches
in the ALTER PARTITION thread which was needed
by the SELECT thread to be able to continue.
Solution was to release the latches by commit
before requesting upgrade to exclusive MDL lock.
Updated according to reviewers comments (3).
corruption on ADD PARTITION and LOCK TABLE
Bug#53770: Server crash at handler.cc:2076 on
LOAD DATA after timed out COALESCE PARTITION
5.5 fix for:
Bug#51042: REORGANIZE PARTITION can leave table in an
inconsistent state in case of crash
Needs to be back-ported to 5.1
5.5 fix for:
Bug#50418: DROP PARTITION does not interact with
transactions
Main problem was non-persistent operations done
before meta-data lock was taken (53770+53676).
And 53676 needed to keep the table/partitions opened and locked
while copying the data to the new partitions.
Also added thorough tests to spot some additional bugs
in the ddl_log code, which could result in bad state
between the .frm and partitions.
Collapsed patch, includes all fixes required from the reviewers.
mysql-test/r/partition_innodb.result:
updated result with new test
mysql-test/suite/parts/inc/partition_crash.inc:
crash test include file
mysql-test/suite/parts/inc/partition_crash_add.inc:
test all states in fast_alter_partition_table
ADD PARTITION branch
mysql-test/suite/parts/inc/partition_crash_change.inc:
test all states in fast_alter_partition_table
CHANGE PARTITION branch
mysql-test/suite/parts/inc/partition_crash_drop.inc:
test all states in fast_alter_partition_table
DROP PARTITION branch
mysql-test/suite/parts/inc/partition_fail.inc:
recovery test including injecting errors
mysql-test/suite/parts/inc/partition_fail_add.inc:
test all states in fast_alter_partition_table
ADD PARTITION branch
mysql-test/suite/parts/inc/partition_fail_change.inc:
test all states in fast_alter_partition_table
CHANGE PARTITION branch
mysql-test/suite/parts/inc/partition_fail_drop.inc:
test all states in fast_alter_partition_table
DROP PARTITION branch
mysql-test/suite/parts/inc/partition_mgm_crash.inc:
include file that runs all crash and failure injection tests.
mysql-test/suite/parts/r/partition_debug_innodb.result:
new test result file
mysql-test/suite/parts/r/partition_debug_myisam.result:
new test result file
mysql-test/suite/parts/r/partition_special_innodb.result:
updated result
mysql-test/suite/parts/r/partition_special_myisam.result:
updated result
mysql-test/suite/parts/t/partition_debug_innodb-master.opt:
opt file for using with crashing tests of partitioned innodb
mysql-test/suite/parts/t/partition_debug_innodb.test:
partitioned innodb test that require debug builds
mysql-test/suite/parts/t/partition_debug_myisam-master.opt:
opt file for using with crashing tests of partitioned myisam
mysql-test/suite/parts/t/partition_debug_myisam.test:
partitioned myisam test that require debug builds
mysql-test/suite/parts/t/partition_special_innodb-master.opt:
added innodb-file-per-table to easier verify partition status on disk
mysql-test/suite/parts/t/partition_special_innodb.test:
added test case
mysql-test/suite/parts/t/partition_special_myisam.test:
added test case
mysql-test/t/partition_innodb.test:
added test case
sql/sql_base.cc:
Moved alter_close_tables to sql_partition.cc
sql/sql_base.h:
removed some non existing and duplicated functions.
sql/sql_partition.cc:
fast_alter_partition_table:
Spletted abort_and_upgrad_lock_and_close_table
to its parts (wait_while_table_is_used and
alter_close_tables) and always have
wait_while_table_is_used before any persistent
operations (including logs, which will be executed
on failure) and alter_close_tables after
create/read/write operations and before
drop operations.
moved alter_close_tables here from sql_base.cc
Added error injections for better test coverage.
write_log_final_change_partition:
fixed a log_entry linking bug (delete_frm was not
linked to change/drop partition)
and drop partition must be executed before
change partition (change partition can rename a
partition to an old name, like REORG p1 INTO (p1,p2).
write_log_add_change_partition:
need to use drop_frm first, and relinking that entry
and reusing its execute entry.
sql/sql_table.cc:
added initialization of next_active_log_entry.
sql/table.h:
removed a duplicate declaration.
corruption on ADD PARTITION and LOCK TABLE
Bug#53770: Server crash at handler.cc:2076 on
LOAD DATA after timed out COALESCE PARTITION
5.5 fix for:
Bug#51042: REORGANIZE PARTITION can leave table in an
inconsistent state in case of crash
Needs to be back-ported to 5.1
5.5 fix for:
Bug#50418: DROP PARTITION does not interact with
transactions
Main problem was non-persistent operations done
before meta-data lock was taken (53770+53676).
And 53676 needed to keep the table/partitions opened and locked
while copying the data to the new partitions.
Also added thorough tests to spot some additional bugs
in the ddl_log code, which could result in bad state
between the .frm and partitions.
Collapsed patch, includes all fixes required from the reviewers.