TO INCONSISTENCY
PROBLEM
--------
When we drop a partitoned table , we first gather the
information about partitions in the table from the
table_name.par file and store it in an internal data
structure.Then we delete this file and the data in
the table. If the server crashes after deleting the
file,then after recovering we cannot access the table
.Even we cannot drop the table ,because drop algorithm
requires par file to read the partition information.
FIX
---
1. We move the part of deleting par file after deleting
all the table data from the storage egine.
2. During drop operation if we detect that the par
file is missing then we delete the .frm file,since
there is no way of recovering without par file.
[Approved by Mattias rb#2576 ]
Additional patch to remove the part_id -> ref_buffer offset.
The partitioning id and the associate record buffer can
be found without having to calculate it.
By initializing it for each used partition, and then reuse
the key-buffer from the queue, it is not needed to have
such map.
The buffer for the current read row from each partition
(m_ordered_rec_buffer) used for sorted reads was
allocated on open and freed when the ha_partition handler
was closed or destroyed.
For tables with many partitions and big records this could
take up too much valuable memory.
Solution is to only allocate the memory when it is needed
and free it when nolonger needed. I.e. allocate it in
index_init and free it in index_end (and to handle failures
also free it on reset, close etc.)
Also only allocating needed memory, according to
partitioning pruning.
Manually tested that it does not use as much memory and
releases it after queries.
RESULT FROM PREVIOUS TRANSACTION
The current Query Cache API is not fully compatible with
the partitioning engine.
There is no good way to implement support for QC due to:
1) a static callback for ha_partition would need to have access
to all partition names and call the underlying callback for each
[sub]partition with the correct name.
2) pruning would be impossible, even if one used the ulonglong
engine_data due to if engine_data is changed, the table is
invalidated by the QC.
So the only viable solution to avoid incorrect data is to not allow
caching of queries using partitioned tables.
(There are some extra changes, due to removal of \r as line break)
Update for previous patch according to reviewers comments.
Updated the constructors for ha_partitions to use the common
init_handler_variables functions
Added use of defines for size and offset to get better readability for the code that reads
and writes the .par file. Also refactored the get_from_handler_file function.
When executing row-ordered-retrieval index merge,
the handler was cloned, but it used the wrong
memory root, so instead of allocating memory
on the thread/query's mem_root, it used the table's
mem_root, resulting in non released memory in the
table object, and was not freed until the table was
closed.
Solution was to ensure that memory used during cloning
of a handler was allocated from the correct memory root.
This was implemented by fixing handler::clone() to also
take a name argument, so it can be used with partitioning.
And in ha_partition only allocate the ha_partition's ref, and
call the original ha_partition partitions clone() and set at cloned
partitions.
Fix of .bzrignore on Windows with VS 2010
LOAD DATA into partitioned MyISAM table
Problem was that both partitioning and myisam
used the same table_share->mutex for different protections
(auto inc and repair).
Solved by adding a specific mutex for the partitioning
auto_increment.
Also adding destroying the ha_data structure in
free_table_share (which is to be propagated
into 5.5).
This is a 5.1 ONLY patch, already fixed in 5.5+.
Problem was that the handler call ::extra(HA_EXTRA_CACHE) was cached
but the ::extra(HA_EXTRA_PREPARE_FOR_UPDATE) was not.
Solution was to also cache the other call and forward it when moving
to a new partition to scan.
In bug-28430 HA_PRIMARY_KEY_REQUIRED_FOR_POSITION
was disabled in the partitioning engine in the first patch,
That bug was later fixed a second time, but that flag
was not removed.
No need to disable this flag, as it leads to bad
choise in row replication.
The handler function for reading one row from a specific index
was not optimized in the partitioning handler since it
used the default implementation.
No test case since it is performance only, verified by hand.
Problem was that ha_partition::records_in_range called
records_in_range for all non pruned partitions, even if
an estimate should be given.
Solution is to only use 1/3 of the partitions (up to 10) for
records_in_range and estimate the total from this subset.
(And continue until a non zero return value from the called
partitions records_in_range is given, since 0 means no rows
will match.)
"insert into.. select * from"
When inserting into a partitioned table using 'insert into
<target> select * from <src>', read_buffer_size bytes of memory
are allocated for each partition in the target table.
This resulted in large memory consumption when the number of
partitions are high.
This patch introduces a new method which tries to estimate the
buffer size required for each partition and limits the maximum
buffer size used to maximum of 10 * read_buffer_size,
11 * read_buffer_size in case of monotonic partition functions.
(Backport)
Problem is that when insert (ha_start_bulk_insert) in i partitioned table,
it will call ha_start_bulk_insert for every partition, used or not.
Solution is to delay the call to the partitions ha_start_bulk_insert until
the first row is to be inserted into that partition
Inserting a negative value in the autoincrement column of a
partitioned innodb table was causing the value of the auto
increment counter to wrap around into a very large positive
value. The consequences are the same as if a very large positive
value was inserted into a column, e.g. reduced autoincrement
range, failure to read autoincrement counter.
The current patch ensures that before calculating the next
auto increment value, the current value is within the positive
maximum allowed limit.
on tables with partitions
Problem was that the handler function try_semi_consistent_read
was not propagated to the innodb handler.
Solution was to implement that function in the partitioning
handler.
Problem was that partitioning cached the table flags.
These flags could change due to TRANSACTION LEVEL changes.
Solution was to remove the cache and always return the table flags
from the first partition (if the handler was initialized).
breaks auto increment
The auto_increment value was not initialized if
the first statement after opening a table was
an 'UPDATE'.
solution was to check initialize if it was not,
before trying to increase it in update.
on non-partitioned table
Problem was that partitioning specific commands was accepted
for non partitioned tables and treated like
ANALYZE/CHECK/OPTIMIZE/REPAIR TABLE, after bug-20129 was fixed,
which changed the code path from mysql_alter_table to
mysql_admin_table.
Solution was to check if the table was partitioned before
trying to execute the admin command
index column
There was actually two problems
1) when clustered pk, order by non pk index should also
compare with pk as last resort to differ keys from each
other
2) bug in the index search handling in ha_partition (was
found when extending the test case
Solution to 1 was to include the pk in key compare if
clustered pk and search on other index.
Solution for 2 was to remove the optimization from
ordered scan to unordered scan if clustered pk.
InnoDB Plugin locks table
The fast/on-line add/drop index handler calls was not implemented
whithin the partitioning.
This implements it in the partitioning handler.
Since this is only used by the not included InnoDB plugin, there
is no test case. (Have tested it manually with the plugin, and
it does not allow unique indexes not including partitioning
function, or removal of pk, which in innodb generates a new pk,
which is not in the partitioning function.)
NOTE: This introduces a new handler method, and because of that
changes the storage engine api. (One cannot use a handlerton to
see the capabilities of a table's handler if it is partitioned.
So I added a wrapper function in the handler that defaults to
the handlerton function, which the partitioning handler overrides.
and
Bug#33555: Group By Query does not correctly aggregate partitions
Backport of bug-33257 which is the same bug.
read_range_*() calls was not passed to the partition handlers,
but was translated to index_read/next family calls.
Resulting in duplicates rows and wrong aggregations.
Problem was a mutex added in bug n 27405 for solving a problem
with auto_increment in partitioned innodb tables.
(in ha_partition::write_row over partitions file->ha_write_row)
Solution is to use the patch for bug#33479, which refines the
usage of mutexes for auto_increment.
Backport of bug-33479 from 6.0:
Bug-33479: auto_increment failures in partitioning
Several problems with auto_increment in partitioning
(with MyISAM, InnoDB. Locking issues, not handling
multi-row INSERTs properly etc.)
Changed the auto_increment handling for partitioning:
Added a ha_data variable in table_share for storage engine specific data
such as auto_increment value handling in partitioning, also see WL 4305
and using the ha_data->mutex to lock around read + update.
The idea is this:
Store the table's reserved auto_increment value in
the TABLE_SHARE and use a mutex to, lock it for reading and updating it
and unlocking it, in one block. Only accessing all partitions
when it is not initialized.
Also allow reservations of ranges, and if no one has done a reservation
afterwards, lower the reservation to what was actually used after
the statement is done (via release_auto_increment from WL 3146).
The lock is kept from the first reservation if it is statement based
replication and a multi-row INSERT statement where the number of
candidate rows to insert is not known in advance (like INSERT SELECT,
LOAD DATA, unlike INSERT VALUES (row1), (row2),,(rowN)).
This should also lead to better concurrancy (no need to have a mutex
protection around write_row in all cases)
and work with any local storage engine.
partition is corrupt
The main problem was that ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR
PARTITION took another code path (over mysql_alter_table instead of
mysql_admin_table) which differs in two ways:
1) alter table opens the tables in a different way than admin tables do
resulting in returning with error before it tried the command
2) alter table does not start to send any diagnostic rows to the client
which the lower admin functions continue to use -> resulting in
assertion crash
The fix:
Remapped ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR PARTITION to use
the same code path as ANALYZE/CHECK/OPTIMIZE/REPAIR TABLE t.
Adding check in mysql_admin_table to setup the partition list for
which partitions that should be used.
Partitioned tables will still not work with
REPAIR TABLE/PARTITION USE_FRM, since that requires moving partitions
to tables, REPAIR TABLE t USE_FRM, and check that the data still
fulfills the partitioning function and then move the table back to
being a partition.
NOTE: I have removed the following functions from the handler
interface:
analyze_partitions, check_partitions, optimize_partitions,
repair_partitions
Since they are not longer needed.
THIS ALTERS THE STORAGE ENGINE API
problem was that ha_partition::records was not implemented, thus
using the default handler::records, which is not correct if the engine
does not support HA_STATS_RECORDS_IS_EXACT.
Solution was to implement ha_partition::records as a wrapper around
the underlying partitions records.
The rows column in explain partitions will now include the total
number of records in the partitioned table.
(recommit after removing out-commented code)
Problem was that auto_repair, is_crashed and check_and_repair was not
implemented in ha_partition.
Solution, implemented them as loop over all partitions for is_crashed and
check_and_repair, and using the first partition for auto_repair.
(Recommit after fixing review comments)
- Reserver namespace and place in frm for TABLE_CHECKSUM and PAGE_CHECKSUM create options
- Added syncing of directory when creating .frm files
- Portability fixes
- Added missing cast that could cause bugs
- Code cleanups
- Made some bit functions inline
- Moved things out of myisam.h to my_handler.h to make them more accessable
- Renamed some myisam variables and defines to make them more globaly usable (as they are used outside of MyISAM)
- Fixed bugs in error conditions
- Use compiler time asserts instead of run time
- Fixed indentation
HA_EXTRA_PREPARE_FOR_DELETE -> HA_EXTRA_PREPARE_FOR_DROP as the old name was wrong
(Added a define for old value to ensure we don't break any old code)
Added HA_EXTRA_PREPARE_FOR_RENAME as a signal for rename (before we used a DROP signal which is wrong)
- Initialize error messages early to get better errors when mysqld or an engine fails to start
- Fix windows bug that query_performance_frequency was not initialized if registry code failed
- thread_stack -> my_thread_stack_size
In the ha_partition::position() we don't calculate the number
of the partition of the record, but use m_last_part value instead,
relying on that it's previously set by some other call like ::write_row().
Delete_rows_log_event::do_exec_row() calls find_and_fetch_row(),
where we used position() + rnd_pos() call for the InnoDB-based PARTITION-ed
table as there HA_PRIMARY_KEY_REQUIRED_FOR_POSITION enabled.
fixed by introducing new handler::rnd_pos_by_record() method to be
used for random record-based positioning