Additional patch to remove the part_id -> ref_buffer offset.
The partitioning id and the associate record buffer can
be found without having to calculate it.
By initializing it for each used partition, and then reuse
the key-buffer from the queue, it is not needed to have
such map.
The buffer for the current read row from each partition
(m_ordered_rec_buffer) used for sorted reads was
allocated on open and freed when the ha_partition handler
was closed or destroyed.
For tables with many partitions and big records this could
take up too much valuable memory.
Solution is to only allocate the memory when it is needed
and free it when nolonger needed. I.e. allocate it in
index_init and free it in index_end (and to handle failures
also free it on reset, close etc.)
Also only allocating needed memory, according to
partitioning pruning.
Manually tested that it does not use as much memory and
releases it after queries.
RESULT FROM PREVIOUS TRANSACTION
The current Query Cache API is not fully compatible with
the partitioning engine.
There is no good way to implement support for QC due to:
1) a static callback for ha_partition would need to have access
to all partition names and call the underlying callback for each
[sub]partition with the correct name.
2) pruning would be impossible, even if one used the ulonglong
engine_data due to if engine_data is changed, the table is
invalidated by the QC.
So the only viable solution to avoid incorrect data is to not allow
caching of queries using partitioned tables.
(There are some extra changes, due to removal of \r as line break)
PARTITONING, ON INDEX CREATE
If the first partition succeeded in adding a index, but a successive partition failed,
then the first partition had still the new index.
The fix reverts the added indexes from previous partitions on failure.
In sql_class.cc, 'row_count', of type 'ha_rows', was used as last argument for
ER_TRUNCATED_WRONG_VALUE_FOR_FIELD which is
"Incorrect %-.32s value: '%-.128s' for column '%.192s' at row %ld".
So 'ha_rows' was used as 'long'.
On SPARC32 Solaris builds, 'long' is 4 bytes and 'ha_rows' is 'longlong' i.e. 8 bytes.
So the printf-like code was reading only the first 4 bytes.
Because the CPU is big-endian, 1LL is 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01
so the first four bytes yield 0. So the warning message had "row 0" instead of
"row 1" in test outfile_loaddata.test:
-Warning 1366 Incorrect string value: '\xE1\xE2\xF7' for column 'b' at row 1
+Warning 1366 Incorrect string value: '\xE1\xE2\xF7' for column 'b' at row 0
All error-messaging functions which internally invoke some printf-life function
are potential candidate for such mistakes.
One apparently easy way to catch such mistakes is to use
ATTRIBUTE_FORMAT (from my_attribute.h).
But this works only when call site has both:
a) the format as a string literal
b) the types of arguments.
So:
func(ER(ER_BLAH), 10);
will silently not be checked, because ER(ER_BLAH) is not known at
compile time (it is known at run-time, and depends on the chosen
language).
And
func("%s", a va_list argument);
has the same problem, as the *real* type of arguments is not
known at this site at compile time (it's known in some caller).
Moreover,
func(ER(ER_BLAH));
though possibly correct (if ER(ER_BLAH) has no '%' markers), will not
compile (gcc says "error: format not a string literal and no format
arguments").
Consequences:
1) ATTRIBUTE_FORMAT is here added only to functions which in practice
take "string literal" formats: "my_error_reporter" and "print_admin_msg".
2) it cannot be added to the other functions: my_error(),
push_warning_printf(), Table_check_intact::report_error(),
general_log_print().
To do a one-time check of functions listed in (2), the following
"static code analysis" has been done:
1) replace
my_error(ER_xxx, arguments for substitution in format)
with the equivalent
my_printf_error(ER_xxx,ER(ER_xxx), arguments for substitution in
format),
so that we have ER(ER_xxx) and the arguments *in the same call site*
2) add ATTRIBUTE_FORMAT to push_warning_printf(),
Table_check_intact::report_error(), general_log_print()
3) replace ER(xxx) with the hard-coded English text found in
errmsg.txt (like: ER(ER_UNKNOWN_ERROR) is replaced with
"Unknown error"), so that a call site has the format as string literal
4) this way, ATTRIBUTE_FORMAT can effectively do its job
5) compile, fix errors detected by ATTRIBUTE_FORMAT
6) revert steps 1-2-3.
The present patch has no compiler error when submitted again to the
static code analysis above.
It cannot catch all problems though: see Field::set_warning(), in
which a call to push_warning_printf() has a variable error
(thus, not replacable by a string literal); I checked set_warning() calls
by hand though.
See also WL 5883 for one proposal to avoid such bugs from appearing
again in the future.
The issues fixed in the patch are:
a) mismatch in types (like 'int' passed to '%ld')
b) more arguments passed than specified in the format.
This patch resolves mismatches by changing the type/number of arguments,
not by changing error messages of sql/share/errmsg.txt. The latter would be wrong,
per the following old rule: errmsg.txt must be as stable as possible; no insertions
or deletions of messages, no changes of type or number of printf-like format specifiers,
are allowed, as long as the change impacts a message already released in a GA version.
If this rule is not followed:
- Connectors, which use error message numbers, will be confused (by insertions/deletions
of messages)
- using errmsg.sys of MySQL 5.1.n with mysqld of MySQL 5.1.(n+1)
could produce wrong messages or crash; such usage can easily happen if
installing 5.1.(n+1) while /etc/my.cnf still has --language=/path/to/5.1.n/xxx;
or if copying mysqld from 5.1.(n+1) into a 5.1.n installation.
When fixing b), I have verified that the superfluous arguments were not used in the format
in the first 5.1 GA (5.1.30 'bteam@astra04-20081114162938-z8mctjp6st27uobm').
Had they been used, then passing them today, even if the message doesn't use them
anymore, would have been necessary, as explained above.
The partitioning engine checked the auto_increment column even if it was not to be written,
triggering a DBUG_ASSERT.
Fixed by checking if table->write_set for that column was set.
Partitions can have different ref_length (position data length).
Removed DBUG_ASSERT which crashed debug builds when using
MAX_ROWS on some partitions.
Update for previous patch according to reviewers comments.
Updated the constructors for ha_partitions to use the common
init_handler_variables functions
Added use of defines for size and offset to get better readability for the code that reads
and writes the .par file. Also refactored the get_from_handler_file function.
When executing row-ordered-retrieval index merge,
the handler was cloned, but it used the wrong
memory root, so instead of allocating memory
on the thread/query's mem_root, it used the table's
mem_root, resulting in non released memory in the
table object, and was not freed until the table was
closed.
Solution was to ensure that memory used during cloning
of a handler was allocated from the correct memory root.
This was implemented by fixing handler::clone() to also
take a name argument, so it can be used with partitioning.
And in ha_partition only allocate the ha_partition's ref, and
call the original ha_partition partitions clone() and set at cloned
partitions.
Fix of .bzrignore on Windows with VS 2010
Regression introduced in bug#52455. Problem was that the
fixed function did not set the last used partition variable, resulting
in wrong partition used when storing the position of the newly
retrieved row.
Fixed by setting the last used partition in ha_partition::index_read_idx_map.
with on duplicate key update
There was a missed corner case in the partitioning
handler, which caused the next_insert_id to be changed
in the second level handlers (i.e the hander of a partition),
which caused this debug assertion.
The solution was to always ensure that only the partitioning
level generates auto_increment values, since if it was done
within a partition, it may fail to match the partition
function.
LOAD DATA into partitioned MyISAM table
Problem was that both partitioning and myisam
used the same table_share->mutex for different protections
(auto inc and repair).
Solved by adding a specific mutex for the partitioning
auto_increment.
Also adding destroying the ha_data structure in
free_table_share (which is to be propagated
into 5.5).
This is a 5.1 ONLY patch, already fixed in 5.5+.
Bug#57113: ha_partition::extra(ha_extra_function):
Assertion `m_extra_cache' failed
Fix for bug#55458 included DBUG_ASSERTS causing
debug builds of the server to crash on
another multi-table update.
Removed the asserts since they where wrong.
(updated after testing the patch in 5.5).
When having a sub query in partitioned innodb one could
make the partitioning engine to search for a 'index_next_same'
on a partition that had not been initialized.
Problem was that the subselect function looks at table->status
which was not set in the partitioning handler when it skipped
scanning due to no matching partitions found.
Fixed by setting table->status = STATUS_NOT_FOUND when
there was no partitions to scan. (If there are partitions to
scan, it will be set in the partitions handler.)
As we check for the impossible partitions earlier, it's possible that we don't find any
suitable partitions at all. So this assertion just has to be corrected for this case.
per-file comments:
mysql-test/r/partition_innodb.result
Bug#55146 Assertion `m_part_spec.start_part == m_part_spec.end_part' in index_read_idx_map
test result updated.
mysql-test/t/partition_innodb.test
Bug#55146 Assertion `m_part_spec.start_part == m_part_spec.end_part' in index_read_idx_map
test case added.
sql/ha_partition.cc
Bug#55146 Assertion `m_part_spec.start_part == m_part_spec.end_part' in index_read_idx_map
Assertion changed to '>=' as the prune_partition_set() in the get_partition_set() can
do start_part= end_part+1 if no possible partitions were found.
Problem was that the handler call ::extra(HA_EXTRA_CACHE) was cached
but the ::extra(HA_EXTRA_PREPARE_FOR_UPDATE) was not.
Solution was to also cache the other call and forward it when moving
to a new partition to scan.
The handler function for reading one row from a specific index
was not optimized in the partitioning handler since it
used the default implementation.
No test case since it is performance only, verified by hand.
concurrent I_S query
There were two problem:
1) MYSQL_LOCK_IGNORE_FLUSH also ignored name locks
2) there was a race between abort_and_upgrade_locks and
alter_close_tables
(i.e. remove_table_from_cache and
close_data_files_and_morph_locks)
Which allowed the table to be opened with MYSQL_LOCK_IGNORE_FLUSH flag
resulting in renaming a partition that was already in use,
which could cause the table to be unusable.
Solution was to not allow IGNORE_FLUSH to skip waiting for
a named locked table.
And to not release the LOCK_open mutex between the
calls to remove_table_from_cache and
close_data_files_and_morph_locks by merging the functions
abort_and_upgrade_locks and alter_close_tables.
auto_increment on duplicate entry
The bug was that when INSERT_ID was used and the storage
engine was told to release any reserved but not used
auto_increment values, it set the highest auto_increment
value to INSERT_ID.
The fix was to check if the auto_increment value was forced
by user (INSERT_ID) or by slave-thread, i.e. not auto-
generated. So that it is only allowed to release generated
values.
Problem was block_size on partitioned tables was not set,
resulting in keys_per_block was not correct which affects
the cost calculation for read time of indexes (including
cost for group min/max).Which resulted in a bad optimizer
decision.
Fixed by setting stats.block_size correctly.
There was two problems:
The first was the symptom, caused by bad error handling in
ha_partition. It did not handle print_error etc. when
having no partitions (when used by dummy handler).
The second was the real problem that when dropping tables
it reused the table type (storage engine) from when the lock
was asked for, not the table type that it had when gaining
the exclusive name lock. So that it tried to delete tables
from wrong storage engines.
Solutions for the first problem was to accept some handler
calls to the partitioning handler even if it was not setup
with any partitions, and also if possible fallback
to use the base handler's default functions.
Solution for the second problem was to remove the optimization
to reuse the definition from the cache, instead always check
the frm-file when holding the LOCK_open mutex
(updated with a fix for a debug print crash and better
comments as required by reviewer, and removed optimization
to avoid reading the frm-file).
REORGANIZE PARTITION
There were several problems which lead to this this,
all related to bad error handling.
1) There was several bugs preventing the ddl-log to be used for
cleaning up created files on error.
2) The error handling after the copy partition rows did not close
and unlock the tables, resulting in deletion of partitions
which were in use, which lead InnoDB to put the partition to
drop in a background queue.
Problem was that ha_partition::records_in_range called
records_in_range for all non pruned partitions, even if
an estimate should be given.
Solution is to only use 1/3 of the partitions (up to 10) for
records_in_range and estimate the total from this subset.
(And continue until a non zero return value from the called
partitions records_in_range is given, since 0 means no rows
will match.)