get_table_share, drop_open_table
In the partition handler code, LOCK_open and share->LOCK_ha_data
are acquired in the wrong order in certain cases. When doing a
multi-row INSERT (i.e a INSERT..SELECT) in a table with auto-
increment column(s). the increments must be in a monotonically
continuous increasing sequence (i.e it can't have "holes"). To
achieve this, a lock is held for the duration of the operation.
share->LOCK_ha_data was used for this purpose.
Whenever there was a need to open a view _during_ the operation
(views are not currently pre-opened the way tables are), and
LOCK_open was grabbed, a deadlock could occur. share->LOCK_ha_data
is other places used _while_ holding LOCK_open.
A new mutex was introduced in the HA_DATA_PARTITION structure,
for exclusive use of the autoincrement data fields, so we don't
need to overload the use of LOCK_ha_data here.
A module test case has not been supplied, since the problem occurs
as a result of a race condition, and testing for this condition
is thus not deterministic. Testing for it could be done by
setting up a test case as described in the bug report.
"insert into.. select * from"
When inserting into a partitioned table using 'insert into
<target> select * from <src>', read_buffer_size bytes of memory
are allocated for each partition in the target table.
This resulted in large memory consumption when the number of
partitions are high.
This patch introduces a new method which tries to estimate the
buffer size required for each partition and limits the maximum
buffer size used to maximum of 10 * read_buffer_size,
11 * read_buffer_size in case of monotonic partition functions.
(Backport)
Problem is that when insert (ha_start_bulk_insert) in i partitioned table,
it will call ha_start_bulk_insert for every partition, used or not.
Solution is to delay the call to the partitions ha_start_bulk_insert until
the first row is to be inserted into that partition
Inserting a negative value in the autoincrement column of a
partitioned innodb table was causing the value of the auto
increment counter to wrap around into a very large positive
value. The consequences are the same as if a very large positive
value was inserted into a column, e.g. reduced autoincrement
range, failure to read autoincrement counter.
The current patch ensures that before calculating the next
auto increment value, the current value is within the positive
maximum allowed limit.
on tables with partitions
Problem was that the handler function try_semi_consistent_read
was not propagated to the innodb handler.
Solution was to implement that function in the partitioning
handler.
Problem was that partitioning cached the table flags.
These flags could change due to TRANSACTION LEVEL changes.
Solution was to remove the cache and always return the table flags
from the first partition (if the handler was initialized).
breaks auto increment
The auto_increment value was not initialized if
the first statement after opening a table was
an 'UPDATE'.
solution was to check initialize if it was not,
before trying to increase it in update.
on non-partitioned table
Problem was that partitioning specific commands was accepted
for non partitioned tables and treated like
ANALYZE/CHECK/OPTIMIZE/REPAIR TABLE, after bug-20129 was fixed,
which changed the code path from mysql_alter_table to
mysql_admin_table.
Solution was to check if the table was partitioned before
trying to execute the admin command
index column
There was actually two problems
1) when clustered pk, order by non pk index should also
compare with pk as last resort to differ keys from each
other
2) bug in the index search handling in ha_partition (was
found when extending the test case
Solution to 1 was to include the pk in key compare if
clustered pk and search on other index.
Solution for 2 was to remove the optimization from
ordered scan to unordered scan if clustered pk.
InnoDB Plugin locks table
The fast/on-line add/drop index handler calls was not implemented
whithin the partitioning.
This implements it in the partitioning handler.
Since this is only used by the not included InnoDB plugin, there
is no test case. (Have tested it manually with the plugin, and
it does not allow unique indexes not including partitioning
function, or removal of pk, which in innodb generates a new pk,
which is not in the partitioning function.)
NOTE: This introduces a new handler method, and because of that
changes the storage engine api. (One cannot use a handlerton to
see the capabilities of a table's handler if it is partitioned.
So I added a wrapper function in the handler that defaults to
the handlerton function, which the partitioning handler overrides.
and
Bug#33555: Group By Query does not correctly aggregate partitions
Backport of bug-33257 which is the same bug.
read_range_*() calls was not passed to the partition handlers,
but was translated to index_read/next family calls.
Resulting in duplicates rows and wrong aggregations.
Problem was a mutex added in bug n 27405 for solving a problem
with auto_increment in partitioned innodb tables.
(in ha_partition::write_row over partitions file->ha_write_row)
Solution is to use the patch for bug#33479, which refines the
usage of mutexes for auto_increment.
Backport of bug-33479 from 6.0:
Bug-33479: auto_increment failures in partitioning
Several problems with auto_increment in partitioning
(with MyISAM, InnoDB. Locking issues, not handling
multi-row INSERTs properly etc.)
Changed the auto_increment handling for partitioning:
Added a ha_data variable in table_share for storage engine specific data
such as auto_increment value handling in partitioning, also see WL 4305
and using the ha_data->mutex to lock around read + update.
The idea is this:
Store the table's reserved auto_increment value in
the TABLE_SHARE and use a mutex to, lock it for reading and updating it
and unlocking it, in one block. Only accessing all partitions
when it is not initialized.
Also allow reservations of ranges, and if no one has done a reservation
afterwards, lower the reservation to what was actually used after
the statement is done (via release_auto_increment from WL 3146).
The lock is kept from the first reservation if it is statement based
replication and a multi-row INSERT statement where the number of
candidate rows to insert is not known in advance (like INSERT SELECT,
LOAD DATA, unlike INSERT VALUES (row1), (row2),,(rowN)).
This should also lead to better concurrancy (no need to have a mutex
protection around write_row in all cases)
and work with any local storage engine.
partition is corrupt
The main problem was that ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR
PARTITION took another code path (over mysql_alter_table instead of
mysql_admin_table) which differs in two ways:
1) alter table opens the tables in a different way than admin tables do
resulting in returning with error before it tried the command
2) alter table does not start to send any diagnostic rows to the client
which the lower admin functions continue to use -> resulting in
assertion crash
The fix:
Remapped ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR PARTITION to use
the same code path as ANALYZE/CHECK/OPTIMIZE/REPAIR TABLE t.
Adding check in mysql_admin_table to setup the partition list for
which partitions that should be used.
Partitioned tables will still not work with
REPAIR TABLE/PARTITION USE_FRM, since that requires moving partitions
to tables, REPAIR TABLE t USE_FRM, and check that the data still
fulfills the partitioning function and then move the table back to
being a partition.
NOTE: I have removed the following functions from the handler
interface:
analyze_partitions, check_partitions, optimize_partitions,
repair_partitions
Since they are not longer needed.
THIS ALTERS THE STORAGE ENGINE API
problem was that ha_partition::records was not implemented, thus
using the default handler::records, which is not correct if the engine
does not support HA_STATS_RECORDS_IS_EXACT.
Solution was to implement ha_partition::records as a wrapper around
the underlying partitions records.
The rows column in explain partitions will now include the total
number of records in the partitioned table.
(recommit after removing out-commented code)
Problem was that auto_repair, is_crashed and check_and_repair was not
implemented in ha_partition.
Solution, implemented them as loop over all partitions for is_crashed and
check_and_repair, and using the first partition for auto_repair.
(Recommit after fixing review comments)
- Reserver namespace and place in frm for TABLE_CHECKSUM and PAGE_CHECKSUM create options
- Added syncing of directory when creating .frm files
- Portability fixes
- Added missing cast that could cause bugs
- Code cleanups
- Made some bit functions inline
- Moved things out of myisam.h to my_handler.h to make them more accessable
- Renamed some myisam variables and defines to make them more globaly usable (as they are used outside of MyISAM)
- Fixed bugs in error conditions
- Use compiler time asserts instead of run time
- Fixed indentation
HA_EXTRA_PREPARE_FOR_DELETE -> HA_EXTRA_PREPARE_FOR_DROP as the old name was wrong
(Added a define for old value to ensure we don't break any old code)
Added HA_EXTRA_PREPARE_FOR_RENAME as a signal for rename (before we used a DROP signal which is wrong)
- Initialize error messages early to get better errors when mysqld or an engine fails to start
- Fix windows bug that query_performance_frequency was not initialized if registry code failed
- thread_stack -> my_thread_stack_size
In the ha_partition::position() we don't calculate the number
of the partition of the record, but use m_last_part value instead,
relying on that it's previously set by some other call like ::write_row().
Delete_rows_log_event::do_exec_row() calls find_and_fetch_row(),
where we used position() + rnd_pos() call for the InnoDB-based PARTITION-ed
table as there HA_PRIMARY_KEY_REQUIRED_FOR_POSITION enabled.
fixed by introducing new handler::rnd_pos_by_record() method to be
used for random record-based positioning
In the ha_partition::position() we didn't calculate the number
of the partition of the record. We used m_last_part value instead,
relying on that it is set in other place like previous call of a method
like ::write_row(). In replication we don't call any of these befor
position(). Delete_rows_log_event::do_exec_row calls find_and_fetch_row.
In case of InnoDB-based PARTITION table, we have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION
enabled, so use position() / rnd_pos() calls to fetch the record.
Fixed by adding partition_id calculation to the ha_partition::position()
Faster thr_alarm()
Added 'Opened_files' status variable to track calls to my_open()
Don't give warnings when running mysql_install_db
Added option --source-install to mysql_install_db
I had to do the following renames() as used polymorphism didn't work with Forte compiler on 64 bit systems
index_read() -> index_read_map()
index_read_idx() -> index_read_idx_map()
index_read_last() -> index_read_last_map()
causing update of a different column
For efficiency some storage engines do not read a complete record
for update, but only the columns required for selecting the rows.
When updating a row of a partitioned table, modifying a column
that is part of the partition or subpartition expression, then
the row may need to move from one [sub]partition to another one.
This is done by inserting the new row into the target
[sub]partition and deleting the old row from the originating one.
For the insert we need a complete record.
If an above mentioned engine was used for a partitioned table, we
did not have a complete record in update_row(). The implicitly
executed write_row() got an incomplete record.
This is solved by instructing the engine to read a complete record
if one of the columns of the partition or subpartiton is to be
updated.
No testcase. This can be reproduced with Falcon only. The engines
contained in standard 5.1 do always return complete records on
update.
The following type conversions was done:
- Changed byte to uchar
- Changed gptr to uchar*
- Change my_string to char *
- Change my_size_t to size_t
- Change size_s to size_t
Removed declaration of byte, gptr, my_string, my_size_t and size_s.
Following function parameter changes was done:
- All string functions in mysys/strings was changed to use size_t
instead of uint for string lengths.
- All read()/write() functions changed to use size_t (including vio).
- All protocoll functions changed to use size_t instead of uint
- Functions that used a pointer to a string length was changed to use size_t*
- Changed malloc(), free() and related functions from using gptr to use void *
as this requires fewer casts in the code and is more in line with how the
standard functions work.
- Added extra length argument to dirname_part() to return the length of the
created string.
- Changed (at least) following functions to take uchar* as argument:
- db_dump()
- my_net_write()
- net_write_command()
- net_store_data()
- DBUG_DUMP()
- decimal2bin() & bin2decimal()
- Changed my_compress() and my_uncompress() to use size_t. Changed one
argument to my_uncompress() from a pointer to a value as we only return
one value (makes function easier to use).
- Changed type of 'pack_data' argument to packfrm() to avoid casts.
- Changed in readfrm() and writefrom(), ha_discover and handler::discover()
the type for argument 'frmdata' to uchar** to avoid casts.
- Changed most Field functions to use uchar* instead of char* (reduced a lot of
casts).
- Changed field->val_xxx(xxx, new_ptr) to take const pointers.
Other changes:
- Removed a lot of not needed casts
- Added a few new cast required by other changes
- Added some cast to my_multi_malloc() arguments for safety (as string lengths
needs to be uint, not size_t).
- Fixed all calls to hash-get-key functions to use size_t*. (Needed to be done
explicitely as this conflict was often hided by casting the function to
hash_get_key).
- Changed some buffers to memory regions to uchar* to avoid casts.
- Changed some string lengths from uint to size_t.
- Changed field->ptr to be uchar* instead of char*. This allowed us to
get rid of a lot of casts.
- Some changes from true -> TRUE, false -> FALSE, unsigned char -> uchar
- Include zlib.h in some files as we needed declaration of crc32()
- Changed MY_FILE_ERROR to be (size_t) -1.
- Changed many variables to hold the result of my_read() / my_write() to be
size_t. This was needed to properly detect errors (which are
returned as (size_t) -1).
- Removed some very old VMS code
- Changed packfrm()/unpackfrm() to not be depending on uint size
(portability fix)
- Removed windows specific code to restore cursor position as this
causes slowdown on windows and we should not mix read() and pread()
calls anyway as this is not thread safe. Updated function comment to
reflect this. Changed function that depended on original behavior of
my_pwrite() to itself restore the cursor position (one such case).
- Added some missing checking of return value of malloc().
- Changed definition of MOD_PAD_CHAR_TO_FULL_LENGTH to avoid 'long' overflow.
- Changed type of table_def::m_size from my_size_t to ulong to reflect that
m_size is the number of elements in the array, not a string/memory
length.
- Moved THD::max_row_length() to table.cc (as it's not depending on THD).
Inlined max_row_length_blob() into this function.
- More function comments
- Fixed some compiler warnings when compiled without partitions.
- Removed setting of LEX_STRING() arguments in declaration (portability fix).
- Some trivial indentation/variable name changes.
- Some trivial code simplifications:
- Replaced some calls to alloc_root + memcpy to use
strmake_root()/strdup_root().
- Changed some calls from memdup() to strmake() (Safety fix)
- Simpler loops in client-simple.c