on tables with partitions
Problem was that the handler function try_semi_consistent_read
was not propagated to the innodb handler.
Solution was to implement that function in the partitioning
handler.
Problem was that partitioning cached the table flags.
These flags could change due to TRANSACTION LEVEL changes.
Solution was to remove the cache and always return the table flags
from the first partition (if the handler was initialized).
breaks auto increment
The auto_increment value was not initialized if
the first statement after opening a table was
an 'UPDATE'.
solution was to check initialize if it was not,
before trying to increase it in update.
on non-partitioned table
Problem was that partitioning specific commands was accepted
for non partitioned tables and treated like
ANALYZE/CHECK/OPTIMIZE/REPAIR TABLE, after bug-20129 was fixed,
which changed the code path from mysql_alter_table to
mysql_admin_table.
Solution was to check if the table was partitioned before
trying to execute the admin command
index column
There was actually two problems
1) when clustered pk, order by non pk index should also
compare with pk as last resort to differ keys from each
other
2) bug in the index search handling in ha_partition (was
found when extending the test case
Solution to 1 was to include the pk in key compare if
clustered pk and search on other index.
Solution for 2 was to remove the optimization from
ordered scan to unordered scan if clustered pk.
InnoDB Plugin locks table
The fast/on-line add/drop index handler calls was not implemented
whithin the partitioning.
This implements it in the partitioning handler.
Since this is only used by the not included InnoDB plugin, there
is no test case. (Have tested it manually with the plugin, and
it does not allow unique indexes not including partitioning
function, or removal of pk, which in innodb generates a new pk,
which is not in the partitioning function.)
NOTE: This introduces a new handler method, and because of that
changes the storage engine api. (One cannot use a handlerton to
see the capabilities of a table's handler if it is partitioned.
So I added a wrapper function in the handler that defaults to
the handlerton function, which the partitioning handler overrides.
and
Bug#33555: Group By Query does not correctly aggregate partitions
Backport of bug-33257 which is the same bug.
read_range_*() calls was not passed to the partition handlers,
but was translated to index_read/next family calls.
Resulting in duplicates rows and wrong aggregations.
Problem was a mutex added in bug n 27405 for solving a problem
with auto_increment in partitioned innodb tables.
(in ha_partition::write_row over partitions file->ha_write_row)
Solution is to use the patch for bug#33479, which refines the
usage of mutexes for auto_increment.
Backport of bug-33479 from 6.0:
Bug-33479: auto_increment failures in partitioning
Several problems with auto_increment in partitioning
(with MyISAM, InnoDB. Locking issues, not handling
multi-row INSERTs properly etc.)
Changed the auto_increment handling for partitioning:
Added a ha_data variable in table_share for storage engine specific data
such as auto_increment value handling in partitioning, also see WL 4305
and using the ha_data->mutex to lock around read + update.
The idea is this:
Store the table's reserved auto_increment value in
the TABLE_SHARE and use a mutex to, lock it for reading and updating it
and unlocking it, in one block. Only accessing all partitions
when it is not initialized.
Also allow reservations of ranges, and if no one has done a reservation
afterwards, lower the reservation to what was actually used after
the statement is done (via release_auto_increment from WL 3146).
The lock is kept from the first reservation if it is statement based
replication and a multi-row INSERT statement where the number of
candidate rows to insert is not known in advance (like INSERT SELECT,
LOAD DATA, unlike INSERT VALUES (row1), (row2),,(rowN)).
This should also lead to better concurrancy (no need to have a mutex
protection around write_row in all cases)
and work with any local storage engine.
partition is corrupt
The main problem was that ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR
PARTITION took another code path (over mysql_alter_table instead of
mysql_admin_table) which differs in two ways:
1) alter table opens the tables in a different way than admin tables do
resulting in returning with error before it tried the command
2) alter table does not start to send any diagnostic rows to the client
which the lower admin functions continue to use -> resulting in
assertion crash
The fix:
Remapped ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR PARTITION to use
the same code path as ANALYZE/CHECK/OPTIMIZE/REPAIR TABLE t.
Adding check in mysql_admin_table to setup the partition list for
which partitions that should be used.
Partitioned tables will still not work with
REPAIR TABLE/PARTITION USE_FRM, since that requires moving partitions
to tables, REPAIR TABLE t USE_FRM, and check that the data still
fulfills the partitioning function and then move the table back to
being a partition.
NOTE: I have removed the following functions from the handler
interface:
analyze_partitions, check_partitions, optimize_partitions,
repair_partitions
Since they are not longer needed.
THIS ALTERS THE STORAGE ENGINE API
problem was that ha_partition::records was not implemented, thus
using the default handler::records, which is not correct if the engine
does not support HA_STATS_RECORDS_IS_EXACT.
Solution was to implement ha_partition::records as a wrapper around
the underlying partitions records.
The rows column in explain partitions will now include the total
number of records in the partitioned table.
(recommit after removing out-commented code)
Problem was that auto_repair, is_crashed and check_and_repair was not
implemented in ha_partition.
Solution, implemented them as loop over all partitions for is_crashed and
check_and_repair, and using the first partition for auto_repair.
(Recommit after fixing review comments)
- Reserver namespace and place in frm for TABLE_CHECKSUM and PAGE_CHECKSUM create options
- Added syncing of directory when creating .frm files
- Portability fixes
- Added missing cast that could cause bugs
- Code cleanups
- Made some bit functions inline
- Moved things out of myisam.h to my_handler.h to make them more accessable
- Renamed some myisam variables and defines to make them more globaly usable (as they are used outside of MyISAM)
- Fixed bugs in error conditions
- Use compiler time asserts instead of run time
- Fixed indentation
HA_EXTRA_PREPARE_FOR_DELETE -> HA_EXTRA_PREPARE_FOR_DROP as the old name was wrong
(Added a define for old value to ensure we don't break any old code)
Added HA_EXTRA_PREPARE_FOR_RENAME as a signal for rename (before we used a DROP signal which is wrong)
- Initialize error messages early to get better errors when mysqld or an engine fails to start
- Fix windows bug that query_performance_frequency was not initialized if registry code failed
- thread_stack -> my_thread_stack_size
In the ha_partition::position() we don't calculate the number
of the partition of the record, but use m_last_part value instead,
relying on that it's previously set by some other call like ::write_row().
Delete_rows_log_event::do_exec_row() calls find_and_fetch_row(),
where we used position() + rnd_pos() call for the InnoDB-based PARTITION-ed
table as there HA_PRIMARY_KEY_REQUIRED_FOR_POSITION enabled.
fixed by introducing new handler::rnd_pos_by_record() method to be
used for random record-based positioning
In the ha_partition::position() we didn't calculate the number
of the partition of the record. We used m_last_part value instead,
relying on that it is set in other place like previous call of a method
like ::write_row(). In replication we don't call any of these befor
position(). Delete_rows_log_event::do_exec_row calls find_and_fetch_row.
In case of InnoDB-based PARTITION table, we have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION
enabled, so use position() / rnd_pos() calls to fetch the record.
Fixed by adding partition_id calculation to the ha_partition::position()
Faster thr_alarm()
Added 'Opened_files' status variable to track calls to my_open()
Don't give warnings when running mysql_install_db
Added option --source-install to mysql_install_db
I had to do the following renames() as used polymorphism didn't work with Forte compiler on 64 bit systems
index_read() -> index_read_map()
index_read_idx() -> index_read_idx_map()
index_read_last() -> index_read_last_map()
causing update of a different column
For efficiency some storage engines do not read a complete record
for update, but only the columns required for selecting the rows.
When updating a row of a partitioned table, modifying a column
that is part of the partition or subpartition expression, then
the row may need to move from one [sub]partition to another one.
This is done by inserting the new row into the target
[sub]partition and deleting the old row from the originating one.
For the insert we need a complete record.
If an above mentioned engine was used for a partitioned table, we
did not have a complete record in update_row(). The implicitly
executed write_row() got an incomplete record.
This is solved by instructing the engine to read a complete record
if one of the columns of the partition or subpartiton is to be
updated.
No testcase. This can be reproduced with Falcon only. The engines
contained in standard 5.1 do always return complete records on
update.
The following type conversions was done:
- Changed byte to uchar
- Changed gptr to uchar*
- Change my_string to char *
- Change my_size_t to size_t
- Change size_s to size_t
Removed declaration of byte, gptr, my_string, my_size_t and size_s.
Following function parameter changes was done:
- All string functions in mysys/strings was changed to use size_t
instead of uint for string lengths.
- All read()/write() functions changed to use size_t (including vio).
- All protocoll functions changed to use size_t instead of uint
- Functions that used a pointer to a string length was changed to use size_t*
- Changed malloc(), free() and related functions from using gptr to use void *
as this requires fewer casts in the code and is more in line with how the
standard functions work.
- Added extra length argument to dirname_part() to return the length of the
created string.
- Changed (at least) following functions to take uchar* as argument:
- db_dump()
- my_net_write()
- net_write_command()
- net_store_data()
- DBUG_DUMP()
- decimal2bin() & bin2decimal()
- Changed my_compress() and my_uncompress() to use size_t. Changed one
argument to my_uncompress() from a pointer to a value as we only return
one value (makes function easier to use).
- Changed type of 'pack_data' argument to packfrm() to avoid casts.
- Changed in readfrm() and writefrom(), ha_discover and handler::discover()
the type for argument 'frmdata' to uchar** to avoid casts.
- Changed most Field functions to use uchar* instead of char* (reduced a lot of
casts).
- Changed field->val_xxx(xxx, new_ptr) to take const pointers.
Other changes:
- Removed a lot of not needed casts
- Added a few new cast required by other changes
- Added some cast to my_multi_malloc() arguments for safety (as string lengths
needs to be uint, not size_t).
- Fixed all calls to hash-get-key functions to use size_t*. (Needed to be done
explicitely as this conflict was often hided by casting the function to
hash_get_key).
- Changed some buffers to memory regions to uchar* to avoid casts.
- Changed some string lengths from uint to size_t.
- Changed field->ptr to be uchar* instead of char*. This allowed us to
get rid of a lot of casts.
- Some changes from true -> TRUE, false -> FALSE, unsigned char -> uchar
- Include zlib.h in some files as we needed declaration of crc32()
- Changed MY_FILE_ERROR to be (size_t) -1.
- Changed many variables to hold the result of my_read() / my_write() to be
size_t. This was needed to properly detect errors (which are
returned as (size_t) -1).
- Removed some very old VMS code
- Changed packfrm()/unpackfrm() to not be depending on uint size
(portability fix)
- Removed windows specific code to restore cursor position as this
causes slowdown on windows and we should not mix read() and pread()
calls anyway as this is not thread safe. Updated function comment to
reflect this. Changed function that depended on original behavior of
my_pwrite() to itself restore the cursor position (one such case).
- Added some missing checking of return value of malloc().
- Changed definition of MOD_PAD_CHAR_TO_FULL_LENGTH to avoid 'long' overflow.
- Changed type of table_def::m_size from my_size_t to ulong to reflect that
m_size is the number of elements in the array, not a string/memory
length.
- Moved THD::max_row_length() to table.cc (as it's not depending on THD).
Inlined max_row_length_blob() into this function.
- More function comments
- Fixed some compiler warnings when compiled without partitions.
- Removed setting of LEX_STRING() arguments in declaration (portability fix).
- Some trivial indentation/variable name changes.
- Some trivial code simplifications:
- Replaced some calls to alloc_root + memcpy to use
strmake_root()/strdup_root().
- Changed some calls from memdup() to strmake() (Safety fix)
- Simpler loops in client-simple.c
"Server Variables for Plugins"
Implement support for plugins to declare server variables.
Demonstrate functionality by removing InnoDB specific code from sql/*
New feature for HASH - HASH_UNIQUE flag
New feature for DYNAMIC_ARRAY - initializer accepts preallocated ptr.
Completed support for plugin reference counting.
Before the fix:
ha_partition objects had ha_partition::m_part_info==NULL and that caused
crash
After:
- The new ha_partition::clone() function makes the clones use parent's
m_part_info value.
- The parent ha_partition object remains responsible for deallocation of
m_part_info.
index_read(), index_read_idx(), index_read_last(), and
records_in_range() - instead of 'uint keylen' argument take
'ulonglong keypart_map', a bitmap showing which keyparts are
present in the key value.
Fallback method is provided for handlers that are lagging behind.
- Removed not used variables
- Changed some ulong parameters/variables to ulonglong (possible serious bug)
- Added casts to get rid of safe assignment from longlong to long (and similar)
- Added casts to function parameters
- Fixed signed/unsigned compares
- Added some constructores to structures
- Removed some not portable constructs
Better fix for bug Bug #21428 "skipped 9 bytes from file: socket (3)" on "mysqladmin shutdown"
(Added new parameter to net_clear() to define when we want the communication buffer to be emptied)
setup 'share' struct for all partiton file elements. It's neccessary because we use
m_file[0]->update_create_info(create_info) during ha_partition::update_create_info
and 'share' for m_file[0] should be valid