(Backport)
Problem is that when insert (ha_start_bulk_insert) in i partitioned table,
it will call ha_start_bulk_insert for every partition, used or not.
Solution is to delay the call to the partitions ha_start_bulk_insert until
the first row is to be inserted into that partition
Inserting a negative value in the autoincrement column of a
partitioned innodb table was causing the value of the auto
increment counter to wrap around into a very large positive
value. The consequences are the same as if a very large positive
value was inserted into a column, e.g. reduced autoincrement
range, failure to read autoincrement counter.
The current patch ensures that before calculating the next
auto increment value, the current value is within the positive
maximum allowed limit.
INSERT ... SELECT ...
Problem was that when bulk insert is used on an empty
table/partition, it disables the indexes for better
performance, but in this specific case it also tries
to read from that partition using an index, which is
not possible since it has been disabled.
Solution was to allow index reads on disabled indexes
if there are no records.
Also reverted the patch for bug#38005, since that was a workaround
in the partitioning engine instead of a fix in myisam.
column on partitioned table
An assertion 'ASSERT_COULUMN_MARKED_FOR_READ' is failed if the query
is executed with index containing double column on partitioned table.
The problem is that assertion expects all the fields which are read,
to be in the read_set.
In this query only the field 'a' is in the readset as the tables in
the query are joined by the field 'a' and so the assertion fails
expecting other field 'b'.
Since the function cmp() is just comparison of two parameters passed,
the assertion is not required.
Fixed by removing the assertion in the double fields comparision
function and also fixed the index initialization to do ordered
index scan with RW LOCK which ensures all the fields from a key are in
the read_set.
Note: this bug is not reproducible with other datatypes because the
assertion doesn't exist in comparision function for other
datatypes.
We disallow the partitioning of a log table. You could however
partition a table first, and then point logging to it. This is
not only against the docs, it also crashes the server.
We catch this case now.
Problem was that a failing rename just left the partitions at the state
it was at the failure.
Solution was to try to revert the started rename if a failure occured.
NOTE: This undoes changes by BUG#42829 in sql_class.cc:binlog_query().
I will revert the change in a post-push fix (the binlog filter should
be checked in sql_base.cc:decide_logging_format()).
General overview:
The logic for switching to row format when binlog_format=MIXED had
numerous flaws. The underlying problem was the lack of a consistent
architecture.
General purpose of this changeset:
This changeset introduces an architecture for switching to row format
when binlog_format=MIXED. It enforces the architecture where it has
to. It leaves some bugs to be fixed later. It adds extensive tests to
verify that unsafe statements work as expected and that appropriate
errors are produced by problems with the selection of binlog format.
It was not practical to split this into smaller pieces of work.
Problem 1:
To determine the logging mode, the code has to take several parameters
into account (namely: (1) the value of binlog_format; (2) the
capabilities of the engines; (3) the type of the current statement:
normal, unsafe, or row injection). These parameters may conflict in
several ways, namely:
- binlog_format=STATEMENT for a row injection
- binlog_format=STATEMENT for an unsafe statement
- binlog_format=STATEMENT for an engine only supporting row logging
- binlog_format=ROW for an engine only supporting statement logging
- statement is unsafe and engine does not support row logging
- row injection in a table that does not support statement logging
- statement modifies one table that does not support row logging and
one that does not support statement logging
Several of these conflicts were not detected, or were detected with
an inappropriate error message. The problem of BUG#39934 was that no
appropriate error message was written for the case when an engine
only supporting row logging executed a row injection with
binlog_format=ROW. However, all above cases must be handled.
Fix 1:
Introduce new error codes (sql/share/errmsg.txt). Ensure that all
conditions are detected and handled in decide_logging_format()
Problem 2:
The binlog format shall be determined once per statement, in
decide_logging_format(). It shall not be changed before or after that.
Before decide_logging_format() is called, all information necessary to
determine the logging format must be available. This principle ensures
that all unsafe statements are handled in a consistent way.
However, this principle is not followed:
thd->set_current_stmt_binlog_row_based_if_mixed() is called in several
places, including from code executing UPDATE..LIMIT,
INSERT..SELECT..LIMIT, DELETE..LIMIT, INSERT DELAYED, and
SET @@binlog_format. After Problem 1 was fixed, that caused
inconsistencies where these unsafe statements would not print the
appropriate warnings or errors for some of the conflicts.
Fix 2:
Remove calls to THD::set_current_stmt_binlog_row_based_if_mixed() from
code executed after decide_logging_format(). Compensate by calling the
set_current_stmt_unsafe() at parse time. This way, all unsafe statements
are detected by decide_logging_format().
Problem 3:
INSERT DELAYED is not unsafe: it is logged in statement format even if
binlog_format=MIXED, and no warning is printed even if
binlog_format=STATEMENT. This is BUG#45825.
Fix 3:
Made INSERT DELAYED set itself to unsafe at parse time. This allows
decide_logging_format() to detect that a warning should be printed or
the binlog_format changed.
Problem 4:
LIMIT clause were not marked as unsafe when executed inside stored
functions/triggers/views/prepared statements. This is
BUG#45785.
Fix 4:
Make statements containing the LIMIT clause marked as unsafe at
parse time, instead of at execution time. This allows propagating
unsafe-ness to the view.
the auto_increment value
This is an alternative patch that instead of allowing RECREATE TABLE
on TRUNCATE TABLE it implements reset_auto_increment that is called
after delete_all_rows.
Note: this bug was fixed by Mattias Jonsson:
Pusing this patch: http://lists.mysql.com/commits/70370
Problem is that DATA_FREE in SHOW TABLE STATUS
is not correct when not using innodb_file_per_table.
The solution is to use I_S.PARTITIONS instead.
This is only a small fix for correcting mean record length and
always return 0 if the table is empty.
'lock wait timeout exceeded'
Problem was a bug in the implementation of scan in partitioning
which masked the error code from the partition's handler.
Fixed by returning the value from the underlying handler.
mysql-test/t/partition.test
sql/ha_partition.cc
Bug#40954: Crash in MyISAM index code with concurrency test using partitioned tables
Problem was usage of read_range_first with an empty key.
Solution was to not to give a key if it was empty. (real author Mattias Jonsson)
storage/archive/archive_reader.c
client/mysqlslap.c
Aligned the copyright texts output from "--version" of tools, to
let internal tools be able to change them if needed.
storage/ndb/test/tools/connect.cpp
storage/ndb/test/run-test/atrt.hpp
Corrected a few GPL headers not restricted to GPL version 2
Makefile.am
Added missing --report-features to the 'test-bt-fast' target
support-files/mysql.spec.sh
Reversed the removal of the "%define license GPL" in as internal
tools depended on it
on tables with partitions
Problem was that the handler function try_semi_consistent_read
was not propagated to the innodb handler.
Solution was to implement that function in the partitioning
handler.
order by
Problem was that the first index read was unordered,
and the next was ordered, resulting in use of
uninitialized data.
Solution was to use the correct variable to see if
the 'next' call should be ordered or not.
Problem was that partitioning cached the table flags.
These flags could change due to TRANSACTION LEVEL changes.
Solution was to remove the cache and always return the table flags
from the first partition (if the handler was initialized).
breaks auto increment
The auto_increment value was not initialized if
the first statement after opening a table was
an 'UPDATE'.
solution was to check initialize if it was not,
before trying to increase it in update.
on non-partitioned table
Problem was that partitioning specific commands was accepted
for non partitioned tables and treated like
ANALYZE/CHECK/OPTIMIZE/REPAIR TABLE, after bug-20129 was fixed,
which changed the code path from mysql_alter_table to
mysql_admin_table.
Solution was to check if the table was partitioned before
trying to execute the admin command
index column
There was actually two problems
1) when clustered pk, order by non pk index should also
compare with pk as last resort to differ keys from each
other
2) bug in the index search handling in ha_partition (was
found when extending the test case
Solution to 1 was to include the pk in key compare if
clustered pk and search on other index.
Solution for 2 was to remove the optimization from
ordered scan to unordered scan if clustered pk.
MyISAM blocks index usage for bulk insert into zero-records tables.
See ha_myisam::start_bulk_insert() lines from
...
if (file->state->records == 0 ...
...
That causes problems for partition engine when some partitions have records some not
as the engine uses same access method for all partitions.
Now partition engine doesn't call index_first/index_last
for empty tables.
per-file comments:
mysql-test/r/partition.result
Bug#38005 Partitions: error with insert select.
test result
mysql-test/t/partition.test
Bug#38005 Partitions: error with insert select.
test case
sql/ha_partition.cc
Bug#38005 Partitions: error with insert select.
ha_engine::index_first and
ha_engine::index_last not called for empty tables.
InnoDB Plugin locks table
The fast/on-line add/drop index handler calls was not implemented
whithin the partitioning.
This implements it in the partitioning handler.
Since this is only used by the not included InnoDB plugin, there
is no test case. (Have tested it manually with the plugin, and
it does not allow unique indexes not including partitioning
function, or removal of pk, which in innodb generates a new pk,
which is not in the partitioning function.)
NOTE: This introduces a new handler method, and because of that
changes the storage engine api. (One cannot use a handlerton to
see the capabilities of a table's handler if it is partitioned.
So I added a wrapper function in the handler that defaults to
the handlerton function, which the partitioning handler overrides.
and
Bug#33555: Group By Query does not correctly aggregate partitions
Backport of bug-33257 which is the same bug.
read_range_*() calls was not passed to the partition handlers,
but was translated to index_read/next family calls.
Resulting in duplicates rows and wrong aggregations.
Problem was a mutex added in bug n 27405 for solving a problem
with auto_increment in partitioned innodb tables.
(in ha_partition::write_row over partitions file->ha_write_row)
Solution is to use the patch for bug#33479, which refines the
usage of mutexes for auto_increment.
Backport of bug-33479 from 6.0:
Bug-33479: auto_increment failures in partitioning
Several problems with auto_increment in partitioning
(with MyISAM, InnoDB. Locking issues, not handling
multi-row INSERTs properly etc.)
Changed the auto_increment handling for partitioning:
Added a ha_data variable in table_share for storage engine specific data
such as auto_increment value handling in partitioning, also see WL 4305
and using the ha_data->mutex to lock around read + update.
The idea is this:
Store the table's reserved auto_increment value in
the TABLE_SHARE and use a mutex to, lock it for reading and updating it
and unlocking it, in one block. Only accessing all partitions
when it is not initialized.
Also allow reservations of ranges, and if no one has done a reservation
afterwards, lower the reservation to what was actually used after
the statement is done (via release_auto_increment from WL 3146).
The lock is kept from the first reservation if it is statement based
replication and a multi-row INSERT statement where the number of
candidate rows to insert is not known in advance (like INSERT SELECT,
LOAD DATA, unlike INSERT VALUES (row1), (row2),,(rowN)).
This should also lead to better concurrancy (no need to have a mutex
protection around write_row in all cases)
and work with any local storage engine.
update accross partitions.
It's not Innodb-specific bug.
ha_partition::update_row() didn't set
table->timestamp_field_type= TIMESTAMP_NO_AUTO_SET when
orig_timestamp_type == TIMESTAMP_AUTO_SET_ON_INSERT.
So that a partition sets the timestamp field when a record
is moved to a different partition.
Fixed by doing '= TIMESTAMP_NO_AUTO_SET' unconditionally.
Also ha_partition::write_row() is fixed in same way as now
Field_timestamp::set() is called twice in SET_ON_INSERT case.
(Chad queues this patch on demand by Trudy/Davi.)
partition is corrupt
The main problem was that ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR
PARTITION took another code path (over mysql_alter_table instead of
mysql_admin_table) which differs in two ways:
1) alter table opens the tables in a different way than admin tables do
resulting in returning with error before it tried the command
2) alter table does not start to send any diagnostic rows to the client
which the lower admin functions continue to use -> resulting in
assertion crash
The fix:
Remapped ALTER TABLE t ANALYZE/CHECK/OPTIMIZE/REPAIR PARTITION to use
the same code path as ANALYZE/CHECK/OPTIMIZE/REPAIR TABLE t.
Adding check in mysql_admin_table to setup the partition list for
which partitions that should be used.
Partitioned tables will still not work with
REPAIR TABLE/PARTITION USE_FRM, since that requires moving partitions
to tables, REPAIR TABLE t USE_FRM, and check that the data still
fulfills the partitioning function and then move the table back to
being a partition.
NOTE: I have removed the following functions from the handler
interface:
analyze_partitions, check_partitions, optimize_partitions,
repair_partitions
Since they are not longer needed.
THIS ALTERS THE STORAGE ENGINE API
Problem was that ha_partition had HA_FILE_BASED flag set
(since it uses a .par file), but after open it uses the first partitions
flags, which results in different case handling for create and for
open.
Solution was to change the underlying partition name so it was consistent.
(Only happens when lower_case_table_names = 2, i.e. Mac OS X and storage
engines without HA_FILE_BASED, like InnoDB and Memory.)
(Recommit after adding rename of check_lowercase_names to
get_canonical_filename, and moved it from handler.h to mysql_priv.h)
NOTE: if a mixed case name for a partitioned table was created when
lower_case_table_name = 2 it should be renamed or dropped before using
the updated version (See bug#37402 for more info)
problem was that ha_partition::records was not implemented, thus
using the default handler::records, which is not correct if the engine
does not support HA_STATS_RECORDS_IS_EXACT.
Solution was to implement ha_partition::records as a wrapper around
the underlying partitions records.
The rows column in explain partitions will now include the total
number of records in the partitioned table.
(recommit after removing out-commented code)
Problem was that auto_repair, is_crashed and check_and_repair was not
implemented in ha_partition.
Solution, implemented them as loop over all partitions for is_crashed and
check_and_repair, and using the first partition for auto_repair.
(Recommit after fixing review comments)
returns erroneous results
Used the wrong function when fixing 30480 which lead to
no stop on end_key resulting in duplicate results from index scan
Includes test cases for the duplicates 37327 and 37329,
Duplicate rows and bad performance/High Handler_read_next values
Recommit after merge issues
called from a SELECT doesn't cause ROLLBACK of state"
Make private all class handler methods (PSEA API) that may modify
data. Introduce and deploy public ha_* wrappers for these methods in
all sql/.
This necessary to keep track of all data modifications in sql/,
which is in turn necessary to be able to optimize two-phase
commit of those transactions that do not modify data.
that the entire server uses their public ha_* counterparts instead,
since only then we can ensure proper tracing of these calls that
is necessary for Bug#12713.
A pre-requisite for Bug#12713 "Error in a stored function called from
a SELECT doesn't cause ROLLBACK of statem"
cause ROLLBACK of statement", part 1. Review fixes.
Do not send OK/EOF packets to the client until we reached the end of
the current statement.
This is a consolidation, to keep the functionality that is shared by all
SQL statements in one place in the server.
Currently this functionality includes:
- close_thread_tables()
- log_slow_statement().
After this patch and the subsequent patch for Bug#12713, it shall also include:
- ha_autocommit_or_rollback()
- net_end_statement()
- query_cache_end_of_result().
In future it may also include:
- mysql_reset_thd_for_next_command().
ha_partition::update_create_info() just calls update_create_info
of a first partition, so only get the autoincrement maximum
of the first partition, so SHOW CREATE TABLE can show
small AUTO_INCREMENT parameters.
Fixed by implementing ha_partition::update_create_info() in a way
other handlers work.
HA_ARCHIVE:stats.auto_increment handling made consistent with other engines
(also fixes the bugs: Bug#29320, Bug#29493 and Bug#30536)
Problem: Partitioning did not handle unordered scans correctly
for engines with unordered read order.
Solution: do not stop scanning fi a recored is out of range, since
there can be more records within the range afterwards.
Note: this is the patch that fixes the bug, but since there are no
storage engines shipped with mysql 5.1 (falcon comes in 6.0) there
are no test cases (it is a separate patch that only goes into 6.0)
The bug was that for ordered index scans, ha_partition::index_init() did
not put index columns into table->read_set if the underlying storage
engine did not have HA_PARTIAL_COLUMN_READ flag.
This was causing assertion failure when handle_ordered_index_scan() tried
to sort the records according to index order.
Fixed by making ha_partition::index_init() put index columns into table->read_set
for all ordered scans.
Problem was for LINEAR HASH/KEY. Crashes because of wrong partition id
returned when creating the new altered partitions. (because of wrong
linear hash mask)
Solution: Update the linear hash mask before using it for the new
altered table.
The problem: ha_partition::read_range_first() could return a record that is
outside of the scanned range. If that record happened to be in the next
subsequent range, it would satisfy the WHERE and appear in the output twice.
(we would get it the second time when scanning the next subsequent range)
Fix:
Made ha_partition::read_range_first() check if the returned recod is within
the scanned range, like other read_range_first() implementations do.
Partition handler fails updating tables with partitioning
based on timestamp field, as it calculates the timestamp field
AFTER it calculates the number of partition of a record.
Fixed by adding timestamp_field->set_time() call and disabling
such consequent calls
Problem: the table's INDEX and DATA DIR was taken
directly from the table's first partition.
This allowed rename attack similar to
bug#32111 when ALTER TABLE REMOVE PARTITIONING
Solution: Silently ignore the INDEX/DATA DIR
for the table. (Like some other storage engines
do).
Partitioned tables do not support DATA/INDEX
DIR on the table level, only on its partitions.
table to partitioned
Problem:
Crashed because usage of an uninitialised mutex when auto_incrementing
a partitioned temporary table
Fix:
Only locking (using the mutex) if not temporary table.
- Reserver namespace and place in frm for TABLE_CHECKSUM and PAGE_CHECKSUM create options
- Added syncing of directory when creating .frm files
- Portability fixes
- Added missing cast that could cause bugs
- Code cleanups
- Made some bit functions inline
- Moved things out of myisam.h to my_handler.h to make them more accessable
- Renamed some myisam variables and defines to make them more globaly usable (as they are used outside of MyISAM)
- Fixed bugs in error conditions
- Use compiler time asserts instead of run time
- Fixed indentation
HA_EXTRA_PREPARE_FOR_DELETE -> HA_EXTRA_PREPARE_FOR_DROP as the old name was wrong
(Added a define for old value to ensure we don't break any old code)
Added HA_EXTRA_PREPARE_FOR_RENAME as a signal for rename (before we used a DROP signal which is wrong)
- Initialize error messages early to get better errors when mysqld or an engine fails to start
- Fix windows bug that query_performance_frequency was not initialized if registry code failed
- thread_stack -> my_thread_stack_size
Two cases in ha_partition::extra() was missing
(HA_EXTRA_DELETE_CANNOT_BATCH and HA_EXTRA_UPDATE_CANNOT_BATCH)
which only is currently used by NDB (which not uses ha_partition)
"Rows not deleted from innodb partitioned tables if --innodb_autoinc_lock_mode=0"
Due to a previous bugfix which initializes a previously uninitialized
variable, ha_partition::get_auto_increment() may fail to operate
correctly when the storage engine reports that it is only reserving
one value and one or more partitions have a different 'next-value'.
Currently, only affects Innodb's new-style auto-increment code which
reserves larger blocks of values and has less inter-thread contention.
In the ha_partition::position() we don't calculate the number
of the partition of the record, but use m_last_part value instead,
relying on that it's previously set by some other call like ::write_row().
Delete_rows_log_event::do_exec_row() calls find_and_fetch_row(),
where we used position() + rnd_pos() call for the InnoDB-based PARTITION-ed
table as there HA_PRIMARY_KEY_REQUIRED_FOR_POSITION enabled.
fixed by introducing new handler::rnd_pos_by_record() method to be
used for random record-based positioning
In the ha_partition::position() we didn't calculate the number
of the partition of the record. We used m_last_part value instead,
relying on that it is set in other place like previous call of a method
like ::write_row(). In replication we don't call any of these befor
position(). Delete_rows_log_event::do_exec_row calls find_and_fetch_row.
In case of InnoDB-based PARTITION table, we have HA_PRIMARY_KEY_REQUIRED_FOR_POSITION
enabled, so use position() / rnd_pos() calls to fetch the record.
Fixed by adding partition_id calculation to the ha_partition::position()
Faster thr_alarm()
Added 'Opened_files' status variable to track calls to my_open()
Don't give warnings when running mysql_install_db
Added option --source-install to mysql_install_db
I had to do the following renames() as used polymorphism didn't work with Forte compiler on 64 bit systems
index_read() -> index_read_map()
index_read_idx() -> index_read_idx_map()
index_read_last() -> index_read_last_map()
Stopping mysql server could result in an entry in mysql error
file: "InnoDB: Error: MySQL is freeing a thd".
This happened because InnoDB assumes that the server will never
call external_lock(F_UNLCK) in case external_lock(READ/WRITE)
failed.
Prior to this patch we haven't had strict definition whether
external_lock(F_UNLCK) must be called in case external_lock(READ/WRITE)
fails.
This patch states that we never call external_lock(F_UNLCK) in case
external_lock(READ/WRITE) fails.
causing update of a different column
For efficiency some storage engines do not read a complete record
for update, but only the columns required for selecting the rows.
When updating a row of a partitioned table, modifying a column
that is part of the partition or subpartition expression, then
the row may need to move from one [sub]partition to another one.
This is done by inserting the new row into the target
[sub]partition and deleting the old row from the originating one.
For the insert we need a complete record.
If an above mentioned engine was used for a partitioned table, we
did not have a complete record in update_row(). The implicitly
executed write_row() got an incomplete record.
This is solved by instructing the engine to read a complete record
if one of the columns of the partition or subpartiton is to be
updated.
No testcase. This can be reproduced with Falcon only. The engines
contained in standard 5.1 do always return complete records on
update.
In the ha_partition::position we don't calculate the number of
the partition of the record. We use m_last_part_value instead relying on
that it is set in other place like previous calls of ::write_row().
In replication we do neither of these calls before ::position().
Delete_row_log_event::do_exec_row calls find_and_fetch_row() where
we used position() & rnd_pos() calls to find the record for the
PARTITION/INNODB table as it posesses InnoDB table flags.
Fixed by removing HA_PRIMARY_KEY_REQUIRED_FOR_POSITION flag from PARTITION
Assertion failure may happen with falcon + partition + select for
update. This may affect other engines as well.
Though assertion failure is fixed with this patch, falcon still
deadlocks when running falcon_bug_28026.test.
Problem: getting an autoincrement value for a partition table in the ::info() method we call
the get_auto_increment() for all partitions. That may cause a problem for at least MyISAM
tables that rely on some table state (in this particular case table->naxt_nuber_field is
set to 0 in the mysql_insert() and we get a crash).
Moreover, calling get_auto_increment() is superfluous there.
Fix: use ::info(HA_STATUS_AUTO) calls to get autoincrement values for partitions instead of
get_auto_increment() ones in the ha_partition::info().
---
Added casts and fixed wrong type.
---
Added casts and fixed wrong type.
---
Merge jamppa@bk-internal.mysql.com:/home/bk/mysql-5.1-marvel
into a88-113-38-195.elisa-laajakaista.fi:/home/my/bk/mysql-5.1-marvel
---
Don't give warning that readonly variable is forced to be readonly
mysql-test-run run now fails if we have [Warning] and [ERROR] as tags in .err file
Fixed wrong reference to the mysql manual
Fixed wrong prototype that caused some tests to fail on 64 bit platforms
---
Disabled compiler warnings mainly for Win 64.
---
Added casts to remove compiler warnings on windows
Give warnings also for safe_mutex errors found by test system
Added some warnings from different machines in pushbuild
---
Merge bk-internal.mysql.com:/home/bk/mysql-5.1-marvel
into mysql.com:/home/my/mysql-5.1
---
Added escapes for double quotes and parenthesis.
---
Archive db fix plus added non-critical warnings
in ignore list.
---
Fixed previously added patch and added new ignored warning.
The following type conversions was done:
- Changed byte to uchar
- Changed gptr to uchar*
- Change my_string to char *
- Change my_size_t to size_t
- Change size_s to size_t
Removed declaration of byte, gptr, my_string, my_size_t and size_s.
Following function parameter changes was done:
- All string functions in mysys/strings was changed to use size_t
instead of uint for string lengths.
- All read()/write() functions changed to use size_t (including vio).
- All protocoll functions changed to use size_t instead of uint
- Functions that used a pointer to a string length was changed to use size_t*
- Changed malloc(), free() and related functions from using gptr to use void *
as this requires fewer casts in the code and is more in line with how the
standard functions work.
- Added extra length argument to dirname_part() to return the length of the
created string.
- Changed (at least) following functions to take uchar* as argument:
- db_dump()
- my_net_write()
- net_write_command()
- net_store_data()
- DBUG_DUMP()
- decimal2bin() & bin2decimal()
- Changed my_compress() and my_uncompress() to use size_t. Changed one
argument to my_uncompress() from a pointer to a value as we only return
one value (makes function easier to use).
- Changed type of 'pack_data' argument to packfrm() to avoid casts.
- Changed in readfrm() and writefrom(), ha_discover and handler::discover()
the type for argument 'frmdata' to uchar** to avoid casts.
- Changed most Field functions to use uchar* instead of char* (reduced a lot of
casts).
- Changed field->val_xxx(xxx, new_ptr) to take const pointers.
Other changes:
- Removed a lot of not needed casts
- Added a few new cast required by other changes
- Added some cast to my_multi_malloc() arguments for safety (as string lengths
needs to be uint, not size_t).
- Fixed all calls to hash-get-key functions to use size_t*. (Needed to be done
explicitely as this conflict was often hided by casting the function to
hash_get_key).
- Changed some buffers to memory regions to uchar* to avoid casts.
- Changed some string lengths from uint to size_t.
- Changed field->ptr to be uchar* instead of char*. This allowed us to
get rid of a lot of casts.
- Some changes from true -> TRUE, false -> FALSE, unsigned char -> uchar
- Include zlib.h in some files as we needed declaration of crc32()
- Changed MY_FILE_ERROR to be (size_t) -1.
- Changed many variables to hold the result of my_read() / my_write() to be
size_t. This was needed to properly detect errors (which are
returned as (size_t) -1).
- Removed some very old VMS code
- Changed packfrm()/unpackfrm() to not be depending on uint size
(portability fix)
- Removed windows specific code to restore cursor position as this
causes slowdown on windows and we should not mix read() and pread()
calls anyway as this is not thread safe. Updated function comment to
reflect this. Changed function that depended on original behavior of
my_pwrite() to itself restore the cursor position (one such case).
- Added some missing checking of return value of malloc().
- Changed definition of MOD_PAD_CHAR_TO_FULL_LENGTH to avoid 'long' overflow.
- Changed type of table_def::m_size from my_size_t to ulong to reflect that
m_size is the number of elements in the array, not a string/memory
length.
- Moved THD::max_row_length() to table.cc (as it's not depending on THD).
Inlined max_row_length_blob() into this function.
- More function comments
- Fixed some compiler warnings when compiled without partitions.
- Removed setting of LEX_STRING() arguments in declaration (portability fix).
- Some trivial indentation/variable name changes.
- Some trivial code simplifications:
- Replaced some calls to alloc_root + memcpy to use
strmake_root()/strdup_root().
- Changed some calls from memdup() to strmake() (Safety fix)
- Simpler loops in client-simple.c
InnoDB engine changes internal auto_increment counter only after
ha_innodb::write_row, so two threads can't simultaneously
operate between ha_innodb::update_autoincrement and
ha_innodb::write_row.
So concurrent execution of ha_partition::write_row prevented
mysql_load calls hander::start_bulk_insert(0) to prepare caches
for big inserts. As ha_partition didn't allow these caches
for underlaying tables, the inserts were much slower
"Server Variables for Plugins"
Implement support for plugins to declare server variables.
Demonstrate functionality by removing InnoDB specific code from sql/*
New feature for HASH - HASH_UNIQUE flag
New feature for DYNAMIC_ARRAY - initializer accepts preallocated ptr.
Completed support for plugin reference counting.
Before the fix:
ha_partition objects had ha_partition::m_part_info==NULL and that caused
crash
After:
- The new ha_partition::clone() function makes the clones use parent's
m_part_info value.
- The parent ha_partition object remains responsible for deallocation of
m_part_info.
index_read(), index_read_idx(), index_read_last(), and
records_in_range() - instead of 'uint keylen' argument take
'ulonglong keypart_map', a bitmap showing which keyparts are
present in the key value.
Fallback method is provided for handlers that are lagging behind.
Removed a lot of compiler warnings
Removed not used variables, functions and labels
Initialize some variables that could be used unitialized (fatal bugs)
%ll -> %l
Subselect's engine checks table->status field to determine if the
record was properly found when we use keyread upon the table.
Partition engine checks all the partitions for given key
before return. So if matching record was found in the first
partition and no matching records were found in the second,
we have table->status == NOT_FOUND after the function, what
makes subselects to skip matching records.
The patch adds table->status= 0 if we actually found something.
Added missing DBUG_RETURN statements (in mysqldump.c)
Added missing enums
Fixed a lot of wrong DBUG_PRINT() statements, some of which could cause crashes
Removed usage of %lld and %p in printf strings as these are not portable or produces different results on different systems.
In practice this means that handlerton is now created by the server and is passed to the engine. Plugin startups can now also control how plugins are inited (and can optionally pass values). Bit more flexibility to those who want to write plugin interfaces to the database.
setup 'share' struct for all partiton file elements. It's neccessary because we use
m_file[0]->update_create_info(create_info) during ha_partition::update_create_info
and 'share' for m_file[0] should be valid
set_up_table_before_create can fail due to erroneus path to
data directory or index directory
Added abort handling to ensure created partitions are dropped
if a failure occurs in the middle of the create process.
We now reset the THD members related to auto_increment+binlog in
MYSQL_LOG::write(). This is better than in THD::cleanup_after_query(),
which was not able to distinguish between SELECT myfunc1(),myfunc2()
and INSERT INTO t SELECT myfunc1(),myfunc2() from a binlogging point
of view.
Rows_log_event::exec_event() now calls lex_start() instead of
mysql_init_query() because the latter now does too much (it resets
the binlog format).
before update trigger on NDB table".
Two main changes:
- We use TABLE::read_set/write_set bitmaps for marking fields used by
statement instead of Field::query_id in 5.1.
- Now when we mark columns used by statement we take into account columns
used by table's triggers instead of marking all columns as used if table
has triggers.
Changes that requires code changes in other code of other storage engines.
(Note that all changes are very straightforward and one should find all issues
by compiling a --debug build and fixing all compiler errors and all
asserts in field.cc while running the test suite),
- New optional handler function introduced: reset()
This is called after every DML statement to make it easy for a handler to
statement specific cleanups.
(The only case it's not called is if force the file to be closed)
- handler::extra(HA_EXTRA_RESET) is removed. Code that was there before
should be moved to handler::reset()
- table->read_set contains a bitmap over all columns that are needed
in the query. read_row() and similar functions only needs to read these
columns
- table->write_set contains a bitmap over all columns that will be updated
in the query. write_row() and update_row() only needs to update these
columns.
The above bitmaps should now be up to date in all context
(including ALTER TABLE, filesort()).
The handler is informed of any changes to the bitmap after
fix_fields() by calling the virtual function
handler::column_bitmaps_signal(). If the handler does caching of
these bitmaps (instead of using table->read_set, table->write_set),
it should redo the caching in this code. as the signal() may be sent
several times, it's probably best to set a variable in the signal
and redo the caching on read_row() / write_row() if the variable was
set.
- Removed the read_set and write_set bitmap objects from the handler class
- Removed all column bit handling functions from the handler class.
(Now one instead uses the normal bitmap functions in my_bitmap.c instead
of handler dedicated bitmap functions)
- field->query_id is removed. One should instead instead check
table->read_set and table->write_set if a field is used in the query.
- handler::extra(HA_EXTRA_RETRIVE_ALL_COLS) and
handler::extra(HA_EXTRA_RETRIEVE_PRIMARY_KEY) are removed. One should now
instead use table->read_set to check for which columns to retrieve.
- If a handler needs to call Field->val() or Field->store() on columns
that are not used in the query, one should install a temporary
all-columns-used map while doing so. For this, we provide the following
functions:
my_bitmap_map *old_map= dbug_tmp_use_all_columns(table, table->read_set);
field->val();
dbug_tmp_restore_column_map(table->read_set, old_map);
and similar for the write map:
my_bitmap_map *old_map= dbug_tmp_use_all_columns(table, table->write_set);
field->val();
dbug_tmp_restore_column_map(table->write_set, old_map);
If this is not done, you will sooner or later hit a DBUG_ASSERT
in the field store() / val() functions.
(For not DBUG binaries, the dbug_tmp_restore_column_map() and
dbug_tmp_restore_column_map() are inline dummy functions and should
be optimized away be the compiler).
- If one needs to temporary set the column map for all binaries (and not
just to avoid the DBUG_ASSERT() in the Field::store() / Field::val()
methods) one should use the functions tmp_use_all_columns() and
tmp_restore_column_map() instead of the above dbug_ variants.
- All 'status' fields in the handler base class (like records,
data_file_length etc) are now stored in a 'stats' struct. This makes
it easier to know what status variables are provided by the base
handler. This requires some trivial variable names in the extra()
function.
- New virtual function handler::records(). This is called to optimize
COUNT(*) if (handler::table_flags() & HA_HAS_RECORDS()) is true.
(stats.records is not supposed to be an exact value. It's only has to
be 'reasonable enough' for the optimizer to be able to choose a good
optimization path).
- Non virtual handler::init() function added for caching of virtual
constants from engine.
- Removed has_transactions() virtual method. Now one should instead return
HA_NO_TRANSACTIONS in table_flags() if the table handler DOES NOT support
transactions.
- The 'xxxx_create_handler()' function now has a MEM_ROOT_root argument
that is to be used with 'new handler_name()' to allocate the handler
in the right area. The xxxx_create_handler() function is also
responsible for any initialization of the object before returning.
For example, one should change:
static handler *myisam_create_handler(TABLE_SHARE *table)
{
return new ha_myisam(table);
}
->
static handler *myisam_create_handler(TABLE_SHARE *table, MEM_ROOT *mem_root)
{
return new (mem_root) ha_myisam(table);
}
- New optional virtual function: use_hidden_primary_key().
This is called in case of an update/delete when
(table_flags() and HA_PRIMARY_KEY_REQUIRED_FOR_DELETE) is defined
but we don't have a primary key. This allows the handler to take precisions
in remembering any hidden primary key to able to update/delete any
found row. The default handler marks all columns to be read.
- handler::table_flags() now returns a ulonglong (to allow for more flags).
- New/changed table_flags()
- HA_HAS_RECORDS Set if ::records() is supported
- HA_NO_TRANSACTIONS Set if engine doesn't support transactions
- HA_PRIMARY_KEY_REQUIRED_FOR_DELETE
Set if we should mark all primary key columns for
read when reading rows as part of a DELETE
statement. If there is no primary key,
all columns are marked for read.
- HA_PARTIAL_COLUMN_READ Set if engine will not read all columns in some
cases (based on table->read_set)
- HA_PRIMARY_KEY_ALLOW_RANDOM_ACCESS
Renamed to HA_PRIMARY_KEY_REQUIRED_FOR_POSITION.
- HA_DUPP_POS Renamed to HA_DUPLICATE_POS
- HA_REQUIRES_KEY_COLUMNS_FOR_DELETE
Set this if we should mark ALL key columns for
read when when reading rows as part of a DELETE
statement. In case of an update we will mark
all keys for read for which key part changed
value.
- HA_STATS_RECORDS_IS_EXACT
Set this if stats.records is exact.
(This saves us some extra records() calls
when optimizing COUNT(*))
- Removed table_flags()
- HA_NOT_EXACT_COUNT Now one should instead use HA_HAS_RECORDS if
handler::records() gives an exact count() and
HA_STATS_RECORDS_IS_EXACT if stats.records is exact.
- HA_READ_RND_SAME Removed (no one supported this one)
- Removed not needed functions ha_retrieve_all_cols() and ha_retrieve_all_pk()
- Renamed handler::dupp_pos to handler::dup_pos
- Removed not used variable handler::sortkey
Upper level handler changes:
- ha_reset() now does some overall checks and calls ::reset()
- ha_table_flags() added. This is a cached version of table_flags(). The
cache is updated on engine creation time and updated on open.
MySQL level changes (not obvious from the above):
- DBUG_ASSERT() added to check that column usage matches what is set
in the column usage bit maps. (This found a LOT of bugs in current
column marking code).
- In 5.1 before, all used columns was marked in read_set and only updated
columns was marked in write_set. Now we only mark columns for which we
need a value in read_set.
- Column bitmaps are created in open_binary_frm() and open_table_from_share().
(Before this was in table.cc)
- handler::table_flags() calls are replaced with handler::ha_table_flags()
- For calling field->val() you must have the corresponding bit set in
table->read_set. For calling field->store() you must have the
corresponding bit set in table->write_set. (There are asserts in
all store()/val() functions to catch wrong usage)
- thd->set_query_id is renamed to thd->mark_used_columns and instead
of setting this to an integer value, this has now the values:
MARK_COLUMNS_NONE, MARK_COLUMNS_READ, MARK_COLUMNS_WRITE
Changed also all variables named 'set_query_id' to mark_used_columns.
- In filesort() we now inform the handler of exactly which columns are needed
doing the sort and choosing the rows.
- The TABLE_SHARE object has a 'all_set' column bitmap one can use
when one needs a column bitmap with all columns set.
(This is used for table->use_all_columns() and other places)
- The TABLE object has 3 column bitmaps:
- def_read_set Default bitmap for columns to be read
- def_write_set Default bitmap for columns to be written
- tmp_set Can be used as a temporary bitmap when needed.
The table object has also two pointer to bitmaps read_set and write_set
that the handler should use to find out which columns are used in which way.
- count() optimization now calls handler::records() instead of using
handler->stats.records (if (table_flags() & HA_HAS_RECORDS) is true).
- Added extra argument to Item::walk() to indicate if we should also
traverse sub queries.
- Added TABLE parameter to cp_buffer_from_ref()
- Don't close tables created with CREATE ... SELECT but keep them in
the table cache. (Faster usage of newly created tables).
New interfaces:
- table->clear_column_bitmaps() to initialize the bitmaps for tables
at start of new statements.
- table->column_bitmaps_set() to set up new column bitmaps and signal
the handler about this.
- table->column_bitmaps_set_no_signal() for some few cases where we need
to setup new column bitmaps but don't signal the handler (as the handler
has already been signaled about these before). Used for the momement
only in opt_range.cc when doing ROR scans.
- table->use_all_columns() to install a bitmap where all columns are marked
as use in the read and the write set.
- table->default_column_bitmaps() to install the normal read and write
column bitmaps, but not signaling the handler about this.
This is mainly used when creating TABLE instances.
- table->mark_columns_needed_for_delete(),
table->mark_columns_needed_for_delete() and
table->mark_columns_needed_for_insert() to allow us to put additional
columns in column usage maps if handler so requires.
(The handler indicates what it neads in handler->table_flags())
- table->prepare_for_position() to allow us to tell handler that it
needs to read primary key parts to be able to store them in
future table->position() calls.
(This replaces the table->file->ha_retrieve_all_pk function)
- table->mark_auto_increment_column() to tell handler are going to update
columns part of any auto_increment key.
- table->mark_columns_used_by_index() to mark all columns that is part of
an index. It will also send extra(HA_EXTRA_KEYREAD) to handler to allow
it to quickly know that it only needs to read colums that are part
of the key. (The handler can also use the column map for detecting this,
but simpler/faster handler can just monitor the extra() call).
- table->mark_columns_used_by_index_no_reset() to in addition to other columns,
also mark all columns that is used by the given key.
- table->restore_column_maps_after_mark_index() to restore to default
column maps after a call to table->mark_columns_used_by_index().
- New item function register_field_in_read_map(), for marking used columns
in table->read_map. Used by filesort() to mark all used columns
- Maintain in TABLE->merge_keys set of all keys that are used in query.
(Simplices some optimization loops)
- Maintain Field->part_of_key_not_clustered which is like Field->part_of_key
but the field in the clustered key is not assumed to be part of all index.
(used in opt_range.cc for faster loops)
- dbug_tmp_use_all_columns(), dbug_tmp_restore_column_map()
tmp_use_all_columns() and tmp_restore_column_map() functions to temporally
mark all columns as usable. The 'dbug_' version is primarily intended
inside a handler when it wants to just call Field:store() & Field::val()
functions, but don't need the column maps set for any other usage.
(ie:: bitmap_is_set() is never called)
- We can't use compare_records() to skip updates for handlers that returns
a partial column set and the read_set doesn't cover all columns in the
write set. The reason for this is that if we have a column marked only for
write we can't in the MySQL level know if the value changed or not.
The reason this worked before was that MySQL marked all to be written
columns as also to be read. The new 'optimal' bitmaps exposed this 'hidden
bug'.
- open_table_from_share() does not anymore setup temporary MEM_ROOT
object as a thread specific variable for the handler. Instead we
send the to-be-used MEMROOT to get_new_handler().
(Simpler, faster code)
Bugs fixed:
- Column marking was not done correctly in a lot of cases.
(ALTER TABLE, when using triggers, auto_increment fields etc)
(Could potentially result in wrong values inserted in table handlers
relying on that the old column maps or field->set_query_id was correct)
Especially when it comes to triggers, there may be cases where the
old code would cause lost/wrong values for NDB and/or InnoDB tables.
- Split thd->options flag OPTION_STATUS_NO_TRANS_UPDATE to two flags:
OPTION_STATUS_NO_TRANS_UPDATE and OPTION_KEEP_LOG.
This allowed me to remove some wrong warnings about:
"Some non-transactional changed tables couldn't be rolled back"
- Fixed handling of INSERT .. SELECT and CREATE ... SELECT that wrongly reset
(thd->options & OPTION_STATUS_NO_TRANS_UPDATE) which caused us to loose
some warnings about
"Some non-transactional changed tables couldn't be rolled back")
- Fixed use of uninitialized memory in ha_ndbcluster.cc::delete_table()
which could cause delete_table to report random failures.
- Fixed core dumps for some tests when running with --debug
- Added missing FN_LIBCHAR in mysql_rm_tmp_tables()
(This has probably caused us to not properly remove temporary files after
crash)
- slow_logs was not properly initialized, which could maybe cause
extra/lost entries in slow log.
- If we get an duplicate row on insert, change column map to read and
write all columns while retrying the operation. This is required by
the definition of REPLACE and also ensures that fields that are only
part of UPDATE are properly handled. This fixed a bug in NDB and
REPLACE where REPLACE wrongly copied some column values from the replaced
row.
- For table handler that doesn't support NULL in keys, we would give an error
when creating a primary key with NULL fields, even after the fields has been
automaticly converted to NOT NULL.
- Creating a primary key on a SPATIAL key, would fail if field was not
declared as NOT NULL.
Cleanups:
- Removed not used condition argument to setup_tables
- Removed not needed item function reset_query_id_processor().
- Field->add_index is removed. Now this is instead maintained in
(field->flags & FIELD_IN_ADD_INDEX)
- Field->fieldnr is removed (use field->field_index instead)
- New argument to filesort() to indicate that it should return a set of
row pointers (not used columns). This allowed me to remove some references
to sql_command in filesort and should also enable us to return column
results in some cases where we couldn't before.
- Changed column bitmap handling in opt_range.cc to be aligned with TABLE
bitmap, which allowed me to use bitmap functions instead of looping over
all fields to create some needed bitmaps. (Faster and smaller code)
- Broke up found too long lines
- Moved some variable declaration at start of function for better code
readability.
- Removed some not used arguments from functions.
(setup_fields(), mysql_prepare_insert_check_table())
- setup_fields() now takes an enum instead of an int for marking columns
usage.
- For internal temporary tables, use handler::write_row(),
handler::delete_row() and handler::update_row() instead of
handler::ha_xxxx() for faster execution.
- Changed some constants to enum's and define's.
- Using separate column read and write sets allows for easier checking
of timestamp field was set by statement.
- Remove calls to free_io_cache() as this is now done automaticly in ha_reset()
- Don't build table->normalized_path as this is now identical to table->path
(after bar's fixes to convert filenames)
- Fixed some missed DBUG_PRINT(.."%lx") to use "0x%lx" to make it easier to
do comparision with the 'convert-dbug-for-diff' tool.
Things left to do in 5.1:
- We wrongly log failed CREATE TABLE ... SELECT in some cases when using
row based logging (as shown by testcase binlog_row_mix_innodb_myisam.result)
Mats has promised to look into this.
- Test that my fix for CREATE TABLE ... SELECT is indeed correct.
(I added several test cases for this, but in this case it's better that
someone else also tests this throughly).
Lars has promosed to do this.
New prototype for get_auto_increment() (but new arguments not yet used), to be able
to reserve a finite interval of auto_increment values from cooperating engines.
A hint on how many values to reserve is found in handler::estimation_rows_to_insert,
filled by ha_start_bulk_insert(), new wrapper around start_bulk_insert().
NOTE: this patch changes nothing, for all engines. But it makes the API ready for those
engines which will want to do reservation.
More csets will come to complete WL#3146.
duplicate fields removed, st_mysql_storage_engine added to support
run-time handlerton initialization (no compiler warnings), handler API
is now tied to MySQL version, handlerton->plugin mapping added
(slot-based), dummy default_hton removed, plugin-type-specific
initialization generalized, built-in plugins are now initialized too,
--default-storage-engine no longer needs a list of storage engines
in handle_options().
mysql-test-run.pl bugfixes