get_table_share, drop_open_table
In the partition handler code, LOCK_open and share->LOCK_ha_data
are acquired in the wrong order in certain cases. When doing a
multi-row INSERT (i.e a INSERT..SELECT) in a table with auto-
increment column(s). the increments must be in a monotonically
continuous increasing sequence (i.e it can't have "holes"). To
achieve this, a lock is held for the duration of the operation.
share->LOCK_ha_data was used for this purpose.
Whenever there was a need to open a view _during_ the operation
(views are not currently pre-opened the way tables are), and
LOCK_open was grabbed, a deadlock could occur. share->LOCK_ha_data
is other places used _while_ holding LOCK_open.
A new mutex was introduced in the HA_DATA_PARTITION structure,
for exclusive use of the autoincrement data fields, so we don't
need to overload the use of LOCK_ha_data here.
A module test case has not been supplied, since the problem occurs
as a result of a race condition, and testing for this condition
is thus not deterministic. Testing for it could be done by
setting up a test case as described in the bug report.
backport for bug#44059 from mysql-pe to mysql-5.1-bugteam
Using the partition with most rows instead of first partition
to estimate the cardinality of indexes.
is reached
Problem was bad error handling, leaving some new temporary
partitions locked and initialized and some not yet initialized
and locked, leading to a crash when trying to unlock the not
yet initialized and locked partitions
Solution was to unlock the already locked partitions, and not
include any of the new temporary partitions in later unlocks
"insert into.. select * from"
When inserting into a partitioned table using 'insert into
<target> select * from <src>', read_buffer_size bytes of memory
are allocated for each partition in the target table.
This resulted in large memory consumption when the number of
partitions are high.
This patch introduces a new method which tries to estimate the
buffer size required for each partition and limits the maximum
buffer size used to maximum of 10 * read_buffer_size,
11 * read_buffer_size in case of monotonic partition functions.
(Backport)
Problem is that when insert (ha_start_bulk_insert) in i partitioned table,
it will call ha_start_bulk_insert for every partition, used or not.
Solution is to delay the call to the partitions ha_start_bulk_insert until
the first row is to be inserted into that partition
Inserting a negative value in the autoincrement column of a
partitioned innodb table was causing the value of the auto
increment counter to wrap around into a very large positive
value. The consequences are the same as if a very large positive
value was inserted into a column, e.g. reduced autoincrement
range, failure to read autoincrement counter.
The current patch ensures that before calculating the next
auto increment value, the current value is within the positive
maximum allowed limit.
INSERT ... SELECT ...
Problem was that when bulk insert is used on an empty
table/partition, it disables the indexes for better
performance, but in this specific case it also tries
to read from that partition using an index, which is
not possible since it has been disabled.
Solution was to allow index reads on disabled indexes
if there are no records.
Also reverted the patch for bug#38005, since that was a workaround
in the partitioning engine instead of a fix in myisam.
column on partitioned table
An assertion 'ASSERT_COULUMN_MARKED_FOR_READ' is failed if the query
is executed with index containing double column on partitioned table.
The problem is that assertion expects all the fields which are read,
to be in the read_set.
In this query only the field 'a' is in the readset as the tables in
the query are joined by the field 'a' and so the assertion fails
expecting other field 'b'.
Since the function cmp() is just comparison of two parameters passed,
the assertion is not required.
Fixed by removing the assertion in the double fields comparision
function and also fixed the index initialization to do ordered
index scan with RW LOCK which ensures all the fields from a key are in
the read_set.
Note: this bug is not reproducible with other datatypes because the
assertion doesn't exist in comparision function for other
datatypes.
We disallow the partitioning of a log table. You could however
partition a table first, and then point logging to it. This is
not only against the docs, it also crashes the server.
We catch this case now.
Problem was that a failing rename just left the partitions at the state
it was at the failure.
Solution was to try to revert the started rename if a failure occured.
NOTE: This undoes changes by BUG#42829 in sql_class.cc:binlog_query().
I will revert the change in a post-push fix (the binlog filter should
be checked in sql_base.cc:decide_logging_format()).
General overview:
The logic for switching to row format when binlog_format=MIXED had
numerous flaws. The underlying problem was the lack of a consistent
architecture.
General purpose of this changeset:
This changeset introduces an architecture for switching to row format
when binlog_format=MIXED. It enforces the architecture where it has
to. It leaves some bugs to be fixed later. It adds extensive tests to
verify that unsafe statements work as expected and that appropriate
errors are produced by problems with the selection of binlog format.
It was not practical to split this into smaller pieces of work.
Problem 1:
To determine the logging mode, the code has to take several parameters
into account (namely: (1) the value of binlog_format; (2) the
capabilities of the engines; (3) the type of the current statement:
normal, unsafe, or row injection). These parameters may conflict in
several ways, namely:
- binlog_format=STATEMENT for a row injection
- binlog_format=STATEMENT for an unsafe statement
- binlog_format=STATEMENT for an engine only supporting row logging
- binlog_format=ROW for an engine only supporting statement logging
- statement is unsafe and engine does not support row logging
- row injection in a table that does not support statement logging
- statement modifies one table that does not support row logging and
one that does not support statement logging
Several of these conflicts were not detected, or were detected with
an inappropriate error message. The problem of BUG#39934 was that no
appropriate error message was written for the case when an engine
only supporting row logging executed a row injection with
binlog_format=ROW. However, all above cases must be handled.
Fix 1:
Introduce new error codes (sql/share/errmsg.txt). Ensure that all
conditions are detected and handled in decide_logging_format()
Problem 2:
The binlog format shall be determined once per statement, in
decide_logging_format(). It shall not be changed before or after that.
Before decide_logging_format() is called, all information necessary to
determine the logging format must be available. This principle ensures
that all unsafe statements are handled in a consistent way.
However, this principle is not followed:
thd->set_current_stmt_binlog_row_based_if_mixed() is called in several
places, including from code executing UPDATE..LIMIT,
INSERT..SELECT..LIMIT, DELETE..LIMIT, INSERT DELAYED, and
SET @@binlog_format. After Problem 1 was fixed, that caused
inconsistencies where these unsafe statements would not print the
appropriate warnings or errors for some of the conflicts.
Fix 2:
Remove calls to THD::set_current_stmt_binlog_row_based_if_mixed() from
code executed after decide_logging_format(). Compensate by calling the
set_current_stmt_unsafe() at parse time. This way, all unsafe statements
are detected by decide_logging_format().
Problem 3:
INSERT DELAYED is not unsafe: it is logged in statement format even if
binlog_format=MIXED, and no warning is printed even if
binlog_format=STATEMENT. This is BUG#45825.
Fix 3:
Made INSERT DELAYED set itself to unsafe at parse time. This allows
decide_logging_format() to detect that a warning should be printed or
the binlog_format changed.
Problem 4:
LIMIT clause were not marked as unsafe when executed inside stored
functions/triggers/views/prepared statements. This is
BUG#45785.
Fix 4:
Make statements containing the LIMIT clause marked as unsafe at
parse time, instead of at execution time. This allows propagating
unsafe-ness to the view.
the auto_increment value
This is an alternative patch that instead of allowing RECREATE TABLE
on TRUNCATE TABLE it implements reset_auto_increment that is called
after delete_all_rows.
Note: this bug was fixed by Mattias Jonsson:
Pusing this patch: http://lists.mysql.com/commits/70370
Problem is that DATA_FREE in SHOW TABLE STATUS
is not correct when not using innodb_file_per_table.
The solution is to use I_S.PARTITIONS instead.
This is only a small fix for correcting mean record length and
always return 0 if the table is empty.
'lock wait timeout exceeded'
Problem was a bug in the implementation of scan in partitioning
which masked the error code from the partition's handler.
Fixed by returning the value from the underlying handler.
mysql-test/t/partition.test
sql/ha_partition.cc
Bug#40954: Crash in MyISAM index code with concurrency test using partitioned tables
Problem was usage of read_range_first with an empty key.
Solution was to not to give a key if it was empty. (real author Mattias Jonsson)
storage/archive/archive_reader.c
client/mysqlslap.c
Aligned the copyright texts output from "--version" of tools, to
let internal tools be able to change them if needed.
storage/ndb/test/tools/connect.cpp
storage/ndb/test/run-test/atrt.hpp
Corrected a few GPL headers not restricted to GPL version 2
Makefile.am
Added missing --report-features to the 'test-bt-fast' target
support-files/mysql.spec.sh
Reversed the removal of the "%define license GPL" in as internal
tools depended on it
on tables with partitions
Problem was that the handler function try_semi_consistent_read
was not propagated to the innodb handler.
Solution was to implement that function in the partitioning
handler.
order by
Problem was that the first index read was unordered,
and the next was ordered, resulting in use of
uninitialized data.
Solution was to use the correct variable to see if
the 'next' call should be ordered or not.