Exclude SELECT and INSERT SELECT from vers_set_hist_part(). We cannot
likewise exclude REPLACE SELECT because it may REPLACE into itself
(and REPLACE generates history).
INSERT also does not generate history, but we have history
modification setting which might be interfered.
make live checksum to be returned in handler::info(),
and slow table-scan checksum to be calculated in handler::checksum().
part of
MDEV-16249 CHECKSUM TABLE for a spider table is not parallel and saves all data in memory in the spider head by default
For partitioned table, ensure that the AUTO_INCREMENT values will
be assigned from the same sequence. This is based on the following
change in MySQL 5.6.44:
commit aaba359c13d9200747a609730dafafc3b63cd4d6
Author: Rahul Malik <rahul.m.malik@oracle.com>
Date: Mon Feb 4 13:31:41 2019 +0530
Bug#28573894 ALTER PARTITIONED TABLE ADD AUTO_INCREMENT DIFF RESULT DEPENDING ON ALGORITHM
Problem:
When a partition table is in-place altered to add an auto-increment column,
then its values are starting over for each partition.
Analysis:
In the case of in-place alter, InnoDB is creating a new sequence object
for each partition. It is default initialized. So auto-increment columns
start over for each partition.
Fix:
Assign old sequence of the partition to the sequence of next partition
so it won't start over.
RB#21148
Reviewed by Bin Su <bin.x.su@oracle.com>
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
When using buffered sort in `UPDATE`, keyread is used. In this case,
`TABLE::update_virtual_field` should be aborted, but it actually isn't,
because it is called not with a top-level handler, but with the one that
is actually going to access the disk. Here the problemm is issued with
partitioning, so the solution is to recursively mark for keyread all the
underlying partition handlers.
* ha_partition: update keyread state for child partitions
Closes#800
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.
main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
in thr_lock / has_old_lock upon FLUSH TABLES
Explicit partition access of partitioned MEMORY table under LOCK TABLES
may cause subsequent statements to crash the server, deadlock, trigger
valgrind warnings or ASAN errors. Freed memory was being used due to
incorrect cleanup.
At least MyISAM and InnoDB don't seem to be affected, since their
THR_LOCK structures don't survive FLUSH TABLES. MEMORY keeps table shared
data (including THR_LOCK) even if there're no open instances.
There's partition_info::lock_partitions bitmap, which holds bits of
partitions allowed to be accessed after pruning. This bitmap is
updated for each individual statement.
This bitmap was abused in ha_partition::store_lock() such that when we
need to unlock a table, locked by LOCK TABLES, only locks for partitions
that were accessed by previous statement were released.
Eventually FLUSH TABLES frees THR_LOCK_DATA objects, which are still
linked into THR_LOCK lists. When such THR_LOCK gets reused we end up with
freed memory access.
Fixed by using ha_partition::m_locked_partitions bitmap similarly to
ha_partition::external_lock().
When using buffered sort in `UPDATE`, keyread is used. In this case,
`TABLE::update_virtual_field` should be aborted, but it actually isn't,
because it is called not with a top-level handler, but with the one that
is actually going to access the disk. Here the problemm is issued with
partitioning, so the solution is to recursively mark for keyread all the
underlying partition handlers.
* ha_partition: update keyread state for child partitions
Closes#800
The problem occurs in 10.2 and earlier releases of MariaDB Server because the
Partition Engine was not pushing the engine conditions to the underlying
storage engine of each partition. This caused Spider to return the first 5
rows in the table with the data provided by the customer. 2 of the 5 rows
did not qualify the WHERE clause, so they were removed from the result set by
the server.
To fix the problem, I have back-ported support for engine condition pushdown
in the Partition Engine from MariaDB Server 10.3 to 10.2 and 10.1. In 10.3
and 10.4 I have merged the comments and the test case.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Merged:
Commit eb2ca3d on branch bb-10.2-MDEV-16912
The problem occurs in 10.2 and earlier releases of MariaDB Server because the
Partition Engine was not pushing the engine conditions to the underlying
storage engine of each partition. This caused Spider to return the first 5
rows in the table with the data provided by the customer. 2 of the 5 rows
did not qualify the WHERE clause, so they were removed from the result set by
the server.
To fix the problem, I have back-ported support for engine condition pushdown
in the Partition Engine from MariaDB Server 10.3.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit eb2ca3d on branch bb-10.2-MDEV-16912
The problem occurs in 10.2 and earlier releases of MariaDB Server because the
Partition Engine was not pushing the engine conditions to the underlying
storage engine of each partition. This caused Spider to return the first 5
rows in the table with the data provided by the customer. 2 of the 5 rows
did not qualify the WHERE clause, so they were removed from the result set by
the server.
To fix the problem, I have back-ported support for engine condition pushdown
in the Partition Engine from MariaDB Server 10.3.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
The problem occurred because the Spider node was incorrectly handling
timestamp values sent to and received from the data nodes.
The problem has been corrected as follows:
- Added logic to set and maintain the UTC time zone on the data nodes.
To prevent timestamp ambiguity, it is necessary for the data nodes to use
a time zone such as UTC which does not have daylight savings time.
- Removed the spider_sync_time_zone configuration variable, which did not
solve the problem and which interfered with the solution.
- Added logic to convert to the UTC time zone all timestamp values sent to
and received from the data nodes. This is done for both unique and
non-unique timestamp columns. It is done for WHERE clauses, applying to
SELECT, UPDATE and DELETE statements, and for UPDATE columns.
- Disabled Spider's use of direct update when any of the columns to update is
a timestamp column. This is necessary to prevent false duplicate key value
errors.
- Added a new test spider.timestamp to thoroughly test Spider's handling of
timestamp values.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit 97cc9d3 on branch bb-10.3-MDEV-16246
Observed and described
partitioned engine execution time difference
between master and slave was caused by excessive invocation
of base_engine::rnd_init which was done also for partitions
uninvolved into Rows-event operation.
The bug's slave slowdown therefore scales with the number of partitions.
Fixed with applying an upstream patch.
References:
----------
https://bugs.mysql.com/bug.php?id=73648
Bug#25687813 REPLICATION REGRESSION WITH RBR AND PARTITIONED TABLES
In a test case Update occurs between Search and Delete/Update. This corrupts rowid
which Search saves for Delete/Update. Patch prevents this by using of
HA_EXTRA_REMEMBER_POS and HA_EXTRA_RESTORE_POS in a partition code.
This situation possibly occurs only with system versioning table and partition.
MyISAM and Aria engines are affected.
fix by midenok
Closes#705
table.cc:
virtual columns must be computed for INSERT, if they're part
of the partitioning expression.
this change broke gcol.gcol_partition_innodb.
fix CHECK TABLE for partitioned tables and vcols.
sql_partition.cc:
mark prerequisite base columns in full_part_field_set
ha_partition.cc
initialize vcol_set accordingly
As thd->alloc() and new automatically calls my_error(ER_OUTOFMEORY)
there is no reason to call mem_alloc_error()
Other things:
- Fixed bug in mysql_unpack_partition() where lex.part_info was
changed even if it would be a null pointer
This is done to get more free flag bits for alter_info->flags
Renamed all ALTER PARTITION defines to start with ALTER_PARTITION_
Renamed ALTER_PARTITION to ALTER_PARTITION_INFO
Renamed ALTER_TABLE_REORG to ALTER_PARTITION_TABLE_REORG
Other things:
- Shifted some ALTER_xxx defines to get empty bits at end
Main reason was to make it easier to print the above structures in
a debugger. Additional benefits is that I was able to use same
defines for both structures, which simplifes some code.
Most of the code is just removing Alter_info:: and Alter_inplace_info::
from alter table flags.
Following renames was done:
HA_ALTER_FLAGS -> alter_table_operations
CHANGE_CREATE_OPTION -> ALTER_CHANGE_CREATE_OPTION
Alter_info::ADD_INDEX -> ALTER_ADD_INDEX
DROP_INDEX -> ALTER_DROP_INDEX
ADD_UNIQUE_INDEX -> ALTER_ADD_UNIQUE_INDEX
DROP_UNIQUE_INDEx -> ALTER_DROP_UNIQUE_INDEX
ADD_PK_INDEX -> ALTER_ADD_PK_INDEX
DROP_PK_INDEX -> ALTER_DROP_PK_INDEX
Alter_info:ALTER_ADD_COLUMN -> ALTER_PARSE_ADD_COLUMN
Alter_info:ALTER_DROP_COLUMN -> ALTER_PARSE_DROP_COLUMN
Alter_inplace_info::ADD_INDEX -> ALTER_ADD_NON_UNIQUE_NON_PRIM_INDEX
Alter_inplace_info::DROP_INDEX -> ALTER_DROP_NON_UNIQUE_NON_PRIM_INDEX
Other things:
- Added typedef alter_table_operatons for alter table flags
- DROP CHECK CONSTRAINT can now be done online
- Added checks for Aria tables in alter_table_online.test
- alter_table_flags now takes an ulonglong as argument.
- Don't support online operations if checksum option is used.
- sql_lex.cc doesn't add ALTER_ADD_INDEX if index is not created
Lots of changes:
* calculate the current history partition in ::external_lock(),
not in ::write_row() or ::update_row()
* remove dynamically collected per-partition row_end stats
* no full table scan in open_table_from_share to calculate these
stats, no manual MDL/thr_locks in open_table_from_share
* no shared stats in TABLE_SHARE = no mutexes or condition waits when
calculating current history partition
* always compare timestamps, don't convert them to MYSQL_TIME
(avoid DST ambiguity, and it's faster too)
* correct interval handling, 1 month = 1 month, not 30 * 24 * 3600 seconds
* save/restore first partition start time, and count intervals from there
* only allow to drop first partitions if INTERVAL
* when adding new history partitions, split the data in the last history
parition, if it was overflowed
* show partition boundaries in INFORMATION_SCHEMA.PARTITIONS
partition_info had a bunch of function pointers to avoid if()'s
when invoking part_type specific functionality (like get_part_id, etc).
But check_range_constants() and check_list_constants() were still
invoked conditionally, with if()'s.
Create partition_info::check_constants function pointer, get rid
of if()'s
Also remove alloc argument of check_range_constants(), added
in 26a3ff0a22. Broken system versioning will be fixed in
following commits.