If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
When using buffered sort in `UPDATE`, keyread is used. In this case,
`TABLE::update_virtual_field` should be aborted, but it actually isn't,
because it is called not with a top-level handler, but with the one that
is actually going to access the disk. Here the problemm is issued with
partitioning, so the solution is to recursively mark for keyread all the
underlying partition handlers.
* ha_partition: update keyread state for child partitions
Closes#800
in thr_lock / has_old_lock upon FLUSH TABLES
Explicit partition access of partitioned MEMORY table under LOCK TABLES
may cause subsequent statements to crash the server, deadlock, trigger
valgrind warnings or ASAN errors. Freed memory was being used due to
incorrect cleanup.
At least MyISAM and InnoDB don't seem to be affected, since their
THR_LOCK structures don't survive FLUSH TABLES. MEMORY keeps table shared
data (including THR_LOCK) even if there're no open instances.
There's partition_info::lock_partitions bitmap, which holds bits of
partitions allowed to be accessed after pruning. This bitmap is
updated for each individual statement.
This bitmap was abused in ha_partition::store_lock() such that when we
need to unlock a table, locked by LOCK TABLES, only locks for partitions
that were accessed by previous statement were released.
Eventually FLUSH TABLES frees THR_LOCK_DATA objects, which are still
linked into THR_LOCK lists. When such THR_LOCK gets reused we end up with
freed memory access.
Fixed by using ha_partition::m_locked_partitions bitmap similarly to
ha_partition::external_lock().
The problem occurs in 10.2 and earlier releases of MariaDB Server because the
Partition Engine was not pushing the engine conditions to the underlying
storage engine of each partition. This caused Spider to return the first 5
rows in the table with the data provided by the customer. 2 of the 5 rows
did not qualify the WHERE clause, so they were removed from the result set by
the server.
To fix the problem, I have back-ported support for engine condition pushdown
in the Partition Engine from MariaDB Server 10.3.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit eb2ca3d on branch bb-10.2-MDEV-16912
The problem occurs in 10.2 and earlier releases of MariaDB Server because the
Partition Engine was not pushing the engine conditions to the underlying
storage engine of each partition. This caused Spider to return the first 5
rows in the table with the data provided by the customer. 2 of the 5 rows
did not qualify the WHERE clause, so they were removed from the result set by
the server.
To fix the problem, I have back-ported support for engine condition pushdown
in the Partition Engine from MariaDB Server 10.3.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Observed and described
partitioned engine execution time difference
between master and slave was caused by excessive invocation
of base_engine::rnd_init which was done also for partitions
uninvolved into Rows-event operation.
The bug's slave slowdown therefore scales with the number of partitions.
Fixed with applying an upstream patch.
References:
----------
https://bugs.mysql.com/bug.php?id=73648
Bug#25687813 REPLICATION REGRESSION WITH RBR AND PARTITIONED TABLES
table.cc:
virtual columns must be computed for INSERT, if they're part
of the partitioning expression.
this change broke gcol.gcol_partition_innodb.
fix CHECK TABLE for partitioned tables and vcols.
sql_partition.cc:
mark prerequisite base columns in full_part_field_set
ha_partition.cc
initialize vcol_set accordingly
If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend
a lot of time rolling back writes to the intermediate copy of the table.
To reduce the amount of busy work done, a work-around was introduced in
commit fd069e2bb3 in MySQL 4.1.8 and 5.0.2,
to commit the transaction after every 10,000 inserted rows.
A proper fix would have been to disable the undo logging altogether and
to simply drop the intermediate copy of the table on subsequent server
startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585.
In MariaDB 10.2, the intermediate copy of the table would be left behind
with a name starting with the string #sql.
This is a backport of a bug fix from MySQL 8.0.0 to MariaDB,
contributed by jixianliang <271365745@qq.com>.
Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation
InnoDB must for now keep the undo logging enabled, so that the latest
row can be rolled back in case of an error.
In Galera cluster, the LOAD DATA statement will retain the existing
behaviour and commit the transaction after every 10,000 rows if
the parameter wsrep_load_data_splitting=ON is set. The logic to do
so (the wsrep_load_data_split() function and the call
handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work
by Ji Xianliang and Marko Mäkelä.
The original fix:
Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com>
Date: Wed Dec 2 16:09:15 2015 +0530
Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY
Problem:
During ALTER TABLE, we commit and restart the transaction for every
10,000 rows, so that the rollback after recovery would not take so long.
Fix:
Suppress the undo logging during copy alter operation. If fts_index is
present then insert directly into fts auxiliary table rather
than doing at commit time.
ha_innobase::num_write_row: Remove the variable.
ha_innobase::write_row(): Remove the hack for committing every 10000 rows.
row_lock_table_for_mysql(): Remove the extra 2 parameters.
lock_get_src_table(), lock_is_table_exclusive(): Remove.
Reviewed-by: Marko Mäkelä <marko.makela@oracle.com>
Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com>
Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)
- Also fix printf-format warnings
Make the above mentioned warnings fatal.
- fix pthread_join on Windows to set return value.
Analysis
========
CREATE TABLE of InnoDB table with a partition name
which exceeds the path limit can cause the server
to exit.
During the preparation of the partition name,
there was no check to identify whether the complete
path name for partition exceeds the max supported
path length, causing the server to exit during
subsequent processing.
Fix
===
During the preparation of partition name, check and report
an error if the partition path name exceeds the maximum path
name limit.
This is a 5.5 patch.
ha_partition creates temporary ha_XXX objects for its partitions when
performing DDL operations. The objects were created on a MEM_ROOT and
never deleted.
This works as long as ha_XXX objects free all data ha_XXX::close() and
don't rely on a proper destructor invocation. Unfortunately, ha_rocksdb
includes String members which need to be delete'd properly.
Fixed the bug by having ha_partition::~ha_partition delete these temporary
objects.
Don't write to a temporary file, use String.
Remove strange one-liner "helpers", use String methods.
Don't use current_thd, don't allocate memory for 1-byte strings, etc.
Also, implement MDEV-11027 a little differently from 5.5 and 10.0:
recv_apply_hashed_log_recs(): Change the return type back to void
(DB_SUCCESS was always returned).
Report progress also via systemd using sd_notifyf().
* remove a confusing method name - Field::set_default_expression()
* remove handler::register_columns_for_write()
* rename stuff
* add asserts
* remove unlikely unlikely
* remove redundant if() conditions
* fix mark_unsupported_function() to report the most important violation
* don't scan vfield list for default values (vfields don't have defaults)
* move handling for DROP CONSTRAINT IF EXIST where it belongs
* don't protect engines from Alter_inplace_info::ALTER_ADD_CONSTRAINT
* comments
MDEV-10134 Add full support for DEFAULT
- Added support for using tables with MySQL 5.7 virtual fields,
including MySQL 5.7 syntax
- Better error messages also for old cases
- CREATE ... SELECT now also updates timestamp columns
- Blob can now have default values
- Added new system variable "check_constraint_checks", to turn of
CHECK constraint checking if needed.
- Removed some engine independent tests in suite vcol to only test myisam
- Moved some tests from 'include' to 't'. Should some day be done for all tests.
- FRM version increased to 11 if one uses virtual fields or constraints
- Changed to use a bitmap to check if a field has got a value, instead of
setting HAS_EXPLICIT_VALUE bit in field flags
- Expressions can now be up to 65K in total
- Ensure we are not refering to uninitialized fields when handling virtual fields or defaults
- Changed check_vcol_func_processor() to return a bitmap of used types
- Had to change some functions that calculated cached value in fix_fields to do
this in val() or getdate() instead.
- store_now_in_TIME() now takes a THD argument
- fill_record() now updates default values
- Add a lookahead for NOT NULL, to be able to handle DEFAULT 1+1 NOT NULL
- Automatically generate a name for constraints that doesn't have a name
- Added support for ALTER TABLE DROP CONSTRAINT
- Ensure that partition functions register virtual fields used. This fixes
some bugs when using virtual fields in a partitioning function
The following left in semi-improved state to keep patch size reasonable:
- Field operator new: left thd_alloc(current_thd)
- Sql_alloc operator new: left thd_alloc(thd_get_current_thd())
- Item_args constructors: left thd_alloc(thd)
- Item_func_interval::fix_length_and_dec(): no THD arg, have to call current_thd
- Item_func_dyncol_exists::val_int(): same
- Item_dyncol_get::val_str(): same
- Item_dyncol_get::val_int(): same
- Item_dyncol_get::val_real(): same
- Item_dyncol_get::val_decimal(): same
- Item_singlerow_subselect::fix_length_and_dec(): same
ha_partition::init_record_priority_queue()
Cherry-pick rev.6b0ee0c795499cee7f9deb649fb010801e0be4c2 from mysql-5.6.
Bug #18305270 BACKPORT BUG#18694052 FIX
FOR ASSERTION `!M_ORDERED_REC_BUFFER'
FAILED TO 5.6
PROBLEM
-------
Missed to remove record priority queue if
init_index failed for a partition which
was causing the crash.
FIX
---
Remove priority queue if init_index fails
for partition.