was restored.
Optionally rollback prepared XA's on "mariabackup --prepare".
The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for
slaves.
MySQL 5.7.9 (and MariaDB 10.2.2) introduced a race condition
between InnoDB transaction commit and the conversion of implicit
locks into explicit ones.
The assertion failure can be triggered with a test that runs
3 concurrent single-statement transactions in a loop on a simple
table:
CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
thread1: INSERT INTO t SET a=1;
thread2: DELETE FROM t;
thread3: SELECT * FROM t FOR UPDATE; -- or DELETE FROM t;
The failure scenarios are like the following:
(1) The INSERT statement is being committed, waiting for lock_sys->mutex.
(2) At the time of the failure, both the DELETE and SELECT transactions
are active but have not logged any changes yet.
(3) The transaction where the !other_lock assertion fails started
lock_rec_convert_impl_to_expl().
(4) After this point, the commit of the INSERT removed the transaction from
trx_sys->rw_trx_set, in trx_erase_lists().
(5) The other transaction consulted trx_sys->rw_trx_set and determined
that there is no implicit lock. Hence, it grabbed the lock.
(6) The !other_lock assertion fails in lock_rec_add_to_queue()
for the lock_rec_convert_impl_to_expl(), because the lock was 'stolen'.
This assertion failure looks genuine, because the INSERT transaction
is still active (trx->state=TRX_STATE_ACTIVE).
The problematic step (4) was introduced in
mysql/mysql-server@e27e0e0bb7
which fixed something related to MVCC (covered by the test
innodb.innodb-read-view). Basically, it reintroduced an error
that had been mentioned in an earlier commit
mysql/mysql-server@a17be6963f:
"The active transaction was removed from trx_sys->rw_trx_set prematurely."
Our fix goes along the following lines:
(a) Implicit locks will released by assigning
trx->state=TRX_STATE_COMMITTED_IN_MEMORY as the first step.
This transition will no longer be protected by lock_sys_t::mutex,
only by trx->mutex. This idea is by Sergey Vojtovich.
(b) We detach the transaction from trx_sys before starting to release
explicit locks.
(c) All callers of trx_rw_is_active() and trx_rw_is_active_low() must
recheck trx->state after acquiring trx->mutex.
(d) Before releasing any explicit locks, we will ensure that any activity
by other threads to convert implicit locks into explicit will have ceased,
by checking !trx_is_referenced(trx). There was a glitch
in this check when it was part of lock_trx_release_locks(); at the end
we would release trx->mutex and acquire lock_sys->mutex and trx->mutex,
and fail to recheck (trx_is_referenced() is protected by trx_t::mutex).
(e) Explicit locks can be released in batches (LOCK_RELEASE_INTERVAL=1000)
just like we did before.
trx_t::state: Document that the transition to COMMITTED is only
protected by trx_t::mutex, no longer by lock_sys_t::mutex.
trx_rw_is_active_low(), trx_rw_is_active(): Document that the transaction
state should be rechecked after acquiring trx_t::mutex.
trx_t::commit_state(): New function to change a transaction to committed
state, to release implicit locks.
trx_t::release_locks(): New function to release the explicit locks
after commit_state().
lock_trx_release_locks(): Move much of the logic to the caller
(which must invoke trx_t::commit_state() and trx_t::release_locks()
as needed), and assert that the transaction will have locks.
trx_get_trx_by_xid(): Make the parameter a pointer to const.
lock_rec_other_trx_holds_expl(): Recheck trx->state after acquiring
trx->mutex, and avoid a redundant lookup of the transaction.
lock_rec_queue_validate(): Recheck impl_trx->state while holding
impl_trx->mutex.
row_vers_impl_x_locked(), row_vers_impl_x_locked_low():
Document that the transaction state must be rechecked after
trx_mutex_enter().
trx_free_prepared(): Adjust for the changes to lock_trx_release_locks().
Problem:
========
We have a Master/Master Setup on two servers, but are only writing to one of
those servers (so it is essentially Master/Slave) We upgraded from 10.1.* to
10.2.22 last week and starting with the upgrade, we are getting duplicate key
errors on the slave. BINLOG=mixed.
Analysis:
=========
This issue happens with LOCK TABLES and binlog_format=MIXED combination. When an
UNSAFE statement is encountered in 'MIXED' mode, it is logged in the form of
'ROW' format. For all the tables that are part of LOCK TABLES list their table maps
are written into the binary log. For each table in the list a check is
done to see if 'check_table_binlog_row_based_done' flag is set or not. If it is not set
a check process is initiated to see if table qualifies for row based binary
logging or not and 'check_table_binlog_row_based_done' is set. This flag will be
cleared at the time of closing thread tables.
But there can be special cases where the LOCK TABLES contains more number of
tables but the unsafe query is actually using subset of tables from LOCK TABLES
list.
For example: LOCK TABLES locks t1,t2,t3 but the unsafe statement makes use of
only two tables t1,t3. In this case the 'check_table_binlog_row_based_done' flag
is enabled for table 't2' while writing table map, but 'close_thread_tables'
function call will not reset this flag. Since the flag is not cleared for table
't2' even a safe statement which used t2 will be logged in the form of row based
format.
This leads to an assert on debug builds and causes duplicate entries in release
builds. In release builds a statement is logged in the form of both ROW and
STATEMENT format. This causes the slave to fail with duplicate key error.
Fix:
===
During 'close_thread_tables' when LOCK TABLE modes are active "ha_reset" is done
for all the tables which were part of current statement. As mentioned in the
example 'ha_reset' is called for tables 't1' and 't3'. This will clear the
'check_table_binlog_row_based_done' flag. At this point add a check for the rest
of the tables to see if 'check_table_binlog_row_based_done' is enabled or not.
If enabled clear the flag.
For partitioned table, ensure that the AUTO_INCREMENT values will
be assigned from the same sequence. This is based on the following
change in MySQL 5.6.44:
commit aaba359c13d9200747a609730dafafc3b63cd4d6
Author: Rahul Malik <rahul.m.malik@oracle.com>
Date: Mon Feb 4 13:31:41 2019 +0530
Bug#28573894 ALTER PARTITIONED TABLE ADD AUTO_INCREMENT DIFF RESULT DEPENDING ON ALGORITHM
Problem:
When a partition table is in-place altered to add an auto-increment column,
then its values are starting over for each partition.
Analysis:
In the case of in-place alter, InnoDB is creating a new sequence object
for each partition. It is default initialized. So auto-increment columns
start over for each partition.
Fix:
Assign old sequence of the partition to the sequence of next partition
so it won't start over.
RB#21148
Reviewed by Bin Su <bin.x.su@oracle.com>
There were two newly enabled warnings:
1. cast for a function pointers. Affected sql_analyse.h, mi_write.c
and ma_write.cc, mf_iocache-t.cc, mysqlbinlog.cc, encryption.cc, etc
2. memcpy/memset of nontrivial structures. Fixed as:
* the warning disabled for InnoDB
* TABLE, TABLE_SHARE, and TABLE_LIST got a new method reset() which
does the bzero(), which is safe for these classes, but any other
bzero() will still cause a warning
* Table_scope_and_contents_source_st uses `TABLE_LIST *` (trivial)
instead of `SQL_I_List<TABLE_LIST>` (not trivial) so it's safe to
bzero now.
* added casts in debug_sync.cc and sql_select.cc (for JOIN)
* move assignment method for MDL_request instead of memcpy()
* PARTIAL_INDEX_INTERSECT_INFO::init() instead of bzero()
* remove constructor from READ_RECORD() to make it trivial
* replace some memcpy() with c++ copy assignments
Analysis:
========
Increasing the length of the indexed varchar column is not an instant operation for
innodb.
Fix:
===
- Introduce the new handler flag 'Alter_inplace_info::ALTER_COLUMN_INDEX_LENGTH' to
indicate the index length differs due to change of column length changes.
- InnoDB makes the ALTER_COLUMN_INDEX_LENGTH flag as instant operation.
This is a port of Mysql fix.
commit 913071c0b16cc03e703308250d795bc381627e37
Author: Nisha Gopalakrishnan <nisha.gopalakrishnan@oracle.com>
Date: Wed May 30 14:54:46 2018 +0530
BUG#26848813: INDEXED COLUMN CAN'T BE CHANGED FROM VARCHAR(15)
TO VARCHAR(40) INSTANTANEOUSLY
Implement according to standard SQL specification 2008.
The check_constraints table is used for fetching metadata about
the constraints defined for tables in all databases.
There were some result files which failed after running mtr.
These files are updated with newly create record with mtr --record.
When using buffered sort in `UPDATE`, keyread is used. In this case,
`TABLE::update_virtual_field` should be aborted, but it actually isn't,
because it is called not with a top-level handler, but with the one that
is actually going to access the disk. Here the problemm is issued with
partitioning, so the solution is to recursively mark for keyread all the
underlying partition handlers.
* ha_partition: update keyread state for child partitions
Closes#800
Observed and described
partitioned engine execution time difference
between master and slave was caused by excessive invocation
of base_engine::rnd_init which was done also for partitions
uninvolved into Rows-event operation.
The bug's slave slowdown therefore scales with the number of partitions.
Fixed with applying an upstream patch.
References:
----------
https://bugs.mysql.com/bug.php?id=73648
Bug#25687813 REPLICATION REGRESSION WITH RBR AND PARTITIONED TABLES
MDEV-13851 Always check table options in ALTER TABLE…ALGORITHM=INPLACE
In the merge of MySQL 5.7.9 to MariaDB 10.2.2, some code was included
that prevents ADD SPATIAL INDEX from being executed with ALGORITHM=INPLACE.
Also, the constant ADD_SPATIAL_INDEX was introduced as an alias
to ADD_INDEX. We will remove that alias now, and properly implement
the same ADD SPATIAL INDEX restrictions as MySQL 5.7 does:
1. table-rebuilding operations are not allowed if SPATIAL INDEX survive it
2. ALTER TABLE…ADD SPATIAL INDEX…LOCK=NONE is not allowed
ha_innobase::prepare_inplace_alter_table(): If the ALTER TABLE
requires actions within InnoDB, enforce the table options (MDEV-13851).
In this way, we will keep denying ADD SPATIAL INDEX for tables
that use encryption (MDEV-11974), even if ALGORITHM=INPLACE is used.
CREATE/DROP TEMPORARY TABLE are not safe to optimistically replicate in
parallel with other transactions, so they need to be marked as "ddl" in the
binlog.
This was already done for stand-alone CREATE/DROP TEMPORARY. But temporary
tables can also be created and dropped inside a BEGIN...END transaction, and
such transactions were not marked as ddl. Nor was the DROP TEMPORARY TABLE
statement emitted implicitly when a client connection is closed.
So this patch adds such ddl mark for the missing cases.
The difference to Kristian's original patch is mainly a fix in
mysql_trans_commit_alter_copy_data() to remember the unsafe_rollback_flags
over the temporary commit.
The sole purpose of handlerton::release_temporary_latches and its wrapper
function was to release the InnoDB adaptive hash index latch
(btr_search_latch).
When the btr_search_latch was split into an array of latches
in MySQL 5.7.8 as part of the Oracle Bug#20985298 fix, the "caching"
of the latch across storage engine API calls was removed. As part of that,
the function trx_search_latch_release_if_reserved() was changed to an
assertion and the function trx_reserve_search_latch_if_not_reserved()
was removed, and handlerton::release_temporary_latches() practically
became a no-op.
Note: MDEV-12121 replaced the function
trx_search_latch_release_if_reserved()
with the more appropriately named macro trx_assert_no_search_latch().
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
* rename to "keyread" (to avoid conflicts with tokudb),
* change from bool to uint and store the keyread index number there
* provide a bool accessor to check if keyread is enabled
move TABLE::key_read into handler. Because in index merge and DS-MRR
there can be many handlers per table, and some of them use
key read while others don't. "keyread" is really per handler,
not per TABLE property.
- It turns out, ha_rocksdb::table_flags() can return
HA_PRIMARY_KEY_IN_READ_INDEX for all kinds of tables (as its meaning
is "if there is a PK, PK columns contribute to the secondary index
tuple". There is no assumption that a certain PK column can be decoded
from the secondary index.
(Should probably be fixed in the upstream, too, but I was unable to
construct a testcase showing this is necessary).
- Following the above, we can undo the init_with_fields() changes in
table.cc. MyRocks calls init_with_fields() from ha_rocksdb::open()
which sets index-only read capabilities properly.
* remove old 5.2+ InnoDB support for virtual columns
* enable corresponding parts of the innodb-5.7 sources
* copy corresponding test cases from 5.7
* copy detailed Alter_inplace_info::HA_ALTER_FLAGS flags from 5.7
- and more detailed detection of changes in fill_alter_inplace_info()
* more "innodb compatibility hooks" in sql_class.cc to
- create/destroy/reset a THD (used by background purge threads)
- find a prelocked table by name
- open a table (from a background purge thread)
* different from 5.7:
- new service thread "thd_destructor_proxy" to make sure all THDs are
destroyed at the correct point in time during the server shutdown
- proper opening/closing of tables for vcol evaluations in
+ FK checks (use already opened prelocked tables)
+ purge threads (open the table, MDLock it, add it to tdc, close
when not needed)
- cache open tables in vc_templ
- avoid unnecessary allocations, reuse table->record[0] and table->s->default_values
- not needed in 5.7, because it overcalculates:
+ tell the server to calculate vcols for an on-going inline ADD INDEX
+ calculate vcols for correct error messages
* update other engines (mroonga/tokudb) accordingly
This cset just re-uses the approach from facebook/mysql-5.6 (Perhaps we
will have something different for MariaDB in the end).
For now this is:
Port this fix
dd7eeae69503cb8ab6ddc8fd9e2fef451cc31a32
Issue#250: MyRocks/Innodb different output from query with order by on table with index and decimal type
Summary:
Make open_binary_frm() set TABLE_SHARE::primary_key before it computes
Also add the patch for
https://github.com/facebook/mysql-5.6/issues/376
MySQL has each storage engine to increment Handler_XXX counters,
while MariaDB has handler::ha_XXX() methods to do the increments.
MariaDB's solution doesn't work for storage engines that implement
handler::read_range_first(), though.
Make ha_rocksdb::read_range_first increment the counter (when it is
calling handler::ha_XXX() function that will)