MDEV-20578 Got error 126 when executing undo undo_key_delete
upon Aria crash recovery
The crash happens in this scenario:
- Table with unique keys and non unique keys
- Batch insert (LOAD DATA or INSERT ... SELECT) with REPLACE
- Some insert succeeds followed by duplicate key error
In the above scenario the table gets corrupted.
The bug was that we don't generate any undo entry for the
failed insert as the whole insert can be ignored by undo.
The code did however not take into account that when bulk
insert is used, we would write cached keys to the file on
failure and undo would wrongly ignore these.
Fixed by moving the writing of the cache keys after we write
the aborted-insert event to the log.
Rowid Filter check is just like Index Condition Pushdown check: before
we check the filter, we must check if we have walked out of the range
we are scanning. (If we did, we should return, and not continue the scan).
Consequences of this:
- Rowid filtering doesn't work for keys that have partially-covered
blob columns (just like Index Condition Pushdown)
- The rowid filter function has three return values: CHECK_POS (passed)
CHECK_NEG (filtered out), CHECK_OUT_OF_RANGE.
All of the above is implemented in this patch
- Inplace alter shouldn't set default date column as '0000-00-00' when
table is not empty. So mysql_inplace_alter_table() copied
alter_ctx.error_if_not_empty to a new field of Alter_inplace_info.
In ha_innobase::check_if_supported_inplace_alter() should check the
error_if_not_empty flag and return INPLACE_NOT_SUPPORTED if the table
is not empty
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
was restored.
Optionally rollback prepared XA's on "mariabackup --prepare".
The fix MUST NOT be ported on 10.5+, as MDEV-742 fix solves the issue for
slaves.
This was to remove a performance regression between 10.3 and 10.4
In 10.5 we will have a better implementation of records_in_range
that will enable us to get more statistics.
This change was not done in 10.4 because the 10.5 will be part of
a larger change that is not suitable for the GA 10.4 version
Other things:
- Changed default handler block_size to 8192 to fix things statistics
for engines that doesn't set the block size.
- Fixed a bug in spider when using multiple part const ranges
(Patch from Kentoku)
* do not allow versioned table to be without versioned (non-system) fields
* prohibit changing field versioning, when removing table versioning
* handle CREATE...SELECT as well
MySQL 5.7.9 (and MariaDB 10.2.2) introduced a race condition
between InnoDB transaction commit and the conversion of implicit
locks into explicit ones.
The assertion failure can be triggered with a test that runs
3 concurrent single-statement transactions in a loop on a simple
table:
CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
thread1: INSERT INTO t SET a=1;
thread2: DELETE FROM t;
thread3: SELECT * FROM t FOR UPDATE; -- or DELETE FROM t;
The failure scenarios are like the following:
(1) The INSERT statement is being committed, waiting for lock_sys->mutex.
(2) At the time of the failure, both the DELETE and SELECT transactions
are active but have not logged any changes yet.
(3) The transaction where the !other_lock assertion fails started
lock_rec_convert_impl_to_expl().
(4) After this point, the commit of the INSERT removed the transaction from
trx_sys->rw_trx_set, in trx_erase_lists().
(5) The other transaction consulted trx_sys->rw_trx_set and determined
that there is no implicit lock. Hence, it grabbed the lock.
(6) The !other_lock assertion fails in lock_rec_add_to_queue()
for the lock_rec_convert_impl_to_expl(), because the lock was 'stolen'.
This assertion failure looks genuine, because the INSERT transaction
is still active (trx->state=TRX_STATE_ACTIVE).
The problematic step (4) was introduced in
mysql/mysql-server@e27e0e0bb7
which fixed something related to MVCC (covered by the test
innodb.innodb-read-view). Basically, it reintroduced an error
that had been mentioned in an earlier commit
mysql/mysql-server@a17be6963f:
"The active transaction was removed from trx_sys->rw_trx_set prematurely."
Our fix goes along the following lines:
(a) Implicit locks will released by assigning
trx->state=TRX_STATE_COMMITTED_IN_MEMORY as the first step.
This transition will no longer be protected by lock_sys_t::mutex,
only by trx->mutex. This idea is by Sergey Vojtovich.
(b) We detach the transaction from trx_sys before starting to release
explicit locks.
(c) All callers of trx_rw_is_active() and trx_rw_is_active_low() must
recheck trx->state after acquiring trx->mutex.
(d) Before releasing any explicit locks, we will ensure that any activity
by other threads to convert implicit locks into explicit will have ceased,
by checking !trx_is_referenced(trx). There was a glitch
in this check when it was part of lock_trx_release_locks(); at the end
we would release trx->mutex and acquire lock_sys->mutex and trx->mutex,
and fail to recheck (trx_is_referenced() is protected by trx_t::mutex).
(e) Explicit locks can be released in batches (LOCK_RELEASE_INTERVAL=1000)
just like we did before.
trx_t::state: Document that the transition to COMMITTED is only
protected by trx_t::mutex, no longer by lock_sys_t::mutex.
trx_rw_is_active_low(), trx_rw_is_active(): Document that the transaction
state should be rechecked after acquiring trx_t::mutex.
trx_t::commit_state(): New function to change a transaction to committed
state, to release implicit locks.
trx_t::release_locks(): New function to release the explicit locks
after commit_state().
lock_trx_release_locks(): Move much of the logic to the caller
(which must invoke trx_t::commit_state() and trx_t::release_locks()
as needed), and assert that the transaction will have locks.
trx_get_trx_by_xid(): Make the parameter a pointer to const.
lock_rec_other_trx_holds_expl(): Recheck trx->state after acquiring
trx->mutex, and avoid a redundant lookup of the transaction.
lock_rec_queue_validate(): Recheck impl_trx->state while holding
impl_trx->mutex.
row_vers_impl_x_locked(), row_vers_impl_x_locked_low():
Document that the transaction state must be rechecked after
trx_mutex_enter().
trx_free_prepared(): Adjust for the changes to lock_trx_release_locks().
MDEV-19486 and one more similar bug appeared because handler::write_row() interface
welcomes to modify buffer by storage engine. But callers are not ready for that
thus bugs are possible in future.
handler::write_row():
handler::ha_write_row(): make argument const
Make Field::is_equal() const and return bool as it's a naturally fitting
type for it. Also it's agrument was narrowed to Column_definition.
InnoDB can change type of some columns by itself. InnoDB-specific code used to
reside in Field_xxx:is_equal() methods. Now engine-specific stuff was
moved to a virtual methods of handler::can_convert{string,varstring,blob,geom}.
These methods are called by Field::can_be_converted_by_engine() which is a
double dispatch pattern.
Some InnoDB-specific code still resides in compare_keys_but_name(). It should
be moved from here someday to handler::compare_key_parts(...) or similar.
IS_EQUAL_WITH_REINTERPRET_COMPATIBLE_CHARSET
IS_EQUAL_WITH_REINTERPRET_COMPATIBLE_CHARSET_BUT_COLLATE: both was removed
IS_EQUAL_NO, IS_EQUAL_YES are not needed now and should be removed
along with deprecated handler::check_if_incompatible_data().
HA_EXTENDED_TYPES_CONVERSION: was removed as such logic is not needed now by
server code.
ALTER_COLUMN_EQUAL_PACK_LENGTH: was renamed to a more generic
ALTER_COLUMN_TYPE_CHANGE_BY_ENGINE
Problem:
========
We have a Master/Master Setup on two servers, but are only writing to one of
those servers (so it is essentially Master/Slave) We upgraded from 10.1.* to
10.2.22 last week and starting with the upgrade, we are getting duplicate key
errors on the slave. BINLOG=mixed.
Analysis:
=========
This issue happens with LOCK TABLES and binlog_format=MIXED combination. When an
UNSAFE statement is encountered in 'MIXED' mode, it is logged in the form of
'ROW' format. For all the tables that are part of LOCK TABLES list their table maps
are written into the binary log. For each table in the list a check is
done to see if 'check_table_binlog_row_based_done' flag is set or not. If it is not set
a check process is initiated to see if table qualifies for row based binary
logging or not and 'check_table_binlog_row_based_done' is set. This flag will be
cleared at the time of closing thread tables.
But there can be special cases where the LOCK TABLES contains more number of
tables but the unsafe query is actually using subset of tables from LOCK TABLES
list.
For example: LOCK TABLES locks t1,t2,t3 but the unsafe statement makes use of
only two tables t1,t3. In this case the 'check_table_binlog_row_based_done' flag
is enabled for table 't2' while writing table map, but 'close_thread_tables'
function call will not reset this flag. Since the flag is not cleared for table
't2' even a safe statement which used t2 will be logged in the form of row based
format.
This leads to an assert on debug builds and causes duplicate entries in release
builds. In release builds a statement is logged in the form of both ROW and
STATEMENT format. This causes the slave to fail with duplicate key error.
Fix:
===
During 'close_thread_tables' when LOCK TABLE modes are active "ha_reset" is done
for all the tables which were part of current statement. As mentioned in the
example 'ha_reset' is called for tables 't1' and 't3'. This will clear the
'check_table_binlog_row_based_done' flag. At this point add a check for the rest
of the tables to see if 'check_table_binlog_row_based_done' is enabled or not.
If enabled clear the flag.
make live checksum to be returned in handler::info(),
and slow table-scan checksum to be calculated in handler::checksum().
part of
MDEV-16249 CHECKSUM TABLE for a spider table is not parallel and saves all data in memory in the spider head by default
For partitioned table, ensure that the AUTO_INCREMENT values will
be assigned from the same sequence. This is based on the following
change in MySQL 5.6.44:
commit aaba359c13d9200747a609730dafafc3b63cd4d6
Author: Rahul Malik <rahul.m.malik@oracle.com>
Date: Mon Feb 4 13:31:41 2019 +0530
Bug#28573894 ALTER PARTITIONED TABLE ADD AUTO_INCREMENT DIFF RESULT DEPENDING ON ALGORITHM
Problem:
When a partition table is in-place altered to add an auto-increment column,
then its values are starting over for each partition.
Analysis:
In the case of in-place alter, InnoDB is creating a new sequence object
for each partition. It is default initialized. So auto-increment columns
start over for each partition.
Fix:
Assign old sequence of the partition to the sequence of next partition
so it won't start over.
RB#21148
Reviewed by Bin Su <bin.x.su@oracle.com>
Just rename index in data dictionary and in InnoDB cache when it's possible.
Introduce ALTER_INDEX_RENAME for that purpose so that engines can optimize
such operation.
Unused code between macro MYSQL_RENAME_INDEX was removed.
compare_keys_but_name(): compare index definitions except for index names
Alter_inplace_info::rename_keys:
ha_innobase_inplace_ctx::rename_keys: vector of rename indexes
fill_alter_inplace_info():: fills Alter_inplace_info::rename_keys
Fix partitioning for trx_id-versioned tables.
`partition by hash`, `range` and others now work.
`partition by system_time` is forbidden.
Currently we cannot use row_start and row_end in `partition by`, because
insertion of versioned field is done by engine's handler, as well as
row_start/row_end's value set up, which is a transaction id -- so it's
also forbidden.
The drawback is that it's now impossible to use `partition by key()`
without parameters for such tables, because it references row_start and
row_end implicitly.
* add handler::vers_can_native()
* drop Table_scope_and_contents_source_st::vers_native()
* drop partition_element::find_engine_flag as unused
* forbid versioning partitioning for trx_id as not supported
* adopt vers tests for trx_id partitioning
* forbid any row_end referencing in `partition by` clauses,
including implicit `by key()`
The MDEV-17262 commit 26432e49d3
was skipped. In Galera 4, the implementation would seem to require
changes to the streaming replication.
In the tests archive.rnd_pos main.profiling, disable_ps_protocol
for SHOW STATUS and SHOW PROFILE commands until MDEV-18974
has been fixed.