An INSERT into a temporary table would fail to set the
index page as modified. If there were no other write operations
(such as UPDATE or DELETE) to the page, and the page was evicted,
we would read back the old contents of the page, causing
corruption or loss of data.
page_cur_insert_rec_write_log(): Call mtr_t::set_modified()
for temporary tables. Normally this is part of the mlog_open()
call, but the mlog_open() call was only present in debug builds.
This regression was caused by
commit 48192f963a
which was preparation for MDEV-11369 and supposed to affect
debug builds only.
Thanks to Thirunarayanan Balathandayuthapani for debugging.
When a table is renamed to an internal #sql2 or #sql-ib name during
a table-rebuilding DDL operation such as OPTIMIZE TABLE or ALTER TABLE,
and shortly after that a purge operation in an index on virtual columns
is attempted, the operation could fail, but purge would fail to release
the table reference.
innodb_acquire_mdl(): Release the reference if the table name is not
valid for acquiring a meta-data lock (MDL).
innodb_find_table_for_vc(): Add a debug assertion if the table name
is not valid. This code path is for DML execution. The table
should have a valid name for executing DML, and furthermore a MDL
will prevent the table from being renamed.
row_vers_build_clust_v_col(): Add a debug assertion that both indexes
must belong to the same table.
trx_purge_add_update_undo_to_history(): Relax the too strict assertion
by removing the condition on srv_fast_shutdown (innodb_fast_shutdown).
Rollback is allowed during any form of shutdown.
buf_dump(): Only generate the output when shutdown is in progress.
log_write_up_to(): Only generate the output before actually writing
to the redo log files.
srv_purge_should_exit(): Rate-limit the output, and instead of
displaying the work done, indicate the work that remains to be done
until the completion of the slow shutdown.
The MySQL 5.7 TRUNCATE TABLE is inherently incompatible
with hot backup, because it is creating and deleting a separate
log file, and it is not writing redo log for all changes of the
InnoDB data dictionary tables. Refuse to create a corrupted backup
if the unsafe form of TRUNCATE was executed.
Note: Undo log tablespace truncation cannot be detected easily.
Also it is incompatible with backup, for similar reasons.
xtrabackup_backup_func(): "Subscribe to" the log events before
the first invocation of xtrabackup_copy_logfile().
recv_parse_or_apply_log_rec_body(): If the function pointer
log_truncate is set, invoke it to report MLOG_TRUNCATE.
Amend commit b853b4fd88
that was reverted in commit 29150e2391.
recv_parse_log_recs(): Do check for corrupted redo log or file
system before checking for len==0, but only read *ptr if
it is not past the end of the buffer (end_ptr).
recv_parse_log_rec(): Report incorrect redo log type
in a consistent way with recv_parse_or_apply_log_rec_body().
This is a follow-up to commit f30c5af42e.
The Pool poisoning that was introduced in MDEV-15030 introduced
race conditions in AddressSanitizer builds, because concurrent
poisoning and unpoisoning were not prevented by any synchronization
primitive.
Pool::get(): Protect the unpoisoning by m_lock_strategy.
Pool::mem_free(): Protect the poisoning by m_lock_strategy.
Pool::putl(): Renamed from put(), because now the caller is
responsible for invoking m_lock_strategy.
If trx_free() and trx_create_low() were called while a call to
trx_reference() was pending, we could get a reference to a wrong
transaction object.
trx_reference(): Return NULL if the trx->id no longer matches.
lock_trx_release_locks(): Assign trx->id = 0, so that trx_reference()
will not return a reference to this object.
trx_cleanup_at_db_startup(): Assign trx->id = 0.
assert_trx_is_free(): Assert !trx->n_ref. Assert trx->id == 0,
now that it will be cleared as part of a transaction commit.
Allocate trx->lock.rec_pool and trx->lock.table_pool directly from trx_t.
Remove unnecessary use of std::vector.
In order to do this, move some definitions from lock0priv.h to
lock0types.h, so that ib_lock_t will not be an opaque type.
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
The test causes simulated server crashes with DBUG_SUICIDE();.
It also relies on transactions that were committed right before the
crash to be visible after the crash (that is, it requires durability).
Run the test with transaction durability enabled: set
rocksdb-flush-log-at-trx-commit=1.
recv_parse_log_recs(): Do not check for corruption before
checking for end-of-log-buffer. For some reason, adding the
check to the logical-looking place would cause intermittent
recovery failures in the tests innodb.innodb-index and
innodb_gis.rtree_compress2.
recv_parse_log_recs(): Check for corruption before checking for
end-of-log-buffer.
mlog_parse_initial_log_record(), page_cur_parse_delete_rec():
Flag corruption for out-of-bounds values, and let the caller
dump the corrupted redo log extract.
If recv_sys_justify_left_parsing_buf() has been invoked, it is possible
that recv_previous_parsed_rec_offset is after the current offset.
In this case, we must not dump any bytes before the current record.
If the LOG_BLOCK_HDR_DATA_LEN field is corrupted, scanning the
log records could fail in strange ways. It is better to validate
the field as part of validating each log block.
filamtxt.cpp: DOSFAM::RenameTempFile: Change sprintf to snprintf.
filamvct.cpp: VECFAM::RenameTempFile: Change sprintf to snprintf.
javaconn.cpp:
Add JAVAConn::GetUTFString function.
Use it instead of env->GetStringUTFChars.
Fix wrong identation.
javaconn.h: Add GetUTFString declaration.
jdbconn.cpp:
Use GetUTFString function instead of env->GetStringUTFChars.
jmgoconn.cpp:
Use GetUTFString function instead of env->GetStringUTFChars.
Fix wrong identation.
jsonudf.cpp: change 139 to BMX line 4631.
tabjmg.cpp:
Add ReleaseStringUTF.
Fix wrong identation.
tabpivot.cpp: Fix wrong identation.
tabutil.cpp: TDBPRX::GetSubTable: Change sprintf to snprintf.
modified: storage/connect/filamtxt.cpp
modified: storage/connect/filamvct.cpp
modified: storage/connect/javaconn.cpp
modified: storage/connect/javaconn.h
modified: storage/connect/jdbconn.cpp
modified: storage/connect/jmgoconn.cpp
modified: storage/connect/jsonudf.cpp
modified: storage/connect/tabjmg.cpp
modified: storage/connect/tabpivot.cpp
modified: storage/connect/tabutil.cpp
- Fix MDEV-16895 CONNECT engine's get_error_message can cause buffer
overflow and server crash with long queries
ha_connect_cc: Update version.
get_error_message: Remove charset conversion.
modified: storage/connect/ha_connect.cc
- Fix a server crash on inserting bigint to a JDBC table
JDBConn::SetUUID:
Suppress check on ctyp that causes a server crash because ctyp
can be negative and this triggers an DEBUG_ASSERT on return.
modified: storage/connect/jdbconn.cpp
- Update jdbc.result
mysql-test/connect/r/jdbc.result: Recorded to reflect a message change.
modified: storage/connect/mysql-test/connect/r/jdbc.result
InnoDB executed code that is mean to execute only when Galera
is used and in bad luck one of the transactions is selected
incorrectly as deadlock victim. Fixed by adding wsrep_on_trx()
condition before entering actual Galera transaction handling.
No always repeatable test case for this issue is known.
rw_lock_get_debug_info(): Remove. This function is inherently unsafe
to use, because the copied pointers can become stale between
rw_lock_debug_mutex_exit() and the dereferencing of the pointer in
the caller.
fts_query(): Remove a redundant condition (result will never be NULL),
and instead check if *result is NULL, to prevent SIGSEGV in
fts_query_free_result().
This concludes the merge of all applicable InnoDB changes from
MySQL 5.7.23, with the exception of a performance fix, which we
plan to rewrite in MariaDB later in such a way that it does not
involve changing the storage engine API:
MDEV-16849 Extending indexed VARCHAR column should be instantaneous
This is a port of an Oracle fix.
No test case was provided by Oracle. It seems that to exploit this
bug, one would have to SET foreign_key_checks=0 before TRUNCATE,
and to concurrently run some DML statement that causes a foreign key
constraint to be checked.
commit 1f24c5aa2843fa548aa5c4b29c00f955e03e9f5b
Author: Aditya A <aditya.a@oracle.com>
Date: Fri May 18 12:32:37 2018 +0530
Bug #27208858 CONCURRENT DDL/DML ON FOREIGN KEYS CRASH IN
PAGE_CUR_SEARCH_WITH_MATCH_BYTES
Similar to the tables SYS_FOREIGN and SYS_FOREIGN_COLS,
the tables mysql.innodb_table_stats and mysql.innodb_index_stats
are updated by the InnoDB internal SQL parser, which fails to
enforce the size limits of the data. Due to this, it is possible
for InnoDB to hang when there are persistent statistics defined on
partitioned tables where the total length of table name,
partition name and subpartition name exceeds the incorrectly
defined limit VARCHAR(64). That column should have been defined
as VARCHAR(199).
btr_node_ptr_max_size(): Interpret the VARCHAR(64) as VARCHAR(199),
to prevent a hang in the case that the upgrade script has not been
run.
dict_table_schema_check(): Ignore difference in the length of the
table_name column.
ha_innobase::max_supported_key_length(): For innodb_page_size=4k,
return a larger value so that the table mysql.innodb_index_stats
can be created. This could allow "impossible" tables to be created,
such that it is not possible to insert anything into a secondary
index when both the secondary key and the primary key are long,
but this is the easiest and most consistent way. The Oracle fix
would only ignore the maximum length violation for the two
statistics tables.
os_file_get_status_posix(), os_file_get_status_win32(): Handle
ENAMETOOLONG as well.
This patch is based on the following change in MySQL 5.7.23.
Not all changes were applied, and our variant allows persistent
statistics to work without hangs even if the table definitions
were not upgraded.
From fdbdce701ab8145ae234c9d401109dff4e4106cb Mon Sep 17 00:00:00 2001
From: Aditya A <aditya.a@oracle.com>
Date: Thu, 17 May 2018 16:11:43 +0530
Subject: [PATCH] Bug #26390736 THE FIELD TABLE_NAME (VARCHAR(64)) FROM
MYSQL.INNODB_TABLE_STATS CAN OVERFLOW.
In mysql.innodb_index_stats and mysql.innodb_table_stats
tables the table name column didn't take into consideration
partition names which can be more than varchar(64).
When MySQL 5.7.1 introduced WL#6326 to reduce contention on the
non-leaf levels of B-trees, it introduced a new rw-lock mode SX
(not conflicting with S, but conflicting with SX and X) and
new rules to go with it.
A thread that is holding an dict_index_t::lock aka index->lock
in SX mode is permitted to acquire non-leaf buf_block_t::lock
aka block->lock X or SX mode, in monotonically descending order.
That is, once the thread has acquired a block->lock, it is not
allowed to acquire a lock on its parent or grandparent pages.
Such arbitrary-order access is only allowed when the thread
acquired the index->lock in X mode upfront.
A customer encountered a repeatable hang when loading a dump into
InnoDB while using multiple innodb_purge_threads (default: 4).
The dump makes very heavy use of FOREIGN KEY constraints.
By luck, it happened so that two purge worker threads (srv_worker_thread)
deadlocked with each other. Both were operating on the index FOR_REF
of the InnoDB internal table SYS_FOREIGN. One of them was legitimately
holding index->lock S-latch and the root block->lock S-latch. The other
had acquired index->lock SX-latch, root block->lock SX-latch, and a bunch
of other latches, including the fil_space_t::latch for freeing some blocks
and some leaf page latches. This other thread was inside 2 nested calls
to btr_compress() and it was trying to reacquire the root block->lock
in X mode, violating the WL#6326 protocol.
This violation led to a deadlock, because while S is compatible with SX
and a thread can upgrade an SX lock to X when there are no conflicting
requests, in this case there was a conflicting S lock held by the other
purge worker thread.
During this deadlock, both threads are holding dict_operation_lock S-latch,
which would block any subsequent DDL statements, such as CREATE TABLE.
The tables SYS_FOREIGN and SYS_FOREIGN_COLS are special in that they
define key columns of the type VARCHAR(0), created using the InnoDB
internal SQL parser. Because InnoDB does not internally enforce the
maximum length of columns, it would happily write more than 0 bytes
to these columns. This caused a miscalculation of node_ptr_max_size.
btr_cur_will_modify_tree(): Clean up some code. (No functional change.)
btr_node_ptr_max_size(): Renamed from dict_index_node_ptr_max_size().
Use a more realistic maximum size for SYS_FOREIGN and SYS_FOREIGN_COLS.
btr_cur_pessimistic_delete(): Refrain from merging pages if it is
not safe.
This work is based on the following MySQL 5.7.23 fix:
commit 58dcf0b4a4165ed59de94a9a1e7d8c954f733726
Author: Aakanksha Verma <aakanksha.verma@oracle.com>
Date: Wed May 9 18:54:03 2018 +0530
BUG#26225783 MYSQL CRASH ON CREATE TABLE (REPRODUCEABLE) -> INNODB: A
LONG SEMAPHORE WAIT
fsync() will just return EIO only once when the IO error happens, so, it's
wrong to keep trying to call it till it return success.
When fsync() returns EIO it should be treated as a hard error and InnoDB must
abort immediately.
trx_set_rw_mode() is never called for read-only transactions, this is guarded
by callers.
Removing this condition from critical section immediately gives 5% scalability
improvement in OLTP index updates benchmark.
INNOBASE_SHARE: remove
check_index_consistency(): iterates through keys and looks for InnoDB and .frm
mismatches.
ha_innobase::innobase_get_index(): now uses dict_table_get_index_on_name()
dict_table_get_index_on_name(): uses strcmp() instead of innobase_casestrcmp()
as we just need to know whether strings are equal or not
Compile on Windows MSVC with -DHAVE_SSE2 and -DHAVE_PCLMUL
It is safe, since code will do also runtime checks via cpuid(), before
using the instructions, and will fallback to slower versions,
if instructions are not available.
The functions fts_ast_visit() and fts_query() inside
InnoDB FULLTEXT INDEX query processing are not checking
for THD::killed (trx_is_interrupted()), like anything
that potentially takes a long time should do.
This is a port of the following change from MySQL 5.7.23,
with a completely rewritten test case.
commit c58c6f8f66ddd0357ecd0c99646aa6bf1dae49c8
Author: Aakanksha Verma <aakanksha.verma@oracle.com>
Date: Fri May 4 15:53:13 2018 +0530
Bug #27155294 MAX_EXECUTION_TIME NOT INTERUPTED WITH FULLTEXT SEARCH USING MECAB