Commit graph

1002 commits

Author SHA1 Message Date
Nikita Malyavin
b393e2cb0c add innodb_debug_sync var to support DEBUG_SYNC from purge threads 2019-10-11 17:02:39 +10:00
Marko Mäkelä
5a92ccbaea Merge 10.3 into 10.4
Disable MDEV-20576 assertions until MDEV-20595 has been fixed.
2019-09-23 17:35:29 +03:00
Marko Mäkelä
c997af7d1f Remove reference to dict_sys->mutex 2019-09-23 10:33:10 +03:00
Marko Mäkelä
c016ea660e Merge 10.2 into 10.3 2019-09-23 10:25:34 +03:00
Thirunarayanan Balathandayuthapani
f94d9ab9f8 MDEV-20483 Follow-up fix
At commit, trx->lock.table_locks (which is a cache of trx_locks) can
consist of NULL pointers. Add a debug assertion for that, and clear
the vector.
2019-09-18 20:20:04 +05:30
Marko Mäkelä
bb4214272a Merge 10.1 into 10.2 2019-09-18 16:24:48 +03:00
Thirunarayanan Balathandayuthapani
8a79fa0e4d MDEV-19529 InnoDB hang on DROP FULLTEXT INDEX
Problem:
=======
  During dropping of fts index, InnoDB waits for fts_optimize_remove_table()
and it holds dict_sys->mutex and dict_operaiton_lock even though the
table id is not present in the queue. But fts_optimize_thread does wait
for dict_sys->mutex to process the unrelated table id from the slot.

Solution:
========
  Whenever table is added to fts_optimize_wq, update the fts_status
of in-memory fts subsystem to TABLE_IN_QUEUE. Whenever drop index
wants to remove table from the queue, it can check the fts_status
to decide whether it should send the MSG_DELETE_TABLE to the queue.

Removed the following functions because these are all deadcode.
dict_table_wait_for_bg_threads_to_exit(),
fts_wait_for_background_thread_to_start(),fts_start_shutdown(), fts_shudown().
2019-09-18 13:22:08 +05:30
Thirunarayanan Balathandayuthapani
fb3e3a6a3d MDEV-20483 trx_lock_t::table_locks is not a subset of trx_lock_t::trx_locks
Problem:
=======
  Transaction left with nonempty table locks list. This leads to
assumption that table_locks is not subset of trx_locks. Problem is that
lock_wait_timeout_thread() doesn't remove the table lock from
table_locks for transaction.

Solution:
========
  In lock_wait_timeout_thread(), remove the lock from table vector of
transaction.
2019-09-17 19:54:55 +05:30
Marko Mäkelä
60c04be659 Merge 10.3 into 10.4 2019-09-12 12:16:40 +03:00
Marko Mäkelä
0fa5ad3acf Merge 10.2 into 10.3 2019-09-11 16:42:01 +03:00
Thirunarayanan Balathandayuthapani
df4dee4b84 MDEV-17939 Assertion `++loop_count < 2' failed in trx_undo_report_rename
- During trx_undo_report_rename(), InnoDB can fail to write undo log
for it if undo log doesn't fit in the undo page. In that case, InnoDB
adds one more undo log page and retry to write the rename undo log.
But the assert is wrong and it doesn't allow to fail even for one time.
2019-09-11 16:02:41 +05:30
Sergei Golubchik
244f0e6dd8 Merge branch '10.3' into 10.4 2019-09-06 11:53:10 +02:00
Marko Mäkelä
2c9e75ccfe MDEV-15326 after-merge fixes
trx_t::is_recovered: Revert most of the changes that were made by the
merge of MDEV-15326 from 10.2. The trx_sys.rw_trx_hash and the recovery
of transactions at startup is quite different in 10.3.

trx_free_at_shutdown(): Avoid excessive mutex protection. Reading fields
that can only be modified by the current thread (owning the transaction)
can be done outside mutex.

trx_t::commit_state(): Restore a tighter assertion.

trx_rollback_recovered(): Clarify why there is no potential race condition
with other transactions.

lock_trx_release_locks(): Merge with trx_t::release_locks(),
and avoid holding lock_sys.mutex unnecessarily long.

rw_trx_hash_t::find(): Remove redundant code, and avoid starving the
committer by checking trx_t::state before trx_t::reference().
2019-09-05 15:58:31 +03:00
Marko Mäkelä
537f8594a6 Merge 10.2 into 10.3 2019-09-04 17:52:04 +03:00
Marko Mäkelä
dae1b3b04c MDEV-15326: Backport trx_t::is_referenced()
Backport the applicable part of Sergey Vojtovich's commit
0ca2ea1a65 from MariaDB Server 10.3.

trx reference counter was updated under mutex and read without any
protection. This is both slow and unsafe. Use atomic operations for
reference counter accesses.
2019-09-04 09:42:38 +03:00
Marko Mäkelä
b07beff894 MDEV-15326: InnoDB: Failing assertion: !other_lock
MySQL 5.7.9 (and MariaDB 10.2.2) introduced a race condition
between InnoDB transaction commit and the conversion of implicit
locks into explicit ones.

The assertion failure can be triggered with a test that runs
3 concurrent single-statement transactions in a loop on a simple
table:

CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
thread1: INSERT INTO t SET a=1;
thread2: DELETE FROM t;
thread3: SELECT * FROM t FOR UPDATE; -- or DELETE FROM t;

The failure scenarios are like the following:
(1) The INSERT statement is being committed, waiting for lock_sys->mutex.
(2) At the time of the failure, both the DELETE and SELECT transactions
are active but have not logged any changes yet.
(3) The transaction where the !other_lock assertion fails started
lock_rec_convert_impl_to_expl().
(4) After this point, the commit of the INSERT removed the transaction from
trx_sys->rw_trx_set, in trx_erase_lists().
(5) The other transaction consulted trx_sys->rw_trx_set and determined
that there is no implicit lock. Hence, it grabbed the lock.
(6) The !other_lock assertion fails in lock_rec_add_to_queue()
for the lock_rec_convert_impl_to_expl(), because the lock was 'stolen'.
This assertion failure looks genuine, because the INSERT transaction
is still active (trx->state=TRX_STATE_ACTIVE).

The problematic step (4) was introduced in
mysql/mysql-server@e27e0e0bb7
which fixed something related to MVCC (covered by the test
innodb.innodb-read-view). Basically, it reintroduced an error
that had been mentioned in an earlier commit
mysql/mysql-server@a17be6963f:
"The active transaction was removed from trx_sys->rw_trx_set prematurely."

Our fix goes along the following lines:

(a) Implicit locks will released by assigning
trx->state=TRX_STATE_COMMITTED_IN_MEMORY as the first step.
This transition will no longer be protected by lock_sys_t::mutex,
only by trx->mutex. This idea is by Sergey Vojtovich.
(b) We detach the transaction from trx_sys before starting to release
explicit locks.
(c) All callers of trx_rw_is_active() and trx_rw_is_active_low() must
recheck trx->state after acquiring trx->mutex.
(d) Before releasing any explicit locks, we will ensure that any activity
by other threads to convert implicit locks into explicit will have ceased,
by checking !trx_is_referenced(trx). There was a glitch
in this check when it was part of lock_trx_release_locks(); at the end
we would release trx->mutex and acquire lock_sys->mutex and trx->mutex,
and fail to recheck (trx_is_referenced() is protected by trx_t::mutex).
(e) Explicit locks can be released in batches (LOCK_RELEASE_INTERVAL=1000)
just like we did before.

trx_t::state: Document that the transition to COMMITTED is only
protected by trx_t::mutex, no longer by lock_sys_t::mutex.

trx_rw_is_active_low(), trx_rw_is_active(): Document that the transaction
state should be rechecked after acquiring trx_t::mutex.

trx_t::commit_state(): New function to change a transaction to committed
state, to release implicit locks.

trx_t::release_locks(): New function to release the explicit locks
after commit_state().

lock_trx_release_locks(): Move much of the logic to the caller
(which must invoke trx_t::commit_state() and trx_t::release_locks()
as needed), and assert that the transaction will have locks.

trx_get_trx_by_xid(): Make the parameter a pointer to const.

lock_rec_other_trx_holds_expl(): Recheck trx->state after acquiring
trx->mutex, and avoid a redundant lookup of the transaction.

lock_rec_queue_validate(): Recheck impl_trx->state while holding
impl_trx->mutex.

row_vers_impl_x_locked(), row_vers_impl_x_locked_low():
Document that the transaction state must be rechecked after
trx_mutex_enter().

trx_free_prepared(): Adjust for the changes to lock_trx_release_locks().
2019-09-04 09:42:38 +03:00
Marko Mäkelä
7c79c12784 MDEV-15326 preparation: Remove trx_sys_t::n_prepared_trx
This is a backport of 900b07908b
from MariaDB Server 10.3.
2019-09-04 09:42:38 +03:00
Marko Mäkelä
db4a27ab73 Merge 10.3 into 10.4 2019-08-31 06:53:45 +03:00
Marko Mäkelä
e41eb044f1 Merge 10.2 into 10.3 2019-08-28 10:18:41 +03:00
Marko Mäkelä
947b0b5722 Implement innodb_evict_tables_on_commit_debug
Some bugs are detected only after a table definition has been evicted
and then reloaded to the InnoDB data dictionary cache.

For debug builds, introduce the settable Boolean configuration parameter
innodb_evict_tables_on_commit_debug that can be set to request InnoDB
to attempt to evict table definitions from the data dictionary cache
whenever a transaction is committed.

This has been tested on 10.3 and 10.4 with the following:

./mysql-test-run.pl --mysqld=--loose-innodb-evict-tables-on-commit-debug

You can also use the following:

SET GLOBAL innodb_evict_tables_on_commit_debug=ON;
SET GLOBAL innodb_evict_tables_on_commit_debug=OFF;

The parameter affects the commit (or rollback or abort) of
transactions that have modified persistent InnoDB tables.
2019-08-28 10:11:39 +03:00
Marko Mäkelä
25af2a183b MDEV-15326/MDEV-16136 dead code removal
Revert part of fa2a74e08d.

trx_reference(): Remove, and merge the relevant part to the only caller
trx_rw_is_active(). If the statements trx = NULL; were ever executed,
the function would have dereferenced a NULL pointer and crashed in
trx_mutex_exit(trx). Hence, those statements must have been unreachable,
and they can be replaced with debug assertions.

trx_rw_is_active(): Avoid unnecessary acquisition and release of trx->mutex
when do_ref_count=false.

lock_trx_release_locks(): Do not reset trx->id=0. Had the statement been
necessary, we would have experienced crashes in trx_reference().
2019-08-27 16:38:57 +03:00
Marko Mäkelä
ae1d17f52d MDEV-20316 InnoDB writes uninitialised tail of XID buffer
Starting with commit 210855ce5d
Valgrind became aware that the unused tail of the buffer that
is returned by thd_get_xid() is actually uninitialized.

The problem should exist already in MySQL 5.0. I was able to
repeat it on MariaDB Server 5.5 with some additional instrumentation.
InnoDB is allocating 128+4+4 bytes for the XID and the lengths of
its components, even when the XID is shorter than 64+64 bytes.
In MariaDB Server 10.3, while running the test main.xa_binlog,
in the xid_t::set() that is called by sql_yacc.yy, the 128-byte data
buffer was uninitialized according to Valgrind, and only the first bytes
were initialized. When the xid_t::data was copied to
thd.transaction.xid_state.xid.data, it happened so that the entire
target buffer was considered initialized. With MariaDB Server 10.4 since
the said commit, Valgrind will correctly be detect the tail of the buffer
as uninitialized.

The impact of this bug is as follows:

(1) InnoDB will write unnecessarily much redo log for XA PREPARE.
(2) InnoDB will write garbage bytes to the redo log and undo log pages.
(3) The garbage should be 'harmless', because on recovery, only the
actual payload of the XID will be used, based on the written length.

trx_rseg_write_wsrep_checkpoint(), trx_undo_write_xid(): Write only
the actually used length of xid->data to the data page, and
zero out the rest of the buffer by mlog_memset().
2019-08-12 19:37:24 +03:00
Marko Mäkelä
e9c1701e11 Merge 10.3 into 10.4 2019-07-25 18:42:06 +03:00
Marko Mäkelä
fdef9f9b89 Merge 10.2 into 10.3 2019-07-25 15:31:11 +03:00
Marko Mäkelä
b6ac67389d Merge 10.1 into 10.2 2019-07-25 12:14:27 +03:00
Marko Mäkelä
0c7c61019d Remove the wrappers ut_time(), ut_difftime(), ib_time_t 2019-07-24 21:59:26 +03:00
Marko Mäkelä
10727b6953 Always initialize trx_t::start_time_micro
This affects the function has_higher_priority() for internal or
recovered transactions.
2019-07-24 21:59:26 +03:00
Marko Mäkelä
ab6dd77408 MDEV-14154: Remove ut_time_us()
Use microsecond_interval_timer()
or my_interval_timer() [in nanoseconds] instead.
2019-07-24 21:21:54 +03:00
Marko Mäkelä
b951fc4e7f Merge 10.2 into 10.3 2019-07-24 15:34:24 +03:00
Marko Mäkelä
97055e6b11 MDEV-14154: Remove ut_time_us()
Use microsecond_interval_timer()
or my_interval_timer() [in nanoseconds] instead.
2019-07-23 17:25:02 +03:00
Marko Mäkelä
09e9f884f1 MDEV-20048 Assertion 'n < tuple->n_fields on ROLLBACK after DROP COLUMN
btr_push_update_extern_fields(): Add a parameter for the original number
of fields in the record before btr_cur_trim(). Assume that this function
will only be called for the clustered index, which is the only index
that can contain off-page columns.

trx_undo_prev_version_build(), btr_cur_pessimistic_update():
Only invoke btr_push_update_extern_fields() for the clustered index.
2019-07-19 18:13:36 +03:00
Marko Mäkelä
24773bf380 MDEV-19606: dict_v_col_t: Encapsulate v_indexes
Remove the separate allocation and pointer indirection of
dict_v_col_t::v_indexes.
2019-05-28 08:01:50 +03:00
Marko Mäkelä
0274ab1de3 MDEV-19606: Replace most std::list with std::forward_list
C++11 defines the singly-linked std::forward_list. Prefer it to
the doubly-linked std::list in cases where we dot really need it.
Also, clean up some code.

dict_index_remove_from_v_col_list(): Remove.
Obsoleted by dict_index_t::detach_columns().

There is no std::forward_list::push_back(). Use push_front() instead.
The ordering does not really matter.

dict_v_col_t::n_v_indexes: Added. There is no std::forward_list::size(),
and trx_undo_log_v_idx() needs to know the size.

rtr_info_track_t::rtr_active: Encapsulate. There really was no justification
for the pointer indirection.
2019-05-28 08:01:50 +03:00
Marko Mäkelä
5d2619b693 MDEV-19584 Allocate recv_sys statically
There is only one InnoDB crash recovery subsystem.
Allocating recv_sys statically removes one level of pointer indirection
and makes code more readable, and removes the awkward initialization of
recv_sys->dblwr.

recv_sys_t::create(): Replaces recv_sys_init().

recv_sys_t::debug_free(): Replaces recv_sys_debug_free().

recv_sys_t::close(): Replaces recv_sys_close().

recv_sys_t::add(): Replaces recv_add_to_hash_table().

recv_sys_t::empty(): Replaces recv_sys_empty_hash().
2019-05-24 16:19:38 +03:00
Marko Mäkelä
b40c99a82c MDEV-17458: Clear more of the TRX_SYS page
trx_rseg_array_init(): Using the 10.4 specific MLOG_MEMSET record,
clear the entire TRX_SYS_WSREP_XID_INFO field.
2019-05-22 08:42:41 +03:00
Marko Mäkelä
cf77951fb6 Merge 10.3 into 10.4 2019-05-22 08:42:31 +03:00
Daniele Sciascia
592dc59d7a MDEV-17458 Unable to start galera node
Bootstrapping a new cluster from a backup created from a MariaDB
version prior to 10.3.5 may result in error "SST position can't be
set in past" when attempting to join additional nodes.
The problem stems from the fact that when reading the wsrep position
from InnoDB, the position is looked up in two places:
the TRX_SYS page, where versions prior to 10.3.5 used to store
WSREP's position; and rollback segments, this is where newer versions
store the position.
When starting a new cluster, the starting seqno is 0 and a new cluster
UUID is generated. This is persisted in rollback segments, but the old
UUID and seqno are not cleared from TRX_SYS page.
Subsequently, when reading back the position,
trx_rseg_read_wsrep_checkpoint() is going to return the maximum seqno
found in both TRX_SYS page and rollback segments. So in the case of a
newly bootstrapped cluster, it's always going to return the old
cluster information.
The fix consists of changing trx_rseg_read_wsrep_checkpoint() so that
only rollback segments are looked up. On startup, position is read
from the TRX_SYS page, and if present, it is copied to rollback
segments (unless a newer position is already present in the rollback
segments).
Finally the position stored in TRX_SYS page is cleared.
2019-05-21 15:06:44 +03:00
Marko Mäkelä
d0ef948d70 Non-functional change: Remove #ifdef UNIV_DEBUG 2019-05-20 16:52:26 +03:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
b390447e71 MDEV-19513: Remove rw_lock_t::magic_n
The magic_n only complicated object destruction and did not serve
any useful purpose.
2019-05-17 15:25:31 +03:00
Marko Mäkelä
5fd7502e77 MDEV-19513: Allocate dict_sys statically
dict_sys_t::create(): Renamed from dict_init().

dict_sys_t::close(): Renamed from dict_close().

dict_sys_t::add(): Sliced from dict_table_t::add_to_cache().

dict_sys_t::remove(): Renamed from dict_table_remove_from_cache().

dict_sys_t::prevent_eviction(): Renamed from
dict_table_move_from_lru_to_non_lru().

dict_sys_t::acquire(): Replaces dict_move_to_mru() and some more logic.

dict_sys_t::resize(): Renamed from dict_resize().

dict_sys_t::find(): Replaces dict_lru_find_table() and
dict_non_lru_find_table().
2019-05-17 14:32:53 +03:00
Marko Mäkelä
874f8f30f2 Merge 10.2 into 10.3 2019-05-14 17:25:25 +03:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
b93ecea65c Remove unnecessary pointer indirection for rw_lock_t
In MySQL 5.7.8 an extra level of pointer indirection was added to
dict_operation_lock and some other rw_lock_t without solid justification,
in mysql/mysql-server@52720f1772.

Let us revert that change and remove the rather useless rw_lock_t
constructor and destructor and the magic_n field. In this way,
some unnecessary pointer dereferences and heap allocation will be avoided
and debugging might be a little easier.
2019-05-13 18:46:12 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
c0ac0b8860 Update FSF address 2019-05-11 19:25:02 +03:00
Vicențiu Ciorbaru
f177f125d4 Merge branch '5.5' into 10.1 2019-05-11 19:15:57 +03:00
Vicențiu Ciorbaru
15f1e03d46 Follow-up to changing FSF address
Some places didn't match the previous rules, making the Floor
address wrong.

Additional sed rules:

sed -i -e 's/Place.*Suite .*, Boston/Street, Fifth Floor, Boston/g'
sed -i -e 's/Suite .*, Boston/Fifth Floor, Boston/g'
2019-05-11 18:30:45 +03:00
Marko Mäkelä
d3dcec5d65 Merge 10.3 into 10.4 2019-05-05 15:06:44 +03:00
Marko Mäkelä
b132b8895e Merge 10.3 into 10.4 2019-05-05 10:23:14 +03:00