Commit graph

19842 commits

Author SHA1 Message Date
Sergey Vojtovich
02270b44d0 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Let lock_validate_table_locks(), lock_rec_other_trx_holds_expl(),
lock_table_locks_lookup(), trx_recover_for_mysql(), trx_get_trx_by_xid(),
trx_roll_must_shutdown(), fetch_data_into_cache() iterate rw_trx_hash
instead of rw_trx_list.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
d8c0caad32 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Removed trx_sys_validate_trx_list(): with rw_trx_hash elements are not
required to be ordered by transaction id. Transaction state is now guarded
by asserts in rw_trx_hash_t.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
900b07908b MDEV-14756 - Remove trx_sys_t::rw_trx_list
Removed trx_sys_t::n_prepared_recovered_trx: never used.

Removed trx_sys_t::n_prepared_trx: used only at shutdown, we can perfectly
get this value from rw_trx_hash.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
a0b385ea2b MDEV-14756 - Remove trx_sys_t::rw_trx_list
Determine minimum transaction id by iterating rw_trx_hash, not rw_trx_list.

It is more expensive than previous implementation since it does linear
search, especially if there're many concurrent transactions running. But in
such case mutex is much bigger evil. And since it doesn't require
trx_sys->mutex protection it scales better.

For low concurrency performance difference is neglible.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
868c77df3e MDEV-14756 - Remove trx_sys_t::rw_trx_list
Replaced UT_LIST_GET_LEN(trx_sys->rw_trx_list) with
trx_sys->rw_trx_hash.size().
Moved freeing of trx objects at shutdown to rw_trx_hash destructor.
Small clean-up in trx_rollback_recovered().
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
d09f146934 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Reduce divergence between trx_sys_t::rw_trx_hash and trx_sys_t::rw_trx_list
by not adding recovered COMMITTED transactions to trx_sys_t::rw_trx_list.

Such transactions are discarded immediately without creating trx object.

This also required to split rollback and cleanup phases of recovery. To
reflect these updates the following renames happened:
trx_rollback_or_clean_all_recovered() -> trx_rollback_all_recovered()
trx_rollback_or_clean_is_active -> trx_rollback_is_active
trx_rollback_or_clean_recovered() -> trx_rollback_recovered()
trx_cleanup_at_db_startup() -> trx_cleanup_recovered()

Also removed a hack from lock_trx_release_locks(). Instead let recovery
rollback thread to skip committed XA transactions.
2018-01-20 16:09:26 +04:00
Marko Mäkelä
4ef2e43080 Merge bb-10.2-ext into 10.3 2018-01-17 16:33:40 +02:00
Marko Mäkelä
c6cd64f3cb Merge 10.2 into bb-10.2-ext 2018-01-17 16:22:27 +02:00
Marko Mäkelä
656f66def2 Follow-up fix to MDEV-14585 Automatically remove #sql- tables in InnoDB dictionary during recovery
If InnoDB is killed while ALTER TABLE...ALGORITHM=COPY is in progress,
after recovery there could be undo log records some records that were
inserted into an intermediate copy of the table. Due to these undo log
records, InnoDB would resurrect locks at recovery, and the intermediate
table would be locked while we are trying to drop it. This would cause
a call to row_rename_table_for_mysql(), either from
row_mysql_drop_garbage_tables() or from the rollback of a RENAME
operation that was part of the ALTER TABLE.

row_rename_table_for_mysql(): Do not attempt to parse FOREIGN KEY
constraints when renaming from #sql-something to #sql-something-else,
because it does not make any sense.

row_drop_table_for_mysql(): When deferring DROP TABLE due to locks,
do not rename the table if its name already starts with the #sql-
prefix, which is what row_mysql_drop_garbage_tables() uses.
Previously, the too strict prefix #sql-ib was used, and some
tables were renamed unnecessarily.
2018-01-17 16:21:56 +02:00
Sergei Golubchik
8f102b584d Merge branch 'github/10.3' into bb-10.3-temporal 2018-01-17 00:45:02 +01:00
Sergei Golubchik
edb6375910 compilation warning on windows 2018-01-17 00:44:11 +01:00
Marko Mäkelä
f44017384a MDEV-14968 On upgrade, InnoDB reports "started; log sequence number 0"
srv_prepare_to_delete_redo_log_files(): Initialize srv_start_lsn.
2018-01-16 20:02:38 +02:00
Marko Mäkelä
d87531a6a0 Follow-up to MDEV-14952: Remove some more btr_get_search_latch()
Replace some !rw_lock_own() assertions with the stronger
!btr_search_own_any(). Remove some redundant btr_get_search_latch()
calls.

btr_search_update_hash_ref(): Remove a duplicated assertion.

btr_search_build_page_hash_index(): Remove a duplicated assertion.
rw_lock_s_lock() asserts that the latch is not being held.

btr_search_disable_ref_count(): Remove an assertion. The only caller
is acquiring all adaptive hash index latches.
2018-01-16 14:08:48 +02:00
Marko Mäkelä
2281fcf38a Follow-up fix to MDEV-14952 for Mariabackup
innodb_init_param(): Initialize btr_ahi_parts=1 for Mariabackup.

btr_search_enabled: Let the adaptive hash index be disabled
in Mariabackup. This would potentially only matter during --export,
and --export performs a table scan, not many index lookups.
2018-01-16 14:08:48 +02:00
Marko Mäkelä
be85c2dc88 Mariabackup --prepare: Do not access transactions or data dictionary
innobase_start_or_create_for_mysql(): Only start the data dictionary
and transaction subsystems in normal server startup and during
mariabackup --export.
2018-01-16 13:57:30 +02:00
Marko Mäkelä
33ecf8345d Follow-up fix to MDEV-14441: Fix a potential race condition
btr_cur_update_in_place(): Read block->index only once,
so that it cannot change to NULL after the first read.
When block->index != NULL, it must be equal to index.
2018-01-16 13:55:45 +02:00
Marko Mäkelä
822f4e6c10 Merge 10.2 into bb-10.2-ext 2018-01-16 07:51:02 +02:00
Marko Mäkelä
f5e158183c Follow-up fix to MDEV-14441: Correct a misplaced condition
btr_cur_update_in_place(): The call rw_lock_x_lock(ahi_latch) must
of course be inside the if (ahi_latch) condition. This is a mistake
that I made when backporting the fix-under-development from 10.3.
2018-01-16 07:50:15 +02:00
Sergei Petrunia
0292cd0a27 Better explanation why rpl_row_triggers is disabled. 2018-01-15 21:08:00 +03:00
Marko Mäkelä
0664d633e4 MDEV-14952 Avoid repeated calls to btr_get_search_latch()
btr_cur_search_to_nth_level(), btr_search_guess_on_hash(),
btr_pcur_open_with_no_init_func(), row_sel_open_pcur():
Replace the parameter has_search_latch with the ahi_latch
(passed as NULL if the caller does not hold the latch).

btr_search_update_hash_node_on_insert(),
btr_search_update_hash_on_insert(),
btr_search_build_page_hash_index(): Add the parameter ahi_latch.

btr_search_x_lock(), btr_search_x_unlock(),
btr_search_s_lock(), btr_search_s_unlock(): Remove.
2018-01-15 19:51:09 +02:00
Marko Mäkelä
4beb699a36 MDEV-14952 Avoid repeated calls to btr_get_search_latch()
btr_cur_search_to_nth_level(), row_sel(): Do not bother to yield to
waiting exclusive lock requests on the adaptive hash index latch.
When the btr_search_latch was split into an array of latches
in MySQL 5.7.8 as part of the Oracle Bug#20985298 fix, the "caching"
of the latch across storage engine API calls was removed. Thus,
X-lock requests should have a good chance of becoming served, and
starvation should not be possible.

btr_search_guess_on_hash(): Clean up a debug assertion.
2018-01-15 19:32:48 +02:00
Marko Mäkelä
542ad0fa3f btr_search_check_guess(): Remove the parameter 'mode'
Also, use 32-bit native reads to read the 32-bit aligned
FIL_PAGE_PREV and FIL_PAGE_NEXT reads, to compare them to the
byte order agnostic pattern FIL_NULL (0xffffffff).
2018-01-15 19:32:36 +02:00
Marko Mäkelä
12f804acfa MDEV-14441 Deadlock due to InnoDB adaptive hash index
This is mere code clean-up; the reported problem was already fixed
in commit 3fdd390791.

row_sel(): Remove the variable search_latch_locked.

row_sel_try_search_shortcut(): Remove the parameter
search_latch_locked, which was always passed as nonzero.

row_sel_try_search_shortcut(), row_sel_try_search_shortcut_for_mysql():
Do not expect the caller to acquire the AHI latch. Instead,
acquire and release it inside this function.

row_search_mvcc(): Remove a bogus condition on mysql_n_tables_locked.
When the btr_search_latch was split into an array of latches
in MySQL 5.7.8 as part of the Oracle Bug#20985298 fix, the "caching"
of the latch across storage engine API calls was removed, and
thus it is unnecessary to avoid adaptive hash index searches
during INSERT...SELECT.
2018-01-15 19:18:47 +02:00
Marko Mäkelä
458e33cfbc MDEV-14441 Deadlock due to InnoDB adaptive hash index
This is not fixing the reported problem, but a potential problem that was
introduced in MDEV-11369.

row_sel_try_search_shortcut(), row_sel_try_search_shortcut_for_mysql():
When an adaptive hash index search lands on top of rec_is_default_row(),
we must skip the candidate and perform a normal search. This is because
the adaptive hash index latch only protects the record from being deleted
but does not prevent concurrent inserts into the page. Therefore, it is not
safe to dereference the next-record pointer.
2018-01-15 19:12:30 +02:00
Marko Mäkelä
4ef25dbfd8 Merge bb-10.2-ext into 10.3 2018-01-15 19:11:28 +02:00
Marko Mäkelä
e2e740030d Merge 10.2 into bb-10.2-ext 2018-01-15 19:07:02 +02:00
Marko Mäkelä
3fdd390791 MDEV-14441 InnoDB hangs when setting innodb_adaptive_hash_index=OFF during UPDATE
This race condition is a regression caused by MDEV-12121.

btr_cur_update_in_place(): Determine block->index!=NULL only once
in order to determine whether an adaptive hash index bucket needs
to be exclusively locked and unlocked.

If we evaluated block->index multiple times, and the adaptive hash
index was disabled before we locked the adaptive hash index, then
we would never release the adaptive hash index bucket latch, which
would eventually lead to InnoDB hanging.
2018-01-15 19:02:38 +02:00
Marko Mäkelä
39f236a2f5 Merge 10.2 into bb-10.2-ext 2018-01-15 16:41:10 +02:00
Sergei Petrunia
85aea5a12b Update .result for rocksdb.rpl_row_triggers (not the whole test works yet) 2018-01-15 16:50:18 +03:00
Marko Mäkelä
ec062c6181 MDEV-12121 follow-up: Unbreak the WITH_INNODB_AHI=OFF build 2018-01-15 15:40:28 +02:00
Marko Mäkelä
3d798be1d4 MDEV-14655 Assertion `!fts_index' failed in prepare_inplace_alter_table_dict
MariaDB inherits the MySQL limitation that ALGORITHM=INPLACE cannot
create more than one FULLTEXT INDEX at a time. As part of the MDEV-11369
Instant ADD COLUMN refactoring, MariaDB 10.3.2 accidentally stopped
enforcing the restriction.

Actually, it is a bug in MySQL 5.6 and MariaDB 10.0 that an ALTER TABLE
statement with multiple ADD FULLTEXT INDEX but without explicit
ALGORITHM=INPLACE would return in an error message, rather than
executing the operation with ALGORITHM=COPY.

ha_innobase::check_if_supported_inplace_alter(): Enforce the restriction
on multiple FULLTEXT INDEX.

prepare_inplace_alter_table_dict(): Replace some code with debug
assertions. A "goto error_handled" at this point would result in
another error, because the reference count of ctx->new_table would be 0.
2018-01-15 10:57:16 +02:00
Eugene Kosov
72136ae75c Compilation speed (#546)
Speed up compilation

Standard C++ headers contribute a lot to compilation time. Avoid algorithm
and sstream in frequently used headers.
2018-01-14 20:50:45 +04:00
Marko Mäkelä
70fff3688d Merge bb-10.2-ext into 10.3 2018-01-13 18:25:24 +02:00
Marko Mäkelä
bec2712775 Merge 10.2 into bb-10.2-ext 2018-01-13 18:18:28 +02:00
Marko Mäkelä
fc65577873 MDEV-14887 On a 32-bit system, MariaDB 10.2 mishandles data file sizes exceeding 4GiB
This is a regression that was introduced in MySQL 5.7.6 in
19855664de

fil_node_open_file(): Use proper 64-bit arithmetics for truncating
size_bytes to a multiple of a file extent size.
2018-01-13 18:15:04 +02:00
Sergey Vojtovich
0a63b50c7a Cleanup UT_LOW_PRIORITY_CPU/UT_RESUME_PRIORITY_CPU
Server already has HMT_low/HMT_medium.
2018-01-13 13:08:59 +04:00
Sergei Petrunia
1eea7966f3 Merge branch 'bb-10.2-mariarocks' into 10.2 2018-01-13 01:27:35 +03:00
Sergei Petrunia
4cafd8e66f rocksdb.information_schema testcase is not stable 2018-01-13 01:26:06 +03:00
Marko Mäkelä
3e6fcb6ac8 MDEV-14935 Remove bogus conditions related to not redo-logging PAGE_MAX_TRX_ID changes
InnoDB originally skipped the redo logging of PAGE_MAX_TRX_ID changes
until I enabled it in commit e76b873f24
that was part of MySQL 5.5.5 already.

Later, when a more complete history of the InnoDB Plugin for MySQL 5.1
(aka branches/zip in the InnoDB subversion repository) and of the
planned-to-be closed-source branches/innodb+ that became the basis of
InnoDB in MySQL 5.5 was pushed to the MySQL source repository, the
change was part of commit 509e761f06:

 ------------------------------------------------------------------------
 r5038 | marko | 2009-05-19 22:59:07 +0300 (Tue, 19 May 2009) | 30 lines

 branches/zip: Write PAGE_MAX_TRX_ID to the redo log. Otherwise,
 transactions that are started before the rollback of incomplete
 transactions has finished may have an inconsistent view of the
 secondary indexes.

 dict_index_is_sec_or_ibuf(): Auxiliary function for controlling
 updates and checks of PAGE_MAX_TRX_ID: check whether an index is a
 secondary index or the insert buffer tree.

 page_set_max_trx_id(), page_update_max_trx_id(),
 lock_rec_insert_check_and_lock(),
 lock_sec_rec_modify_check_and_lock(), btr_cur_ins_lock_and_undo(),
 btr_cur_upd_lock_and_undo(): Add the parameter mtr.

 page_set_max_trx_id(): Allow mtr to be NULL.  When mtr==NULL, do not
 attempt to write to the redo log.  This only occurs when creating a
 page or reorganizing a compressed page.  In these cases, the
 PAGE_MAX_TRX_ID will be set correctly during the application of redo
 log records, even though there is no explicit log record about it.

 btr_discard_only_page_on_level(): Preserve PAGE_MAX_TRX_ID.  This
 function should be unreachable, though.

 btr_cur_pessimistic_update(): Update PAGE_MAX_TRX_ID.

 Add some assertions for checking that PAGE_MAX_TRX_ID is set on all
 secondary index leaf pages.

 rb://115 tested by Michael, fixes Issue #211
 ------------------------------------------------------------------------

After this fix, some bogus references to recv_recovery_is_on()
remained. Also, some references could be replaced with
references to index->is_dummy to prepare us for MDEV-14481
(background redo log apply).
2018-01-12 18:31:03 +02:00
Sergei Petrunia
2da1917912 Attempt to eliminate race conditions in rocksdb.information_schema 2018-01-12 16:04:29 +00:00
Varun Gupta
028e2ddc54 Added a missing result file to the rocksdb_sys_vars result suite 2018-01-12 19:16:36 +05:30
Sergei Petrunia
c481fc9ca7 Change MyRocks maturity from Alpha to Beta 2018-01-12 15:58:34 +03:00
Sergei Petrunia
d32f5be307 MDEV-14372: Fix and enable rocksdb.information_schema test
- Make Rdb_binlog_manager::unpack_value to not have a stack overrun
  when it is reading invalid data (which it currently does as we in
  MariaDB do not store binlog coordinates under BINLOG_INFO_INDEX_NUMBER,
  see comments in MDEV-14892 for details).
- We may need to store these coordinates in the future, so instead of
  removing the call of this function, let's make it work properly for
  all possible inputs.
2018-01-12 15:58:34 +03:00
Marko Mäkelä
6dd302d164 Merge bb-10.2-ext into 10.3 2018-01-11 19:44:41 +02:00
Marko Mäkelä
cca611d1c0 Merge 10.2 into bb-10.2-ext 2018-01-11 18:00:31 +02:00
Marko Mäkelä
773c3ceb57 MDEV-14824 Assertion `!trx_is_started(trx)' failed in innobase_start_trx_and_assign_read_view
In CREATE SEQUENCE or CREATE TEMPORARY SEQUENCE, we should not start
an InnoDB transaction for inserting the sequence status record into
the underlying no-rollback table. Because we did this, a debug assertion
failure would fail in START TRANSACTION WITH CONSISTENT SNAPSHOT after
CREATE TEMPORARY SEQUENCE was executed.

row_ins_step(): Do not start the transaction. Let the caller do that.

que_thr_step(): Start the transaction before calling row_ins_step().

row_ins_clust_index_entry(): Skip locking and undo logging for no-rollback
tables, even for temporary no-rollback tables.

row_ins_index_entry(): Allow trx->id==0 for no-rollback tables.

row_insert_for_mysql(): Do not start a transaction for no-rollback tables.
2018-01-11 16:34:31 +02:00
Marko Mäkelä
e9842de20c Merge 10.1 into 10.2 2018-01-11 12:05:57 +02:00
Marko Mäkelä
c15b3d2d41 Merge 10.0 into 10.1 2018-01-11 10:44:05 +02:00
Sergey Vojtovich
0ca2ea1a65 MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH
trx reference counter was updated under mutex and read without any
protection. This is both slow and unsafe. Use atomic operations for
reference counter accesses.
2018-01-11 12:30:53 +04:00
Sergey Vojtovich
380069c235 MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH
trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite
expensive operations under trx_sys_t::mutex protection: e.g. malloc/free
when adding/removing elements. Traversing b-tree is not that cheap either.

This has negative scalability impact, which is especially visible when running
oltp_update_index.lua benchmark on a ramdisk.

To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None
of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex)
protection.

Another interesting issue observed with std::set is reproducible ~2% performance
decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable.

All in all this patch optimises away one of three trx_sys->mutex locks per
oltp_update_index.lua query. The other two critical sections became smaller.

Relevant clean-ups:

Replaced rw_trx_set iteration at startup with local set. The latter is needed
because values inserted to rw_trx_list must be ordered by trx->id.

Removed redundant conditions from trx_reference(): it is (and even was) never
called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY.
do_ref_count doesn't (and probably even didn't) make any sense: now it is called
only when reference counter increment is actually requested.

Moved condition out of mutex in trx_erase_lists().

trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were
greatly simplified and replaced by appropriate trx_rw_hash_t methods.

Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or
ACTIVE states. Transactions in COMMITTED state were required to be found
at InnoDB startup only. They are now looked up in the local set.

Removed unused trx_assert_recovered().

Removed unused innobase_get_trx() declaration.

Removed rather semantically incorrect trx_sys_rw_trx_add().

Moved information printout from trx_sys_init_at_db_start() to
trx_lists_init_at_db_start().
2018-01-11 12:30:53 +04:00