Commit graph

585 commits

Author SHA1 Message Date
Sergey Vojtovich
4575ae70da Plug a memory leak 2018-01-24 14:00:42 +02:00
Marko Mäkelä
9875d5c3e1 Merge bb-10.2-ext into 10.3 2018-01-24 14:00:33 +02:00
Alexander Barkov
ec6b8c546a Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext 2018-01-23 17:43:12 +04:00
Marko Mäkelä
d04e1d4bdc MDEV-15029 XA COMMIT and XA ROLLBACK operate on freed transaction object
innobase_commit_by_xid(), innobase_rollback_by_xid(): Decrement
the reference count before freeing the transaction object to the pool.
Failure to do so might corrupt the transaction bookkeeping
if trx_create_low() returns the same object to another thread
before we are done with it.

trx_sys_close(): Detach the recovered XA PREPARE transactions from
trx_sys->rw_trx_list before freeing them.
2018-01-22 16:25:37 +02:00
Sergey Vojtovich
4dc30f3c17 MDEV-15019 - InnoDB: store ReadView on trx
This will allow us to reduce critical section protected by
trx_sys.mutex:
- no need to maintain global m_free list
- eliminate if (trx->read_view == NULL) condition.

On x86_64 sizeof(Readview) is 144 mostly due to padding, sizeof(trx_t)
with ReadView is 1200.

Also don't close ReadView for read-write transactions, just mark it
closed similarly to read-only.

Clean-up: removed n_prepared_recovered_trx and n_prepared_trx, which
accidentally re-appeared after some rebase.
2018-01-22 16:23:15 +04:00
Marko Mäkelä
c425dcd8f2 Merge 10.2 into bb-10.2-ext 2018-01-22 09:04:32 +02:00
Marko Mäkelä
4f8555f1f6 MDEV-14941 Timeouts on persistent statistics tables caused by MDEV-14511
MDEV-14511 tried to avoid some consistency problems related to InnoDB
persistent statistics. The persistent statistics are being written by
an InnoDB internal SQL interpreter that requires the InnoDB data dictionary
cache to be locked.

Before MDEV-14511, the statistics were written during DDL in separate
transactions, which could unnecessarily reduce performance (each commit
would require a redo log flush) and break atomicity, because the statistics
would be updated separately from the dictionary transaction.

However, because it is unacceptable to hold the InnoDB data dictionary
cache locked while suspending the execution for waiting for a
transactional lock (in the mysql.innodb_index_stats or
mysql.innodb_table_stats tables) to be released, any lock conflict
was immediately be reported as "lock wait timeout".

To fix MDEV-14941, an attempt to reduce these lock conflicts by acquiring
transactional locks on the user tables in both the statistics and DDL
operations was made, but it would still not entirely prevent lock conflicts
on the mysql.innodb_index_stats and mysql.innodb_table_stats tables.

Fixing the remaining problems would require a change that is too intrusive
for a GA release series, such as MariaDB 10.2.

Thefefore, we revert the change MDEV-14511. To silence the
MDEV-13201 assertion, we use the pre-existing flag trx_t::internal.
2018-01-22 08:58:47 +02:00
Monty
27a5d96bcb Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext
Conflicts:
	sql/sp_rcontext.cc
2018-01-21 20:32:48 +02:00
Sergey Vojtovich
ec32c05072 Get rid of trx->read_view pointer juggling
trx->read_view|= 1 was done in a silly attempt to fix race condition
where trx->read_view was closed without trx_sys.mutex lock by read-only
trasnactions.

This just made the problem less likely to happen. In fact there was race
condition in const version of trx_get_read_view(): pointer may change to
garbage any moment after MVCC::is_view_active(trx->read_view) check and
before this function returns.

This patch doesn't fix this race condition, but rather makes it's
consequences less destructive.
2018-01-20 16:10:38 +04:00
Sergey Vojtovich
90bf55673e Misc trx_sys scalability fixes
trx_erase_lists(): trx->read_view is owned by current thread and thus
doesn't need trx_sys.mutex protection for reading it's value. Move
trx->read_view check out of mutex

trx_start_low(): moved assertion out of mutex.

Call ReadView::creator_trx_id() directly: allows to inline this one-line
method.
2018-01-20 16:10:37 +04:00
Sergey Vojtovich
64048bafe0 Removed purge_trx_id_age and purge_view_trx_id_age
These were unused status variables available in debug builds only.
Also removed trx_sys.rw_max_trx_id: not used anymore.
2018-01-20 16:10:37 +04:00
Sergey Vojtovich
db5bb785f9 Allocate trx_sys.mvcc at link time
trx_sys.mvcc was allocated dynamically for no good reason.
2018-01-20 16:10:36 +04:00
Marko Mäkelä
f8882cce93 Replace trx_sys_t* trx_sys with trx_sys_t trx_sys
There is only one transaction system object in InnoDB.
Allocate the storage for it at link time, not at runtime.

lock_rec_fetch_page(): Use the correct fetch mode BUF_GET.
Pages may never be deallocated from a tablespace while
record locks are pointing to them.
2018-01-20 16:10:36 +04:00
Sergey Vojtovich
7078203389 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Use atomic operations when accessing trx_sys_t::max_trx_id. We can't yet
move trx_sys_t::get_new_trx_id() out of mutex because it must be updated
atomically along with trx_sys_t::rw_trx_ids.
2018-01-20 16:10:35 +04:00
Sergey Vojtovich
c6d2842d9a MDEV-14756 - Remove trx_sys_t::rw_trx_list
Remove rw_trx_list.
2018-01-20 16:10:35 +04:00
Sergey Vojtovich
a447980ff3 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Let lock_print_info_all_transactions() iterate rw_trx_hash instead of
rw_trx_list.

When printing info of locks for transactions, InnoDB monitor doesn't
attempt to read relevant page from disk anymore. The code was prone
to race conditions.

Note that TrxListIterator didn't work as advertised: it iterated
rw_trx_list only.
2018-01-20 16:10:34 +04:00
Sergey Vojtovich
886af392d3 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Let trx_rollback_recovered() iterate rw_trx_hash instead of rw_trx_list.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
02270b44d0 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Let lock_validate_table_locks(), lock_rec_other_trx_holds_expl(),
lock_table_locks_lookup(), trx_recover_for_mysql(), trx_get_trx_by_xid(),
trx_roll_must_shutdown(), fetch_data_into_cache() iterate rw_trx_hash
instead of rw_trx_list.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
d8c0caad32 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Removed trx_sys_validate_trx_list(): with rw_trx_hash elements are not
required to be ordered by transaction id. Transaction state is now guarded
by asserts in rw_trx_hash_t.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
900b07908b MDEV-14756 - Remove trx_sys_t::rw_trx_list
Removed trx_sys_t::n_prepared_recovered_trx: never used.

Removed trx_sys_t::n_prepared_trx: used only at shutdown, we can perfectly
get this value from rw_trx_hash.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
868c77df3e MDEV-14756 - Remove trx_sys_t::rw_trx_list
Replaced UT_LIST_GET_LEN(trx_sys->rw_trx_list) with
trx_sys->rw_trx_hash.size().
Moved freeing of trx objects at shutdown to rw_trx_hash destructor.
Small clean-up in trx_rollback_recovered().
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
d09f146934 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Reduce divergence between trx_sys_t::rw_trx_hash and trx_sys_t::rw_trx_list
by not adding recovered COMMITTED transactions to trx_sys_t::rw_trx_list.

Such transactions are discarded immediately without creating trx object.

This also required to split rollback and cleanup phases of recovery. To
reflect these updates the following renames happened:
trx_rollback_or_clean_all_recovered() -> trx_rollback_all_recovered()
trx_rollback_or_clean_is_active -> trx_rollback_is_active
trx_rollback_or_clean_recovered() -> trx_rollback_recovered()
trx_cleanup_at_db_startup() -> trx_cleanup_recovered()

Also removed a hack from lock_trx_release_locks(). Instead let recovery
rollback thread to skip committed XA transactions.
2018-01-20 16:09:26 +04:00
Marko Mäkelä
6c09a6542e MDEV-14985 innodb_undo_log_truncate may be blocked if transactions were recovered at startup
The field trx_rseg_t::trx_ref_count that was added in WL#6965 in
MySQL 5.7.5 is being incremented twice if a recovered transaction
includes both undo log partitions insert_undo and update_undo.

This reference count is being used in trx_purge(), which invokes
trx_purge_initiate_truncate() to try to truncate an undo tablespace
file. Because of the double-increment, the trx_ref_count would never
reach 0.

It is possible that after the failed truncation attempt, the undo
tablespace would be disabled for logging any new transactions until
the server is restarted (hopefully after committing or rolling back
all transactions, so that no transactions would be recovered
on the next startup).

trx_resurrect_insert(), trx_resurrect_update(): Do not increment
trx_ref_count. Instead, let the caller do that.

trx_lists_init_at_db_start(): Increment rseg->trx_ref_count only
once for each recovered transaction. Adjust comments.
Finally, if innodb_force_recovery prevents the undo log scan,
do not bother iterating the empty lists.
2018-01-18 16:26:09 +02:00
Sergei Golubchik
8f102b584d Merge branch 'github/10.3' into bb-10.3-temporal 2018-01-17 00:45:02 +01:00
Marko Mäkelä
6dd302d164 Merge bb-10.2-ext into 10.3 2018-01-11 19:44:41 +02:00
Marko Mäkelä
cca611d1c0 Merge 10.2 into bb-10.2-ext 2018-01-11 18:00:31 +02:00
Marko Mäkelä
e9842de20c Merge 10.1 into 10.2 2018-01-11 12:05:57 +02:00
Marko Mäkelä
c15b3d2d41 Merge 10.0 into 10.1 2018-01-11 10:44:05 +02:00
Sergey Vojtovich
380069c235 MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH
trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite
expensive operations under trx_sys_t::mutex protection: e.g. malloc/free
when adding/removing elements. Traversing b-tree is not that cheap either.

This has negative scalability impact, which is especially visible when running
oltp_update_index.lua benchmark on a ramdisk.

To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None
of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex)
protection.

Another interesting issue observed with std::set is reproducible ~2% performance
decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable.

All in all this patch optimises away one of three trx_sys->mutex locks per
oltp_update_index.lua query. The other two critical sections became smaller.

Relevant clean-ups:

Replaced rw_trx_set iteration at startup with local set. The latter is needed
because values inserted to rw_trx_list must be ordered by trx->id.

Removed redundant conditions from trx_reference(): it is (and even was) never
called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY.
do_ref_count doesn't (and probably even didn't) make any sense: now it is called
only when reference counter increment is actually requested.

Moved condition out of mutex in trx_erase_lists().

trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were
greatly simplified and replaced by appropriate trx_rw_hash_t methods.

Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or
ACTIVE states. Transactions in COMMITTED state were required to be found
at InnoDB startup only. They are now looked up in the local set.

Removed unused trx_assert_recovered().

Removed unused innobase_get_trx() declaration.

Removed rather semantically incorrect trx_sys_rw_trx_add().

Moved information printout from trx_sys_init_at_db_start() to
trx_lists_init_at_db_start().
2018-01-11 12:30:53 +04:00
Marko Mäkelä
4c1479545d Merge 5.5 into 10.0 2018-01-11 10:16:52 +02:00
Marko Mäkelä
bdcd7f79e4 MDEV-14916 InnoDB reports warning for "Purge reached the head of the history list"
The warning was originally added in
commit c67663054a
(MySQL 4.1.12, 5.0.3) to trace claimed undo log corruption that
was analyzed in https://lists.mysql.com/mysql/176250
on November 9, 2004.

Originally, the limit was 20,000 undo log headers or transactions,
but in commit 9d6d1902e0
in MySQL 5.5.11 it was increased to 2,000,000.

The message can be triggered when the progress of purge is prevented
by a long-running transaction (or just an idle transaction whose
read view was started a long time ago), by running many transactions
that UPDATE or DELETE some records, then starting another transaction
with a read view, and finally by executing more than 2,000,000
transactions that UPDATE or DELETE records in InnoDB tables. Finally,
when the oldest long-running transaction is completed, purge would
run up to the next-oldest transaction, and there would still be more
than 2,000,000 transactions to purge.

Because the message can be triggered when the database is obviously
not corrupted, it should be removed. Heavy users of InnoDB should be
monitoring the "History list length" in SHOW ENGINE INNODB STATUS;
there is no need to spam the error log.
2018-01-11 09:55:10 +02:00
Aleksey Midenkov
c59c1a0736 System Versioning 1.0 pre8
Merge branch '10.3' into trunk
2018-01-10 12:36:55 +03:00
Marko Mäkelä
fa7d85bb87 Merge bb-10.2-ext into 10.3 2018-01-05 22:52:06 +02:00
Monty
e9a2082634 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext
Conflicts:
	mysql-test/r/cte_nonrecursive.result
	mysql-test/suite/galera/r/galera_bf_abort.result
	mysql-test/suite/galera/r/galera_bf_abort_get_lock.result
	mysql-test/suite/galera/r/galera_bf_abort_sleep.result
	mysql-test/suite/galera/r/galera_enum.result
	mysql-test/suite/galera/r/galera_fk_conflict.result
	mysql-test/suite/galera/r/galera_insert_multi.result
	mysql-test/suite/galera/r/galera_many_indexes.result
	mysql-test/suite/galera/r/galera_mdl_race.result
	mysql-test/suite/galera/r/galera_nopk_bit.result
	mysql-test/suite/galera/r/galera_nopk_blob.result
	mysql-test/suite/galera/r/galera_nopk_large_varchar.result
	mysql-test/suite/galera/r/galera_nopk_unicode.result
	mysql-test/suite/galera/r/galera_pk_bigint_signed.result
	mysql-test/suite/galera/r/galera_pk_bigint_unsigned.result
	mysql-test/suite/galera/r/galera_serializable.result
	mysql-test/suite/galera/r/galera_toi_drop_database.result
	mysql-test/suite/galera/r/galera_toi_lock_exclusive.result
	mysql-test/suite/galera/r/galera_toi_truncate.result
	mysql-test/suite/galera/r/galera_unicode_pk.result
	mysql-test/suite/galera/r/galera_var_auto_inc_control_off.result
	mysql-test/suite/galera/r/galera_wsrep_log_conficts.result
	sql/field.cc
	sql/rpl_gtid.cc
	sql/share/errmsg-utf8.txt
	sql/sql_acl.cc
	sql/sql_parse.cc
	sql/sql_partition_admin.cc
	sql/sql_prepare.cc
	sql/sql_repl.cc
	sql/sql_table.cc
	sql/sql_yacc.yy
2018-01-05 16:52:40 +02:00
Marko Mäkelä
c8e6364407 Merge branch 10.1 into 10.2 2018-01-04 20:47:34 +02:00
Marko Mäkelä
21470de148 Merge 10.0 into 10.1 2018-01-04 20:42:29 +02:00
Marko Mäkelä
4496fd71f4 Fix a truncation warning introduced in MDEV-12323 2018-01-04 20:39:00 +02:00
Marko Mäkelä
145ae15a33 Merge bb-10.2-ext into 10.3 2018-01-04 09:22:59 +02:00
Marko Mäkelä
acd2862e65 MDEV-14848 MariaDB 10.3 refuses InnoDB crash-upgrade from MariaDB 10.2
While the redo log format was changed in MariaDB 10.3.2 and 10.3.3
due to MDEV-12288 and MDEV-11369, it should be technically possible
to upgrade from a crashed MariaDB 10.2 instance.

On a related note, it should be possible for Mariabackup 10.3
to create a backup from a running MariaDB Server 10.2.

mlog_id_t: Put back the 10.2 specific redo log record types
MLOG_UNDO_INSERT, MLOG_UNDO_ERASE_END, MLOG_UNDO_INIT,
MLOG_UNDO_HDR_REUSE.

trx_undo_parse_add_undo_rec(): Parse or apply MLOG_UNDO_INSERT.

trx_undo_erase_page_end(): Apply MLOG_UNDO_ERASE_END.

trx_undo_parse_page_init(): Parse or apply MLOG_UNDO_INIT.

trx_undo_parse_page_header_reuse(): Parse or apply MLOG_UNDO_HDR_REUSE.

recv_log_recover_10_2(): Remove. Always parse the redo log from 10.2.

recv_find_max_checkpoint(), recv_recovery_from_checkpoint_start():
Always parse the redo log from MariaDB 10.2.

recv_parse_or_apply_log_rec_body(): Parse or apply
MLOG_UNDO_INSERT, MLOG_UNDO_ERASE_END, MLOG_UNDO_INIT.

srv_prepare_to_delete_redo_log_files(),
innobase_start_or_create_for_mysql(): Upgrade from a previous (supported)
redo log format.
2018-01-03 19:08:50 +02:00
Marko Mäkelä
f7fd6ace18 Merge 10.2 into bb-10.2-ext 2018-01-03 15:48:47 +02:00
Marko Mäkelä
9eb3fcc9fb Follow-up fix of MDEV-14717 RENAME TABLE in InnoDB is not crash-safe
trx_undo_page_report_rename(): Return a pointer to the start of the
undo log record, not to the start of the (not yet written) next free
record. The wrong return value would sometimes cause ROLLBACK to crash
in an assertion failure (trying to parse garbage from the free area at
the end of the insert_undo log page) if the TRX_UNDO_RENAME_TABLE record
was the very last thing that was written to the insert_undo log. This
would occasionally happen when an ALTER TABLE operation is rolled
back due to invalid FOREIGN KEY constraints in the innodb.innodb test.
In these tests, the error ER_ERROR_ON_RENAME (1025) would be returned
at the end of the ALGORITHM=COPY operation of ALTER TABLE.
2018-01-03 14:54:15 +02:00
Marko Mäkelä
d361401bc2 Merge 10.1 into 10.2, with some MDEV-14799 fixups
trx_undo_page_report_modify(): For SPATIAL INDEX, keep logging
updated off-page columns twice, so that
the minimum bounding rectangle (MBR) will be logged.
Avoiding the redundant logging would require larger changes
to the undo log format.

row_build_index_entry_low(): Handle SPATIAL_UNKNOWN more robustly,
by refusing to purge the record from the spatial index.
We can get this code when processing old undo log from 10.2.10 or
10.2.11 (the releases affected by MDEV-14799, which was a regression
from MDEV-14051).
2018-01-03 11:56:24 +02:00
Marko Mäkelä
016caa3d20 Merge 10.0 into 10.1 2018-01-02 21:57:22 +02:00
Marko Mäkelä
51e4650ed0 Merge 5.5 into 10.0 2018-01-02 21:52:46 +02:00
Marko Mäkelä
20fab71b14 Follow-up to MDEV-14799: Remove bogus debug assertions
trx_undo_rec_get_partial_row(): When the PRIMARY KEY includes a
column prefix of an externally stored column, the already parsed
part of the undo log record may contain a reference to
an off-page column. This is the case in the bug58912 test in
innodb.innodb.
2018-01-02 21:41:39 +02:00
Marko Mäkelä
d384ead0f0 MDEV-14799 After UPDATE of indexed columns, old values will not be purged from secondary indexes
This is a regression caused by MDEV-14051 'Undo log record is too big.'

Purge in the secondary index is wrongly skipped in
row_purge_upd_exist_or_extern() because node->row only does not contain all
indexed columns.

trx_undo_rec_get_partial_row(): Add the parameter for node->update
so that the updated columns will be copied from the initial part
of the undo log record.
2018-01-02 19:11:10 +02:00
Monty
fbab79c9b8 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext
Conflicts:
	cmake/make_dist.cmake.in
	mysql-test/r/func_json.result
	mysql-test/r/ps.result
	mysql-test/t/func_json.test
	mysql-test/t/ps.test
	sql/item_cmpfunc.h
2018-01-01 19:39:59 +02:00
Vicențiu Ciorbaru
985d2d393c Merge remote-tracking branch 'origin/10.1' into 10.2 2017-12-22 12:23:39 +02:00
Sergey Vojtovich
1464f4808c MDEV-14477 InnoDB update_time is wrongly updated after partial rollback or internal COMMIT
This is partial revert of original patch.

Read-only transactions that modified temporary tables are added to
trx_sys_t::rw_trx_ids and trx_sys_t::rw_trx_set. However with patch for
MDEV-14477 they were not removed.

Restore old behaviour in this regard.
2017-12-22 14:09:24 +04:00
Marko Mäkelä
1cf28964f5 MDEV-6247 post-fix: Re-enable some debug assertions
These assertions were disabled in MariaDB 10.1.1 in
commit df4dd593f2
with a bogus comment referring to the function wsrep_fake_trx_id()
that was introduced in the very same commit.
2017-12-21 10:19:49 +02:00