Commit graph

305 commits

Author SHA1 Message Date
Marko Mäkelä
27ea2963fc Dead code removal: sess_t
The session object is not really needed for anything.
We can directly create and free the dummy purge_sys->query->trx.
2018-02-15 10:01:05 +02:00
Marko Mäkelä
5fe9b4a7ae MDEV-14648 Restore fix for MySQL BUG#39053 - UNINSTALL PLUGIN does not allow the storage engine to cleanup open connections
Also, allow the MariaDB 10.2 server to link InnoDB dynamically
against ha_innodb.so (which is what mysql-test-run.pl expects
to exist, instead of the default name ha_innobase.so).

wsrep_load_data_split(): Instead of referring to innodb_hton_ptr,
check the handlerton::db_type. This was recently broken by me in
MDEV-11415.

innodb_lock_schedule_algorithm: Define as a weak global symbol,
so that WITH_WSREP will not depend on InnoDB being linked statically.
I tested this manually. Notably, running a test that only does
	SET GLOBAL wsrep_on=1;
with a static or dynamic InnoDB and
	./mtr --mysqld=--loose-innodb-lock-schedule-algorithm=fcfs
will crash with SIGSEGV at shutdown. With the default VATS
combination the wsrep_on is properly refused for both the
static and dynamic InnoDB.

ha_close_connection(): Do invoke the method also for plugins
for which UNINSTALL PLUGIN was deferred due to open connections.
Thanks to @svoj for pointing this out.

thd_to_trx(): Return a pointer, not a reference to a pointer.

check_trx_exists(): Invoke thd_set_ha_data() for assigning a transaction.

log_write_checkpoint_info(): Remove an unused DEBUG_SYNC point
that would cause an assertion failure on shutdown after deferred
UNINSTALL PLUGIN.

This was tested as follows:

cmake -DWITH_WSREP=1 -DPLUGIN_INNOBASE:STRING=DYNAMIC \
-DWITH_MARIABACKUP:BOOL=OFF ...
make
cd mysql-test
./mtr innodb.innodb_uninstall
2018-02-15 09:59:03 +02:00
Marko Mäkelä
d9955b22e9 Merge 10.1 into 10.2 2018-02-13 14:49:47 +02:00
Marko Mäkelä
2202afd541 Merge 10.0 into 10.1 2018-02-13 14:32:17 +02:00
Marko Mäkelä
c051eaba46 MDEV-14988 innodb_read_only tries to modify files if transactions were recovered in COMMITTED state
lock_trx_release_locks(): Relax a debug assertion to allow
recovered TRX_STATE_COMMITTED_IN_MEMORY transactions.

trx_commit_in_memory(): Add DEBUG_SYNC instrumentation.

trx_undo_insert_cleanup(): Skip persistent changes if innodb_read_only
is set. This should only happen when a recovered committed transaction
would be cleaned up at shutdown.
2018-02-13 14:29:32 +02:00
Marko Mäkelä
7660d8c94e Remove dict_table_t::is_clust()
Replace all occurrences of the is_clust() method with is_primary(),
because that is what is actually meant. (Also the change buffer
tree would count as a clustered index.)
2018-02-08 12:18:07 +02:00
Vladislav Vaintroub
d995dd2865 Windows : reenable warning C4805 (unsafe mix of types in bool operations) 2018-02-07 20:12:12 +00:00
Sergei Golubchik
4771ae4b22 Merge branch 'github/10.1' into 10.2 2018-02-06 14:50:50 +01:00
Marko Mäkelä
d6ed077fc8 Clarify a comment after MDEV-15061 2018-02-04 13:11:49 +02:00
Marko Mäkelä
f69a3b2e92 After-merge fix for commit d4df7bc9b1
The merge omitted some InnoDB and XtraDB conflict resolutions,
most notably, failing to merge the fix of MDEV-12173.

ibuf_merge_or_delete_for_page(), lock_rec_block_validate():
Invoke fil_space_acquire_silent() instead of fil_space_acquire().
This fixes MDEV-12173.

wsrep_debug, wsrep_trx_is_aborting(): Removed unused declarations.

_fil_io(): Remove. Instead, declare default parameters for the XtraDB
fil_io().

buf_read_page_low(): Declare default parameters, and clean up some
callers.

os_aio(): Correct the macro that is defined when !UNIV_PFS_IO.
2018-02-02 19:57:59 +02:00
Jimmy Yang
6266493fc3 Bug #25729649 LOCK0LOCK.CC:NNN:ADD_POSITION != __NULL
Reviewed-by: Sunny Bains <sunny.bains@oracle.com>
2018-02-02 16:15:30 +02:00
Sergei Golubchik
d4df7bc9b1 Merge branch 'github/10.0' into 10.1 2018-02-02 10:09:44 +01:00
Marko Mäkelä
c7d0448797 MDEV-15132 Avoid accessing the TRX_SYS page
InnoDB maintains an internal persistent sequence of transaction
identifiers. This sequence is used for assigning both transaction
start identifiers (DB_TRX_ID=trx->id) and end identifiers (trx->no)
as well as end identifiers for the mysql.transaction_registry table
that was introduced in MDEV-12894.

TRX_SYS_TRX_ID_WRITE_MARGIN: Remove. After this many updates of
the sequence we used to update the TRX_SYS page. We can avoid accessing
the TRX_SYS page if we modify the InnoDB startup so that resurrecting
the sequence from other pages of the transaction system.

TRX_SYS_TRX_ID_STORE: Deprecate. The field only exists for the purpose
of upgrading from an earlier version of MySQL or MariaDB.

Starting with this fix, MariaDB will rely on the fields
TRX_UNDO_TRX_ID, TRX_UNDO_TRX_NO in the undo log header page of
each non-committed transaction, and on the new field
TRX_RSEG_MAX_TRX_ID in rollback segment header pages.

Because of this change, setting innodb_force_recovery=5 or 6 may cause
the system to recover with trx_sys.get_max_trx_id()==0. We must adjust
checks for invalid DB_TRX_ID and PAGE_MAX_TRX_ID accordingly.

We will change the startup and shutdown messages to display the
trx_sys.get_max_trx_id() in addition to the log sequence number.

trx_sys_t::flush_max_trx_id(): Remove.

trx_undo_mem_create_at_db_start(), trx_undo_lists_init():
Add an output parameter max_trx_id, to be updated from
TRX_UNDO_TRX_ID, TRX_UNDO_TRX_NO.

TRX_RSEG_MAX_TRX_ID: New field, for persisting
trx_sys.get_max_trx_id() at the time of the latest transaction commit.
Startup is not reading the undo log pages of committed transactions.
We want to avoid additional page accesses on startup, as well as
trouble when all undo logs have been emptied.
On startup, we will simply determine the maximum value from all pages
that are being read anyway.

TRX_RSEG_FORMAT: Redefined from TRX_RSEG_MAX_SIZE.

Old versions of InnoDB wrote uninitialized garbage to unused data fields.
Because of this, we cannot simply introduce a new field in the
rollback segment pages and expect it to be always zero, like it would
if the database was created by a recent enough InnoDB version.

Luckily, it looks like the field TRX_RSEG_MAX_SIZE was always written
as 0xfffffffe. We will indicate a new subformat of the page by writing
0 to this field. This has the nice side effect that after a downgrade
to older versions of InnoDB, transactions should fail to allocate any
undo log, that is, writes will be blocked. So, there is no problem of
getting corrupted transaction identifiers after downgrading.

trx_rseg_t::max_size: Remove.

trx_rseg_header_create(): Remove the parameter max_size=ULINT_MAX.

trx_purge_add_undo_to_history(): Update TRX_RSEG_MAX_SIZE
(and TRX_RSEG_FORMAT if needed). This is invoked on transaction commit.

trx_rseg_mem_restore(): If TRX_RSEG_FORMAT contains 0,
read TRX_RSEG_MAX_SIZE.

trx_rseg_array_init(): Invoke trx_sys.init_max_trx_id(max_trx_id + 1)
where max_trx_id was the maximum that was encountered in the rollback
segment pages and the undo log pages of recovered active, XA PREPARE,
or some committed transactions. (See trx_purge_add_undo_to_history()
which invokes trx_rsegf_set_nth_undo(..., FIL_NULL, ...);
not all committed transactions will be immediately detached from the
rollback segment header.)
2018-01-31 10:24:19 +02:00
Marko Mäkelä
921c5e9314 Merge bb-10.2-ext into 10.3
MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY

Move a test from innodb.rename_table_debug to innodb.alter_copy.

ha_innobase::extra(HA_EXTRA_BEGIN_ALTER_COPY): Register id-versioned
tables so that mysql.transaction_registry will be updated, even for
empty tables that are subjected to ALTER TABLE…ALGORITHM=COPY.
2018-01-30 21:26:53 +02:00
Marko Mäkelä
0ba6aaf030 MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY
If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend
a lot of time rolling back writes to the intermediate copy of the table.
To reduce the amount of busy work done, a work-around was introduced in
commit fd069e2bb3 in MySQL 4.1.8 and 5.0.2,
to commit the transaction after every 10,000 inserted rows.

A proper fix would have been to disable the undo logging altogether and
to simply drop the intermediate copy of the table on subsequent server
startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585.
In MariaDB 10.2, the intermediate copy of the table would be left behind
with a name starting with the string #sql.

This is a backport of a bug fix from MySQL 8.0.0 to MariaDB,
contributed by jixianliang <271365745@qq.com>.

Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation
InnoDB must for now keep the undo logging enabled, so that the latest
row can be rolled back in case of an error.

In Galera cluster, the LOAD DATA statement will retain the existing
behaviour and commit the transaction after every 10,000 rows if
the parameter wsrep_load_data_splitting=ON is set. The logic to do
so (the wsrep_load_data_split() function and the call
handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work
by Ji Xianliang and Marko Mäkelä.

The original fix:

Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com>
Date:   Wed Dec 2 16:09:15 2015 +0530

Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY

Problem:

During ALTER TABLE, we commit and restart the transaction for every
10,000 rows, so that the rollback after recovery would not take so long.

Fix:

Suppress the undo logging during copy alter operation. If fts_index is
present then insert directly into fts auxiliary table rather
than doing at commit time.

ha_innobase::num_write_row: Remove the variable.

ha_innobase::write_row(): Remove the hack for committing every 10000 rows.

row_lock_table_for_mysql(): Remove the extra 2 parameters.

lock_get_src_table(), lock_is_table_exclusive(): Remove.

Reviewed-by: Marko Mäkelä <marko.makela@oracle.com>
Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com>
Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
2018-01-30 20:24:23 +02:00
Sergey Vojtovich
55277e8840 MDEV-15059 - Misc small InnoDB scalability fixes
Form better trx_sys API.
2018-01-26 10:25:33 +04:00
Sergey Vojtovich
0499693910 MDEV-15059 - Misc small InnoDB scalability fixes
Moved lock_rec_lock_slow() inside lock_rec_lock().
2018-01-26 10:25:33 +04:00
Sergey Vojtovich
8389b45b7f MDEV-15059 - Misc small InnoDB scalability fixes
Moved mutex locking inside lock_rec_lock().
Moved monitor increment out of mutex.
Moved assertions that don't require protection out of mutex.
Removed duplicate assertions.
Moved duplicate debug injections into lock_rec_lock().
Let monitor updates use relaxed memory order.
Return directly without maintaining variables in lock_rec_lock_slow().
Moved lock_rec_lock_fast() body into lock_rec_lock(): saves at least one
trx_mutex_enter(), one switch() plus some code was moved out of mutex.
2018-01-26 10:25:33 +04:00
Sergey Vojtovich
ce04790065 MDEV-14482 - Cache line contention on ut_rnd_ulint_counter()
InnoDB RNG maintains global state, causing otherwise unnecessary bus
traffic. Even worse this is cross-mutex traffic. That is different
mutexes suffer from contention.

Fixed delay of 4 was verified to give best throughput by OLTP update
index and read-write benchmarks on Intel Broadwell (2/20/40) and
ARM (1/46/46).
2018-01-26 10:25:33 +04:00
Marko Mäkelä
92d233a512 MDEV-15061 TRUNCATE must honor InnoDB table locks
Traditionally, DROP TABLE and TRUNCATE TABLE discarded any locks that
may have been held on the table. This feels like an ACID violation.
Probably most occurrences of it were prevented by meta-data locks (MDL)
which were introduced in MySQL 5.5.

dict_table_t::n_foreign_key_checks_running: Reduce the number of
non-debug checks.

lock_remove_all_on_table(), lock_remove_all_on_table_for_trx(): Remove.

ha_innobase::truncate(): Acquire an exclusive InnoDB table lock
before proceeding. DROP TABLE and DISCARD/IMPORT were already doing
this.

row_truncate_table_for_mysql(): Convert the already started transaction
into a dictionary operation, and do not invoke lock_remove_all_on_table().

row_mysql_table_id_reassign(): Do not call lock_remove_all_on_table().
This function is only used in ALTER TABLE...DISCARD/IMPORT TABLESPACE,
which is already holding an exclusive InnoDB table lock.

TODO: Make n_foreign_key_checks running a debug-only variable.
This would require two fixes:
(1) DROP TABLE: Exclusively lock the table beforehand, to prevent
the possibility of concurrently running foreign key checks (which
would acquire a table IS lock and then record S locks).
(2) RENAME TABLE: Find out if n_foreign_key_checks_running>0 actually
constitutes a potential problem.
2018-01-25 22:43:43 +02:00
Marko Mäkelä
9875d5c3e1 Merge bb-10.2-ext into 10.3 2018-01-24 14:00:33 +02:00
Marko Mäkelä
431607237d MDEV-12173 "Error: trying to do an operation on a dropped tablespace"
InnoDB is issuing a 'noise' message that is not a sign of abnormal
operation. The only issuers of it are the debug function
lock_rec_block_validate() and the change buffer merge.
While the error should ideally never occur in transactional locking,
we happen to know that DISCARD TABLESPACE and TRUNCATE TABLE and
possibly DROP TABLE are breaking InnoDB table locks.

When it comes to the change buffer merge, the message simply is useless
noise. We know perfectly well that a tablespace can be dropped while a
change buffer merge is pending. And the code is prepared to handle that,
which is demonstrated by the fact that whenever the message was issued,
InnoDB did not crash.

fil_inc_pending_ops(): Remove the parameter print_err.
2018-01-22 16:58:13 +02:00
Sergey Vojtovich
4dc30f3c17 MDEV-15019 - InnoDB: store ReadView on trx
This will allow us to reduce critical section protected by
trx_sys.mutex:
- no need to maintain global m_free list
- eliminate if (trx->read_view == NULL) condition.

On x86_64 sizeof(Readview) is 144 mostly due to padding, sizeof(trx_t)
with ReadView is 1200.

Also don't close ReadView for read-write transactions, just mark it
closed similarly to read-only.

Clean-up: removed n_prepared_recovered_trx and n_prepared_trx, which
accidentally re-appeared after some rebase.
2018-01-22 16:23:15 +04:00
Marko Mäkelä
4f8555f1f6 MDEV-14941 Timeouts on persistent statistics tables caused by MDEV-14511
MDEV-14511 tried to avoid some consistency problems related to InnoDB
persistent statistics. The persistent statistics are being written by
an InnoDB internal SQL interpreter that requires the InnoDB data dictionary
cache to be locked.

Before MDEV-14511, the statistics were written during DDL in separate
transactions, which could unnecessarily reduce performance (each commit
would require a redo log flush) and break atomicity, because the statistics
would be updated separately from the dictionary transaction.

However, because it is unacceptable to hold the InnoDB data dictionary
cache locked while suspending the execution for waiting for a
transactional lock (in the mysql.innodb_index_stats or
mysql.innodb_table_stats tables) to be released, any lock conflict
was immediately be reported as "lock wait timeout".

To fix MDEV-14941, an attempt to reduce these lock conflicts by acquiring
transactional locks on the user tables in both the statistics and DDL
operations was made, but it would still not entirely prevent lock conflicts
on the mysql.innodb_index_stats and mysql.innodb_table_stats tables.

Fixing the remaining problems would require a change that is too intrusive
for a GA release series, such as MariaDB 10.2.

Thefefore, we revert the change MDEV-14511. To silence the
MDEV-13201 assertion, we use the pre-existing flag trx_t::internal.
2018-01-22 08:58:47 +02:00
Sergey Vojtovich
ec32c05072 Get rid of trx->read_view pointer juggling
trx->read_view|= 1 was done in a silly attempt to fix race condition
where trx->read_view was closed without trx_sys.mutex lock by read-only
trasnactions.

This just made the problem less likely to happen. In fact there was race
condition in const version of trx_get_read_view(): pointer may change to
garbage any moment after MVCC::is_view_active(trx->read_view) check and
before this function returns.

This patch doesn't fix this race condition, but rather makes it's
consequences less destructive.
2018-01-20 16:10:38 +04:00
Marko Mäkelä
f8882cce93 Replace trx_sys_t* trx_sys with trx_sys_t trx_sys
There is only one transaction system object in InnoDB.
Allocate the storage for it at link time, not at runtime.

lock_rec_fetch_page(): Use the correct fetch mode BUF_GET.
Pages may never be deallocated from a tablespace while
record locks are pointing to them.
2018-01-20 16:10:36 +04:00
Sergey Vojtovich
7078203389 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Use atomic operations when accessing trx_sys_t::max_trx_id. We can't yet
move trx_sys_t::get_new_trx_id() out of mutex because it must be updated
atomically along with trx_sys_t::rw_trx_ids.
2018-01-20 16:10:35 +04:00
Sergey Vojtovich
a447980ff3 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Let lock_print_info_all_transactions() iterate rw_trx_hash instead of
rw_trx_list.

When printing info of locks for transactions, InnoDB monitor doesn't
attempt to read relevant page from disk anymore. The code was prone
to race conditions.

Note that TrxListIterator didn't work as advertised: it iterated
rw_trx_list only.
2018-01-20 16:10:34 +04:00
Sergey Vojtovich
02270b44d0 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Let lock_validate_table_locks(), lock_rec_other_trx_holds_expl(),
lock_table_locks_lookup(), trx_recover_for_mysql(), trx_get_trx_by_xid(),
trx_roll_must_shutdown(), fetch_data_into_cache() iterate rw_trx_hash
instead of rw_trx_list.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
900b07908b MDEV-14756 - Remove trx_sys_t::rw_trx_list
Removed trx_sys_t::n_prepared_recovered_trx: never used.

Removed trx_sys_t::n_prepared_trx: used only at shutdown, we can perfectly
get this value from rw_trx_hash.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
a0b385ea2b MDEV-14756 - Remove trx_sys_t::rw_trx_list
Determine minimum transaction id by iterating rw_trx_hash, not rw_trx_list.

It is more expensive than previous implementation since it does linear
search, especially if there're many concurrent transactions running. But in
such case mutex is much bigger evil. And since it doesn't require
trx_sys->mutex protection it scales better.

For low concurrency performance difference is neglible.
2018-01-20 16:09:26 +04:00
Sergey Vojtovich
d09f146934 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Reduce divergence between trx_sys_t::rw_trx_hash and trx_sys_t::rw_trx_list
by not adding recovered COMMITTED transactions to trx_sys_t::rw_trx_list.

Such transactions are discarded immediately without creating trx object.

This also required to split rollback and cleanup phases of recovery. To
reflect these updates the following renames happened:
trx_rollback_or_clean_all_recovered() -> trx_rollback_all_recovered()
trx_rollback_or_clean_is_active -> trx_rollback_is_active
trx_rollback_or_clean_recovered() -> trx_rollback_recovered()
trx_cleanup_at_db_startup() -> trx_cleanup_recovered()

Also removed a hack from lock_trx_release_locks(). Instead let recovery
rollback thread to skip committed XA transactions.
2018-01-20 16:09:26 +04:00
Marko Mäkelä
3e6fcb6ac8 MDEV-14935 Remove bogus conditions related to not redo-logging PAGE_MAX_TRX_ID changes
InnoDB originally skipped the redo logging of PAGE_MAX_TRX_ID changes
until I enabled it in commit e76b873f24
that was part of MySQL 5.5.5 already.

Later, when a more complete history of the InnoDB Plugin for MySQL 5.1
(aka branches/zip in the InnoDB subversion repository) and of the
planned-to-be closed-source branches/innodb+ that became the basis of
InnoDB in MySQL 5.5 was pushed to the MySQL source repository, the
change was part of commit 509e761f06:

 ------------------------------------------------------------------------
 r5038 | marko | 2009-05-19 22:59:07 +0300 (Tue, 19 May 2009) | 30 lines

 branches/zip: Write PAGE_MAX_TRX_ID to the redo log. Otherwise,
 transactions that are started before the rollback of incomplete
 transactions has finished may have an inconsistent view of the
 secondary indexes.

 dict_index_is_sec_or_ibuf(): Auxiliary function for controlling
 updates and checks of PAGE_MAX_TRX_ID: check whether an index is a
 secondary index or the insert buffer tree.

 page_set_max_trx_id(), page_update_max_trx_id(),
 lock_rec_insert_check_and_lock(),
 lock_sec_rec_modify_check_and_lock(), btr_cur_ins_lock_and_undo(),
 btr_cur_upd_lock_and_undo(): Add the parameter mtr.

 page_set_max_trx_id(): Allow mtr to be NULL.  When mtr==NULL, do not
 attempt to write to the redo log.  This only occurs when creating a
 page or reorganizing a compressed page.  In these cases, the
 PAGE_MAX_TRX_ID will be set correctly during the application of redo
 log records, even though there is no explicit log record about it.

 btr_discard_only_page_on_level(): Preserve PAGE_MAX_TRX_ID.  This
 function should be unreachable, though.

 btr_cur_pessimistic_update(): Update PAGE_MAX_TRX_ID.

 Add some assertions for checking that PAGE_MAX_TRX_ID is set on all
 secondary index leaf pages.

 rb://115 tested by Michael, fixes Issue #211
 ------------------------------------------------------------------------

After this fix, some bogus references to recv_recovery_is_on()
remained. Also, some references could be replaced with
references to index->is_dummy to prepare us for MDEV-14481
(background redo log apply).
2018-01-12 18:31:03 +02:00
Sergey Vojtovich
0ca2ea1a65 MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH
trx reference counter was updated under mutex and read without any
protection. This is both slow and unsafe. Use atomic operations for
reference counter accesses.
2018-01-11 12:30:53 +04:00
Sergey Vojtovich
380069c235 MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH
trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite
expensive operations under trx_sys_t::mutex protection: e.g. malloc/free
when adding/removing elements. Traversing b-tree is not that cheap either.

This has negative scalability impact, which is especially visible when running
oltp_update_index.lua benchmark on a ramdisk.

To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None
of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex)
protection.

Another interesting issue observed with std::set is reproducible ~2% performance
decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable.

All in all this patch optimises away one of three trx_sys->mutex locks per
oltp_update_index.lua query. The other two critical sections became smaller.

Relevant clean-ups:

Replaced rw_trx_set iteration at startup with local set. The latter is needed
because values inserted to rw_trx_list must be ordered by trx->id.

Removed redundant conditions from trx_reference(): it is (and even was) never
called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY.
do_ref_count doesn't (and probably even didn't) make any sense: now it is called
only when reference counter increment is actually requested.

Moved condition out of mutex in trx_erase_lists().

trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were
greatly simplified and replaced by appropriate trx_rw_hash_t methods.

Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or
ACTIVE states. Transactions in COMMITTED state were required to be found
at InnoDB startup only. They are now looked up in the local set.

Removed unused trx_assert_recovered().

Removed unused innobase_get_trx() declaration.

Removed rather semantically incorrect trx_sys_rw_trx_add().

Moved information printout from trx_sys_init_at_db_start() to
trx_lists_init_at_db_start().
2018-01-11 12:30:53 +04:00
Marko Mäkelä
34f2f4fa43 MDEV-14660 Assertion failure in lock_move_rec_list_start() after instant ADD COLUMN
lock_move_rec_list_start(): Relax a too strict assertion.
This function can be invoked on the leftmost leaf page, after all.
So, the first record of each page can be a 'default row' record,
but the 'default row' record must never be locked.
2017-12-15 13:52:27 +02:00
Marko Mäkelä
e4efbfd904 Remove dead code lock_remove_recovered_trx_record_locks()
Contrary to what the comment said, trx_resurrect_table_locks()
does associate table locks with every recovered transaction that
modified any records, ever since this bug fix in MySQL 5.6.12:

Bug#16593427 ROLLBACK OF RECOVERED TRANSACTION CORRUPTS NON-ONLINE ADD INDEX
2017-12-15 13:52:27 +02:00
Marko Mäkelä
34841d2305 Merge bb-10.2-ext into 10.3 2017-12-12 09:57:17 +02:00
Jan Lindström
e66bb57267 MDEV-12837: WSREP: BF lock wait long
This is 10.1 version where no merge error exists.

wsrep_on_check
        New check function. Galera can't be enabled
        if innodb-lock-schedule-algorithm=VATS.

innobase_kill_query
        In Galera async kill we could own lock mutex.

innobase_init
        If Variance-Aware-Transaction-Sheduling Algorithm (VATS) is
        used on Galera we refuse to start InnoDB.

Changed innodb-lock-schedule-algorithm as read-only parameter
as it was designed to be.

lock_rec_other_has_expl_req,
lock_rec_other_has_conflicting,
lock_rec_lock_slow
lock_table_other_has_incompatible
lock_rec_insert_check_and_lock

        Change pointer to conflicting lock to normal pointer as this
        pointer contents could be changed later.
2017-12-09 11:20:46 +02:00
Jan Lindström
da3a3a68df MDEV-12837: WSREP: BF lock wait long
Problem was a merge error from MySQL wsrep i.e. Galera.

wsrep_on_check
	New check function. Galera can't be enabled
	if innodb-lock-schedule-algorithm=VATS.

innobase_kill_query
	In Galera async kill we could own lock mutex.

innobase_init
	If Variance-Aware-Transaction-Sheduling Algorithm (VATS) is
	used on Galera we fall back to First-Come-First-Served (FCFS)
	with notice to user.

Changed innodb-lock-schedule-algorithm as read-only parameter
as it was designed to be.

lock_reset_lock_and_trx_wait
	Use ib::hex() to print out transaction ID.

lock_rec_other_has_expl_req,
lock_rec_other_has_conflicting,
RecLock::add_to_waitq
lock_rec_lock_slow
lock_table_other_has_incompatible
lock_rec_insert_check_and_lock
lock_prdt_other_has_conflicting

	Change pointer to conflicting lock to normal pointer as this
	pointer contents could be changed later.

RecLock::create
	Conclicting lock pointer is moved to last parameter with
	default value NULL. This conflicting transaction could
	be selected as victim in Galera if requesting transaction
	is BF (brute force) transaction. In this case contents
	of conflicting lock pointer will be changed. Use ib::hex() to print
	transaction ids.
2017-12-07 13:08:41 +02:00
Marko Mäkelä
976f6fb1b6 Merge bb-10.2-ext into 10.3 2017-12-06 19:36:33 +02:00
Marko Mäkelä
7dc6066dea MDEV-14511 Use fewer transactions for updating InnoDB persistent statistics
dict_stats_exec_sql(): Expect the caller to always provide a transaction.
Remove some redundant assertions. The caller must hold dict_sys->mutex,
but holding dict_operation_lock is only necessary for accessing
data dictionary tables, which we are not accessing.

dict_stats_save_index_stat(): Acquire dict_sys->mutex
for invoking dict_stats_exec_sql().

dict_stats_save(), dict_stats_update_for_index(), dict_stats_update(),
dict_stats_drop_index(), dict_stats_delete_from_table_stats(),
dict_stats_delete_from_index_stats(), dict_stats_drop_table(),
dict_stats_rename_in_table_stats(), dict_stats_rename_in_index_stats(),
dict_stats_rename_table(): Use a single caller-provided
transaction that is started and committed or rolled back by the caller.

dict_stats_process_entry_from_recalc_pool(): Let the caller provide
a transaction object.

ha_innobase::open(): Pass a transaction to dict_stats_init().

ha_innobase::create(), ha_innobase::discard_or_import_tablespace():
Pass a transaction to dict_stats_update().

ha_innobase::rename_table(): Pass a transaction to
dict_stats_rename_table(). We do not use the same transaction
as the one that updated the data dictionary tables, because
we already released the dict_operation_lock. (FIXME: there is
a race condition; a lock wait on SYS_* tables could occur
in another DDL transaction until the data dictionary transaction
is committed.)

ha_innobase::info_low(): Pass a transaction to dict_stats_update()
when calculating persistent statistics.

alter_stats_norebuild(), alter_stats_rebuild(): Update the
persistent statistics as well. In this way, a single transaction
will be used for updating the statistics of a whole table, even
for partitioned tables.

ha_innobase::commit_inplace_alter_table(): Drop statistics for
all partitions when adding or dropping virtual columns, so that
the statistics will be recalculated on the next handler::open().
This is a refactored version of Oracle Bug#22469660 fix.

RecLock::add_to_waitq(), lock_table_enqueue_waiting():
Do not allow a lock wait to occur for updating statistics
in a data dictionary transaction, such as DROP TABLE. Instead,
return the previously unused error code DB_QUE_THR_SUSPENDED.

row_merge_lock_table(), row_mysql_lock_table(): Remove dead code
for handling DB_QUE_THR_SUSPENDED.

row_drop_table_for_mysql(), row_truncate_table_for_mysql():
Drop the statistics as part of the data dictionary transaction.
After TRUNCATE TABLE, the statistics will be recalculated on
subsequent ha_innobase::open(), similar to how the logic after
the above-mentioned Oracle Bug#22469660 fix in
ha_innobase::commit_inplace_alter_table() works.

btr_defragment_thread(): Use a single transaction object for
updating defragmentation statistics.

dict_stats_save_defrag_stats(), dict_stats_save_defrag_stats(),
dict_stats_process_entry_from_defrag_pool(),
dict_defrag_process_entries_from_defrag_pool(),
dict_stats_save_defrag_summary(), dict_stats_save_defrag_stats():
Add a parameter for the transaction.

dict_stats_empty_table(): Make public. This will be called by
row_truncate_table_for_mysql() after dropping persistent statistics,
to clear the memory-based statistics as well.
2017-12-06 18:52:28 +02:00
Marko Mäkelä
bd8fd3b7c3 Remove references to UNIV_SYNC_DEBUG which was merged with UNIV_DEBUG 2017-12-04 11:48:12 +02:00
Marko Mäkelä
f830314fd5 Remove dead code for non-debug builds 2017-11-06 22:35:03 +02:00
Alexander Barkov
835cbbcc7b Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3
TODO: enable MDEV-13049 optimization for 10.3
2017-10-30 20:47:39 +04:00
Marko Mäkelä
38e12db478 Merge 10.0 into 10.1 2017-10-26 13:36:38 +03:00
Marko Mäkelä
b933a8c354 MDEV-12569 InnoDB suggests filing bugs at MySQL bug tracker
Replace all references in InnoDB and XtraDB error log messages
to bugs.mysql.com with references to https://jira.mariadb.org/.

The original merge
commit 4274d0bf57
was accidentally reverted by the subsequent merge
commit 3b35d745c3
2017-10-26 13:29:28 +03:00
Vicențiu Ciorbaru
3b35d745c3 Merge branch 'merge-innodb-5.6' into 10.0 2017-10-26 12:46:47 +03:00
Marko Mäkelä
4274d0bf57 Merge 5.5 into 10.0 2017-10-26 11:13:07 +03:00
Marko Mäkelä
ad46ce658a MDEV-14055 Assertion `page_rec_is_leaf(rec)' failed in lock_rec_validate_page
This was a false alarm in a debug check that was introduced in
commit 48192f963a which was a
10.2 code refactoring in preparation for
MDEV-11369 (instant ADD COLUMN) in 10.3.2. The code refactoring
only affected debug builds.

InnoDB B-tree record locks are only supposed to exist on leaf page
records. An assertion failed, because the debug function lock_validate()
was invoking lock_rec_block_validate() on a page for which there were
no locks set in the record lock bitmap. This could happen on a page split.
Especially when the index size grows from a single page to multiple pages,
the root page would transform from a leaf node into an internal node,
and its record lock bitmap would be emptied.

lock_validate(): Skip empty lock bitmaps.
2017-10-14 14:28:11 +03:00