Commit graph

1168 commits

Author SHA1 Message Date
Marko Mäkelä
458e33cfbc MDEV-14441 Deadlock due to InnoDB adaptive hash index
This is not fixing the reported problem, but a potential problem that was
introduced in MDEV-11369.

row_sel_try_search_shortcut(), row_sel_try_search_shortcut_for_mysql():
When an adaptive hash index search lands on top of rec_is_default_row(),
we must skip the candidate and perform a normal search. This is because
the adaptive hash index latch only protects the record from being deleted
but does not prevent concurrent inserts into the page. Therefore, it is not
safe to dereference the next-record pointer.
2018-01-15 19:12:30 +02:00
Marko Mäkelä
3e6fcb6ac8 MDEV-14935 Remove bogus conditions related to not redo-logging PAGE_MAX_TRX_ID changes
InnoDB originally skipped the redo logging of PAGE_MAX_TRX_ID changes
until I enabled it in commit e76b873f24
that was part of MySQL 5.5.5 already.

Later, when a more complete history of the InnoDB Plugin for MySQL 5.1
(aka branches/zip in the InnoDB subversion repository) and of the
planned-to-be closed-source branches/innodb+ that became the basis of
InnoDB in MySQL 5.5 was pushed to the MySQL source repository, the
change was part of commit 509e761f06:

 ------------------------------------------------------------------------
 r5038 | marko | 2009-05-19 22:59:07 +0300 (Tue, 19 May 2009) | 30 lines

 branches/zip: Write PAGE_MAX_TRX_ID to the redo log. Otherwise,
 transactions that are started before the rollback of incomplete
 transactions has finished may have an inconsistent view of the
 secondary indexes.

 dict_index_is_sec_or_ibuf(): Auxiliary function for controlling
 updates and checks of PAGE_MAX_TRX_ID: check whether an index is a
 secondary index or the insert buffer tree.

 page_set_max_trx_id(), page_update_max_trx_id(),
 lock_rec_insert_check_and_lock(),
 lock_sec_rec_modify_check_and_lock(), btr_cur_ins_lock_and_undo(),
 btr_cur_upd_lock_and_undo(): Add the parameter mtr.

 page_set_max_trx_id(): Allow mtr to be NULL.  When mtr==NULL, do not
 attempt to write to the redo log.  This only occurs when creating a
 page or reorganizing a compressed page.  In these cases, the
 PAGE_MAX_TRX_ID will be set correctly during the application of redo
 log records, even though there is no explicit log record about it.

 btr_discard_only_page_on_level(): Preserve PAGE_MAX_TRX_ID.  This
 function should be unreachable, though.

 btr_cur_pessimistic_update(): Update PAGE_MAX_TRX_ID.

 Add some assertions for checking that PAGE_MAX_TRX_ID is set on all
 secondary index leaf pages.

 rb://115 tested by Michael, fixes Issue #211
 ------------------------------------------------------------------------

After this fix, some bogus references to recv_recovery_is_on()
remained. Also, some references could be replaced with
references to index->is_dummy to prepare us for MDEV-14481
(background redo log apply).
2018-01-12 18:31:03 +02:00
Marko Mäkelä
6dd302d164 Merge bb-10.2-ext into 10.3 2018-01-11 19:44:41 +02:00
Marko Mäkelä
cca611d1c0 Merge 10.2 into bb-10.2-ext 2018-01-11 18:00:31 +02:00
Marko Mäkelä
773c3ceb57 MDEV-14824 Assertion `!trx_is_started(trx)' failed in innobase_start_trx_and_assign_read_view
In CREATE SEQUENCE or CREATE TEMPORARY SEQUENCE, we should not start
an InnoDB transaction for inserting the sequence status record into
the underlying no-rollback table. Because we did this, a debug assertion
failure would fail in START TRANSACTION WITH CONSISTENT SNAPSHOT after
CREATE TEMPORARY SEQUENCE was executed.

row_ins_step(): Do not start the transaction. Let the caller do that.

que_thr_step(): Start the transaction before calling row_ins_step().

row_ins_clust_index_entry(): Skip locking and undo logging for no-rollback
tables, even for temporary no-rollback tables.

row_ins_index_entry(): Allow trx->id==0 for no-rollback tables.

row_insert_for_mysql(): Do not start a transaction for no-rollback tables.
2018-01-11 16:34:31 +02:00
Sergey Vojtovich
0ca2ea1a65 MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH
trx reference counter was updated under mutex and read without any
protection. This is both slow and unsafe. Use atomic operations for
reference counter accesses.
2018-01-11 12:30:53 +04:00
Sergey Vojtovich
380069c235 MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH
trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite
expensive operations under trx_sys_t::mutex protection: e.g. malloc/free
when adding/removing elements. Traversing b-tree is not that cheap either.

This has negative scalability impact, which is especially visible when running
oltp_update_index.lua benchmark on a ramdisk.

To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None
of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex)
protection.

Another interesting issue observed with std::set is reproducible ~2% performance
decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable.

All in all this patch optimises away one of three trx_sys->mutex locks per
oltp_update_index.lua query. The other two critical sections became smaller.

Relevant clean-ups:

Replaced rw_trx_set iteration at startup with local set. The latter is needed
because values inserted to rw_trx_list must be ordered by trx->id.

Removed redundant conditions from trx_reference(): it is (and even was) never
called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY.
do_ref_count doesn't (and probably even didn't) make any sense: now it is called
only when reference counter increment is actually requested.

Moved condition out of mutex in trx_erase_lists().

trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were
greatly simplified and replaced by appropriate trx_rw_hash_t methods.

Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or
ACTIVE states. Transactions in COMMITTED state were required to be found
at InnoDB startup only. They are now looked up in the local set.

Removed unused trx_assert_recovered().

Removed unused innobase_get_trx() declaration.

Removed rather semantically incorrect trx_sys_rw_trx_add().

Moved information printout from trx_sys_init_at_db_start() to
trx_lists_init_at_db_start().
2018-01-11 12:30:53 +04:00
Marko Mäkelä
5a1283a4fa Follow-up to MDEV-12288: Add --debug=d,purge diagnostics
row_purge_reset_trx_id(): Display a DBUG message about resetting the
DB_TRX_ID.
2018-01-09 13:48:41 +02:00
Marko Mäkelä
075f61a1d4 Revert part of commit fec844aca8
row_insert_for_mysql(): Remove some duplicated code
2018-01-09 11:30:36 +02:00
Marko Mäkelä
fa7d85bb87 Merge bb-10.2-ext into 10.3 2018-01-05 22:52:06 +02:00
Marko Mäkelä
6feb74c4b2 row_upd_rec_in_place(): Relax a debug assertion 2018-01-05 22:50:28 +02:00
Monty
e9a2082634 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext
Conflicts:
	mysql-test/r/cte_nonrecursive.result
	mysql-test/suite/galera/r/galera_bf_abort.result
	mysql-test/suite/galera/r/galera_bf_abort_get_lock.result
	mysql-test/suite/galera/r/galera_bf_abort_sleep.result
	mysql-test/suite/galera/r/galera_enum.result
	mysql-test/suite/galera/r/galera_fk_conflict.result
	mysql-test/suite/galera/r/galera_insert_multi.result
	mysql-test/suite/galera/r/galera_many_indexes.result
	mysql-test/suite/galera/r/galera_mdl_race.result
	mysql-test/suite/galera/r/galera_nopk_bit.result
	mysql-test/suite/galera/r/galera_nopk_blob.result
	mysql-test/suite/galera/r/galera_nopk_large_varchar.result
	mysql-test/suite/galera/r/galera_nopk_unicode.result
	mysql-test/suite/galera/r/galera_pk_bigint_signed.result
	mysql-test/suite/galera/r/galera_pk_bigint_unsigned.result
	mysql-test/suite/galera/r/galera_serializable.result
	mysql-test/suite/galera/r/galera_toi_drop_database.result
	mysql-test/suite/galera/r/galera_toi_lock_exclusive.result
	mysql-test/suite/galera/r/galera_toi_truncate.result
	mysql-test/suite/galera/r/galera_unicode_pk.result
	mysql-test/suite/galera/r/galera_var_auto_inc_control_off.result
	mysql-test/suite/galera/r/galera_wsrep_log_conficts.result
	sql/field.cc
	sql/rpl_gtid.cc
	sql/share/errmsg-utf8.txt
	sql/sql_acl.cc
	sql/sql_parse.cc
	sql/sql_partition_admin.cc
	sql/sql_prepare.cc
	sql/sql_repl.cc
	sql/sql_table.cc
	sql/sql_yacc.yy
2018-01-05 16:52:40 +02:00
Marko Mäkelä
9c9db1cbe2 MDEV-14059 Work around a problem exposed by InnoDB GIS debug check
row_sel_get_clust_rec_for_mysql(): Look up the page from the
buffer pool, similar to how MySQL 5.7 does it.
2018-01-05 12:10:16 +02:00
Marko Mäkelä
145ae15a33 Merge bb-10.2-ext into 10.3 2018-01-04 09:22:59 +02:00
Marko Mäkelä
f7fd6ace18 Merge 10.2 into bb-10.2-ext 2018-01-03 15:48:47 +02:00
Marko Mäkelä
d361401bc2 Merge 10.1 into 10.2, with some MDEV-14799 fixups
trx_undo_page_report_modify(): For SPATIAL INDEX, keep logging
updated off-page columns twice, so that
the minimum bounding rectangle (MBR) will be logged.
Avoiding the redundant logging would require larger changes
to the undo log format.

row_build_index_entry_low(): Handle SPATIAL_UNKNOWN more robustly,
by refusing to purge the record from the spatial index.
We can get this code when processing old undo log from 10.2.10 or
10.2.11 (the releases affected by MDEV-14799, which was a regression
from MDEV-14051).
2018-01-03 11:56:24 +02:00
Marko Mäkelä
016caa3d20 Merge 10.0 into 10.1 2018-01-02 21:57:22 +02:00
Marko Mäkelä
51e4650ed0 Merge 5.5 into 10.0 2018-01-02 21:52:46 +02:00
Marko Mäkelä
d384ead0f0 MDEV-14799 After UPDATE of indexed columns, old values will not be purged from secondary indexes
This is a regression caused by MDEV-14051 'Undo log record is too big.'

Purge in the secondary index is wrongly skipped in
row_purge_upd_exist_or_extern() because node->row only does not contain all
indexed columns.

trx_undo_rec_get_partial_row(): Add the parameter for node->update
so that the updated columns will be copied from the initial part
of the undo log record.
2018-01-02 19:11:10 +02:00
Monty
fbab79c9b8 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext
Conflicts:
	cmake/make_dist.cmake.in
	mysql-test/r/func_json.result
	mysql-test/r/ps.result
	mysql-test/t/func_json.test
	mysql-test/t/ps.test
	sql/item_cmpfunc.h
2018-01-01 19:39:59 +02:00
Vicențiu Ciorbaru
985d2d393c Merge remote-tracking branch 'origin/10.1' into 10.2 2017-12-22 12:23:39 +02:00
Marko Mäkelä
9d7c0882fa Merge 10.0 into 10.1 2017-12-21 17:25:51 +02:00
Marko Mäkelä
0202e47274 MDEV-12827 Assertion failure when reporting duplicate key error in online table rebuild
row_log_table_apply_insert_low(), row_log_table_apply_update():
When reporting the error_key_num, only count the clustered index
if it corresponds to a key in the SQL layer.

The assertion failure was probably introduced by the (incomplete)
MySQL 5.6.28 bug fix
Bug #21364096 THE BOGUS DUPLICATE KEY ERROR IN ONLINE DDL
WITH INCORRECT KEY NAME
which we are improving.

Side note: the fix was incorrectly merged to MySQL 5.7.10;
incorrect key names will continue to be reported in MySQL 5.7.
2017-12-21 17:19:13 +02:00
Marko Mäkelä
1ec8d45c4d Merge bb-10.2-ext into 10.3 2017-12-20 23:18:41 +02:00
Marko Mäkelä
b4165985c9 MDEV-14585 Automatically remove #sql- tables in InnoDB dictionary during recovery
Now that MDEV-14717 made RENAME TABLE crash-safe within InnoDB,
it should be safe to drop the #sql- tables within InnoDB during
crash recovery. These tables can be one of two things:

(1) #sql-ib related to deferred DROP TABLE (follow-up to MDEV-13407)
or to table-rebuilding ALTER TABLE...ALGORITHM=INPLACE
(since MDEV-14378, only related to the intermediate copy of a table),

(2) #sql- related to the intermediate copy of a table during
ALTER TABLE...ALGORITHM=COPY

We will not drop tables whose name starts with #sql2, because
the server can be killed during an ALGORITHM=COPY operation at
a point where the original table was renamed to #sql2 but the
finished intermediate copy was not yet renamed from #sql-
to the original table name.
2017-12-20 22:47:01 +02:00
Marko Mäkelä
2534b5cb99 Merge bb-10.2-ext into 10.3 2017-12-20 22:37:24 +02:00
Marko Mäkelä
0bc36758ba MDEV-14717 RENAME TABLE in InnoDB is not crash-safe
InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2
redo log records during table-rebuilding ALGORITHM=INPLACE operations.
We must write the records for any .ibd file renames, so that the
operations are crash-safe.

If InnoDB is killed during a RENAME TABLE operation, it can happen that
the transaction for updating the data dictionary will be rolled back.
But, nothing will roll back the renaming of the .ibd file
(the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter,
the renaming of the dict_table_t::name in the dict_sys cache. We introduce
the undo log record TRX_UNDO_RENAME_TABLE to fix this.

fil_space_for_table_exists_in_mem(): Remove the parameters
adjust_space, table_id and some code that was trying to work around
these deficiencies.

fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record.

dict_table_rename_in_cache(): Invoke fil_name_write_rename().

trx_undo_rec_copy(): Set the first 2 bytes to the length of the
copied undo log record.

trx_undo_page_report_rename(), trx_undo_report_rename():
Write a TRX_UNDO_RENAME_TABLE record with the old table name.

row_rename_table_for_mysql(): Invoke trx_undo_report_rename()
before modifying any data dictionary tables.

row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE
by invoking dict_table_rename_in_cache(), which will take care
of both renaming the table and the file.
2017-12-20 22:21:03 +02:00
Marko Mäkelä
0436a0ff3c Merge bb-10.2-ext into 10.3 2017-12-19 17:28:22 +02:00
Marko Mäkelä
88aff5f471 Follow-up to MDEV-13407 innodb.drop_table_background failed in buildbot with "Tablespace for table exists"
The InnoDB background DROP TABLE queue is something that we should
really remove, but are unable to until we remove dict_operation_lock
so that DDL and DML operations can be combined in a single transaction.

Because the queue is not persistent, it is not crash-safe. We should
in some way ensure that the deferred-dropped tables will be dropped
after server restart.

The existence of two separate transactions complicates the error handling
of CREATE TABLE...SELECT. We should really not break locks in DROP TABLE.

Our solution to these problems is to rename the table to a temporary
name, and to drop such-named tables on InnoDB startup. Also, the
queue will use table IDs instead of names from now on.

check-testcase.test: Ignore #sql-ib*.ibd files, because tables may enter
the background DROP TABLE queue shortly before the test finishes.

innodb.drop_table_background: Test CREATE...SELECT and the creation of
tables whose file name starts with #sql-ib.

innodb.alter_crash: Adjust the recovery, now that the #sql-ib tables
will be dropped on InnoDB startup.

row_mysql_drop_garbage_tables(): New function, to drop all #sql-ib tables
on InnoDB startup.

row_drop_table_for_mysql_in_background(): Remove an unnecessary and
misplaced call to log_buffer_flush_to_disk(). (The call should have been
after the transaction commit. We do not care about flushing the redo log
here, because the table would be dropped again at server startup.)

Remove the entry from the list after the table no longer exists.

If server shutdown has been initiated, empty the list without actually
dropping any tables. They will be dropped again on startup.

row_drop_table_for_mysql(): Do not call lock_remove_all_on_table().
Instead, if locks exist, defer the DROP TABLE until they do not exist.
If the table name does not start with #sql-ib, rename it to that prefix
before adding it to the background DROP TABLE queue.
2017-12-19 17:12:39 +02:00
Marko Mäkelä
028e91f380 Merge 10.2 into bb-10.2-ext 2017-12-19 17:12:14 +02:00
Marko Mäkelä
8d70097c21 Merge 10.1 to 10.2
Follow-up fix to MDEV-14008: Let Field_double::val_uint() silently
return 0 on error
2017-12-19 16:48:28 +02:00
Marko Mäkelä
09c5bbf471 Merge 10.0 into 10.1 2017-12-18 20:05:50 +02:00
Marko Mäkelä
40088bfc7e MDEV-13407 innodb.drop_table_background failed in buildbot with "Tablespace for table exists"
The InnoDB background DROP TABLE queue is something that we should
really remove, but are unable to until we remove dict_operation_lock
so that DDL and DML operations can be combined in a single transaction.

Because the queue is not persistent, it is not crash-safe. In stable
versions of MariaDB, we can only try harder to drop all enqueued
tables before server shutdown.

row_mysql_drop_t::table_id: Replaces table_name.

row_drop_tables_for_mysql_in_background():
Do not remove the entry from the list as long as the table exists.
In this way, the table should eventually be dropped.
2017-12-18 19:51:44 +02:00
Marko Mäkelä
866ccc8890 Merge bb-10.2-ext into 10.3 2017-12-14 11:34:30 +02:00
Marko Mäkelä
1b5f0cbd46 Merge 10.2 into bb-10.2-ext 2017-12-14 09:53:19 +02:00
Marko Mäkelä
ece9c54e10 Merge 10.1 into 10.2 2017-12-14 08:40:01 +02:00
Marko Mäkelä
46305b006b Merge 10.0 into 10.1 2017-12-13 19:12:16 +02:00
Marko Mäkelä
b1977a39de MDEV-12323 Rollback progress log messages during crash recovery are intermixed with unrelated log messages
trx_roll_must_shutdown(): During the rollback of recovered transactions,
report progress and check if the rollback should be interrupted because
of a pending shutdown.

trx_roll_max_undo_no, trx_roll_progress_printed_pct: Remove, along with
the messages that were interleaved with other messages.
2017-12-13 18:56:22 +02:00
Marko Mäkelä
b46fa627ca MDEV-12352 InnoDB shutdown should not be blocked by a large transaction rollback
row_undo_step(), trx_rollback_active(): Abort the rollback of a
recovered ordinary transaction if fast shutdown has been initiated.

trx_rollback_resurrected(): Convert an aborted-rollback transaction
into a fake XA PREPARE transaction, so that fast shutdown can proceed.
2017-12-13 18:02:09 +02:00
Marko Mäkelä
58eb4e5db9 MDEV-14422 Assertion failure in trx_purge_run() on shutdown
row_quiesce_table_start(), row_quiesce_table_complete():
Use the more appropriate predicate srv_undo_sources for skipping
purge control. (This change alone is insufficient; it is possible
that this predicate will change during the call to trx_purge_stop()
or trx_purge_run().)

trx_purge_stop(), trx_purge_run(): Tolerate PURGE_STATE_EXIT.
It is very well possible to initiate shutdown soon after the statement
FLUSH TABLES FOR EXPORT has been submitted to execution.

srv_purge_coordinator_thread(): Ensure that the wait for purge_sys->event
in trx_purge_stop() will terminate when the coordinator thread exits.
2017-12-13 14:17:26 +02:00
Marko Mäkelä
34841d2305 Merge bb-10.2-ext into 10.3 2017-12-12 09:57:17 +02:00
Marko Mäkelä
e312a407b8 Merge 10.2 into bb-10.2-ext 2017-12-11 15:06:11 +02:00
Marko Mäkelä
13b9ec651a MDEV-14589 InnoDB should not lock a delete-marked record
When the transaction isolation level is SERIALIZABLE, or when
a locking read is performed in the REPEATABLE READ isolation level,
InnoDB must lock delete-marked records in order to prevent another
transaction from inserting something.

However, at READ UNCOMMITTED or READ COMMITTED isolation level or
when the parameter innodb_locks_unsafe_for_binlog is set, the
repeatability of the reads does not matter, and there is no need
to lock any records.

row_search_mvcc(): Skip locks on delete-marked committed records upfront,
instead of invoking row_unlock_for_mysql() afterwards. The unlocking
never worked for secondary index records.
2017-12-11 14:39:53 +02:00
Marko Mäkelä
1e6ac94451 Correct the comment of row_vers_impl_x_locked() 2017-12-11 13:56:36 +02:00
Marko Mäkelä
0af52734a7 Remove the unused function row_is_magic_monitor_table()
Before MySQL 5.7 or MariaDB 10.2.2, there used to be some
magic InnoDB table names that would assign some InnoDB flags
on CREATE TABLE or DROP TABLE.
2017-12-08 16:36:19 +02:00
Sergey Vojtovich
5b624f00fc MDEV-14529 - InnoDB rw-locks: optimize memory barriers
Remove volatile modifier from waiters: it's not supposed for inter-thread
communication, use appropriate atomic operations instead.

Changed waiters to int32_t, my_atomic friendly type.
2017-12-08 17:55:41 +04:00
Marko Mäkelä
07e9ff1fe1 MDEV-14378 In ALGORITHM=INPLACE, use a common name for the intermediate tables or partitions
Allow DROP TABLE `#mysql50##sql-...._.` to drop tables that were
being rebuilt by ALGORITHM=INPLACE

NOTE: If the server is killed after the table-rebuilding ALGORITHM=INPLACE
commits inside InnoDB but before the .frm file has been replaced, then
the recovery will involve something else than DROP TABLE.

NOTE: If the server is killed in a true inplace ALTER TABLE commits
inside InnoDB but before the .frm file has been replaced, then we
are really out of luck. To properly handle that situation, we would
need a transactional mysql.ddl_fixup table that directs recovery to
rename or remove files.

prepare_inplace_alter_table_dict(): Use the altered_table->s->table_name
for generating the new_table_name.

table_name_t::part_suffix: The start of the partition name suffix.

table_name_t::dbend(): Return the end of the schema name.

table_name_t::dblen(): Return the length of the schema name, in bytes.

table_name_t::basename(): Return the name without the schema name.

table_name_t::part(): Return the partition name, or NULL if none.

row_drop_table_for_mysql(): Assert for #sql, not #sql-ib.
2017-12-08 10:07:51 +02:00
Marko Mäkelä
3aa618a969 MDEV-13820 trx_id_check() fails during row_log_table_apply()
When logging ROW_T_INSERT or ROW_T_UPDATE records, we did not normalize
the DB_TRX_ID of the current transaction into 0 if the current transaction
had started (modifying other tables) before the ALTER TABLE started.

MDEV-13654 introduced this normalization for ROW_T_DELETE
and for all operations with ADD PRIMARY KEY, in row_log_table_get_pk().
2017-12-07 14:35:32 +02:00
Marko Mäkelä
976f6fb1b6 Merge bb-10.2-ext into 10.3 2017-12-06 19:36:33 +02:00
Marko Mäkelä
ce07676502 Merge 10.2 into bb-10.2-ext 2017-12-06 19:34:03 +02:00