mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-15 19:37:16 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	1ad1d78981	MDEV-26779: Enable adaptive spinning on ARMv8 for lock_sys.wait_mutex Similar to commit `f7684f0ca5` (MDEV-26855) we will try to enable the adaptive spinloop for lock_sys.wait_mutex on ARMv8. Enabling any form of spinloop for lock_sys.wait_mutex did not show a significant improvement in our tests on AMD64. Spinning can be argued to be a hack to reduce the impact on mutex contention. It would be better to adjust the code to reduce contention in the first place.	2021-10-27 17:21:19 +03:00
Marko Mäkelä	58fe6b47d4	MDEV-26903: Assertion ctx->trx->state == TRX_STATE_ACTIVE on DROP INDEX rollback_inplace_alter_table(): Tolerate a case where the transaction is not in an active state. If ha_innobase::commit_inplace_alter_table() failed with a deadlock, the transaction would already have been rolled back. This omission of error handling was introduced in commit `1bd681c8b3` (MDEV-25506 part 3). After commit `c3c53926c4` (MDEV-26554) it became easier to trigger DB_DEADLOCK during exclusive table lock acquisition in ha_innobase::commit_inplace_alter_table(). lock_table_low(): Add DBUG injection "innodb_table_deadlock".	2021-10-26 09:54:37 +03:00
Marko Mäkelä	1f02280904	MDEV-26769 InnoDB does not support hardware lock elision This implements memory transaction support for: * Intel Restricted Transactional Memory (RTM), also known as TSX-NI (Transactional Synchronization Extensions New Instructions) * POWER v2.09 Hardware Trace Monitor (HTM) on GNU/Linux transactional_lock_guard, transactional_shared_lock_guard: RAII lock guards that try to elide the lock acquisition when transactional memory is available. buf_pool.page_hash: Try to elide latches whenever feasible. Related to the InnoDB change buffer and ROW_FORMAT=COMPRESSED tables, this is not always possible. In buf_page_get_low(), memory transactions only work reasonably well for validating a guessed block address. TMLockGuard, TMLockTrxGuard, TMLockMutexGuard: RAII lock guards that try to elide lock_sys.latch and related latches.	2021-10-22 12:38:45 +03:00
Marko Mäkelä	5caff20216	MDEV-26883 InnoDB hang due to table lock conflict In a stress test campaign of a 10.6-based branch by Matthias Leich, a deadlock between two InnoDB threads occurred, involving lock_sys.wait_mutex and a dict_table_t::lock_mutex. The cause of the hang is a latching order violation in lock_sys_t::cancel(). That function and the latching order violation were originally introduced in commit `8d16da1487` (MDEV-24789). lock_sys_t::cancel(): Invoke table->lock_mutex_trylock() in order to avoid a deadlock. If that fails, release lock_sys.wait_mutex, and acquire both latches. In that way, we will be obeying the latching order and no hangs will occur. This hang should mostly affect DDL operations. DML operations will acquire only IX or IS table locks, which are compatible with each other.	2021-10-22 11:59:44 +03:00
Marko Mäkelä	d6a3f425ee	After-merge fix: Remove unused variable	2021-10-19 20:38:07 +03:00
Marko Mäkelä	9c5835e067	Merge 10.5 into 10.6	2021-10-18 16:36:24 +03:00
Marko Mäkelä	18eab4a832	MDEV-26682 Replication timeouts with XA PREPARE The purpose of non-exclusive locks in a transaction is to guarantee that the records covered by those locks must remain in that way until the transaction is committed. (The purpose of gap locks is to ensure that a record that was nonexistent will remain that way.) Once a transaction has reached the XA PREPARE state, the only allowed further actions are XA ROLLBACK or XA COMMIT. Therefore, it can be argued that only the exclusive locks that the XA PREPARE transaction is holding are essential. Furthermore, InnoDB never preserved explicit locks across server restart. For XA PREPARE transations, we will only recover implicit exclusive locks for records that had been modified. Because of the fact that XA PREPARE followed by a server restart will cause some locks to be lost, we might as well always release all non-exclusive locks during the execution of an XA PREPARE statement. lock_release_on_prepare(): Release non-exclusive locks on XA PREPARE. trx_prepare(): Invoke lock_release_on_prepare() unless the isolation level is SERIALIZABLE or this is an internal distributed transaction with the binlog (not actual XA PREPARE statement). This has been discussed with Sergei Golubchik and Andrei Elkin. Reviewed by: Sergei Golubchik	2021-10-18 12:49:10 +03:00
Marko Mäkelä	668a5f3d12	MDEV-26720: Optimize single-bit atomic operations on IA-32 and AMD64 This is mostly working around a bad compiler optimization. The Intel 80386 processor introduced some bit operations that would be the perfect translation for atomic single-bit read-modify-and-write operations. Alas, even the latest compilers as of today (GCC 11, clang 13, Microsoft Visual C 19.29) would generate a loop around LOCK CMPXCHG instead of emitting the instructions LOCK BTS (fetch_or()), LOCK BTR (fetch_and()), LOCK BTC (fetch_xor()). fil_space_t::clear_closing(): Clear the CLOSING flag. fil_space_t::set_stopping_check(): Special variant of fil_space_t::set_stopping() that will return the old value of the STOPPING flag after atomically setting it. fil_space_t::clear_stopping(): Use fetch_sub() to toggle the STOPPING flag. The flag is guaranteed to be set upon calling this function, hence we will toggle it to clear it. On IA-32 and AMD64, this will translate into the 80486 LOCK XADD instruction. fil_space_t::check_pending_operations(): Replace a Boolean variable with a goto label, to allow more compact code generation for fil_space_t::set_stopping_check(). trx_rseg_t: Define private accessors ref_set() and ref_reset() for setting and clearing the flags. trx_lock_t::clear_deadlock_victim(), trx_lock_t::set_wsrep_victim(): Accessors for clearing and setting the flags.	2021-10-03 13:49:40 +03:00
Marko Mäkelä	c5fd9aa562	MDEV-25919: Lock tables before acquiring dict_sys.latch In commit `1bd681c8b3` (MDEV-25506 part 3) we introduced a "fake instant timeout" when a transaction would wait for a table or record lock while holding dict_sys.latch. This prevented a deadlock of the server but could cause bogus errors for operations on the InnoDB persistent statistics tables. A better fix is to ensure that whenever a transaction is being executed in the InnoDB internal SQL parser (which will for now require dict_sys.latch to be held), it will already have acquired all locks that could be required for the execution. So, we will acquire the following locks upfront, before acquiring dict_sys.latch: (1) MDL on the affected user table (acquired by the SQL layer) (2) If applicable (not for RENAME TABLE): InnoDB table lock (3) If persistent statistics are going to be modified: (3.a) MDL_SHARED on mysql.innodb_table_stats, mysql.innodb_index_stats (3.b) exclusive table locks on the statistics tables (4) Exclusive table locks on the InnoDB data dictionary tables (not needed in ANALYZE TABLE and the like) Note: Acquiring exclusive locks on the statistics tables may cause more locking conflicts between concurrent DDL operations. Notably, RENAME TABLE will lock the statistics tables even if no persistent statistics are enabled for the table. DROP DATABASE will only acquire locks on statistics tables if persistent statistics are enabled for the tables on which the SQL layer is invoking ha_innobase::delete_table(). For any "garbage collection" in innodb_drop_database(), a timeout while acquiring locks on the statistics tables will result in any statistics not being deleted for any tables that the SQL layer did not know about. If innodb_defragment=ON, information may be written to the statistics tables even for tables for which InnoDB persistent statistics are disabled. But, DROP TABLE will no longer attempt to delete that information if persistent statistics are not enabled for the table. This change should also fix the hangs related to InnoDB persistent statistics and STATS_AUTO_RECALC (MDEV-15020) as well as a bug that running ALTER TABLE on the statistics tables concurrently with running ALTER TABLE on InnoDB tables could cause trouble. lock_rec_enqueue_waiting(), lock_table_enqueue_waiting(): Do not issue a fake instant timeout error when the transaction is holding dict_sys.latch. Instead, assert that the dict_sys.latch is never being held here. lock_sys_tables(): A new function to acquire exclusive locks on all dictionary tables, in case DROP TABLE or similar operation is being executed. Locking non-hard-coded tables is optional to avoid a crash in row_merge_drop_temp_indexes(). The SYS_VIRTUAL table was introduced in MySQL 5.7 and MariaDB Server 10.2. Normally, we require all these dictionary tables to exist before executing any DDL, but the function row_merge_drop_temp_indexes() is an exception. When upgrading from MariaDB Server 10.1 or MySQL 5.6 or earlier, the table SYS_VIRTUAL would not exist at this point. ha_innobase::commit_inplace_alter_table(): Invoke log_write_up_to() while not holding dict_sys.latch. dict_sys_t::remove(), dict_table_close(): No longer try to drop index stubs that were left behind by aborted online ADD INDEX. Such indexes should be dropped from the InnoDB data dictionary by row_merge_drop_indexes() as part of the failed DDL operation. Stubs for aborted indexes may only be left behind in the data dictionary cache. dict_stats_fetch_from_ps(): Use a normal read-only transaction. ha_innobase::delete_table(), ha_innobase::truncate(), fts_lock_table(): While waiting for purge to stop using the table, do not hold dict_sys.latch. ha_innobase::delete_table(): Implement a work-around for the rollback of ALTER TABLE...ADD PARTITION. MDL_EXCLUSIVE would not be held if ALTER TABLE hits lock_wait_timeout while trying to upgrade the MDL due to a conflicting LOCK TABLES, such as in the first ALTER TABLE in the test case of Bug#53676 in parts.partition_special_innodb. Therefore, we must explicitly stop purge, because it would not be stopped by MDL. dict_stats_func(), btr_defragment_chunk(): Allocate a THD so that we can acquire MDL on the InnoDB persistent statistics tables. mysqltest_embedded: Invoke ha_pre_shutdown() before free_used_memory() in order to avoid ASAN heap-use-after-free related to acquire_thd(). trx_t::dict_operation_lock_mode: Changed the type to bool. row_mysql_lock_data_dictionary(), row_mysql_unlock_data_dictionary(): Implemented as macros. rollback_inplace_alter_table(): Apply an infinite timeout to lock waits. innodb_thd_increment_pending_ops(): Wrapper for thd_increment_pending_ops(). Never attempt async operation for InnoDB background threads, such as the trx_t::commit() in dict_stats_process_entry_from_recalc_pool(). lock_sys_t::cancel(trx_t*): Make dictionary transactions immune to KILL. lock_wait(): Make dictionary transactions immune to KILL, and to lock wait timeout when waiting for locks on dictionary tables. parts.partition_special_innodb: Use lock_wait_timeout=0 to instantly get ER_LOCK_WAIT_TIMEOUT. main.mdl: Filter out MDL on InnoDB persistent statistics tables Reviewed by: Thirunarayanan Balathandayuthapani	2021-08-31 13:54:44 +03:00
Marko Mäkelä	82b7c561b7	MDEV-24258 Merge dict_sys.mutex into dict_sys.latch In the parent commit, dict_sys.latch could theoretically have been replaced with a mutex. But, we can do better and merge dict_sys.mutex into dict_sys.latch. Generally, every occurrence of dict_sys.mutex_lock() will be replaced with dict_sys.lock(). The PERFORMANCE_SCHEMA instrumentation for dict_sys_mutex will be removed along with dict_sys.mutex. The dict_sys.latch will remain instrumented as dict_operation_lock. Some use of dict_sys.lock() will be replaced with dict_sys.freeze(), which we will reintroduce for the new shared mode. Most notably, concurrent table lookups are possible as long as the tables are present in the dict_sys cache. In particular, this will allow more concurrency among InnoDB purge workers. Because dict_sys.mutex will no longer 'throttle' the threads that purge InnoDB transaction history, a performance degradation may be observed unless innodb_purge_threads=1. The table cache eviction policy will become FIFO-like, similar to what happened to fil_system.LRU in commit `45ed9dd957`. The name of the list dict_sys.table_LRU will become somewhat misleading; that list contains tables that may be evicted, even though the eviction policy no longer is least-recently-used but first-in-first-out. (Note: Tables can never be evicted as long as locks exist on them or the tables are in use by some thread.) As demonstrated by the test perfschema.sxlock_func, there will be less contention on dict_sys.latch, because some previous use of exclusive latches will be replaced with shared latches. fts_parse_sql_no_dict_lock(): Replaced with pars_sql(). fts_get_table_name_prefix(): Merged to fts_optimize_create(). dict_stats_update_transient_for_index(): Deduplicated some code. ha_innobase::info_low(), dict_stats_stop_bg(): Use a combination of dict_sys.latch and table->stats_mutex_lock() to cover the changes of BG_STAT_SHOULD_QUIT, because the flag is being read in dict_stats_update_persistent() while not holding dict_sys.latch. row_discard_tablespace_for_mysql(): Protect stats_bg_flag by exclusive dict_sys.latch, like most other code does. row_quiesce_table_has_fts_index(): Remove unnecessary mutex acquisition. FLUSH TABLES...FOR EXPORT is protected by MDL. row_import::set_root_by_heuristic(): Remove unnecessary mutex acquisition. ALTER TABLE...IMPORT TABLESPACE is protected by MDL. row_ins_sec_index_entry_low(): Replace a call to dict_set_corrupted_index_cache_only(). Reads of index->type were not really protected by dict_sys.mutex, and writes (flagging an index corrupted) should be extremely rare. dict_stats_process_entry_from_defrag_pool(): Only freeze the dictionary, do not lock it exclusively. dict_stats_wait_bg_to_stop_using_table(), DICT_BG_YIELD: Remove trx. We can simply invoke dict_sys.unlock() and dict_sys.lock() directly. dict_acquire_mdl_shared()<trylock=false>: Assert that dict_sys.latch is only held in shared more, not exclusive mode. Only acquire it in exclusive mode if the table needs to be loaded to the cache. dict_sys_t::acquire(): Remove. Relocating elements in dict_sys.table_LRU would require holding an exclusive latch, which we want to avoid for performance reasons. dict_sys_t::allow_eviction(): Add the table first to dict_sys.table_LRU, to compensate for the removal of dict_sys_t::acquire(). This function is only invoked by INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS. dict_table_open_on_id(), dict_table_open_on_name(): If dict_locked=false, try to acquire dict_sys.latch in shared mode. Only acquire the latch in exclusive mode if the table is not found in the cache. Reviewed by: Thirunarayanan Balathandayuthapani	2021-08-31 13:51:35 +03:00
Marko Mäkelä	f3fcf5f45c	Merge 10.5 to 10.6	2021-08-19 12:25:00 +03:00
Vlad Lesin	a2a0ac7c4c	MDEV-26206 gap lock is not set if implicit lock exists Post-push fix for 10.5+. The fix influence MDEV-14479. Before the fix lock_rec_convert_impl_to_expl() did not create explicit lock if caller's transaction owns found implicit lock(see MDEV-14479 for details). After the fix lock_rec_convert_impl_to_expl() can create explicit lock under the above conditions if the requested lock mode is not LOCK_REC_NOT_GAP. And that is why we need to check if the table is X-locked before lock_rec_convert_impl_to_expl() call.	2021-08-19 11:51:21 +03:00
Marko Mäkelä	4a25957274	Merge 10.4 into 10.5	2021-08-18 18:22:35 +03:00
Marko Mäkelä	f84e28c119	Merge 10.3 into 10.4	2021-08-18 16:51:52 +03:00
Vlad Lesin	2d259187a2	MDEV-26206 gap lock is not set if implicit lock exists If lock type is LOCK_GAP or LOCK_ORDINARY, and the transaction holds implicit lock for the record, then explicit gap-lock will not be set for the record, as lock_rec_convert_impl_to_expl() returns true and lock_rec_convert_impl_to_expl() bypasses lock_rec_lock() call. The fix converts explicit lock to implicit one if requested lock type is not LOCK_REC_NOT_GAP. innodb_information_schema test result is also changed as after the fix the following statements execution: SET autocommit=0; INSERT INTO t1 VALUES (5,10); SELECT * FROM t1 FOR UPDATE; leads to additional gap lock requests.	2021-08-17 16:09:55 +03:00
Oleksandr Byelkin	7841a7eb09	Merge branch '10.3' into 10.4	2021-07-31 22:59:58 +02:00
Marko Mäkelä	e305493b1c	MDEV-21175 follow-up: Remove redundant locking; rely on MDL Before entering DML or DDL execution in the storage engine, the SQL layer will have acquired metadata lock (MDL) on the current table name as well as the names of FOREIGN KEY (grand)child tables (that is, tables whose REFERENCES clauses point to the current table). The MDL prevents any metadata changes to these tables, such as RENAME, TRUNCATE, DROP, ALTER. While the MDL on the current table prevents dict_table_t::foreign_set from being modified, it does not prevent the table metadata that the stored pointers are pointing to from being modified. The MDL on the child tables will prevent both dict_table_t::referenced_set as well as the pointed child table metadata from being modified. wsrep_row_upd_index_is_foreign(): Do not unnecessarily acquire the data dictionary latch if Galera replication is not enabled. ha_innobase::can_switch_engines(): Rely on MDL. We are not dereferencing any pointers stored in the sets. row_mysql_freeze_data_dictionary(), row_mysql_unfreeze_data_dictionary(): Remove. row_update_for_mysql(): Call init_fts_doc_id_for_ref() only once. In ALTER TABLE...IMPORT TABLESPACE and FLUSH TABLES...FOR EXPORT the SQL layer is protecting the current table with MDL. We do not need InnoDB latches.	2021-07-29 16:38:24 +03:00
Marko Mäkelä	72764b2f29	Cleanup: Remove a workaround Since commit `1bd681c8b3` dict_stats_exec_sql() is no longer playing dirty tricks. We can remove the workaround in lock_release().	2021-07-29 15:27:47 +03:00
Marko Mäkelä	f50eb0d398	Merge 10.2 into 10.3	2021-07-27 10:47:17 +03:00
Marko Mäkelä	cf1fc59856	MDEV-25594: Improve debug checks trx_t::will_lock: Changed the type to bool. trx_t::is_autocommit_non_locking(): Replaces trx_is_autocommit_non_locking(). trx_is_ac_nl_ro(): Remove (replaced with equivalent assertion expressions). assert_trx_nonlocking_or_in_list(): Remove. Replaced with at least as strict checks in each place. check_trx_state(): Moved to a static function; partially replaced with individual debug assertions implementing equivalent or stricter checks. This is a backport of commit `7b51d11cca` from 10.5.	2021-07-27 08:52:01 +03:00
Marko Mäkelä	83234719f1	MDEV-24671 fixup: Fix an off-by-one error In commit `e71e613353` we accidentally made innodb_lock_wait_timeout=100000000 a "literal" value, not the smallest special value that would mean "infinite" timeout.	2021-07-01 16:37:01 +03:00
Marko Mäkelä	6e12ebd4a7	MDEV-25062: Reduce trx_rseg_t::mutex contention redo_rseg_mutex, noredo_rseg_mutex: Remove the PERFORMANCE_SCHEMA keys. The rollback segment mutex will be uninstrumented. trx_sys_t: Remove pointer indirection for rseg_array, temp_rseg. Align each element to the cache line. trx_sys_t::rseg_id(): Replaces trx_rseg_t::id. trx_rseg_t::ref: Replaces needs_purge, trx_ref_count, skip_allocation in a single std::atomic<uint32_t>. trx_rseg_t::latch: Replaces trx_rseg_t::mutex. trx_rseg_t::history_size: Replaces trx_sys_t::rseg_history_len trx_sys_t::history_size_approx(): Replaces trx_sys.rseg_history_len in those places where the exact count does not matter. We must not acquire any trx_rseg_t::latch while holding index page latches, because normally the trx_rseg_t::latch is acquired before any page latches. trx_sys_t::history_exists(): Replaces trx_sys.rseg_history_len!=0 with an approximation. We remove some unnecessary trx_rseg_t::latch acquisition around trx_undo_set_state_at_prepare() and trx_undo_set_state_at_finish(). Those operations will only access fields that remain constant after trx_rseg_t::init().	2021-06-23 13:42:11 +03:00
Marko Mäkelä	4dfec8b230	Merge 10.5 into 10.6	2021-06-21 17:49:33 +03:00
Marko Mäkelä	a42c80bd48	Merge 10.4 into 10.5	2021-06-21 14:22:22 +03:00
Marko Mäkelä	d3e4fae797	Merge 10.3 into 10.4	2021-06-21 12:38:25 +03:00
Marko Mäkelä	e46f76c974	MDEV-15912: Remove traces of insert_undo Let us simply refuse an upgrade from earlier versions if the upgrade procedure was not followed. This simplifies the purge, commit, and rollback of transactions. Before upgrading to MariaDB 10.3 or later, a clean shutdown of the server (with innodb_fast_shutdown=1 or 0) is necessary, to ensure that any incomplete transactions are rolled back. The undo log format was changed in MDEV-12288. There is only one persistent undo log for each transaction.	2021-06-21 12:34:07 +03:00
Marko Mäkelä	74a0a9877b	MDEV-24731 fixup: Another bogus assertion In commit `de407e7cb4` a debug assertion was added that would not always hold: We could have TRX_STATE_PREPARED here. But, in that case, the transaction should not have been chosen as a deadlock victim.	2021-06-14 12:38:56 +03:00
Marko Mäkelä	6fbf978eec	MDEV-25750: Assertion wait_lock->is_waiting() in lock_wait_rpl_report lock_wait_rpl_report(): Tolerate the loss of a lock wait. commit `c68007d958` (MDEV-24738) had introduced this rare failure.	2021-06-09 22:06:42 +03:00
Marko Mäkelä	1bd681c8b3	MDEV-25506 (3 of 3): Do not delete .ibd files before commit This is a complete rewrite of DROP TABLE, also as part of other DDL, such as ALTER TABLE, CREATE TABLE...SELECT, TRUNCATE TABLE. The background DROP TABLE queue hack is removed. If a transaction needs to drop and create a table by the same name (like TRUNCATE TABLE does), it must first rename the table to an internal #sql-ib name. No committed version of the data dictionary will include any #sql-ib tables, because whenever a transaction renames a table to a #sql-ib name, it will also drop that table. Either the rename will be rolled back, or the drop will be committed. Data files will be unlinked after the transaction has been committed and a FILE_RENAME record has been durably written. The file will actually be deleted when the detached file handle returned by fil_delete_tablespace() will be closed, after the latches have been released. It is possible that a purge of the delete of the SYS_INDEXES record for the clustered index will execute fil_delete_tablespace() concurrently with the DDL transaction. In that case, the thread that arrives later will wait for the other thread to finish. HTON_TRUNCATE_REQUIRES_EXCLUSIVE_USE: A new handler flag. ha_innobase::truncate() now requires that all other references to the table be released in advance. This was implemented by Monty. ha_innobase::delete_table(): If CREATE TABLE..SELECT is detected, we will "hijack" the current transaction, drop the table in the current transaction and commit the current transaction. This essentially fixes MDEV-21602. There is a FIXME comment about making the check less failure-prone. ha_innobase::truncate(), ha_innobase::delete_table(): Implement a fast path for temporary tables. We will no longer allow temporary tables to use the adaptive hash index. dict_table_t::mdl_name: The original table name for the purpose of acquiring MDL in purge, to prevent a race condition between a DDL transaction that is dropping a table, and purge processing undo log records of DML that had executed before the DDL operation. For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the dict_table_t::mdl_name will differ from dict_table_t::name. dict_table_t::parse_name(): Use mdl_name instead of name. dict_table_rename_in_cache(): Update mdl_name. For the internal FTS_ tables of FULLTEXT INDEX, purge would acquire MDL on the FTS_ table name, but not on the main table, and therefore it would be able to run concurrently with a DDL transaction that is dropping the table. Previously, the DROP TABLE queue hack prevented a race between purge and DDL. For now, we introduce purge_sys.stop_FTS() to prevent purge from opening any table, while a DDL transaction that may drop FTS_ tables is in progress. The function fts_lock_table(), which will be invoked before the dictionary is locked, will wait for purge to release any table handles. trx_t::drop_table_statistics(): Drop statistics for the table. This replaces dict_stats_drop_index(). We will drop or rename persistent statistics atomically as part of DDL transactions. On lock conflict for dropping statistics, we will fail instantly with DB_LOCK_WAIT_TIMEOUT, because we will be holding the exclusive data dictionary latch. trx_t::commit_cleanup(): Separated from trx_t::commit_in_memory(). Relax an assertion around fts_commit() and allow DB_LOCK_WAIT_TIMEOUT in addition to DB_DUPLICATE_KEY. The call to fts_commit() is entirely misplaced here and may obviously break the consistency of transactions that affect FULLTEXT INDEX. It needs to be fixed separately. dict_table_t::n_foreign_key_checks_running: Remove (MDEV-21175). The counter was a work-around for missing meta-data locking (MDL) on the SQL layer, and not really needed in MariaDB. ER_TABLE_IN_FK_CHECK: Replaced with ER_UNUSED_28. HA_ERR_TABLE_IN_FK_CHECK: Remove. row_ins_check_foreign_constraints(): Do not acquire dict_sys.latch either. The SQL-layer MDL will protect us. This was reviewed by Thirunarayanan Balathandayuthapani and tested by Matthias Leich.	2021-06-09 17:06:07 +03:00
Marko Mäkelä	a7d68e7a0f	MDEV-25791: Remove UNIV_INTERN Back in 2006 or 2007, when MySQL AB and Innobase Oy existed as separately controlled entities (Innobase had been acquired by Oracle Corporation), MySQL 5.1 introduced a storage engine plugin interface and Oracle made use of it by distributing a separate InnoDB Plugin, which would contain some more bug fixes and improvements, compared to the version of InnoDB that was statically linked with the mysqld server that was distributed by MySQL AB. The built-in InnoDB would export global symbols, which would clash with the symbols of the dynamic InnoDB Plugin (which was supposed to override the built-in one when present). The solution to this problem was to declare all global symbols with UNIV_INTERN, so that they would get the GCC function attribute that specifies hidden visibility. Later, in MariaDB Server, something based on Percona XtraDB (a fork of MySQL InnoDB) became the statically linked implementation, and something closer to MySQL InnoDB was available as a dynamic plugin. Starting with version 10.2, MariaDB Server includes only one InnoDB implementation, and hence any reason to have the UNIV_INTERN definition was lost. btr_get_size_and_reserved(): Move to the same compilation unit with the only caller. innodb_set_buf_pool_size(): Remove. Modify innobase_buffer_pool_size directly. fil_crypt_calculate_checksum(): Merge to the only caller. ha_innobase::innobase_reset_autoinc(): Merge to the only caller. thd_query_start_micro(): Remove. Call thd_start_utime() directly.	2021-05-27 13:28:08 +03:00
Marko Mäkelä	49e2c8f0a6	MDEV-25743: Unnecessary copying of table names in InnoDB dictionary Many InnoDB data dictionary cache operations require that the table name be copied so that it will be NUL terminated. (For example, SYS_TABLES.NAME is not guaranteed to be NUL-terminated.) dict_table_t::is_garbage_name(): Check if a name belongs to the background drop table queue. dict_check_if_system_table_exists(): Remove. dict_sys_t::load_sys_tables(): Load the non-hard-coded system tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL on startup. dict_sys_t::create_or_check_sys_tables(): Replaces dict_create_or_check_foreign_constraint_tables() and dict_create_or_check_sys_virtual(). dict_sys_t::load_table(): Replaces dict_table_get_low() and dict_load_table(). dict_sys_t::find_table(): Renamed from get_table(). dict_sys_t::sys_tables_exist(): Check whether all the non-hard-coded tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL exist. trx_t::has_stats_table_lock(): Moved to dict0stats.cc. Some error messages will now report table names in the internal databasename/tablename format, instead of `databasename`.`tablename`.	2021-05-21 18:03:40 +03:00
Marko Mäkelä	c366845a0b	MDEV-25691: Simplify handlerton::drop_database for InnoDB The implementation of handlerton::drop_database in InnoDB is unnecessarily complex. The minimal implementation should check that no conflicting locks or references exist on the tables, delete all table metadata in a single transaction, and finally delete the tablespaces. Note: DROP DATABASE will delete each individual table that the SQL layer knows about, one table per transaction. The handlerton::drop_database is basically a final cleanup step for removing any garbage that could have been left behind in InnoDB due to some bug, or not having atomic DDL in the past. hash_node_t: Remove. Use the proper data type name in pointers. dict_drop_index_tree(): Do not take the table as a parameter. Instead, return the tablespace ID if the tablespace should be dropped (we are dropping a clustered index tree). fil_delete_tablespace(), fil_system_t::detach(): Return a single detached file handle. Multi-file tablespaces cannot be deleted via this interface. ha_innobase::delete_table(): Remove a work-around for non-atomic DDL and do not try to drop tables with similar-looking name. innodb_drop_database(): Complete rewrite. innobase_drop_database(), dict_get_first_table_name_in_db(), row_drop_database_for_mysql(), drop_all_foreign_keys_in_db(): Remove. row_purge_remove_clust_if_poss_low(), row_undo_ins_remove_clust_rec(): If the tablespace is to be deleted, try to evict the table definition from the cache. Failing that, set dict_table_t::space to nullptr. lock_release_on_rollback(): On the rollback of CREATE TABLE, release all locks that the transaction had on the table, to avoid heap-use-after-free.	2021-05-18 12:53:40 +03:00
Marko Mäkelä	f09d33f521	Merge 10.5 into 10.6	2021-05-18 11:13:45 +03:00
Marko Mäkelä	7b51d11cca	MDEV-25594: Improve debug checks trx_t::will_lock: Changed the type to bool. trx_t::is_autocommit_non_locking(): Replaces trx_is_autocommit_non_locking(). trx_is_ac_nl_ro(): Remove (replaced with equivalent assertion expressions). assert_trx_nonlocking_or_in_list(): Remove. Replaced with at least as strict checks in each place. check_trx_state(): Moved to a static function; partially replaced with individual debug assertions implementing equivalent or stricter checks.	2021-05-18 09:27:59 +03:00
Marko Mäkelä	916b237b3f	Merge 10.5 into 10.6	2021-05-07 15:00:27 +03:00
Marko Mäkelä	52aac131e3	MDEV-18518 Multi-table CREATE and DROP transactions for InnoDB InnoDB used to support at most one CREATE TABLE or DROP TABLE per transaction. This caused complications for DDL operations on partitioned tables (where each partition is treated as a separate table by InnoDB) and FULLTEXT INDEX (where each index is maintained in a number of internal InnoDB tables). dict_drop_index_tree(): Extend the MDEV-24589 logic and treat the purge or rollback of SYS_INDEXES records of clustered indexes specially: by dropping the tablespace if it exists. This is the only form of recovery that we will need. trx_undo_ddl_type: Document the DDL undo log record types better. trx_t::dict_operation: Change the type to bool. trx_t::ddl: Remove. trx_t::table_id, trx_undo_t::table_id: Remove. dict_build_table_def_step(): Remove trx_t::table_id logging. dict_table_close_and_drop(), row_merge_drop_table(): Remove. row_merge_lock_table(): Merged to the only callers, which can call lock_table_for_trx() directly. fts_aux_table_t, fts_aux_id, fts_space_set_t: Remove. fts_drop_orphaned_tables(): Remove. row_merge_rename_index_to_drop(): Remove. Thanks to MDEV-24589, we can simply delete the to-be-dropped indexes from SYS_INDEXES, while still being able to roll back the operation. ha_innobase_inplace_ctx: Make a few data members const. Preallocate trx. prepare_inplace_alter_table_dict(): Simplify the logic. Let the normal rollback take care of some cleanup. row_undo_ins_remove_clust_rec(): Simplify the parsing of SYS_COLUMNS. trx_rollback_active(): Remove the special DROP TABLE logic. trx_undo_mem_create_at_db_start(), trx_undo_reuse_cached(): Always write TRX_UNDO_TABLE_ID as 0.	2021-05-04 13:48:55 +03:00
Marko Mäkelä	a29618f3bd	MDEV-25522: Purge of aborted ADD INDEX leaves orphan locks behind lock_discard_for_index(): New function, to discard locks for an index whose index tree has been purged. By definition, such indexes must be ones for which the MDL upgrade failed in inplace ALTER TABLE and the ADD INDEX operation was never committed. Note: Because we do not support online ADD SPATIAL INDEX, we only have to traverse the lock_sys.rec_hash for B-trees and not the hash tables for R-trees. row_purge_remove_clust_if_poss_low(): Invoke lock_discard_for_index() if necessary before dropping a B-tree for a SYS_INDEXES record.	2021-04-28 17:24:44 +03:00
Marko Mäkelä	8751aa7397	MDEV-25404: ssux_lock_low: Introduce a separate writer mutex Having both readers and writers use a single lock word in futex system calls caused performance regression compared to SRW_LOCK_DUMMY (mutex and 2 condition variables). A contributing factor is that we did not accurately keep track of the number of waiting threads and thus had to invoke system calls to wake up any waiting threads. SUX_LOCK_GENERIC: Renamed from SRW_LOCK_DUMMY. This is the original implementation, with rw_lock (std::atomic<uint32_t>), a mutex and two condition variables. Using a separate writer mutex (as described below) is not possible, because the mutex ownership in a buf_block_t::lock must be able to transfer from a write submitter thread to an I/O completion thread, and pthread_mutex_lock() may assume that the submitter thread is recursively acquiring the mutex that it already holds, while in reality the I/O completion thread is the real owner. POSIX does not define an interface for requesting a mutex to be non-recursive. On Microsoft Windows, srw_lock_low will remain a simple wrapper of SRWLOCK. On 32-bit Microsoft Windows, sizeof(SRWLOCK)=4 while sizeof(srw_lock_low)=8. On other platforms, srw_lock_low is an alias of ssux_lock_low, the Simple (non-recursive) Shared/Update/eXclusive lock. In the futex-based implementation of ssux_lock_low (Linux, OpenBSD, Microsoft Windows), we shall use a dedicated mutex for exclusive requests (writer), and have a WRITER flag in the 'readers' lock word to inform that a writer is holding the lock or waiting for the lock to be granted. When the WRITER flag is set, all lock requests must acquire the writer mutex. Normally, shared (S) lock requests simply perform a compare-and-swap on the 'readers' word. Update locks are implemented as a combination of writer mutex and a normal counter in the 'readers' lock word. The conflict between U and X locks is guaranteed by the writer mutex. Unlike SUX_LOCK_GENERIC, wr_u_downgrade() will not wake up any pending rd_lock() waits. They will wait until u_unlock() releases the writer mutex. The ssux_lock_low is always wrapped by sux_lock (with a recursion count of U and X locks), used for dict_index_t::lock and buf_block_t::lock. Their memory footprint for the futex-based implementation will increase by sizeof(srw_mutex), or 4 bytes. This change addresses a performance regression in read-only benchmarks, such as sysbench oltp_read_only. Also write performance was improved. On 32-bit Linux and OpenBSD, lock_sys_t::hash_table will allocate two hash table elements for each srw_lock (14 instead of 15 hash table cells per 64-byte cache line on IA-32). On Microsoft Windows, sizeof(SRWLOCK)==sizeof(void*) and there is no change. Reviewed by: Vladislav Vaintroub Tested by: Axel Schwenke and Vladislav Vaintroub	2021-04-19 18:15:49 +03:00
Marko Mäkelä	d2e2d32933	Merge 10.5 into 10.6	2021-04-14 12:32:27 +03:00
Marko Mäkelä	6c3e860cbf	Merge 10.4 into 10.5	2021-04-14 11:35:39 +03:00
Marko Mäkelä	5008171b05	Merge 10.3 into 10.4	2021-04-14 10:33:59 +03:00
sjaakola	a1e70388c4	MDEV-24966 Galera multi-master regression After the merging of MDEV-24915, 10.6 branch has regressions with handling of concurrent write load against two or more cluster nodes. These regressions may surface as cluster hanging, node crashes or data inconsistency. With some test scenarios, the only visible symptom could be that the BF victim aborting happens only by innodb lock wait timeout expiration. This would result only to poor performance (by default 50 sec hang for each BF conflict), and could be somewhat difficult to diagnose. This pull request has following fixes to handle concurrent write load from multiple nodes: In lock_wait_wsrep_kill(), the victim trx was expected to be only in TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen that victim has advanced into pre commit state. This was fixed by choosing victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states. Victim transaction may be in several different states at the time of detected lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim may advance further before the actual BF aborting takes place. The BF aborting in MDEV-24915 did not wake the victim, if it was in the state of waiting for some other lock (than the one that was blocking the high priority thread). This anomaly caused the innodb lock wait timeout expiration delays and poor performance symptom. To fix this, lock_wait_wsrep_kill() now looks if victim is in lock waiting state, and uses lock_cancel_waiting_and_release() to cancel this lock wait. wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib), and starts a new transaction if there was no active transaction before. Due to late BF aborting, the victim may have e.g. failed in certification and is already aborting or has aborted at this stage. This has caused problems in testing where BF aborter tries to BF abort himself. The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting or has aborted. Victim may not have started transaction yet in wsrep context, but it may have acquired MDL locks (due to DDL execution), and this has caused BF conflict. Such case does not require aborting in wsrep or replication provider state. BF aborting could cause BF-BF conflict scenario, if victim was already aborted and changed to replayer having high priority as well. This BF-BF conflict scenario is now avoided in lock_wait_wsrep() where we now check if blocking lock holder is also high priority and is ordered before, caller should wait for the lock in this situation. The natural innodb deadlock resolving algorithm could pick BF thread as deadlock victim. This is fixed by giving max weigh to BF threads in Deadlock::report(). MDEV-24341 has changed excution paths in do_command() and this affects BF aborted victim execution. This PR fixes one assert in do_command(): DBUG_ASSERT(!thd->async_state.pending_ops()) Which fired if the thd was BF aborted earlier. This assert is now changed to allow pending_ops() if thd was BF aborted before. With these fixes, long term highly conflicting write load could be run against to node cluster. If binlogging is configured, log_slave_updates should be also set.	2021-04-13 14:58:54 +03:00
Marko Mäkelä	b8c8692fd9	MDEV-24620 ASAN heap-buffer-overflow in btr_pcur_restore_position() Between btr_pcur_store_position() and btr_pcur_restore_position() it is possible that purge empties a table and enlarges index->n_core_fields and index->n_core_null_bytes. Therefore, we must cache index->n_core_fields in btr_pcur_t::old_n_core_fields so that btr_pcur_t::old_rec can be parsed correctly. Unfortunately, this is a huge change, because we will replace "bool leaf" parameters with "ulint n_core" (passing index->n_core_fields, or 0 for non-leaf pages). For special cases where we know that index->is_instant() cannot hold, we may also pass index->n_fields.	2021-04-13 10:28:13 +03:00
Marko Mäkelä	7c524d4414	MDEV-25371 Potential hang in wsrep_is_BF_lock_timeout() In commit `e71e613353` (MDEV-24671), lock_sys.wait_mutex was moved above lock_sys.mutex (which was later replaced with lock_sys.latch) in the latching order. In commit `7cf4419fc4` (MDEV-24789), a potential hang was introduced to Galera. The function lock_wait() would hold lock_sys.wait_mutex while invoking wsrep_is_BF_lock_timeout(), which in turn could acquire LockMutexGuard for some diagnostic printout. wsrep_is_BF_lock_timeout(): Do not invoke trx_print_latched() or LockMutexGuard. lock_sys_t::wr_lock(), lock_sys_t::rd_lock(): Assert that the current thread is not holding lock_sys.wait_mutex. Unfortunately, RW-locks are not covered by SAFE_MUTEX. Reviewed by: Jan Lindström	2021-04-08 13:32:16 +03:00
Marko Mäkelä	cf552f5886	MDEV-25312 Replace fil_space_t::name with fil_space_t::name() A consistency check for fil_space_t::name is causing recovery failures in MDEV-25180 (Atomic ALTER TABLE). So, we'd better remove that field altogether. fil_space_t::name was more or less a copy of dict_table_t::name (except for some special cases), and it was not being used for anything useful. There used to be a name_hash, but it had been removed already in commit `a75dbfd718` (MDEV-12266). We will also remove os_normalize_path(), OS_PATH_SEPARATOR, OS_PATH_SEPATOR_ALT. On Microsoft Windows, we will treat \ and / roughly in the same way. The intention is that for per-table tablespaces, the filenames will always follow the pattern prefix/databasename/tablename.ibd. (Any \ in the prefix must not be converted.) ut_basename_noext(): Remove (unused function). read_link_file(): Replaces RemoteDatafile::read_link_file(). We will ensure that the last two path component separators are forward slashes (converting up to 2 trailing backslashes on Microsoft Windows), so that everywhere else we can assume that data file names end in "/databasename/tablename.ibd". Note: On Microsoft Windows, path names that start with \\?\ must not contain / as path component separators. Previously, such paths did work in the DATA DIRECTORY argument of InnoDB tables. Reviewed by: Vladislav Vaintroub	2021-04-07 18:01:13 +03:00
Marko Mäkelä	1d41e9df4a	MDEV-25314 Assertion `trx.is_wsrep()' failed in wsrep_is_BF_lock_timeout() In commit `1bd4115841` the intention was to move the trx_t::is_wsrep() check to the caller of the function wsrep_is_BF_lock_timeout().	2021-04-01 09:19:21 +03:00
Marko Mäkelä	1bd4115841	After-merge fix: WITH_WSREP=ON CMAKE_BUILD_TYPE=RelWithDebInfo Merge commit `176aaf93d1` accidentally broke the WITH_WSREP release build.	2021-03-31 22:15:54 +03:00
Marko Mäkelä	176aaf93d1	Merge 10.5 into 10.6	2021-03-31 12:04:50 +03:00
Marko Mäkelä	5eae8c2742	Merge 10.4 into 10.5	2021-03-31 11:05:21 +03:00
Marko Mäkelä	50de71b026	Merge 10.3 into 10.4	2021-03-31 09:47:14 +03:00

1 2 3 4 5 ...

507 commits