mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-06 23:25:34 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	0ff90b3b94	MDEV-25506: Kill during DDL leaves orphan .ibd file Before we create an InnoDB data file, we must have persistently started a DDL transaction and written a record in SYS_INDEXES as well as a FILE_CREATE record for creating the file. In that way, if InnoDB is killed before the DDL transaction is committed, the rollback will be able to delete the file in dict_drop_index_tree(). dict_build_table_def_step(): Do not create the tablespace. At this point, we have not written any log, not even for inserting the SYS_TABLES record. dict_create_sys_indexes_tuple(): Relax an assertion to tolerate a missing tablespace before the first index has been created in dict_create_index_step(). dict_build_index_def_step(): Relax the dict_table_open_on_name() parameter, because no tablespace may be available yet. tab_create_graph_create(), row_create_table_for_mysql(), tab_node_t: Remove key_id, mode. ind_create_graph_create(), row_create_index_for_mysql(), ind_node_t: Add key_id, mode. dict_create_index_space(): New function, to create the tablespace during clustered index creation. dict_create_index_step(): After the SYS_INDEXES record has been written, invoke dict_create_index_space() to create the tablespace if needed. fil_ibd_create(): Before creating the file, persistently write a FILE_CREATE record. This will also ensure that an incomplete DDL transaction will be recovered. After creating the file, invoke fsp_header_init().	2021-05-04 13:48:55 +03:00
Marko Mäkelä	52aac131e3	MDEV-18518 Multi-table CREATE and DROP transactions for InnoDB InnoDB used to support at most one CREATE TABLE or DROP TABLE per transaction. This caused complications for DDL operations on partitioned tables (where each partition is treated as a separate table by InnoDB) and FULLTEXT INDEX (where each index is maintained in a number of internal InnoDB tables). dict_drop_index_tree(): Extend the MDEV-24589 logic and treat the purge or rollback of SYS_INDEXES records of clustered indexes specially: by dropping the tablespace if it exists. This is the only form of recovery that we will need. trx_undo_ddl_type: Document the DDL undo log record types better. trx_t::dict_operation: Change the type to bool. trx_t::ddl: Remove. trx_t::table_id, trx_undo_t::table_id: Remove. dict_build_table_def_step(): Remove trx_t::table_id logging. dict_table_close_and_drop(), row_merge_drop_table(): Remove. row_merge_lock_table(): Merged to the only callers, which can call lock_table_for_trx() directly. fts_aux_table_t, fts_aux_id, fts_space_set_t: Remove. fts_drop_orphaned_tables(): Remove. row_merge_rename_index_to_drop(): Remove. Thanks to MDEV-24589, we can simply delete the to-be-dropped indexes from SYS_INDEXES, while still being able to roll back the operation. ha_innobase_inplace_ctx: Make a few data members const. Preallocate trx. prepare_inplace_alter_table_dict(): Simplify the logic. Let the normal rollback take care of some cleanup. row_undo_ins_remove_clust_rec(): Simplify the parsing of SYS_COLUMNS. trx_rollback_active(): Remove the special DROP TABLE logic. trx_undo_mem_create_at_db_start(), trx_undo_reuse_cached(): Always write TRX_UNDO_TABLE_ID as 0.	2021-05-04 13:48:55 +03:00
Marko Mäkelä	54e2e70194	MDEV-25524 heap-use-after-free in fil_space_t::rename() In commit `91599701d0` (MDEV-25312) some recovery code for TRUNCATE TABLE was broken causing a regression in a case where undo log for a RENAME TABLE operation had been durably written but the tablespace had not been renamed yet. row_rename_table_for_mysql(): Add a DEBUG_SYNC point for the test case, and simplify the logic and trim the error messages. fil_space_t::rename(): Simplify the operation. Merge the necessary part of fil_rename_tablespace_check(). If there is no change to the file name, do nothing. dict_table_t::rename_tablespace(): Refactored from dict_table_rename_in_cache(). row_undo_ins_parse_undo_rec(): On rolling back TRX_UNDO_RENAME_TABLE, invoke dict_table_t::rename_tablespace() even if the table name matches. os_file_rename_func(): Temporarily relax an assertion that would fail during the recovery in the test innodb.truncate_crash.	2021-04-29 18:31:59 +03:00
Marko Mäkelä	a29618f3bd	MDEV-25522: Purge of aborted ADD INDEX leaves orphan locks behind lock_discard_for_index(): New function, to discard locks for an index whose index tree has been purged. By definition, such indexes must be ones for which the MDL upgrade failed in inplace ALTER TABLE and the ADD INDEX operation was never committed. Note: Because we do not support online ADD SPATIAL INDEX, we only have to traverse the lock_sys.rec_hash for B-trees and not the hash tables for R-trees. row_purge_remove_clust_if_poss_low(): Invoke lock_discard_for_index() if necessary before dropping a B-tree for a SYS_INDEXES record.	2021-04-28 17:24:44 +03:00
Marko Mäkelä	f619d79bda	MDEV-25491: Race condition between DROP TABLE and purge of SYS_INDEXES record btr_free_if_exists(): Always use the BUF_GET_POSSIBLY_FREED mode when accessing pages, because due to MDEV-24589 the function fil_space_t::set_stopping(true) can be called at any time during the execution of this function. mtr_t::m_freeing_tree: New data member for debugging purposes. buf_page_get_low(): Assert that the BUF_GET mode is not being used anywhere during the execution of btr_free_if_exists(). In all code related to freeing or allocating pages, we will add some robustness, by making more use of BUF_GET_POSSIBLY_FREED and by reporting an error instead of crashing in some cases of corruption.	2021-04-28 17:24:44 +03:00
Marko Mäkelä	a81aec1505	MDEV-25491 preparation: Clean up tablespace destruction fil_check_pending_ops(), fil_check_pending_io(): Remove. These functions were actually duplicating each other ever since commit `118e258aaa` (MDEV-23855). fil_space_t::check_pending_operations(): Replaces fil_check_pending_operations() and incorporates the logic of fil_check_pending_ops(). Avoid unnecessary lookups for the tablespace. Just wait for the reference count to drop to zero. fil_space_t::io(): Remove an unnecessary condition. We can (and probably better should) refuse asynchronous reads of undo tablespaces that are being truncated. fil_truncate_prepare(): Remove. trx_purge_truncate_history(): Implement the necessary steps that used to be in fil_truncate_prepare().	2021-04-28 17:24:44 +03:00
Thirunarayanan Balathandayuthapani	b862377c3e	MDEV-25503 InnoDB hangs on startup during recovery InnoDB startup hangs if a DDL transaction needs to be rolled back and a recovered transaction on statistics tables exists. In that case, InnoDB should rollback the transaction which holds locks on innodb_table_stats or innodb_index_stats during trx_rollback_or_clean_recovered().	2021-04-27 17:07:37 +05:30
Nikita Malyavin	300253acf1	revive innodb_debug_sync innodb_debug_sync was introduced in commit `b393e2cb0c` and reverted in commit `fc58c17216` due to memory leak reported by valgrind, see MDEV-21336. The leak is now fixed by adding `rw_lock_free(&slot->debug_sync_lock)` after background thread working loop is finished, and the patch is reapplied, with respect to c++98 fixes by Marko. The missing DEBUG_SYNC for MDEV-18546 in row0vers.cc is also reapplied.	2021-04-27 11:51:17 +03:00
Marko Mäkelä	b81382887c	MDEV-25512 Deadlock between sux_lock::u_x_upgrade() and sux_lock::u_lock() In the SUX_LOCK_GENERIC implementation, we can remember at most one pending exclusive lock request. If multiple exclusive lock requests are pending, the WRITER_WAITING flag will be cleared when the first waiting writer acquires the exclusive lock. ssux_lock_low::update_lock(): If WRITER_WAITING is set, wake up the writer even if the UPDATER flag is set, because the waiting writer may be in the process of upgrading its U lock to X. rw_lock::read_unlock(): Also indicate that an X lock waiter must be woken up if an U lock exists. This fix may cause unnecessary wake-ups and system calls, but this is the best that we can do. Ideally we would use the MDEV-25404 idea of a separate 'writer' mutex, but there is no portable way to request that a non-recursive mutex be created, and InnoDB requires the ability to transfer buf_block_t::lock ownership to an I/O thread. To allow problems like this to be caught more reliably in the future, we add a unit test for srw_mutex, srw_lock, ssux_lock, sux_lock.	2021-04-25 12:58:16 +03:00
Marko Mäkelä	8121d03fd4	MDEV-25404 fixup: Fix ssux_lock_low::u_wr_upgrade() The U-to-X upgrade turned out to be incorrect. A debug assertion failed in wr_wait(), called from mtr_defer_drop_ahi() in a stress test with innodb_adaptive_hash_index=ON. A correct upgrade procedure ought to be readers.fetch_add(WRITER-1) to register ourselves as a WRITER (or waiting writer) and to release the reference that was being held for the U lock. Thanks to Matthias Leich for catching the problem.	2021-04-22 15:43:30 +03:00
Marko Mäkelä	1636db5443	Revert "MDEV-24589 DROP TABLE is not crash-safe" This reverts commit `e731a28394`. A crash occurred during the test stress.ddl_innodb when fil_delete_tablespace() for DROP TABLE was waiting in fil_check_pending_operations() and a purge thread for handling an earlier DROP INDEX was attempting to load the index root page in btr_free_if_exists() and btr_free_root_check(). The function buf_page_get_gen() would write out several times "trying to read...being-dropped tablespace" before giving up and committing suicide. It turns out that during any page access in btr_free_if_exists(), fil_space_t::set_stopping() could have been invoked by fil_check_pending_operations(), as part of dropping the tablespace. Preventing this race condition would require extensive changes to the allocation code or some locking mechanism that would ensure that we only set the flag if btr_free_if_exists() is not in progress. Either way, that could be a too risky change in a GA release. Because MDEV-24589 is not strictly necessary in the 10.5 release series and it only is a requirement for MDEV-25180 in a later major release, we will revert the change from 10.5.	2021-04-22 14:57:23 +03:00
Marko Mäkelä	4930f9c94b	Merge 10.5 into 10.6	2021-04-21 11:45:00 +03:00
Marko Mäkelä	80ed136e6d	Merge 10.4 into 10.5	2021-04-21 09:01:01 +03:00
Marko Mäkelä	a0588d54a2	Merge 10.3 into 10.4	2021-04-21 07:58:42 +03:00
Marko Mäkelä	75c01f39b1	Merge 10.2 into 10.3	2021-04-21 07:25:48 +03:00
Marko Mäkelä	922e676b43	MDEV-25466 Merge new release of InnoDB 5.7.34 to 10.2	2021-04-20 17:33:36 +03:00
Monty	031f11717d	Fix all warnings given by UBSAN The easiest way to compile and test the server with UBSAN is to run: ./BUILD/compile-pentium64-ubsan and then run mysql-test-run. After this commit, one should be able to run this without any UBSAN warnings. There is still a few compiler warnings that should be fixed at some point, but these do not expose any real bugs. The 'special' cases where we disable, suppress or circumvent UBSAN are: - ref10 source (as here we intentionally do some shifts that UBSAN complains about. - x86 version of optimized int#korr() methods. UBSAN do not like unaligned memory access of integers. Fixed by using byte_order_generic.h when compiling with UBSAN - We use smaller thread stack with ASAN and UBSAN, which forced me to disable a few tests that prints the thread stack size. - Verifying class types does not work for shared libraries. I added suppression in mysql-test-run.pl for this case. - Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is safe to have overflows (two cases, in item_func.cc). Things fixed: - Don't left shift signed values (byte_order_generic.h, mysqltest.c, item_sum.cc and many more) - Don't assign not non existing values to enum variables. - Ensure that bool and enum values are properly initialized in constructors. This was needed as UBSAN checks that these types has correct values when one copies an object. (gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...) - Ensure we do not called handler functions on unallocated objects or deleted objects. (events.cc, sql_acl.cc). - Fixed bugs in Item_sp::Item_sp() where we did not call constructor on Query_arena object. - Fixed several cast of objects to an incompatible class! (Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc, sql_select.cc ...) - Ensure we do not do integer arithmetic that causes over or underflows. This includes also ++ and -- of integers. (Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...) - Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that value_type is initialized to this instead of to -1, which is not a valid enum value for json_value_types. - Ensure we do not call memcpy() when second argument could be null. - Fixed that Item_func_str::make_empty_result() creates an empty string instead of a null string (safer as it ensures we do not do arithmetic on null strings). Other things: - Changed struct st_position to an OBJECT and added an initialization function to it to ensure that we do not copy or use uninitialized members. The change to a class was also motived that we used "struct st_position" and POSITION randomly trough the code which was confusing. - Notably big rewrite in sql_acl.cc to avoid using deleted objects. - Changed in sql_partition to use '^' instead of '-'. This is safe as the operator is either 0 or 0x8000000000000000ULL. - Added check for select_nr < INT_MAX in JOIN::build_explain() to avoid bug when get_select() could return NULL. - Reordered elements in POSITION for better alignment. - Changed sql_test.cc::print_plan() to use pointers instead of objects. - Fixed bug in find_set() where could could execute '1 << -1'. - Added variable have_sanitizer, used by mtr. (This variable was before only in 10.5 and up). It can now have one of two values: ASAN or UBSAN. - Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked it virtual. This was an effort to get UBSAN to work with loaded storage engines. I kept the change as the new place is better. - Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast in tabutil.cpp. - Added HAVE_REPLICATION around usage of rgi_slave, to get embedded server to compile with UBSAN. (Patch from Marko). - Added #ifdef for powerpc64 to avoid a bug in old gcc versions related to integer arithmetic. Changes that should not be needed but had to be done to suppress warnings from UBSAN: - Added static_cast<<uint16_t>> around shift to get rid of a LOT of compiler warnings when using UBSAN. - Had to change some '/' of 2 base integers to shift to get rid of some compile time warnings. Reviewed by: - Json changes: Alexey Botchkov - Charset changes in ctype-uca.c: Alexander Barkov - InnoDB changes & Embedded server: Marko Mäkelä - sql_acl.cc changes: Vicențiu Ciorbaru - build_explain() changes: Sergey Petrunia	2021-04-20 12:30:09 +03:00
Marko Mäkelä	8751aa7397	MDEV-25404: ssux_lock_low: Introduce a separate writer mutex Having both readers and writers use a single lock word in futex system calls caused performance regression compared to SRW_LOCK_DUMMY (mutex and 2 condition variables). A contributing factor is that we did not accurately keep track of the number of waiting threads and thus had to invoke system calls to wake up any waiting threads. SUX_LOCK_GENERIC: Renamed from SRW_LOCK_DUMMY. This is the original implementation, with rw_lock (std::atomic<uint32_t>), a mutex and two condition variables. Using a separate writer mutex (as described below) is not possible, because the mutex ownership in a buf_block_t::lock must be able to transfer from a write submitter thread to an I/O completion thread, and pthread_mutex_lock() may assume that the submitter thread is recursively acquiring the mutex that it already holds, while in reality the I/O completion thread is the real owner. POSIX does not define an interface for requesting a mutex to be non-recursive. On Microsoft Windows, srw_lock_low will remain a simple wrapper of SRWLOCK. On 32-bit Microsoft Windows, sizeof(SRWLOCK)=4 while sizeof(srw_lock_low)=8. On other platforms, srw_lock_low is an alias of ssux_lock_low, the Simple (non-recursive) Shared/Update/eXclusive lock. In the futex-based implementation of ssux_lock_low (Linux, OpenBSD, Microsoft Windows), we shall use a dedicated mutex for exclusive requests (writer), and have a WRITER flag in the 'readers' lock word to inform that a writer is holding the lock or waiting for the lock to be granted. When the WRITER flag is set, all lock requests must acquire the writer mutex. Normally, shared (S) lock requests simply perform a compare-and-swap on the 'readers' word. Update locks are implemented as a combination of writer mutex and a normal counter in the 'readers' lock word. The conflict between U and X locks is guaranteed by the writer mutex. Unlike SUX_LOCK_GENERIC, wr_u_downgrade() will not wake up any pending rd_lock() waits. They will wait until u_unlock() releases the writer mutex. The ssux_lock_low is always wrapped by sux_lock (with a recursion count of U and X locks), used for dict_index_t::lock and buf_block_t::lock. Their memory footprint for the futex-based implementation will increase by sizeof(srw_mutex), or 4 bytes. This change addresses a performance regression in read-only benchmarks, such as sysbench oltp_read_only. Also write performance was improved. On 32-bit Linux and OpenBSD, lock_sys_t::hash_table will allocate two hash table elements for each srw_lock (14 instead of 15 hash table cells per 64-byte cache line on IA-32). On Microsoft Windows, sizeof(SRWLOCK)==sizeof(void*) and there is no change. Reviewed by: Vladislav Vaintroub Tested by: Axel Schwenke and Vladislav Vaintroub	2021-04-19 18:15:49 +03:00
Marko Mäkelä	040c16ab8b	MDEV-25404: Optimize srw_mutex on Linux, OpenBSD, Windows On Linux, OpenBSD and Microsoft Windows, srw_mutex was an alias for a rw-lock while we only need mutex functionality. Let us implement a futex-based mutex with one bit for HOLDER and 31 bits for counting waiting requests. srw_lock::wr_unlock() can avoid waking up a waiter when no waiting requests exist. (Previously, we only had 1-bit rw_lock::WRITER_WAITING flag that could be wrongly cleared if multiple waiting wr_lock() exist. Now we have no problem with up to 2,147,483,648 conflicting threads.) On 64-bit Microsoft Windows, the advantage is that sizeof(srw_mutex) is 4, while sizeof(SRWLOCK) would be 8. Reviewed by: Vladislav Vaintroub	2021-04-19 18:03:17 +03:00
Eugene Kosov	a3871cd283	MDEV-22255 SIGABRT: Assertion `id' failed in trx_write_trx_id on INSERT \| Assertion` id > 0' failed in trx_write_trx_id \| Assertion `val > 0' failed in row_upd_index_entry_sys_field \| Assertion` thr_get_trx(thr)->id \|\| index->table->no_rollback()' failed.	2021-04-15 17:53:33 +03:00
Marko Mäkelä	d2e2d32933	Merge 10.5 into 10.6	2021-04-14 12:32:27 +03:00
Marko Mäkelä	6c3e860cbf	Merge 10.4 into 10.5	2021-04-14 11:35:39 +03:00
Marko Mäkelä	5008171b05	Merge 10.3 into 10.4	2021-04-14 10:33:59 +03:00
sjaakola	a1e70388c4	MDEV-24966 Galera multi-master regression After the merging of MDEV-24915, 10.6 branch has regressions with handling of concurrent write load against two or more cluster nodes. These regressions may surface as cluster hanging, node crashes or data inconsistency. With some test scenarios, the only visible symptom could be that the BF victim aborting happens only by innodb lock wait timeout expiration. This would result only to poor performance (by default 50 sec hang for each BF conflict), and could be somewhat difficult to diagnose. This pull request has following fixes to handle concurrent write load from multiple nodes: In lock_wait_wsrep_kill(), the victim trx was expected to be only in TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen that victim has advanced into pre commit state. This was fixed by choosing victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states. Victim transaction may be in several different states at the time of detected lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim may advance further before the actual BF aborting takes place. The BF aborting in MDEV-24915 did not wake the victim, if it was in the state of waiting for some other lock (than the one that was blocking the high priority thread). This anomaly caused the innodb lock wait timeout expiration delays and poor performance symptom. To fix this, lock_wait_wsrep_kill() now looks if victim is in lock waiting state, and uses lock_cancel_waiting_and_release() to cancel this lock wait. wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib), and starts a new transaction if there was no active transaction before. Due to late BF aborting, the victim may have e.g. failed in certification and is already aborting or has aborted at this stage. This has caused problems in testing where BF aborter tries to BF abort himself. The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting or has aborted. Victim may not have started transaction yet in wsrep context, but it may have acquired MDL locks (due to DDL execution), and this has caused BF conflict. Such case does not require aborting in wsrep or replication provider state. BF aborting could cause BF-BF conflict scenario, if victim was already aborted and changed to replayer having high priority as well. This BF-BF conflict scenario is now avoided in lock_wait_wsrep() where we now check if blocking lock holder is also high priority and is ordered before, caller should wait for the lock in this situation. The natural innodb deadlock resolving algorithm could pick BF thread as deadlock victim. This is fixed by giving max weigh to BF threads in Deadlock::report(). MDEV-24341 has changed excution paths in do_command() and this affects BF aborted victim execution. This PR fixes one assert in do_command(): DBUG_ASSERT(!thd->async_state.pending_ops()) Which fired if the thd was BF aborted earlier. This assert is now changed to allow pending_ops() if thd was BF aborted before. With these fixes, long term highly conflicting write load could be run against to node cluster. If binlogging is configured, log_slave_updates should be also set.	2021-04-13 14:58:54 +03:00
Marko Mäkelä	b8c8692fd9	MDEV-24620 ASAN heap-buffer-overflow in btr_pcur_restore_position() Between btr_pcur_store_position() and btr_pcur_restore_position() it is possible that purge empties a table and enlarges index->n_core_fields and index->n_core_null_bytes. Therefore, we must cache index->n_core_fields in btr_pcur_t::old_n_core_fields so that btr_pcur_t::old_rec can be parsed correctly. Unfortunately, this is a huge change, because we will replace "bool leaf" parameters with "ulint n_core" (passing index->n_core_fields, or 0 for non-leaf pages). For special cases where we know that index->is_instant() cannot hold, we may also pass index->n_fields.	2021-04-13 10:28:13 +03:00
Marko Mäkelä	6e6318b29b	Merge 10.2 into 10.3	2021-04-13 10:26:01 +03:00
Thirunarayanan Balathandayuthapani	cf2c6b7f8d	MDEV-24971 InnoDB access freed virtual column after rollback of secondary index Problem: ======== InnoDB fails to clean the index stub if it fails to add the virtual index which contains new virtual column. But it clears the newly virtual column from index in clear_added_indexes() during inplace_alter_table. On commit, InnoDB evicts and reload the table. In case of rollback, it doesn't happen. InnoDB clears the ABORTED index while opening the table or doing the DDL. In the mean time, InnoDB can access the dropped virtual index columns while creating prebuilt or rollback of concurrent DML. Solution: ========== (1) InnoDB should maintain newly added virtual column while rollbacking the newly added virtual index. (2) InnoDB must not defer the index removal if the alter table is executed with LOCK=EXCLUSIVE. (3) For LOCK=SHARED, InnoDB should check whether the table has any other transaction lock other than alter transaction before deferring the index stub. Replaced has_new_v_col with dict_add_vcol_info in dict_index_t to indicate whether the index has any new virtual column. dict_index_t::has_new_v_col(): Returns whether the index has newly added virtual column, it doesn't say which columns are newly added virtual column ha_innobase_inplace_ctx::is_new_vcol(): Return whether the given column is added as a part of the current alter. ha_innobase_inplace_ctx::clean_new_vcol_index(): Copy the newly added virtual column to new_vcol_info in dict_index_t. Replace the column in the index fields with virtual column stored in new_vcol_info. dict_index_t::assign_new_v_col(): Store the number of virtual column added in index as a part of alter table. dict_index_t::get_n_new_vcol(): Get the number of newly added virtual column dict_index_t::assign_drop_v_col(): Allocate the memory for adding new virtual column in new_vcol_info. dict_index_t::add_drop_v_col(): Add the newly added virtual column in new_vcol_info. dict_table_t::has_lock_for_other_trx(): Whether the table has any other transaction lock than given transaction. row_merge_drop_indexes(): Add parameter alter_trx and check whether the table has any other lock than alter transaction.	2021-04-12 16:06:06 +05:30
Thirunarayanan Balathandayuthapani	1ac4d0c168	BtrBulk::table_name(): Return the table name while displaying table name for fts diagnostics	2021-04-09 17:38:21 +05:30
Marko Mäkelä	450c017c2d	Merge 10.2 into 10.3	2021-04-09 14:32:06 +03:00
Thirunarayanan Balathandayuthapani	5a3151bcda	Improve diagnostics in order to catch MDEV-18868 and similar bugs	2021-04-09 12:01:42 +05:30
Marko Mäkelä	de119fa2b6	MDEV-25297 Assertion: trx->roll_limit <= trx->undo_no in ROLLBACK TO SAVEPOINT In commit `8ea923f55b` (MDEV-24818) when we optimized multi-statement INSERT transactions into empty tables, we would roll back the entire transaction on any error. But, we would fail to invalidate any SAVEPOINT that had been requested in the past. trx_t::savepoints_discard(): Renamed from trx_roll_savepoints_free(). row_mysql_handle_errors(): If we were in bulk insert, invoke trx_t::savepoints_discard(). In this way, a future attempt of ROLLBACK TO SAVEPOINT will return an error.	2021-04-09 09:18:07 +03:00
Marko Mäkelä	1fde581237	MDEV-25315 Crash in SHOW ENGINE INNODB STATUS In commit `8ea923f55b` (MDEV-24818) when we optimized multi-statement INSERT into an empty table, we would sometimes wrongly enable bulk insert into a table that is actually already using row-level locking and undo logging. trx_has_lock_x(): New predicate, to check if the transaction of the current thread is holding an exclusive lock on a table. trx_undo_report_row_operation(): Only invoke trx_mod_table_time_t::start_bulk_insert() if trx_has_lock_x() holds.	2021-04-08 18:01:27 +03:00
Marko Mäkelä	7c524d4414	MDEV-25371 Potential hang in wsrep_is_BF_lock_timeout() In commit `e71e613353` (MDEV-24671), lock_sys.wait_mutex was moved above lock_sys.mutex (which was later replaced with lock_sys.latch) in the latching order. In commit `7cf4419fc4` (MDEV-24789), a potential hang was introduced to Galera. The function lock_wait() would hold lock_sys.wait_mutex while invoking wsrep_is_BF_lock_timeout(), which in turn could acquire LockMutexGuard for some diagnostic printout. wsrep_is_BF_lock_timeout(): Do not invoke trx_print_latched() or LockMutexGuard. lock_sys_t::wr_lock(), lock_sys_t::rd_lock(): Assert that the current thread is not holding lock_sys.wait_mutex. Unfortunately, RW-locks are not covered by SAFE_MUTEX. Reviewed by: Jan Lindström	2021-04-08 13:32:16 +03:00
Daniel Black	553ef1a78b	MDEV-13115: Implement SELECT SKIP LOCKED Adds an implementation for SELECT ... FOR UPDATE SKIP LOCKED / SELECT ... LOCK IN SHARED MODE SKIP LOCKED This is implemented only InnoDB at the moment, not in RockDB yet. This adds a new hander flag HA_CAN_SKIP_LOCKED than will be used when the storage engine advertises the flag. When a storage engine indicates this flag it will get TL_WRITE_SKIP_LOCKED and TL_READ_SKIP_LOCKED transaction types. The Lex structure has been updated to store both the FOR UPDATE/LOCK IN SHARE as well as the SKIP LOCKED so the SHOW CREATE VIEW implementation is simplier. "SELECT FOR UPDATE ... SKIP LOCKED" combined with CREATE TABLE AS or INSERT.. SELECT on the result set is not safe for STATEMENT based replication. MIXED replication will replicate this as row based events." Thanks to guidance from Facebook commit `193896c466` This helped verify basic test case, and components that need implementing (even though every part was implemented differently). Thanks Marko for guidance on simplier InnoDB implementation. Reviewers: Marko, Monty	2021-04-08 16:51:36 +10:00
Marko Mäkelä	cf552f5886	MDEV-25312 Replace fil_space_t::name with fil_space_t::name() A consistency check for fil_space_t::name is causing recovery failures in MDEV-25180 (Atomic ALTER TABLE). So, we'd better remove that field altogether. fil_space_t::name was more or less a copy of dict_table_t::name (except for some special cases), and it was not being used for anything useful. There used to be a name_hash, but it had been removed already in commit `a75dbfd718` (MDEV-12266). We will also remove os_normalize_path(), OS_PATH_SEPARATOR, OS_PATH_SEPATOR_ALT. On Microsoft Windows, we will treat \ and / roughly in the same way. The intention is that for per-table tablespaces, the filenames will always follow the pattern prefix/databasename/tablename.ibd. (Any \ in the prefix must not be converted.) ut_basename_noext(): Remove (unused function). read_link_file(): Replaces RemoteDatafile::read_link_file(). We will ensure that the last two path component separators are forward slashes (converting up to 2 trailing backslashes on Microsoft Windows), so that everywhere else we can assume that data file names end in "/databasename/tablename.ibd". Note: On Microsoft Windows, path names that start with \\?\ must not contain / as path component separators. Previously, such paths did work in the DATA DIRECTORY argument of InnoDB tables. Reviewed by: Vladislav Vaintroub	2021-04-07 18:01:13 +03:00
Marko Mäkelä	75db05c599	Merge 10.5 into 10.6 (except MDEV-24630) Commit `76d2846a71` was for 10.5 only. It caused some performance regression on 10.6 in some cases, likely related to the removal of ib_mutex_t in MDEV-21452.	2021-03-30 13:28:05 +03:00
Marko Mäkelä	8c2e3259c1	MDEV-24302 follow-up: RESET MASTER hangs As pointed out by Andrei Elkin, the previous fix did not fix one race condition that may have caused the observed hang. innodb_log_flush_request(): If we are enqueueing the very first request at the same time the log write is being completed, we must ensure that a near-concurrent call to log_flush_notify() will not result in a missed notification. We guarantee this by release-acquire operations on log_requests.start and log_sys.flushed_to_disk_lsn. log_flush_notify_and_unlock(): Cleanup: Always release the mutex. log_sys_t::get_flushed_lsn(): Use acquire memory order. log_sys_t::set_flushed_lsn(): Use release memory order. log_sys_t::set_lsn(): Use release memory order. log_sys_t::get_lsn(): Use relaxed memory order by default, and allow the caller to specify acquire memory order explicitly. Whenever the log_sys.mutex is being held or when log writes are prohibited during startup, we can use a relaxed load. Likewise, in some assertions where reading a stale value of log_sys.lsn should not matter, we can use a relaxed load. This will cause some additional instructions to be emitted on architectures that do not implement Total Store Ordering (TSO), such as POWER, ARM, and RISC-V Weak Memory Ordering (RVWMO).	2021-03-30 10:29:11 +03:00
Marko Mäkelä	2ad61c6782	Merge 10.5 into 10.6	2021-03-29 16:16:12 +03:00
Krunal Bauskar	0f6f72965b	MDEV-25281: Switch to use non-atomic (vs atomic) distributed counter to track page-access counter As part of MDEV-21212, n_page_gets that is meant to track page access, is ported to use distributed counter that default uses atomic sub-counters. n_page_gets originally was a non-atomic counter that represented an approximate value of pages tracked. Using the said analogy it doesn't need to be an atomic distributed counter. This patch introduces an interface that allows distributed counter to be used with atomic and non-atomic sub-counter (through template) and also port n_page_gets to use non-atomic distributed counter using the said updated interface.	2021-03-29 16:15:28 +03:00
Marko Mäkelä	e8b7fceb82	MDEV-24302: RESET MASTER hangs Starting with MariaDB 10.5, roughly after MDEV-23855 was fixed, we are observing sporadic hangs during the execution of the RESET MASTER statement. We are hoping to fix the hangs with these changes, but due to the rather infrequent occurrence of the hangs and our inability to reliably reproduce the hangs, we cannot be sure of this. What we do know is that innodb_force_recovery=2 (or a larger setting) will prevent srv_master_callback (the former srv_master_thread) from running. In that mode, periodic log flushes would never occur and RESET MASTER could hang indefinitely. That is demonstrated by the new test case that was developed by Andrei Elkin. We fix this case by implementing a special case for it. This also includes some code cleanup and renames of misleadingly named code. The interface has nothing to do with log checkpoints in the storage engine; it is only about requesting log writes to be persistent. handlerton::commit_checkpoint_request, commit_checkpoint_notify_ha(): Remove the unused parameter hton. log_requests.start: Replaces pending_checkpoint_list. log_requests.end: Replaces pending_checkpoint_list_end. log_requests.mutex: Replaces pending_checkpoint_mutex. log_flush_notify_and_unlock(), log_flush_notify(): Replaces innobase_mysql_log_notify(). The new implementation should be functionally equivalent to the old one. innodb_log_flush_request(): Replaces innobase_checkpoint_request(). Implement a fast path for common cases, and reduce the mutex hold time. POSSIBLE FIX OF THE HANG: We will invoke commit_checkpoint_notify_ha() for the current request if it is already satisfied, as well as invoke log_flush_notify_and_unlock() for any satisfied requests. log_write(): Invoke log_flush_notify() when the write is already durable. This was missing WITH_PMEM when the log is in persistent memory. Reviewed by: Vladislav Vaintroub	2021-03-29 15:16:23 +03:00
Marko Mäkelä	e538cb095f	Merge 10.5 into 10.6	2021-03-27 18:03:03 +02:00
Marko Mäkelä	80459bcbd4	Merge 10.4 into 10.5	2021-03-27 17:37:42 +02:00
Marko Mäkelä	7ae37ff74f	Merge 10.3 into 10.4	2021-03-27 17:12:28 +02:00
Marko Mäkelä	3157fa182a	Merge 10.2 into 10.3	2021-03-27 16:11:26 +02:00
Marko Mäkelä	356c149603	Merge 10.5 into 10.6	2021-03-26 11:50:32 +02:00
Daniel Black	bcb9ca4105	MEM_CHECK_DEFINED: replace HAVE_valgrind HAVE_valgrind_or_MSAN to HAVE_valgrind was incorrect in `af784385b4`. In my_valgrind.h when clang exists (hence no __has_feature(memory_sanitizer), and -DWITH_VALGRIND=1, but without memcheck.h, we end up with a MEM_CHECK_DEFINED being empty. If we are also doing a CMAKE_BUILD_TYPE=Debug this results a number of [-Werror,-Wunused-variable] errors because MEM_CHECK_DEFINED is empty. With MEM_CHECK_DEFINED empty, there becomes no uses of this of the fixed field and innodb variables in this patch. So we stop using HAVE_valgrind as catchall and use the name HAVE_CHECK_MEM to indicate that a CHECK_MEM_DEFINED function exists. Reviewer: Monty Corrects: `af784385b4`	2021-03-26 07:58:49 +11:00
Eugene Kosov	1274bb7729	MDEV-25223 change fil_system_t::space_list and fil_system_t::named_spaces from UT_LIST to ilist Mostly a refactoring. Also, debug functions were added for ease of life while debugging	2021-03-24 15:15:18 +03:00
Marko Mäkelä	e731a28394	MDEV-24589 DROP TABLE is not crash-safe row_upd_clust_step(): Remove the "trigger" on DELETE SYS_INDEXES that would invoke dict_drop_index_tree(). Let us do it on purge. row_purge_remove_clust_if_poss_low(): Invoke dict_drop_index_tree() when purging a delete-marked SYS_INDEXES record.	2021-03-23 16:20:15 +02:00
Marko Mäkelä	8f23aab068	MDEV-15528 fixup: Remove dict_table_open_on_index_id() dict_table_open_on_index_id(): Remove. This function was used by the background scrubbing, which was removed in commit `a5584b13d1`.	2021-03-23 14:47:04 +02:00
Marko Mäkelä	0f8caadc96	MDEV-22653: Remove the useless parameter innodb_simulate_comp_failures The debug parameter innodb_simulate_comp_failures injected compression failures for ROW_FORMAT=COMPRESSED tables, breaking the pre-existing logic that I had implemented in the InnoDB Plugin for MySQL 5.1 to prevent compressed page overflows. A much better check is already achieved by defining UNIV_ZIP_COPY at the compilation time. (Only UNIV_ZIP_DEBUG is part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON.)	2021-03-22 18:12:44 +02:00

1 2 3 4 5 ...

3,652 commits