mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-30 10:31:54 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	691f923906	Merge 10.5 into 10.6	2024-02-13 20:42:59 +02:00
Marko Mäkelä	81f3e97bc8	MDEV-33383: Corrupted red-black tree due to incorrect comparison fts_doc_id_cmp(): Replaces several duplicated functions for comparing two doc_id_t. On IA-32, AMD64, ARMv7, ARMv8, RISC-V this should make use of some conditional ALU instructions. On POWER there will be conditional jumps. Unlike the original functions, these will return the correct result even if the difference of the two doc_id does not fit in the int data type. We use static_assert() and offsetof() to check at compilation time that this function is compatible with the rbt_create() calls. fts_query_compare_rank(): As documented, return -1 and not 1 when the rank are equal and r1->doc_id < r2->doc_id. This will affect the result of ha_innobase::ft_read(). fts_ptr2_cmp(), fts_ptr1_ptr2_cmp(): These replace fts_trx_table_cmp(), fts_trx_table_id_cmp(). The fts_savepoint_t::tables will be sorted by dict_table_t rather than dict_table_t::id. There was no correctness bug in the previous comparison predicates. We can avoid one level of unnecessary pointer dereferencing in this way. Actually, fts_savepoint_t is duplicating trx_t::mod_tables. MDEV-33401 was filed about removing it. The added unit test innodb_rbt-t covers both the previous buggy comparison predicate and the revised fts_doc_id_cmp(), using keys which led to finding the bug. Thanks to Shaohua Wang from Alibaba for providing the example and the revised comparison predicate. Reviewed by: Thirunarayanan Balathandayuthapani	2024-02-12 17:01:45 +02:00
Marko Mäkelä	47122a6112	MDEV-33383: Replace fts_doc_id_cmp, ib_vector_sort fts_doc_ids_sort(): Sort an array of doc_id_t by C++11 std::sort(). fts_doc_id_cmp(), ib_vector_sort(): Remove. The comparison was returning an incorrect result when the difference exceeded the int range. Reviewed by: Thirunarayanan Balathandayuthapani	2024-02-12 17:01:17 +02:00
Oleksandr Byelkin	b83c379420	Merge branch '10.5' into 10.6	2023-11-08 15:57:05 +01:00
Oleksandr Byelkin	6cfd2ba397	Merge branch '10.4' into 10.5	2023-11-08 12:59:00 +01:00
Marko Mäkelä	88733282fb	MDEV-32050: Look up tables in the purge coordinator The InnoDB table lookup in purge worker threads is a bottleneck that can degrade a slow shutdown to utilize less than 2 threads. Let us fix that bottleneck by constructing a local lookup table that does not require any synchronization while the undo log records of the current batch are being processed. TRX_PURGE_TABLE_BUCKETS: The initial number of std::unordered_map hash buckets used during a purge batch. This could avoid some resizing and rehashing in trx_purge_attach_undo_recs(). purge_node_t::tables: A lookup table from table ID to an already looked up and locked table. Replaces many fields. trx_purge_attach_undo_recs(): Look up each table in the purge batch only once. trx_purge(): Close all tables and release MDL at the end of the batch. trx_purge_table_open(), trx_purge_table_acquire(): Open a table in purge and acquire a metadata lock on it. This replaces dict_table_open_on_id<true>() and dict_acquire_mdl_shared(). purge_sys_t::close_and_reopen(): In case of an MDL conflict, close and reopen all tables that are covered by the current purge batch. It may be that some of the tables have been dropped meanwhile and can be ignored. This replaces wait_SYS() and wait_FTS(). row_purge_parse_undo_rec(): Make purge_coordinator_task issue a MDL warrant to any purge_worker_task which might need it when innodb_purge_threads>1. purge_node_t::end(): Clear the MDL warrant. Reviewed by: Vladislav Lesin and Vladislav Vaintroub	2023-10-25 10:08:20 +03:00
Thirunarayanan Balathandayuthapani	a2312b6fb2	MDEV-32017 Auto-increment no longer works for explicit FTS_DOC_ID - InnoDB should avoid the sync commit operation when there is nothing in fulltext cache. This is caused by commit `1248fe7277` (MDEV-27582)	2023-10-12 14:48:43 +05:30
Marko Mäkelä	3f1a256234	MDEV-31890: Remove COMPILE_FLAGS The cmake configuration step is single-threaded and already consuming too much time. We should not make it worse by adding invocations like MY_CHECK_CXX_COMPILER_FLAG(). Let us prefer something that works on any supported version of GCC (4.8.5 or later) or clang, as well as recent versions of the Intel C compiler. This replaces commit `1fde785315`	2023-10-11 15:59:56 +03:00
Oleksandr Byelkin	1d74927c58	Merge branch '10.4' into 10.5	2023-04-24 12:43:47 +02:00
Thirunarayanan Balathandayuthapani	2c567b2fa3	MDEV-30996 insert.. select in presence of full text index freezes all other commits at commit time - This patch does the following: git revert --no-commit `673243c893` git revert --no-commit `6c669b9586` git revert --no-commit `bacaf2d4f4` git checkout HEAD mysql-test git revert --no-commit `1fd7d3a9ad` Above command reverts MDEV-29277, MDEV-25581, MDEV-29342. When binlog is enabled, trasaction takes a lot of time to do sync operation on innodb fts table. This leads to block of other transaction commit. To avoid this failure, remove the fulltext sync operation during transaction commit. So reverted MDEV-25581 related patches. We filed MDEV-31105 to avoid the memory consumption problem during fulltext sync operation.	2023-04-24 11:06:56 +05:30
Thirunarayanan Balathandayuthapani	b2bbc66a41	MDEV-24011 InnoDB: Failing assertion: index_cache->words == NULL in fts0fts.cc line 551 This issue happens when race condition happens when DDL and fts optimize thread. DDL adds the new index to fts cache. At the same time, fts optimize thread clears the cache and reinitialize it. Take cache init lock before reinitializing the cache. fts_sync_commit() should take dict_sys mutex to avoid the deadlock with create index.	2023-04-19 17:11:14 +05:30
Marko Mäkelä	0760ad3336	Merge 10.5 into 10.6	2023-03-28 15:25:52 +03:00
Marko Mäkelä	dfa90257f6	MDEV-30936 clang 15.0.7 -fsanitize=memory fails massively handle_slave_io(), handle_slave_sql(), os_thread_exit(): Remove a redundant pthread_exit(nullptr) call, because it would cause SIGSEGV. mysql_print_status(): Add MEM_MAKE_DEFINED() to work around some missing instrumentation around mallinfo2(). que_graph_free_stat_list(): Invoke que_node_get_next(node) before que_graph_free_recursive(node). That is the logical and MSAN_OPTIONS=poison_in_dtor=1 compatible way of freeing memory. ins_node_t::~ins_node_t(): Invoke mem_heap_free(entry_sys_heap). que_graph_free_recursive(): Rely on ins_node_t::~ins_node_t(). fts_t::~fts_t(): Invoke mem_heap_free(fts_heap). fts_free(): Replace with direct calls to fts_t::~fts_t(). The failures in free_root() due to MSAN_OPTIONS=poison_in_dtor=1 will be covered in MDEV-30942.	2023-03-28 11:44:24 +03:00
Thirunarayanan Balathandayuthapani	db245e1140	MDEV-25984 Assertion `max_doc_id > 0' failed in fts_init_doc_id() - rollback_inplace_alter_table() locks the fts internal tables. At the time, insert tries to fetch the doc id from config table, fails to lock the config table and returns doc id as 0. fts_cmp_set_sync_doc_id(): Retry to fetch the doc id again if it encounter DB_LOCK_WAIT_TIMEOUT error	2023-02-22 18:54:00 +05:30
Marko Mäkelä	e441c32a0b	Merge 10.5 into 10.6	2023-01-03 18:13:11 +02:00
Marko Mäkelä	8b9b4ab3f5	Merge 10.4 into 10.5	2023-01-03 17:08:42 +02:00
Marko Mäkelä	fb0808c450	Merge 10.3 into 10.4	2023-01-03 16:10:02 +02:00
Aleksey Midenkov	e056efdd6c	MDEV-25004 Missing row in FTS_DOC_ID_INDEX during DELETE HISTORY 1. In case of system-versioned table add row_end into FTS_DOC_ID index in fts_create_common_tables() and innobase_create_key_defs(). fts_n_uniq() returns 1 or 2 depending on whether the table is system-versioned. After this patch recreate of FTS_DOC_ID index is required for existing system-versioned tables. If you see this message in error log or server warnings: "InnoDB: Table db/t1 contains 2 indexes inside InnoDB, which is different from the number of indexes 1 defined in the MariaDB" use this command to fix the table: ALTER TABLE db.t1 FORCE; 2. Fix duplicate history for secondary unique index like it was done in MDEV-23644 for clustered index (`932ec586aa`). In case of existing history row which conflicts with currently inseted row we check in row_ins_scan_sec_index_for_duplicate() whether that row was inserted as part of current transaction. In that case we indicate with DB_FOREIGN_DUPLICATE_KEY that new history row is not needed and should be silently skipped. 3. Some parts of MDEV-21138 (`7410ff436e`) reverted. Skipping of FTS_DOC_ID index for history rows made problems with purge system. Now this is fixed differently by p.2. 4. wait_all_purged.inc checks that we didn't affect non-history rows so they are deleted and purged correctly. Additional FTS fixes fts_init_get_doc_id(): exclude history rows from max_doc_id calculation. fts_init_get_doc_id() callback is used only for crash recovery. fts_add_doc_by_id(): set max value for row_end field. fts_read_stopword(): stopwords table can be system-versioned too. We now read stopwords only for current data. row_insert_for_mysql(): exclude history rows from doc_id validation. row_merge_read_clustered_index(): exclude history_rows from doc_id processing. fts_load_user_stopword(): for versioned table retrieve row_end field and skip history rows. For non-versioned table we retrieve 'value' field twice (just for uniformity). FTS tests for System Versioning now include maybe_versioning.inc which adds 3 combinations: 'vers' for debug build sets sysvers_force and sysvers_hide. sysvers_force makes every created table system-versioned, sysvers_hide hides WITH SYSTEM VERSIONING for SHOW CREATE. Note: basic.test, stopword.test and versioning.test do not require debug for 'vers' combination. This is controlled by $modify_create_table in maybe_versioning.inc and these tests run WITH SYSTEM VERSIONING explicitly which allows to test 'vers' combination on non-debug builds. 'vers_trx' like 'vers' sets sysvers_force_trx and sysvers_hide. That tests FTS with trx_id-based System Versioning. 'orig' works like before: no System Versioning is added, no debug is required. Upgrade/downgrade test for System Versioning is done by innodb_fts.versioning. It has 2 combinations: 'prepare' makes binaries in std_data (requires old server and OLD_BINDIR). It tests upgrade/downgrade against old server as well. 'upgrade' tests upgrade against binaries in std_data. Cleanups: Removed innodb-fts-stopword.test as it duplicates stopword.test	2022-12-27 00:02:02 +03:00
Marko Mäkelä	24fe53477c	MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks btr_cur_t: Zero-initialize all fields in the default constructor. btr_cur_t::index: Remove; it duplicated page_cur.index. Many functions: Remove arguments that were duplicating page_cur_t::index and page_cur_t::block. page_cur_open_level(), btr_pcur_open_level(): Replaces btr_cur_open_at_index_side() for dict_stats_analyze_index(). At the end, release all latches except the dict_index_t::lock and the buf_page_t::lock on the requested page. dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint() to release all uninteresting page latches. btr_search_guess_on_hash(): Simplify the logic, and invoke mtr_t::rollback_to_savepoint(). We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo. In this way, we can avoid setting mtr_memo_slot_t::object to nullptr and instead just remove garbage from m_memo. mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this in dict_stats_analyze_index(), where we will release page latches and only retain the index->lock in mtr_t::m_memo. mtr_t::release_last_page(): Release the last acquired page latch. Replaces btr_leaf_page_release(). mtr_t::release(const buf_block_t&): Release a single page latch. Used in btr_pcur_move_backward_from_page(). mtr_t::memo_release(): Replaced with mtr_t::release(). mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page. This replaces the double bookkeeping in btr_cur_t::open_leaf(). Reviewed by: Vladislav Lesin	2022-11-17 08:19:01 +02:00
Marko Mäkelä	89ec4b53ac	MDEV-29603: Implement btr_cur_t::open_leaf() btr_cur_t::open_leaf(): Replaces btr_cur_open_at_index_side() for most calls, except dict_stats_analyze_index(), which is the only place where we need to open a page at the non-leaf level. Use btr_block_get() for better error handling. Also, use the enumeration type btr_latch_mode wherever possible. Reviewed by: Vladislav Lesin	2022-11-16 09:43:48 +02:00
Marko Mäkelä	6b2d6a81d4	Cleanup: Remove btr_pcur_t::old_stored The Boolean field btr_pcur_t::old_stored mostly duplicates old_rec. Let us remove it.	2022-11-16 09:36:36 +02:00
Thirunarayanan Balathandayuthapani	db85d8b093	MDEV-29853 Assertion `!strstr(table->name.m_name, "/FTS_") \|\| purge_sys.must_wait_FTS()' failed in trx_t::commit - Failing debug assertion is to indicate whether the purge thread is waiting when fts auxilary table is being dropped. But assertion fails if the table name contains FTS_. So in fts_drop_table(), InnoDB sets the auxilary table flag in transaction modified table list.	2022-11-08 15:27:29 +05:30
Marko Mäkelä	ab0190101b	MDEV-24402: InnoDB CHECK TABLE ... EXTENDED Until now, the attribute EXTENDED of CHECK TABLE was ignored by InnoDB, and InnoDB only counted the records in each index according to the current read view. Unless the attribute QUICK was specified, the function btr_validate_index() would be invoked to validate the B-tree structure (the sibling and child links between index pages). The EXTENDED check will not only count all index records according to the current read view, but also ensure that any delete-marked records in the clustered index are waiting for the purge of history, and that all secondary index records point to a version of the clustered index record that is waiting for the purge of history. In other words, no index may contain orphan records. Normal MVCC reads and the non-EXTENDED version of CHECK TABLE would ignore these orphans. Unpurged records merely result in warnings (at most one per index), not errors, and no indexes will be flagged as corrupted due to such garbage. It will remain possible to SELECT data from such indexes or tables (which will skip such records) or to rebuild the table to reclaim some space. We introduce purge_sys.end_view that will be (almost) a copy of purge_sys.view at the end of a batch of purging committed transaction history. It is not an exact copy, because if the size of a purge batch is limited by innodb_purge_batch_size, some records that purge_sys.view would allow to be purged will be left over for subsequent batches. The purge_sys.view is relevant in the purge of committed transaction history, to determine if records are safe to remove. The new purge_sys.end_view is relevant in MVCC operations and in CHECK TABLE ... EXTENDED. It tells which undo log records are safe to access (have not been discarded at the end of a purge batch). purge_sys.clone_oldest_view<true>(): In trx_lists_init_at_db_start(), clone the oldest read view similar to purge_sys_t::clone_end_view() so that CHECK TABLE ... EXTENDED will not report bogus failures between InnoDB restart and the completed purge of committed transaction history. purge_sys_t::is_purgeable(): Replaces purge_sys_t::changes_visible() in the case that purge_sys.latch will not be held by the caller. Among other things, this guards access to BLOBs. It is not safe to dereference any BLOBs of a delete-marked purgeable record, because they may have already been freed. purge_sys_t::view_guard::view(): Return a reference to purge_sys.view that will be protected by purge_sys.latch, held by purge_sys_t::view_guard. purge_sys_t::end_view_guard::view(): Return a reference to purge_sys.end_view while it is protected by purge_sys.end_latch. Whenever a thread needs to retrieve an older version of a clustered index record, it will hold a page latch on the clustered index page and potentially also on a secondary index page that points to the clustered index page. If these pages contain purgeable records that would be accessed by a currently running purge batch, the progress of the purge batch would be blocked by the page latches. Hence, it is safe to make a copy of purge_sys.end_view while holding an index page latch, and consult the copy of the view to determine whether a record should already have been purged. btr_validate_index(): Remove a redundant check. row_check_index_match(): Check if a secondary index record and a version of a clustered index record match each other. row_check_index(): Replaces row_scan_index_for_mysql(). Count the records in each index directly, duplicating the relevant logic from row_search_mvcc(). Initialize check_table_extended_view for CHECK ... EXTENDED while holding an index leaf page latch. If we encounter an orphan record, the copy of purge_sys.end_view that we make is safe for visibility checks, and trx_undo_get_undo_rec() will check for the safety to access each undo log record. Should that check fail, we should return DB_MISSING_HISTORY to report a corrupted index. The EXTENDED check tries to match each secondary index record with every available clustered index record version, by duplicating the logic of row_vers_build_for_consistent_read() and invoking trx_undo_prev_version_build() directly. Before invoking row_check_index_match() on delete-marked clustered index record versions, we will consult purge_sys.is_purgeable() in order to avoid accessing freed BLOBs. We will always check that the DB_TRX_ID or PAGE_MAX_TRX_ID does not exceed the global maximum. Orphan secondary index records will be flagged only if everything up to PAGE_MAX_TRX_ID has been purged. We warn also about clustered index records whose nonzero DB_TRX_ID should have been reset in purge or rollback. trx_set_rw_mode(): Move an assertion from ReadView::set_creator_trx_id(). trx_undo_prev_version_build(): Remove two debug-only parameters, and return an error code instead of a Boolean. trx_undo_get_undo_rec(): Return a pointer to the undo log record, or nullptr if one cannot be retrieved. Instead of consulting the purge_sys.view, consult the purge_sys.end_view to determine which records can be accessed. trx_undo_get_rec_if_purgeable(): A variant of trx_undo_get_undo_rec() that will consult purge_sys.view instead of purge_sys.end_view. TRX_UNDO_CHECK_PURGEABILITY: A new parameter to trx_undo_prev_version_build(), passed by row_vers_old_has_index_entry() so that purge_sys.view instead of purge_sys.end_view will be consulted to determine whether a secondary index record may be safely purged. row_upd_changes_disowned_external(): Remove. This should be more expensive than briefly latching purge_sys in trx_undo_prev_version_build() (which may make use of transactional memory). row_sel_reset_old_vers_heap(): New function, split from row_sel_build_prev_vers_for_mysql(). row_sel_build_prev_vers_for_mysql(): Reorder some parameters to simplify the call to row_sel_reset_old_vers_heap(). row_search_for_mysql(): Replaced with direct calls to row_search_mvcc(). sel_node_get_nth_plan(): Define inline in row0sel.h open_step(): Define at the call site, in simplified form. sel_node_reset_cursor(): Merged with the only caller open_step(). --- ReadViewBase::check_trx_id_sanity(): Remove. Let us handle "future" DB_TRX_ID in a more meaningful way: row_sel_clust_sees(): Return DB_SUCCESS if the record is visible, DB_SUCCESS_LOCKED_REC if it is invisible, and DB_CORRUPTION if the DB_TRX_ID is in the future. row_undo_mod_must_purge(), row_undo_mod_clust(): Silently ignore corrupted DB_TRX_ID. We are in ROLLBACK, and we should have noticed that corruption when we were about to modify the record in the first place (leading us to refuse the operation). row_vers_build_for_consistent_read(): Return DB_CORRUPTION if DB_TRX_ID is in the future. Tested by: Matthias Leich Reviewed by: Vladislav Lesin	2022-10-21 10:02:54 +03:00
Marko Mäkelä	6dc157f8a6	Merge 10.5 into 10.6	2022-10-06 09:22:39 +03:00
Marko Mäkelä	de078e060e	Merge 10.4 into 10.5	2022-10-06 08:29:56 +03:00
Marko Mäkelä	65d0c57c1a	Merge 10.3 into 10.4	2022-10-05 20:30:57 +03:00
Vlad Lesin	c0eda62aec	MDEV-27927 row_sel_try_search_shortcut_for_mysql() does not latch a page, violating read view isolation btr_search_guess_on_hash() would only acquire an index page latch if it is invoked with ahi_latch=NULL. If it's invoked from row_sel_try_search_shortcut_for_mysql() with ahi_latch!=NULL, a page will not be latched, and row_search_mvcc() will get a pointer to the record, which can be changed by some other transaction before the record was stored in result buffer with row_sel_store_mysql_rec() call. ahi_latch argument of btr_cur_search_to_nth_level_func() and btr_pcur_open_with_no_init_func() is used only for row_sel_try_search_shortcut_for_mysql(). btr_cur_search_to_nth_level_func(..., ahi_latch !=0, ...) is invoked only from btr_pcur_open_with_no_init_func(..., ahi_latch !=0, ...), which, in turns, is invoked only from row_sel_try_search_shortcut_for_mysql(). I suppose that separate case with ahi_latch!=0 was intentionally implemented to protect row_sel_store_mysql_rec() call in row_search_mvcc() just after row_sel_try_search_shortcut_for_mysql() call. After the ahi_latch was moved from row_seach_mvcc() to row_sel_try_search_shortcut_for_mysql(), there is no need in it at all if btr_search_guess_on_hash() latches a page unconditionally. And if btr_search_guess_on_hash() latched the page, any access to the record in row_sel_try_search_shortcut_for_mysql() after btr_pcur_open_with_no_init() call will be protected with the page latch. The fix is to remove ahi_latch argument from btr_pcur_open_with_no_init_func(), btr_cur_search_to_nth_level_func() and btr_search_guess_on_hash(). There will not be test, as to test it we need to freeze some SELECT execution in the point between row_sel_try_search_shortcut_for_mysql() and row_sel_store_mysql_rec() calls in row_search_mvcc(), and to change the record in some other transaction to let row_sel_store_mysql_rec() to store changed record in result buffer. Buf we can't do this with the fix, as the page will be latched in btr_search_guess_on_hash() call.	2022-10-05 17:35:21 +03:00
Sergei Golubchik	194cc36805	Merge branch '10.5' into 10.6	2022-09-30 12:29:24 +02:00
Marko Mäkelä	e29fb95614	Cleanup: Remove innobase_destroy_background_thd() We do not need a non-inline wrapper for the function destroy_background_thd().	2022-09-30 08:25:00 +03:00
Thirunarayanan Balathandayuthapani	673243c893	MDEV-29277 On error, fts_sync_table() fails to release a table handle fts_sync_commit() fails to release the auxiliary table handle when it encounters error. This issue is caused by commit 1fd7d3a9adac50de37e40e92188077e3515de505(MDEV-25581). fts_cache_clear() releases the auxiliary table handles. MDEV-25581's patch clear the cache only if fts_sync_commit was successful.	2022-09-23 11:42:00 +05:30
Marko Mäkelä	bacaf2d4f4	MDEV-29342 Assertion failure in file que0que.cc line 728 Additional fixes for 10.6: fts_sync_commit(): Release cache->lock also on rollback. fts_sync_write_words(): Avoid a crash if an error occurs, by stopping at the first error. fts_add_doc_by_id(): Sync the doc id only after adding the doc id to the cache.	2022-09-07 12:57:53 +03:00
Marko Mäkelä	1985204044	Merge 10.5 into 10.6	2022-09-07 08:47:20 +03:00
Marko Mäkelä	38d36b59f9	Merge 10.4 into 10.5	2022-09-07 08:26:21 +03:00
Marko Mäkelä	c7ba237793	Merge 10.3 into 10.4	2022-09-07 08:08:59 +03:00
Thirunarayanan Balathandayuthapani	ac49b7a845	MDEV-29342 Assertion failure in file que0que.cc line 728 - During shutdown, InnoDB fts fails to update synced doc id when there is only one doc id about to sync. While starting the server, InnoDB fetches the already synced doc id from config table. In the subsequent sync operation, InnoDB fails with DB_DUPLICATE_KEY error.	2022-09-06 17:23:31 +05:30
Oleksandr Byelkin	ee620a7416	Merge branch '10.5' into 10.6	2022-08-04 16:58:42 +02:00
Oleksandr Byelkin	1e71ea806b	Merge branch '10.4' into 10.5	2022-08-04 08:30:03 +02:00
Oleksandr Byelkin	e509065247	Merge branch '10.3' into 10.4	2022-08-03 19:51:44 +02:00
Thirunarayanan Balathandayuthapani	f9ec9b6abb	MDEV-27282 InnoDB: Failing assertion: !query->intersection - query->intersection fails to get freed if the query exceeds innodb_ft_result_cache_limit - errors from init_ftfuncs were not propogated by delete command This is taken from percona/percona-server@ef2c0bcb9a	2022-08-03 20:35:12 +05:30
Thirunarayanan Balathandayuthapani	1fd7d3a9ad	MDEV-25581 Allow user thread to do InnoDB fts cache sync Problem: ======== InnoDB FTS requesting the fts sync of the table once the fts cache size reaches 1/10 of innodb_ft_cache_size. But fts_sync() releases cache lock when writing the word. By doing this, InnoDB insert thread increases the innodb fts cache memory and SYNC operation will take more time to complete. Solution: ========= Remove the fts sync operation(FTS_MSG_SYNC_TABLE) from the fts optimize background thread. Instead of that, allow user thread to sync the InnoDB fts cache when the cache size exceeds 512 kb. User thread holds cache lock while doing cache syncing, it make sure that other threads doesn't add the docs into the cache. Removed FTS_MSG_SYNC_TABLE and its related function because we do remove the FTS_MSG_SYNC_TABLE message itself. Removed fts_sync_index_check() and all related function because other threads doesn't add while cache operation going on.	2022-06-10 13:39:07 +05:30
Marko Mäkelä	0b47c126e3	MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit `177d8b0c12` is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.	2022-06-06 14:03:22 +03:00
Marko Mäkelä	bc113b873f	MDEV-28465 Some calls to btr_pcur_close() are unnecessary The function btr_pcur_close() is being invoked on local variables even when no cleanup needs to be done. In particular, for B-tree indexes (not SPATIAL INDEX), unless btr_pcur_store_position() was invoked in the past, there is no need to invoke btr_pcur_close(). On purge and rollback, we will retain btr_pcur_close(&pcur) because otherwise some ./mtr --suite=innodb_gis tests would leak memory.	2022-05-03 17:49:26 +03:00
Thirunarayanan Balathandayuthapani	27b0030b9d	MDEV-27783 InnoDB: Failing assertion: table->get_ref_count() == 0 upon ALTER TABLE ... MODIFY COLUMN - There is a race condition occurs between purge thread and DDL. So purge thread can increment n_ref_count even after DDL does purge_sys_t::stop_FTS(). - dict_table_open_on_id for purge thread should check purge_sys.must_wait_FTS() before acquring the table. - purge_sys.stop_FTS() does acquire dict_sys.latch for setting the purge system flag and check table ref count on auxilary tables.	2022-04-07 10:32:51 +05:30
Marko Mäkelä	ff99413804	MDEV-25975: Merge 10.5 into 10.6	2022-04-06 12:45:14 +03:00
Marko Mäkelä	5d8dcfd86c	MDEV-25975: Merge 10.4 into 10.5	2022-04-06 10:30:49 +03:00
Marko Mäkelä	cbdf62ae90	MDEV-25975 merge fixup	2022-04-06 10:13:21 +03:00
Marko Mäkelä	d172df9913	MDEV-25975: Merge 10.3 into 10.4	2022-04-06 09:18:38 +03:00
Marko Mäkelä	e9735a8185	MDEV-25975 innodb_disallow_writes causes shutdown to hang We will remove the parameter innodb_disallow_writes because it is badly designed and implemented. The parameter was never allowed at startup. It was only internally used by Galera snapshot transfer. If a user executed SET GLOBAL innodb_disallow_writes=ON; the server could hang even on subsequent read operations. During Galera snapshot transfer, we will block writes to implement an rsync friendly snapshot, as follows: sst_flush_tables() will acquire a global lock by executing FLUSH TABLES WITH READ LOCK, which will block any writes at the high level. sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true), will suspend or disable InnoDB background tasks or threads that could initiate writes. As part of this, log_make_checkpoint() will be invoked to ensure that anything in the InnoDB buf_pool.flush_list will be written to the data files. This has the nice side effect that the Galera joiner will avoid crash recovery. The changes to sql/wsrep.cc and to the tests are based on a prototype that was developed by Jan Lindström. Reviewed by: Jan Lindström	2022-04-06 08:06:49 +03:00
Marko Mäkelä	b242c3141f	Merge 10.5 into 10.6	2022-03-29 16:16:21 +03:00
Marko Mäkelä	d62b0368ca	Merge 10.4 into 10.5	2022-03-29 12:59:18 +03:00

1 2 3 4 5 ...

433 commits