mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-04-13 02:35:32 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	cc1363071a	MDEV-34455 innodb_read_only=ON fails to imply innodb_doublewrite=OFF innodb_doublewrite_update(): Disallow any change if srv_read_only_mode holds, that is, the server was started with innodb_read_only=ON or innodb_force_recovery=6. This fixes up commit `1122ac978e` (MDEV-33545).	2024-06-26 08:23:54 +03:00
Marko Mäkelä	fa8a46eb68	MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool By design, InnoDB has always hung when permanently running out of buffer pool, for example when several threads are waiting to allocate a block, and all of the buffer pool is buffer-fixed by the active threads. The hang that we are fixing here occurs when the buffer pool is only temporarily running out and the situation could be rescued by writing out some dirty pages or evicting some clean pages. buf_LRU_get_free_block(): Simplify the way how we wait for the buf_flush_page_cleaner thread. This fixes occasional hangs of the test encryption.innochecksum that were introduced by commit `a55b951e60` (MDEV-26827). To play it safe, we use a timed wait when waiting for the buf_flush_page_cleaner() thread to perform its job. Should that thread get stuck, we will invoke buf_pool.LRU_warn() in order to display a message that pages could not be freed, and keep trying to wake up the buf_flush_page_cleaner() thread. The INFORMATION_SCHEMA.INNODB_METRICS counters buffer_LRU_single_flush_failure_count and buffer_LRU_get_free_waits will be removed. The latter is represented by buffer_pool_wait_free. Also removed will be the message "InnoDB: Difficult to find free blocks in the buffer pool" because in `d34479dc66` we introduced a more precise message "InnoDB: Could not free any blocks in the buffer pool" in the buf_flush_page_cleaner thread. buf_pool_t::LRU_warn(): Issue the warning message that we could not free any blocks in the buffer pool. This may also be invoked by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears to be stuck. buf_pool_t::n_flush_dec(): Remove. buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec(). buf_flush_LRU_list_batch(): Increment the eviction counter for blocks of temporary, discarded or dropped tablespaces. buf_flush_LRU(): Make static, and remove the constant parameter evict=false. The only caller will be the buf_flush_page_cleaner() thread. IORequest::is_LRU(): Remove. The only case of evicting pages on write completion will be when we are writing out pages of the temporary tablespace. Those pages are not in buf_pool.flush_list, only in buf_pool.LRU. buf_page_t::flush(): Remove the parameter evict. buf_page_t::write_complete(): Change the parameter "bool temporary" to "bool persistent" and add a parameter for an already read state(). Reviewed by: Debarun Banerjee	2024-03-22 14:17:39 +02:00
Marko Mäkelä	96a3b11d13	Merge 10.5 into 10.6	2023-02-14 15:23:23 +02:00
Thirunarayanan Balathandayuthapani	3eea2e8e10	MDEV-30551 InnoDB recovery hangs when buffer pool ran out of memory - During non-last batch of multi-batch recovery, InnoDB holds log_sys.mutex and preallocates the block which may intiate page flush, which may initiate log flush, which requires log_sys.mutex to acquire again. This leads to assert failure. So InnoDB recovery should release log_sys.mutex before preallocating the block.	2023-02-14 14:35:35 +05:30
Marko Mäkelä	62a20f8047	Merge 10.5 into 10.6	2022-07-01 15:24:50 +03:00
Marko Mäkelä	b546913ba2	Valgrind: Disable tests that would often time out Starting with 10.5, InnoDB crash recovery tests seem to time out more easily under Valgrind, which emulates multiple threads by interleaving them in a single operating system thread. These tests will still be covered by AddressSanitizer and MemorySanitizer.	2022-07-01 14:42:13 +03:00
Monty	7762ee5dbe	MDEV-25180 Atomic ALTER TABLE MDEV-25604 Atomic DDL: Binlog event written upon recovery does not have default database The purpose of this task is to ensure that ALTER TABLE is atomic even if the MariaDB server would be killed at any point of the alter table. This means that either the ALTER TABLE succeeds (including that triggers, the status tables and the binary log are updated) or things should be reverted to their original state. If the server crashes before the new version is fully up to date and commited, it will revert to the original table and remove all temporary files and tables. If the new version is commited, crash recovery will use the new version, and update triggers, the status tables and the binary log. The one execption is ALTER TABLE .. RENAME .. where no changes are done to table definition. This one will work as RENAME and roll back unless the whole statement completed, including updating the binary log (if enabled). Other changes: - Added handlerton->check_version() function to allow the ddl recovery code to check, in case of inplace alter table, if the table in the storage engine is of the new or old version. - Added handler->table_version() so that an engine can report the current version of the table. This should be changed each time the table definition changes. - Added ha_signal_ddl_recovery_done() and handlerton::signal_ddl_recovery_done() to inform all handlers when ddl recovery has been done. (Needed by InnoDB). - Added handlerton call inplace_alter_table_committed, to signal engine that ddl_log has been closed for the alter table query. - Added new handerton flag HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT to signal when we should call hton->notify_tabledef_changed() during mysql_inplace_alter_table. This was required as MyRocks and InnoDB needed the call at different times. - Added function server_uuid_value() to be able to generate a temporary xid when ddl recovery writes the query to the binary log. This is needed to be able to handle crashes during ddl log recovery. - Moved freeing of the frm definition to end of mysql_alter_table() to remove duplicate code and have a common exit strategy. ------- InnoDB part of atomic ALTER TABLE (Implemented by Marko Mäkelä) innodb_check_version(): Compare the saved dict_table_t::def_trx_id to determine whether an ALTER TABLE operation was committed. We must correctly recover dict_table_t::def_trx_id for this to work. Before purge removes any trace of DB_TRX_ID from system tables, it will make an effort to load the user table into the cache, so that the dict_table_t::def_trx_id can be recovered. ha_innobase::table_version(): return garbage, or the trx_id that would be used for committing an ALTER TABLE operation. In InnoDB, table names starting with #sql-ib will remain special: they will be dropped on startup. This may be revisited later in MDEV-18518 when we implement proper undo logging and rollback for creating or dropping multiple tables in a transaction. Table names starting with #sql will retain some special meaning: dict_table_t::parse_name() will not consider such names for MDL acquisition, and dict_table_rename_in_cache() will treat such names specially when handling FOREIGN KEY constraints. Simplify InnoDB DROP INDEX. Prevent purge wakeup To ensure that dict_table_t::def_trx_id will be recovered correctly in case the server is killed before ddl_log_complete(), we will block the purge of any history in SYS_TABLES, SYS_INDEXES, SYS_COLUMNS between ha_innobase::commit_inplace_alter_table(commit=true) (purge_sys.stop_SYS()) and purge_sys.resume_SYS(). The completion callback purge_sys.resume_SYS() must be between ddl_log_complete() and MDL release. -------- MyRocks support for atomic ALTER TABLE (Implemented by Sergui Petrunia) Implement these SE API functions: - ha_rocksdb::table_version() - hton->check_version = rocksdb_check_versionMyRocks data dictionary now stores table version for each table. (Absence of table version record is interpreted as table_version=0, that is, which means no upgrade changes are needed) - For inplace alter table of a partitioned table, call the underlying handlerton when checking if the table is ok. This assumes that the partition engine commits all changes at once.	2021-05-19 22:54:13 +02:00
Monty	f40ca33bbc	Make all #sql temporary table names uniform The reason for this is to make all temporary file names similar and also to be able to figure out from where a #sql-xxx name orginates. New format is for most cases: '#sql-name-current_pid-thread_id[-increment]' Where name is one of subselect, alter, exchange, temptable or backup The exceptions are: ALTER PARTITION shadow files: '#sql-shadow-thread_id-'original_table_name' Names used with temp pool: '#sql-name-current_pid-pool_number'	2020-04-19 17:33:51 +03:00
Monty	97dd057702	Fixed issues when running mtr with --valgrind - Note that some issues was also fixed in 10.2 and 10.4. I also fixed them here to be able to continue with making 10.5 valgrind safe again - Disable connection threads warnings when doing shutdown	2019-08-23 22:03:54 +02:00
Marko Mäkelä	f25e9aa4ba	MDEV-20310: Make InnoDB crash tests Valgrind-friendly Use DEBUG_SYNC to hang the execution at the interesting point, and then kill and restart the server externally. This will work also with Valgrind. DBUG_SUICIDE() causes Valgrind to hang, and it could also cause uninteresting reports about memory leaks. While we are at it, let us clean up innodb.innodb_bulk_create_index_debug so that it will actually test the desired functionality also in future versions (with instant ADD COLUMN and DROP COLUMN) and avoid some unnecessary restarts. We are adding two DEBUG_SYNC points for ALTER TABLE, because there were none that would be executed right before ha_commit_trans().	2019-08-13 13:32:27 +03:00
Marko Mäkelä	73ed19e44f	MDEV-14585 Automatically remove #sql- tables in InnoDB dictionary during recovery This is a backport of the following commits: commit `b4165985c9` commit `69e88de0fe` commit `40f4525f43` commit `656f66def2` Now that MDEV-14717 made RENAME TABLE crash-safe within InnoDB, it should be safe to drop the #sql- tables within InnoDB during crash recovery. These tables can be one of two things: (1) #sql-ib related to deferred DROP TABLE (follow-up to MDEV-13407) or to table-rebuilding ALTER TABLE...ALGORITHM=INPLACE (since MDEV-14378, only related to the intermediate copy of a table), (2) #sql- related to the intermediate copy of a table during ALTER TABLE...ALGORITHM=COPY We will not drop tables whose name starts with #sql2, because the server can be killed during an ALGORITHM=COPY operation at a point where the original table was renamed to #sql2 but the finished intermediate copy was not yet renamed from #sql- to the original table name. If an old version of MariaDB Server before 10.2.13 (MDEV-11415) was killed while ALTER TABLE...ALGORITHM=COPY was in progress, after recovery there could be undo log records for some records that were inserted into an intermediate copy of the table. Due to these undo log records, InnoDB would resurrect locks at recovery, and the intermediate table would be locked while we are trying to drop it. This would cause a call to row_rename_table_for_mysql(), either from row_mysql_drop_garbage_tables() or from the rollback of a RENAME operation that was part of the ALTER TABLE. row_rename_table_for_mysql(): Do not attempt to parse FOREIGN KEY constraints when renaming from #sql-something to #sql-something-else, because it does not make any sense. row_drop_table_for_mysql(): When deferring DROP TABLE due to locks, do not rename the table if its name already starts with the #sql- prefix, which is what row_mysql_drop_garbage_tables() uses. Previously, the too strict prefix #sql-ib was used, and some tables were renamed unnecessarily.	2018-09-07 22:10:03 +03:00
Marko Mäkelä	0ba6aaf030	MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend a lot of time rolling back writes to the intermediate copy of the table. To reduce the amount of busy work done, a work-around was introduced in commit `fd069e2bb3` in MySQL 4.1.8 and 5.0.2, to commit the transaction after every 10,000 inserted rows. A proper fix would have been to disable the undo logging altogether and to simply drop the intermediate copy of the table on subsequent server startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585. In MariaDB 10.2, the intermediate copy of the table would be left behind with a name starting with the string #sql. This is a backport of a bug fix from MySQL 8.0.0 to MariaDB, contributed by jixianliang <271365745@qq.com>. Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation InnoDB must for now keep the undo logging enabled, so that the latest row can be rolled back in case of an error. In Galera cluster, the LOAD DATA statement will retain the existing behaviour and commit the transaction after every 10,000 rows if the parameter wsrep_load_data_splitting=ON is set. The logic to do so (the wsrep_load_data_split() function and the call handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work by Ji Xianliang and Marko Mäkelä. The original fix: Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com> Date: Wed Dec 2 16:09:15 2015 +0530 Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY Problem: During ALTER TABLE, we commit and restart the transaction for every 10,000 rows, so that the rollback after recovery would not take so long. Fix: Suppress the undo logging during copy alter operation. If fts_index is present then insert directly into fts auxiliary table rather than doing at commit time. ha_innobase::num_write_row: Remove the variable. ha_innobase::write_row(): Remove the hack for committing every 10000 rows. row_lock_table_for_mysql(): Remove the extra 2 parameters. lock_get_src_table(), lock_is_table_exclusive(): Remove. Reviewed-by: Marko Mäkelä <marko.makela@oracle.com> Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com> Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>	2018-01-30 20:24:23 +02:00

12 commits