mariadb/mysql-test/suite/innodb/t/log_file.test

232 lines
8.7 KiB
Text
Raw Normal View History

--echo # Testcase for the following bugs
--echo # Bug#16691130 - ASSERT WHEN INNODB_LOG_GROUP_HOME_DIR DOES NOT EXIST
--echo # Bug#16418661 - CHANGING NAME IN FOR INNODB_DATA_FILE_PATH SHOULD NOT SUCCEED WITH LOG FILES
--source include/have_innodb.inc
--source include/no_valgrind_without_big.inc
--disable_query_log
call mtr.add_suppression("InnoDB: Could not create undo tablespace.*undo002");
call mtr.add_suppression("InnoDB: InnoDB Database creation was aborted");
call mtr.add_suppression("Plugin 'InnoDB' init function returned error");
call mtr.add_suppression("Plugin 'InnoDB' registration as a STORAGE ENGINE failed");
call mtr.add_suppression("InnoDB: Operating system error number \d+ in a file operation");
call mtr.add_suppression("InnoDB: The error means the system cannot find the path specified");
MDEV-14425 Improve the redo log for concurrency The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.
2022-01-21 16:03:47 +02:00
call mtr.add_suppression("InnoDB: File /path/to/non-existent/ib_logfile101 was not found");
call mtr.add_suppression("InnoDB: Cannot create .path.to.non-existent.ib_logfile101");
call mtr.add_suppression("InnoDB: The data file '.*ibdata1' was not found but one of the other data files '.*ibdata2' exists");
call mtr.add_suppression("InnoDB: Tablespace size stored in header is \d+ pages, but the sum of data file sizes is \d+ pages");
call mtr.add_suppression("InnoDB: Cannot start InnoDB. The tail of the system tablespace is missing");
call mtr.add_suppression("InnoDB: undo tablespace '.*undo001' exists\. Creating system tablespace with existing undo tablespaces is not supported\. Please delete all undo tablespaces before creating new system tablespace\.");
call mtr.add_suppression("");
call mtr.add_suppression("");
--enable_query_log
let bugdir= $MYSQLTEST_VARDIR/tmp/log_file;
--mkdir $bugdir
let SEARCH_FILE= $MYSQLTEST_VARDIR/log/mysqld.1.err;
let $check_no_innodb=SELECT * FROM INFORMATION_SCHEMA.ENGINES
WHERE engine = 'innodb'
AND support IN ('YES', 'DEFAULT', 'ENABLED');
let $check_yes_innodb=SELECT COUNT(*) `1` FROM INFORMATION_SCHEMA.ENGINES
WHERE engine='innodb'
AND support IN ('YES', 'DEFAULT', 'ENABLED');
2020-01-12 02:05:28 +07:00
--let $ibp=--innodb-log-group-home-dir=$bugdir
--let $ibp=$ibp --innodb-data-home-dir=$bugdir --innodb-undo-directory=$bugdir
--let $ibp=$ibp --innodb-undo-logs=20 --innodb-undo-tablespaces=3
--let $ibp=$ibp --innodb-data-file-path=ibdata1:16M;ibdata2:10M:autoextend
--echo # Start mysqld without the possibility to create innodb_undo_tablespaces
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
2017-03-30 13:11:34 +03:00
--let $restart_parameters= $ibp
--mkdir $bugdir/undo002
--source include/restart_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
let SEARCH_PATTERN=\[ERROR\] InnoDB: Could not create undo tablespace '.*undo002';
--source include/search_pattern_in_file.inc
2020-01-12 02:05:28 +07:00
--echo # Remove undo001,undo002,ibdata1,ibdata2,ib_logfile101
--remove_file $bugdir/undo001
--rmdir $bugdir/undo002
--remove_file $bugdir/ibdata1
--remove_file $bugdir/ibdata2
--remove_file $bugdir/ib_logfile101
--list_files $bugdir
--echo # Start mysqld with non existent innodb_log_group_home_dir
--let $restart_parameters= $ibp --innodb_log_group_home_dir=/path/to/non-existent/
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
MDEV-14425 Improve the redo log for concurrency The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.
2022-01-21 16:03:47 +02:00
let SEARCH_PATTERN=Cannot create /path/to/non-existent/ib_logfile101;
--source include/search_pattern_in_file.inc
--list_files $bugdir
--echo # Successfully let InnoDB create tablespaces
--let $restart_parameters= $ibp
--source include/start_mysqld.inc
eval $check_yes_innodb;
--source include/shutdown_mysqld.inc
--echo # Backup tmp/logfile/*
--copy_file $bugdir/ibdata1 $bugdir/bak_ibdata1
--copy_file $bugdir/ibdata2 $bugdir/bak_ibdata2
--copy_file $bugdir/ib_logfile0 $bugdir/bak_ib_logfile0
--copy_file $bugdir/undo001 $bugdir/bak_undo001
--copy_file $bugdir/undo002 $bugdir/bak_undo002
--copy_file $bugdir/undo003 $bugdir/bak_undo003
--echo # 1. With ibdata2, Without ibdata1
--remove_file $bugdir/ibdata1
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
let SEARCH_PATTERN=The data file '.*ibdata1' was not found but one of the other data files '.*ibdata2' exists;
--source include/search_pattern_in_file.inc
# clean up & Restore
--source ../include/log_file_cleanup.inc
--echo # 2. With ibdata1, without ibdata2
--remove_file $bugdir/ibdata2
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
let SEARCH_PATTERN=InnoDB: Tablespace size stored in header is \d+ pages, but the sum of data file sizes is \d+ pages;
--source include/search_pattern_in_file.inc
let SEARCH_PATTERN=InnoDB: Cannot start InnoDB. The tail of the system tablespace is missing;
--source include/search_pattern_in_file.inc
# clean up & Restore
--source ../include/log_file_cleanup.inc
--echo # 3. Without ibdata1 & ibdata2
--remove_file $bugdir/ibdata1
--remove_file $bugdir/ibdata2
--list_files $bugdir
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
let SEARCH_PATTERN=InnoDB: undo tablespace .*undo001.* exists\. Creating system tablespace with existing undo tablespaces is not supported\. Please delete all undo tablespaces before creating new system tablespace\.;
--source include/search_pattern_in_file.inc
# clean up & Restore
--source ../include/log_file_cleanup.inc
--echo # 4. Without ibdata*, ib_logfile* and with undo00*
--remove_files_wildcard $bugdir ibdata*
--remove_files_wildcard $bugdir ib_logfile*
--list_files $bugdir
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
# clean up & Restore
--source ../include/log_file_cleanup.inc
--echo # 5. Without ibdata*,ib_logfile* files & Without undo002
--remove_files_wildcard $bugdir ibdata*
--remove_files_wildcard $bugdir ib_logfile*
--remove_file $bugdir/undo002
--list_files $bugdir
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
# clean up & Restore
--source ../include/log_file_cleanup.inc
--echo # 6. Without ibdata*,ib_logfile* files & Without undo001, undo002
# and with undo003
--remove_files_wildcard $bugdir ibdata*
--remove_files_wildcard $bugdir ib_logfile*
--remove_file $bugdir/undo001
--remove_file $bugdir/undo002
--list_files $bugdir
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
let SEARCH_PATTERN=undo tablespace .*undo003.* exists\. Creating system tablespace with existing undo tablespaces is not supported\. Please delete all undo tablespaces before creating new system tablespace\.;
--source include/search_pattern_in_file.inc
# clean up & Restore
--source ../include/log_file_cleanup.inc
--echo # 7. With ibdata files & Without undo002
--remove_file $bugdir/undo002
--list_files $bugdir
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
MDEV-19229 Allow innodb_undo_tablespaces to be changed after database creation trx_sys_t::undo_log_nonempty: Set to true if there are undo logs to rollback and purge. The algorithm for re-creating the undo tablespace when trx_sys_t::undo_log_nonempty is disabled: 1) trx_sys_t::reset_page(): Reset the TRX_SYS page and assign all rollback segment slots from 1..127 to FIL_NULL 2) Free the rollback segment header page of system tablespace for the slots 1..127 3) Update the binlog and WSREP information in system tablespace rollback segment header Step (1), (2) and Step (3) should happen atomically within a single mini-transaction. 4) srv_undo_delete_old_tablespaces(): Delete the old undo tablespaces present in the undo log directory 5) Make checkpoint to get rid of old undo log tablespaces redo logs 6) Assign new start space id for the undo log tablespaces 7) Re-create the specified undo log tablespaces. InnoDB uses same mtr for this one and step (6) 8) Make checkpoint again, so that server or mariabackup can read the undo log tablespace page0 before applying the redo logs srv_undo_tablespaces_reinit(): Recreate the undo log tablespaces. It does reset trx_sys page, delete the old undo tablespaces, update the binlog offset, write set replication checkpoint in system rollback segment page trx_rseg_update_binlog_offset(): Added 2 new parameters to pass binlog file name and binlog offset trx_rseg_array_init(): Return error if the rollback segment slot points to non-existent tablespace srv_undo_tablespaces_init(): Added new parameter mtr to initialize all undo tablespaces trx_assign_rseg_low(): Allow the transaction to use the rollback segment slots(1..127) even if InnoDB failed to change to the requested innodb_undo_tablespaces=0 srv_start(): Override the user specified value of innodb_undo_tablespaces variable with already existing actual undo tablespaces wf_incremental_process(): Detects whether TRX_SYS page has been modified since last backup. If it is then incremental backup fails and throws the information about taking full backup again xb_assign_undo_space_start(): Removed the function. Because undo001 has first undo space id value in page0 Added test case to test the scenario during startup and mariabackup incremental process too. Reviewed-by : Marko Mäkelä Tested-by : Matthias Leich
2022-10-24 20:46:43 +05:30
let SEARCH_PATTERN=InnoDB: Failed to open the undo tablespace;
--source include/search_pattern_in_file.inc
# clean up & Restore
--source ../include/log_file_cleanup.inc
--echo # 8. With ibdata files & Without undo001, undo002
--remove_file $bugdir/undo001
--remove_file $bugdir/undo002
--list_files $bugdir
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
MDEV-19229 Allow innodb_undo_tablespaces to be changed after database creation trx_sys_t::undo_log_nonempty: Set to true if there are undo logs to rollback and purge. The algorithm for re-creating the undo tablespace when trx_sys_t::undo_log_nonempty is disabled: 1) trx_sys_t::reset_page(): Reset the TRX_SYS page and assign all rollback segment slots from 1..127 to FIL_NULL 2) Free the rollback segment header page of system tablespace for the slots 1..127 3) Update the binlog and WSREP information in system tablespace rollback segment header Step (1), (2) and Step (3) should happen atomically within a single mini-transaction. 4) srv_undo_delete_old_tablespaces(): Delete the old undo tablespaces present in the undo log directory 5) Make checkpoint to get rid of old undo log tablespaces redo logs 6) Assign new start space id for the undo log tablespaces 7) Re-create the specified undo log tablespaces. InnoDB uses same mtr for this one and step (6) 8) Make checkpoint again, so that server or mariabackup can read the undo log tablespace page0 before applying the redo logs srv_undo_tablespaces_reinit(): Recreate the undo log tablespaces. It does reset trx_sys page, delete the old undo tablespaces, update the binlog offset, write set replication checkpoint in system rollback segment page trx_rseg_update_binlog_offset(): Added 2 new parameters to pass binlog file name and binlog offset trx_rseg_array_init(): Return error if the rollback segment slot points to non-existent tablespace srv_undo_tablespaces_init(): Added new parameter mtr to initialize all undo tablespaces trx_assign_rseg_low(): Allow the transaction to use the rollback segment slots(1..127) even if InnoDB failed to change to the requested innodb_undo_tablespaces=0 srv_start(): Override the user specified value of innodb_undo_tablespaces variable with already existing actual undo tablespaces wf_incremental_process(): Detects whether TRX_SYS page has been modified since last backup. If it is then incremental backup fails and throws the information about taking full backup again xb_assign_undo_space_start(): Removed the function. Because undo001 has first undo space id value in page0 Added test case to test the scenario during startup and mariabackup incremental process too. Reviewed-by : Marko Mäkelä Tested-by : Matthias Leich
2022-10-24 20:46:43 +05:30
let SEARCH_PATTERN=InnoDB: Failed to open the undo tablespace;
--source include/search_pattern_in_file.inc
# clean up & Restore
--source ../include/log_file_cleanup.inc
2020-01-12 02:05:28 +07:00
--echo # 9. Without ibdata*, without undo*
--remove_files_wildcard $bugdir ibdata*
--remove_files_wildcard $bugdir undo00*
--list_files $bugdir
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
2020-01-12 02:05:28 +07:00
let SEARCH_PATTERN=redo log file .*ib_logfile0.* exists\. Creating system tablespace with existing redo log file is not recommended\. Please delete redo log file before creating new system tablespace\.;
--source include/search_pattern_in_file.inc
# clean up & Restore
--source ../include/log_file_cleanup.inc
--echo # 10. With ibdata*, without ib_logfile0
--remove_file $bugdir/ib_logfile0
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
--source ../include/log_file_cleanup.inc
2020-01-12 02:05:28 +07:00
--echo # 11. With ibdata*
--list_files $bugdir
--source include/start_mysqld.inc
eval $check_yes_innodb;
--source include/shutdown_mysqld.inc
MDEV-33894: Resurrect innodb_log_write_ahead_size As part of commit 685d958e38b825ad9829be311f26729cccf37c46 (MDEV-14425) the parameter innodb_log_write_ahead_size was removed, because it was thought that determining the physical block size would be a sufficient replacement. However, we can only determine the physical block size on Linux or Microsoft Windows. On some file systems, the physical block size is not relevant. For example, XFS uses a block size of 4096 bytes even if the underlying block size may be smaller. On Linux, we failed to determine the physical block size if innodb_log_file_buffered=OFF was not requested or possible. This will be fixed. log_sys.write_size: The value of the reintroduced parameter innodb_log_write_ahead_size. To keep it simple, this is read-only and a power of two between 512 and 4096 bytes, so that the previous alignment guarantees are fulfilled. This will replace the previous log_sys.get_block_size(). log_sys.block_size, log_t::get_block_size(): Remove. log_t::set_block_size(): Ensure that write_size will not be less than the physical block size. There is no point to invoke this function with 512 or less, because that is the minimum value of write_size. innodb_params_adjust(): Add some disabled code for adjusting the minimum value and default value of innodb_log_write_ahead_size to reflect the log_sys.write_size. log_t::set_recovered(): Mark the recovery completed. This is the place to adjust some things if we want to allow write_size>4096. log_t::resize_write_buf(): Refer to write_size. log_t::resize_start(): Refer to write_size instead of get_block_size(). log_write_buf(): Simplify some arithmetics and remove a goto. log_t::write_buf(): Refer to write_size. If we are writing less than that, do not switch buffers, but keep writing to the same buffer. Move some code to improve the locality of reference. recv_scan_log(): Refer to write_size instead of get_block_size(). os_file_create_func(): For type==OS_LOG_FILE on Linux, always invoke os_file_log_maybe_unbuffered(), so that log_sys.set_block_size() will be invoked even if we are not attempting to use O_DIRECT. recv_sys_t::find_checkpoint(): Read the entire log header in a single 12 KiB request into log_sys.buf. Tested with: ./mtr --loose-innodb-log-write-ahead-size=4096 ./mtr --loose-innodb-log-write-ahead-size=2048
2024-06-27 16:38:08 +03:00
--let $restart_parameters=--innodb-log-write-ahead-size=513
--source include/start_mysqld.inc
MDEV-33894: Resurrect innodb_log_write_ahead_size As part of commit 685d958e38b825ad9829be311f26729cccf37c46 (MDEV-14425) the parameter innodb_log_write_ahead_size was removed, because it was thought that determining the physical block size would be a sufficient replacement. However, we can only determine the physical block size on Linux or Microsoft Windows. On some file systems, the physical block size is not relevant. For example, XFS uses a block size of 4096 bytes even if the underlying block size may be smaller. On Linux, we failed to determine the physical block size if innodb_log_file_buffered=OFF was not requested or possible. This will be fixed. log_sys.write_size: The value of the reintroduced parameter innodb_log_write_ahead_size. To keep it simple, this is read-only and a power of two between 512 and 4096 bytes, so that the previous alignment guarantees are fulfilled. This will replace the previous log_sys.get_block_size(). log_sys.block_size, log_t::get_block_size(): Remove. log_t::set_block_size(): Ensure that write_size will not be less than the physical block size. There is no point to invoke this function with 512 or less, because that is the minimum value of write_size. innodb_params_adjust(): Add some disabled code for adjusting the minimum value and default value of innodb_log_write_ahead_size to reflect the log_sys.write_size. log_t::set_recovered(): Mark the recovery completed. This is the place to adjust some things if we want to allow write_size>4096. log_t::resize_write_buf(): Refer to write_size. log_t::resize_start(): Refer to write_size instead of get_block_size(). log_write_buf(): Simplify some arithmetics and remove a goto. log_t::write_buf(): Refer to write_size. If we are writing less than that, do not switch buffers, but keep writing to the same buffer. Move some code to improve the locality of reference. recv_scan_log(): Refer to write_size instead of get_block_size(). os_file_create_func(): For type==OS_LOG_FILE on Linux, always invoke os_file_log_maybe_unbuffered(), so that log_sys.set_block_size() will be invoked even if we are not attempting to use O_DIRECT. recv_sys_t::find_checkpoint(): Read the entire log header in a single 12 KiB request into log_sys.buf. Tested with: ./mtr --loose-innodb-log-write-ahead-size=4096 ./mtr --loose-innodb-log-write-ahead-size=2048
2024-06-27 16:38:08 +03:00
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
--let $restart_parameters=--innodb-log-write-ahead-size=4095
--source include/start_mysqld.inc
eval $check_no_innodb;
--source include/shutdown_mysqld.inc
# this will be silently truncated to the maximum
--let $restart_parameters=--innodb-log-write-ahead-size=10000
--source include/start_mysqld.inc
SELECT @@innodb_log_write_ahead_size;
--echo # Cleanup
--list_files $bugdir
--remove_files_wildcard $bugdir
--rmdir $bugdir