mirror of
				https://github.com/MariaDB/server.git
				synced 2025-10-26 01:18:31 +02:00 
			
		
		
		
	 86dc7b4d4c
			
		
	
	
	86dc7b4d4c
	
	
	
		
			
			During data file creation, InnoDB holds dict_sys mutex, tries to write page 0 of the file and flushes the file. This not only causing unnecessary contention but also a deviation from the write-ahead logging protocol. The clean sequence of operations is that we first start a dictionary transaction and write SYS_TABLES and SYS_INDEXES records that identify the tablespace. Then, we durably write a FILE_CREATE record to the write-ahead log and create the file. Recovery should not unnecessarily insist that the first page of each data file that is referred to by the redo log is valid. It must be enough that page 0 of the tablespace can be initialized based on the redo log contents. We introduce a new data structure deferred_spaces that keeps track of corrupted-looking files during recovery. The data structure holds the last LSN of a FILE_ record referring to the data file, the tablespace identifier, and the last known file name. There are two scenarios can happen during recovery: i) Sufficient memory: InnoDB can reconstruct the tablespace after parsing all redo log records. ii) Insufficient memory(multiple apply phase): InnoDB should store the deferred tablespace redo logs even though tablespace is not present. InnoDB should start constructing the tablespace when it first encounters deferred tablespace id. Mariabackup copies the zero filled ibd file in backup_fix_ddl() as the extension of .new file. Mariabackup test case does page flushing when it deals with DDL operation during backup operation. fil_ibd_create(): Remove the write of page0 and flushing of file fil_ibd_load(): Return FIL_LOAD_DEFER if the tablespace has zero filled page0 Datafile: Clean up the error handling, and do not report errors if we are in the middle of recovery. The caller will check Datafile::m_defer. fil_node_t::deferred: Indicates whether the tablespace loading was deferred during recovery FIL_LOAD_DEFER: Returned by fil_ibd_load() to indicate that tablespace file was cannot be loaded. recv_sys_t::recover_deferred(): Invoke deferred_spaces.create() to initialize fil_space_t based on buffered metadata and records to initialize page 0. Ignore the flags in fil_name_t, because they are intentionally invalid. fil_name_process(): Update deferred_spaces. recv_sys_t::parse(): Store the redo log if the tablespace id is present in deferred spaces recv_sys_t::recover_low(): Should recover the first page of the tablespace even though the tablespace instance is not present recv_sys_t::apply(): Initialize the deferred tablespace before applying the deferred tablespace records recv_validate_tablespace(): Skip the validation for deferred_spaces. recv_rename_files(): Moved and revised from recv_sys_t::apply(). For deferred-recovery tablespaces, do not attempt to rename the file if a deferred-recovery tablespace is associated with the name. recv_recovery_from_checkpoint_start(): Invoke recv_rename_files() and initialize all deferred tablespaces before applying redo log. fil_node_t::read_page0(): Skip page0 validation if the tablespace is deferred buf_page_create_deferred(): A variant of buf_page_create() when the fil_space_t is not available yet This is joint work with Thirunarayanan Balathandayuthapani, who implemented an initial prototype.
		
			
				
	
	
		
			24 lines
		
	
	
	
		
			504 B
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			24 lines
		
	
	
	
		
			504 B
		
	
	
	
		
			Text
		
	
	
	
	
	
| CREATE TABLE t1(i int) ENGINE=INNODB;
 | |
| CREATE TABLE t2(i int) ENGINE=INNODB;
 | |
| CREATE TABLE t3(a CHAR(36)) ENGINE INNODB;
 | |
| INSERT INTO t3 SELECT UUID() FROM seq_1_to_1000;
 | |
| set global innodb_log_checkpoint_now=1;
 | |
| # xtrabackup backup
 | |
| # xtrabackup prepare
 | |
| # shutdown server
 | |
| # remove datadir
 | |
| # xtrabackup move back
 | |
| # restart
 | |
| SELECT COUNT(*) from t1;
 | |
| COUNT(*)
 | |
| 100
 | |
| SELECT COUNT(*) from t2;
 | |
| COUNT(*)
 | |
| 1000
 | |
| SELECT COUNT(*) from t3;
 | |
| COUNT(*)
 | |
| 1000
 | |
| DROP INDEX index_a ON t3;
 | |
| DROP TABLE t1;
 | |
| DROP TABLE t2;
 | |
| DROP TABLE t3;
 |