mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-27 01:04:19 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	420d9eb27f	Merge 10.6 into 10.11	2025-01-08 12:51:26 +02:00
Marko Mäkelä	f20ee931d8	Merge 10.5 into 10.6 Note: Changes to the test innodb.stats_persistent in commit `e5c4c0842d` (MDEV-35443) are not merged, because the test scenario is impossible due to commit `e66928ab28` (MDEV-33462).	2025-01-03 09:10:25 +02:00
Marko Mäkelä	f2ffcd949b	MDEV-35657: Add work-arounds for clang 11	2024-12-19 14:18:55 +02:00
Marko Mäkelä	eca552a1a4	MDEV-34830: LSN in the future is not being treated as serious corruption The invariant of write-ahead logging is that before any change to a page is written to the data file, the corresponding log record must must first have been durably written. In crash recovery, there were some sloppy checks for this. Let us implement accurate checks and flag an inconsistency as a hard error, so that we can avoid further corruption of a corrupted database. For data extraction from the corrupted database, innodb_force_recovery can be used. Before recovery is reading any data pages or invoking buf_dblwr_t::recover() to recover torn pages from the doublewrite buffer, InnoDB will have parsed the log until the final LSN and updated log_sys.lsn to that. So, we can rely on log_sys.lsn at all times. The doublewrite buffer recovery has been refactored in such a way that the recv_sys.dblwr.pages may be consulted while discovering files and their page sizes, but nothing will be written back to data files before buf_dblwr_t::recover() is invoked. recv_max_page_lsn, recv_lsn_checks_on: Remove. recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging condition at the end of the recovery. recv_dblwr_t::validate_page(): Keep track of the maximum LSN (if we are checking a non-doublewrite copy of a page) but do not complain LSN being in the future. The doublewrite buffer is a special case, because it will be read early during recovery. Besides, starting with commit `762bcb81b5` the dblwr=true copies of pages may legitimately be "too new". recv_dblwr_t::find_page(): Find a valid page with the smallest FIL_PAGE_LSN that is in the valid range for recovery. recv_dblwr_t::restore_first_page(): Replaced by find_page(). Only buf_dblwr_t::recover() will write to data files. buf_dblwr_t::recover(): Simplify the message output. Do attempt doublewrite recovery on user page read error. Ignore doublewrite pages whose FIL_PAGE_LSN is outside the usable bounds. Previously, we could wrongly recover a too new page from the doublewrite buffer. It is unlikely that this could have lead to an actual error. Write back all recovered pages from the doublewrite buffer here, including for the first page of any tablespace. buf_page_is_corrupted(): Distinguish the return values CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER. buf_page_check_corrupt(): Return the error code DB_CORRUPTION in case the LSN is in the future. Datafile::read_first_page_flags(): Split from read_first_page(). Take a copy of the first page as a parameter. recv_sys_t::free_corrupted_page(): Take the file as a parameter and return whether a message was displayed. This avoids some duplicated and incomplete error messages. buf_page_t::read_complete(): Remove some redundant output and always display the name of the corrupted file. Never return DB_FAIL; use it only in internal error handling. IORequest::read_complete(): Assume that buf_page_t::read_complete() will have reported any error. fil_space_t::set_corrupted(): Return whether this is the first time the tablespace had been flagged as corrupted. Datafile::validate_first_page(), fil_node_open_file_low(), fil_node_open_file(), fil_space_t::read_page0(), fil_node_t::read_page0(): Add a parameter for a copy of the first page, and a parameter to indicate whether the FIL_PAGE_LSN check should be suppressed. Before buf_dblwr_t::recover() is invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the FSP_SPACE_FLAGS and the tablespace ID that may be present in a potentially too new copy of a page. Reviewed by: Debarun Banerjee	2024-10-18 10:12:47 +03:00
Marko Mäkelä	bb47e575de	MDEV-34830: LSN in the future is not being treated as serious corruption The invariant of write-ahead logging is that before any change to a page is written to the data file, the corresponding log record must must first have been durably written. On crash recovery, there were some sloppy checks for this. Let us implement accurate checks and flag an inconsistency as a hard error, so that we can avoid further corruption of a corrupted database. For data extraction from the corrupted database, innodb_force_recovery can be used. Before recovery is reading any data pages or invoking buf_dblwr_t::recover() to recover torn pages from the doublewrite buffer, InnoDB will have parsed the log until the final LSN and updated log_sys.lsn to that. So, we can rely on log_sys.lsn at all times. The doublewrite buffer recovery has been refactored in such a way that the recv_sys.dblwr.pages may be consulted while discovering files and their page sizes, but nothing will be written back to data files before buf_dblwr_t::recover() is invoked. A section of the test mariabackup.innodb_redo_overwrite that is parsing some mariadb-backup --backup output has been removed, because that output "redo log block is overwritten" would often be missing in a Microsoft Windows environment as a result of these changes. recv_max_page_lsn, recv_lsn_checks_on: Remove. recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging condition at the end of the recovery. recv_dblwr_t::validate_page(): Keep track of the maximum LSN (if we are checking a non-doublewrite copy of a page) but do not complain LSN being in the future. The doublewrite buffer is a special case, because it will be read early during recovery. Besides, starting with commit `762bcb81b5` the dblwr=true copies of pages may legitimately be "too new". recv_dblwr_t::find_page(): Find a valid page with the smallest FIL_PAGE_LSN that is in the valid range for recovery. recv_dblwr_t::restore_first_page(): Replaced by find_page(). Only buf_dblwr_t::recover() will write to data files. buf_dblwr_t::recover(): Simplify the message output. Do attempt doublewrite recovery on user page read error. Ignore doublewrite pages whose FIL_PAGE_LSN is outside the usable bounds. Previously, we could wrongly recover a too new page from the doublewrite buffer. It is unlikely that this could have lead to an actual error. Write back all recovered pages from the doublewrite buffer here, including for the first page of any tablespace. buf_page_is_corrupted(): Distinguish the return values CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER. buf_page_check_corrupt(): Return the error code DB_CORRUPTION in case the LSN is in the future. Datafile::read_first_page(): Handle FSP_SPACE_FLAGS=0xffffffff in the same way on both 32-bit and 64-bit architectures. Datafile::read_first_page_flags(): Split from read_first_page(). Take a copy of the first page as a parameter. recv_sys_t::free_corrupted_page(): Take the file as a parameter and return whether a message was displayed. This avoids some duplicated and incomplete error messages. buf_page_t::read_complete(): Remove some redundant output and always display the name of the corrupted file. Never return DB_FAIL; use it only in internal error handling. IORequest::read_complete(): Assume that buf_page_t::read_complete() will have reported any error. fil_space_t::set_corrupted(): Return whether this is the first time the tablespace had been flagged as corrupted. Datafile::validate_first_page(), fil_node_open_file_low(), fil_node_open_file(), fil_space_t::read_page0(), fil_node_t::read_page0(): Add a parameter for a copy of the first page, and a parameter to indicate whether the FIL_PAGE_LSN check should be suppressed. Before buf_dblwr_t::recover() is invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the FSP_SPACE_FLAGS and the tablespace ID that may be present in a potentially too new copy of a page. Reviewed by: Debarun Banerjee	2024-10-17 17:24:20 +03:00
Oleksandr Byelkin	c9b1ebee2f	Merge branch '10.6' into 10.11	2024-04-26 08:02:49 +02:00
Monty	0ccdf54b64	Check and remove high stack usage I checked all stack overflow potential problems found with gcc -Wstack-usage=16384 and clang -Wframe-larger-than=16384 -no-inline Fixes: Added '#pragma clang diagnostic ignored "-Wframe-larger-than="' to a lot of function to where stack usage large but resonable. - Added stack check warnings to BUILD scrips when using clang and debug. Function changed to use malloc instead allocating things on stack: - read_bootstrap_query() now allocates line_buffer (20000 bytes) with malloc() instead of using stack. This has a small performance impact but this is not releant for bootstrap. - mroonga grn_select() used 65856 bytes on stack. Changed it to use malloc(). - Wsrep_schema::replay_transaction() and Wsrep_schema::recover_sr_transactions(). - Connect zipOpen3() Not fixed: - mroonga/vendor/groonga/lib/expr.c grn_proc_call() uses 43712 byte on stack. However this is not easy to fix as the stack used is caused by a lot of code generated by defines. - Most changes in mroonga/groonga where only adding of pragmas to disable stack warnings. - rocksdb/options/options_helper.cc uses 20288 of stack space. (no reason to fix except to get rid of the compiler warning) - Causes using alloca() where the allocation size is resonable. - An issue in libmariadb (reported to connectors).	2024-04-23 14:12:31 +03:00
Marko Mäkelä	7f7329f092	MDEV-33379 innodb_log_file_buffering=OFF causes corruption on bcachefs Apparently, invoking fcntl(fd, F_SETFL, O_DIRECT) will lead to unexpected behaviour on Linux bcachefs and possibly other file systems, depending on the operating system version. So, let us avoid doing that, and instead just attempt to pass the O_DIRECT flag to open(). This should make us compatible with NetBSD, IBM AIX, as well as Solaris and its derivatives. This fix does not change the fact that we had only implemented innodb_log_file_buffering=OFF on systems where we can determine the physical block size (typically 512 or 4096 bytes). Currently, those operating systems are Linux and Microsoft Windows. HAVE_FCNTL_DIRECT, os_file_set_nocache(): Remove. OS_FILE_OVERWRITE, OS_FILE_CREATE_PATH: Remove (never used parameters). os_file_log_buffered(), os_file_log_maybe_unbuffered(): Helper functions. os_file_create_simple_func(): When applicable, initially attempt to open files in O_DIRECT mode. os_file_create_func(): When applicable, initially attempt to open files in O_DIRECT mode. For type==OS_LOG_FILE && create_mode != OS_FILE_CREATE we will first invoke stat(2) on the file name to find out if the size is compatible with O_DIRECT. If create_mode == OS_FILE_CREATE, we will invoke fstat(2) on the created log file afterwards, and may close and reopen the file in O_DIRECT mode if applicable. create_temp_file(): Support O_DIRECT. This is only used if O_TMPFILE is available and innodb_disable_sort_file_cache=ON (non-default value). Notably, that setting never worked on Microsoft Windows. row_merge_file_create_mode(): Split from row_merge_file_create_low(). Create a temporary file in the specified mode. Reviewed by: Vladislav Vaintroub	2024-02-20 11:22:45 +02:00
Marko Mäkelä	64cce8d5bf	Merge 10.6 into 10.11	2024-02-14 16:12:53 +02:00
Marko Mäkelä	691f923906	Merge 10.5 into 10.6	2024-02-13 20:42:59 +02:00
Marko Mäkelä	92f87f2cf0	Cleanup: Remove changed_pages_bitmap The innodb_changed_pages plugin only was part of XtraDB, never InnoDB. It would be useful for incremental backups. We will remove the code from mariadb-backup for now, because it cannot serve any useful purpose until the server part has been implemented.	2024-02-12 17:01:35 +02:00
Marko Mäkelä	b3ca7fa089	Merge 10.6 into 10.11	2024-01-22 08:49:04 +02:00
Marko Mäkelä	a6290a5bc5	MDEV-33095 innodb_flush_method=O_DIRECT creates excessive errors on Solaris The directio(3C) function on Solaris is supported on NFS and UFS while the majority of users should be on ZFS, which is a copy-on-write file system that implements transparent compression and therefore cannot support unbuffered I/O. Let us remove the call to directio() and simply treat innodb_flush_method=O_DIRECT in the same way as the previous default value innodb_flush_method=fsync on Solaris. Also, let us remove some dead code around calls to os_file_set_nocache() on platforms where fcntl(2) is not usable with O_DIRECT. On IBM AIX, O_DIRECT is not documented for fcntl(2), only for open(2).	2024-01-19 15:34:33 +11:00
Marko Mäkelä	ab36eac584	Merge 10.6 into 10.7	2023-01-10 13:58:03 +02:00
Marko Mäkelä	56c9b0bca0	Merge 10.5 into 10.6	2023-01-10 13:54:17 +02:00
Thirunarayanan Balathandayuthapani	17858e03a7	MDEV-30179 mariabackup --backup fails with FATAL ERROR: ... failed to copy datafile - Mariabackup fails to copy the undo log tablespace when it undergoes truncation. So Mariabackup should detect the redo log which does undo tablespace truncation and also backup should read the minimum file size of the tablespace and ignore the error while reading. - Throw error when innodb undo tablespace read failed, but backup doesn't find the redo log for undo tablespace truncation	2023-01-10 15:47:13 +05:30
Marko Mäkelä	b7ae4d442a	Merge 10.6 into 10.7	2022-11-30 12:09:01 +02:00
Marko Mäkelä	15ab2e122d	MDEV-30132 Crash after recovery, with InnoDB: Tried to read ... os_file_read(): Merged with os_file_read_no_error_handling(). Crashing on a partial page read is as unhelpful as crashing on a corrupted page read (commit `0b47c126e3`). Report the file name if it is available via IORequest.	2022-11-30 10:54:03 +02:00
Marko Mäkelä	fc1403d3a9	Cleanup: Remove fil_space_t::is_deferred() The public data member can be checked directly by the only caller.	2022-11-30 10:41:11 +02:00
Marko Mäkelä	ca501ffb04	MDEV-26195: Use a 32-bit data type for some tablespace fields In the InnoDB data files, we allocate 32 bits for tablespace identifiers and page numbers as well as tablespace flags. But, in main memory data structures we allocate 32 or 64 bits, depending on the register width of the processor. Let us always use 32-bit fields to eliminate a mismatch and reduce the memory footprint on 64-bit systems.	2021-07-22 11:22:47 +03:00
Marko Mäkelä	86dc7b4d4c	MDEV-24626 Remove synchronous write of page0 file during file creation During data file creation, InnoDB holds dict_sys mutex, tries to write page 0 of the file and flushes the file. This not only causing unnecessary contention but also a deviation from the write-ahead logging protocol. The clean sequence of operations is that we first start a dictionary transaction and write SYS_TABLES and SYS_INDEXES records that identify the tablespace. Then, we durably write a FILE_CREATE record to the write-ahead log and create the file. Recovery should not unnecessarily insist that the first page of each data file that is referred to by the redo log is valid. It must be enough that page 0 of the tablespace can be initialized based on the redo log contents. We introduce a new data structure deferred_spaces that keeps track of corrupted-looking files during recovery. The data structure holds the last LSN of a FILE_ record referring to the data file, the tablespace identifier, and the last known file name. There are two scenarios can happen during recovery: i) Sufficient memory: InnoDB can reconstruct the tablespace after parsing all redo log records. ii) Insufficient memory(multiple apply phase): InnoDB should store the deferred tablespace redo logs even though tablespace is not present. InnoDB should start constructing the tablespace when it first encounters deferred tablespace id. Mariabackup copies the zero filled ibd file in backup_fix_ddl() as the extension of .new file. Mariabackup test case does page flushing when it deals with DDL operation during backup operation. fil_ibd_create(): Remove the write of page0 and flushing of file fil_ibd_load(): Return FIL_LOAD_DEFER if the tablespace has zero filled page0 Datafile: Clean up the error handling, and do not report errors if we are in the middle of recovery. The caller will check Datafile::m_defer. fil_node_t::deferred: Indicates whether the tablespace loading was deferred during recovery FIL_LOAD_DEFER: Returned by fil_ibd_load() to indicate that tablespace file was cannot be loaded. recv_sys_t::recover_deferred(): Invoke deferred_spaces.create() to initialize fil_space_t based on buffered metadata and records to initialize page 0. Ignore the flags in fil_name_t, because they are intentionally invalid. fil_name_process(): Update deferred_spaces. recv_sys_t::parse(): Store the redo log if the tablespace id is present in deferred spaces recv_sys_t::recover_low(): Should recover the first page of the tablespace even though the tablespace instance is not present recv_sys_t::apply(): Initialize the deferred tablespace before applying the deferred tablespace records recv_validate_tablespace(): Skip the validation for deferred_spaces. recv_rename_files(): Moved and revised from recv_sys_t::apply(). For deferred-recovery tablespaces, do not attempt to rename the file if a deferred-recovery tablespace is associated with the name. recv_recovery_from_checkpoint_start(): Invoke recv_rename_files() and initialize all deferred tablespaces before applying redo log. fil_node_t::read_page0(): Skip page0 validation if the tablespace is deferred buf_page_create_deferred(): A variant of buf_page_create() when the fil_space_t is not available yet This is joint work with Thirunarayanan Balathandayuthapani, who implemented an initial prototype.	2021-05-17 18:12:33 +03:00
Marko Mäkelä	916b237b3f	Merge 10.5 into 10.6	2021-05-07 15:00:27 +03:00
Vladislav Vaintroub	a5b3982585	MDEV-25613 assertion (file_system.n_open > 0) failed Remove operations on fil_system.n_open from mariabackup, as they are not protected by the mutex, and serve no higher purpose anyway.	2021-05-07 08:13:28 +02:00
Marko Mäkelä	cf552f5886	MDEV-25312 Replace fil_space_t::name with fil_space_t::name() A consistency check for fil_space_t::name is causing recovery failures in MDEV-25180 (Atomic ALTER TABLE). So, we'd better remove that field altogether. fil_space_t::name was more or less a copy of dict_table_t::name (except for some special cases), and it was not being used for anything useful. There used to be a name_hash, but it had been removed already in commit `a75dbfd718` (MDEV-12266). We will also remove os_normalize_path(), OS_PATH_SEPARATOR, OS_PATH_SEPATOR_ALT. On Microsoft Windows, we will treat \ and / roughly in the same way. The intention is that for per-table tablespaces, the filenames will always follow the pattern prefix/databasename/tablename.ibd. (Any \ in the prefix must not be converted.) ut_basename_noext(): Remove (unused function). read_link_file(): Replaces RemoteDatafile::read_link_file(). We will ensure that the last two path component separators are forward slashes (converting up to 2 trailing backslashes on Microsoft Windows), so that everywhere else we can assume that data file names end in "/databasename/tablename.ibd". Note: On Microsoft Windows, path names that start with \\?\ must not contain / as path component separators. Previously, such paths did work in the DATA DIRECTORY argument of InnoDB tables. Reviewed by: Vladislav Vaintroub	2021-04-07 18:01:13 +03:00
Eugene Kosov	62e4aaa240	cleanup: os_thread_sleep() -> std::this_thread::sleep_for() std version has an advantage of a more convenient units implementation from std::chrono. Now it's no need to multipy/divide to bring anything to micro seconds.	2021-03-19 11:44:03 +03:00
Marko Mäkelä	ff5d306e29	MDEV-21452: Replace ib_mutex_t with mysql_mutex_t SHOW ENGINE INNODB MUTEX functionality is completely removed, as are the InnoDB latching order checks. We will enforce innodb_fatal_semaphore_wait_threshold only for dict_sys.mutex and lock_sys.mutex. dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex. lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex. FIXME: srv_sys should be removed altogether; it is duplicating tpool functionality. fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must not hold fil_system.mutex. fil_close_all_files(): To prevent SAFE_MUTEX warnings for fil_space_destroy_crypt_data(), we must not hold fil_system.mutex while invoking fil_space_free_low() on a detached tablespace.	2020-12-15 17:56:18 +02:00
Marko Mäkelä	6a1e655cb0	Merge 10.4 into 10.5	2020-12-02 18:29:49 +02:00
Marko Mäkelä	589cf8dbf3	Merge 10.3 into 10.4	2020-12-01 19:51:14 +02:00
Marko Mäkelä	81ab9ea63f	Merge 10.2 into 10.3	2020-12-01 14:55:46 +02:00
Vlad Lesin	e6b3e38d62	MDEV-22929 MariaBackup option to report and/or continue when corruption is encountered The new option --log-innodb-page-corruption is introduced. When this option is set, backup is not interrupted if innodb corrupted page is detected. Instead it logs all found corrupted pages in innodb_corrupted_pages file in backup directory and finishes with error. For incremental backup corrupted pages are also copied to .delta file, because we can't do LSN check for such pages during backup, innodb_corrupted_pages will also be created in incremental backup directory. During --prepare, corrupted pages list is read from the file just after redo log is applied, and each page from the list is checked if it is allocated in it's tablespace or not. If it is not allocated, then it is zeroed out, flushed to the tablespace and removed from the list. If all pages are removed from the list, then --prepare is finished successfully and innodb_corrupted_pages file is removed from backup directory. Otherwise --prepare is finished with error message and innodb_corrupted_pages contains the list of the pages, which are detected as corrupted during backup, and are allocated in their tablespaces, what means backup directory contains corrupted innodb pages, and backup can not be considered as consistent. For incremental --prepare corrupted pages from .delta files are applied to the base backup, innodb_corrupted_pages is read from both base in incremental directories, and the same action is proceded for corrupted pages list as for full --prepare. innodb_corrupted_pages file is modified or removed only in base directory. If DDL happens during backup, it is also processed at the end of backup to have correct tablespace names in innodb_corrupted_pages.	2020-12-01 08:08:57 +03:00
Marko Mäkelä	118e258aaa	MDEV-23855: Shrink fil_space_t Merge n_pending_ios, n_pending_ops to std::atomic<uint32_t> n_pending. Change some more fil_space_t members to uint32_t to reduce the memory footprint. fil_space_t::add(), fil_ibd_create(): Attach the already opened handle to the tablespace, and enforce the fil_system.n_open limit. dict_boot(): Initialize fil_system.max_assigned_id. srv_boot(): Call srv_thread_pool_init() before anything else, so that files should be opened in the correct mode on Windows. fil_ibd_create(): Create the file in OS_FILE_AIO mode, just like fil_node_open_file_low() does it. dict_table_t::is_accessible(): Replaces fil_table_accessible(). Reviewed by: Vladislav Vaintroub	2020-10-26 17:53:54 +02:00
Marko Mäkelä	45ed9dd957	MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contention Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored for system tablespace on SSD When the maximum configured number of file is exceeded, InnoDB will close data files. We used to maintain a fil_system.LRU list and a counter fil_node_t::n_pending to achieve this, at the huge cost of multiple fil_system.mutex operations per I/O operation. fil_node_open_file_low(): Implement a FIFO replacement policy: The last opened file will be moved to the end of fil_system.space_list, and files will be closed from the start of the list. However, we will not move tablespaces in fil_system.space_list while i_s_tablespaces_encryption_fill_table() is executing (producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION) because it may cause information of some tablespaces to go missing. We also avoid this in mariabackup --backup because datafiles_iter_next() assumes that the ordering is not changed. IORequest: Fold more parameters to IORequest::type. fil_space_t::io(): Replaces fil_io(). fil_space_t::flush(): Replaces fil_flush(). OS_AIO_IBUF: Remove. We will always issue synchronous reads of the change buffer pages in buf_read_page_low(). We will always ignore some errors for background reads. This should reduce fil_system.mutex contention a little. fil_node_t::complete_write(): Replaces fil_node_t::complete_io(). On both read and write completion, fil_space_t::release_for_io() will have to be called. fil_space_t::io(): Do not acquire fil_system.mutex in the normal code path. xb_delta_open_matching_space(): Do not try to open the system tablespace which was already opened. This fixes a file sharing violation in mariabackup --prepare --incremental. Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Marko Mäkelä	18a62eb76d	MDEV-21133 follow-up: Use fil_page_get_type() Let us use the common accessor function fil_page_get_type() instead of accessing the page header field FIL_PAGE_TYPE directly.	2020-05-07 17:15:34 +03:00
Marko Mäkelä	42a4ae54c2	MDEV-21225 Remove ut_align() and use aligned_malloc() Before commit `90c52e5291` introduced aligned_malloc(), InnoDB always used a pattern of over-allocating memory and invoking ut_align() to guarantee the desired alignment. It is cleaner to invoke aligned_malloc() and aligned_free() directly. ut_align(): Remove. In assertions, ut_align_down() can be used instead.	2019-12-05 06:42:31 +02:00
Marko Mäkelä	c11e5cdd12	Merge 10.3 into 10.4	2019-10-10 11:19:25 +03:00
Marko Mäkelä	892378fb9d	Merge 10.2 into 10.3	2019-10-09 13:25:11 +03:00
Thirunarayanan Balathandayuthapani	c65cb244b3	MDEV-19335 Remove buf_page_t::encrypted The field buf_page_t::encrypted was added in MDEV-8588. It was made mostly redundant in MDEV-12699. Remove the field.	2019-10-09 13:13:12 +03:00
Oleksandr Byelkin	c07325f932	Merge branch '10.3' into 10.4	2019-05-19 20:55:37 +02:00
Marko Mäkelä	be85d3e61b	Merge 10.2 into 10.3	2019-05-14 17:18:46 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	cb248f8806	Merge branch '5.5' into 10.1	2019-05-11 22:19:05 +03:00
Marko Mäkelä	b132b8895e	Merge 10.3 into 10.4	2019-05-05 10:23:14 +03:00
Marko Mäkelä	4d59f45260	Merge 10.2 into 10.3	2019-04-27 20:41:31 +03:00
Eugene Kosov	6c5c1f0b2f	MDEV-19231 make DB_SUCCESS equal to 0 It's a micro optimization. On most platforms CPUs has instructions to compare with 0 fast. DB_SUCCESS is the most popular outcome of functions and this patch optimized code like (err == DB_SUCCESS) BtrBulk::finish(): bogus assertion fixed fil_node_t::read_page0(): corrected usage of os_file_read() que_eval_sql(): bugus assertion removed. Apparently it checked that the field was assigned after having been zero-initialized at object creation. It turns out that the return type of os_file_read_func() was changed in mysql/mysql-server@98909cefbc (MySQL 5.7) from ibool to dberr_t. The reviewer (if there was any) failed to point out that because of future merges, it could be a bad idea to change the return type of a function without changing the function name. This change was applied to MariaDB 10.2.2 in commit `2e814d4702` but the MariaDB-specific code was not fully adjusted accordingly, e.g. in fil_node_open_file(). Essentially, code like !os_file_read(...) became dead code in MariaDB and later in Mariabackup 10.2, and we could be dealing with an uninitialized buffer after a failed page read.	2019-04-25 16:29:55 +03:00
Marko Mäkelä	6b6fa3cdb1	MDEV-18644: Support full_crc32 for page_compressed This is a follow-up task to MDEV-12026, which introduced innodb_checksum_algorithm=full_crc32 and a simpler page format. MDEV-12026 did not enable full_crc32 for page_compressed tables, which we will be doing now. This is joint work with Thirunarayanan Balathandayuthapani. For innodb_checksum_algorithm=full_crc32 we change the page_compressed format as follows: FIL_PAGE_TYPE: The most significant bit will be set to indicate page_compressed format. The least significant bits will contain the compressed page size, rounded up to a multiple of 256 bytes. The checksum will be stored in the last 4 bytes of the page (whether it is the full page or a page_compressed page whose size is determined by FIL_PAGE_TYPE), covering all preceding bytes of the page. If encryption is used, then the page will be encrypted between compression and computing the checksum. For page_compressed, FIL_PAGE_LSN will not be repeated at the end of the page. FSP_SPACE_FLAGS (already implemented as part of MDEV-12026): We will store the innodb_compression_algorithm that may be used to compress pages. Previously, the choice of algorithm was written to each compressed data page separately, and one would be unable to know in advance which compression algorithm(s) are used. fil_space_t::full_crc32_page_compressed_len(): Determine if the page_compressed algorithm of the tablespace needs to know the exact length of the compressed data. If yes, we will reserve and write an extra byte for this right before the checksum. buf_page_is_compressed(): Determine if a page uses page_compressed (in any innodb_checksum_algorithm). fil_page_decompress(): Pass also fil_space_t::flags so that the format can be determined. buf_page_is_zeroes(): Check if a page is full of zero bytes. buf_page_full_crc32_is_corrupted(): Renamed from buf_encrypted_full_crc32_page_is_corrupted(). For full_crc32, we always simply validate the checksum to the page contents, while the physical page size is explicitly specified by an unencrypted part of the page header. buf_page_full_crc32_size(): Determine the size of a full_crc32 page. buf_dblwr_check_page_lsn(): Make this a debug-only function, because it involves potentially costly lookups of fil_space_t. create_table_info_t::check_table_options(), ha_innobase::check_if_supported_inplace_alter(): Do allow the creation of SPATIAL INDEX with full_crc32 also when page_compressed is used. commit_cache_norebuild(): Preserve the compression algorithm when updating the page_compression_level. dict_tf_to_fsp_flags(): Set the flags for page compression algorithm. FIXME: Maybe there should be a table option page_compression_algorithm and a session variable to back it?	2019-03-18 14:08:43 +02:00
Marko Mäkelä	aa4b2c1509	Merge 10.3 into 10.4	2019-03-07 08:02:33 +02:00
Marko Mäkelä	77103e9832	Merge 10.2 into 10.3	2019-03-06 16:20:13 +02:00
Marko Mäkelä	c155946c90	Merge 10.1 into 10.2	2019-03-06 15:15:59 +02:00
Marko Mäkelä	b761211685	MDEV-18659: Fix string truncation/overflow in InnoDB and XtraDB Fix the warnings issued by GCC 8 -Wstringop-truncation and -Wstringop-overflow in InnoDB and XtraDB. This work is motivated by Jan Lindström. The patch mainly differs from his original one as follows: (1) We remove explicit initialization of stack-allocated string buffers. The minimum amount of initialization that is needed is a terminating NUL character. (2) GCC issues a warning for invoking strncpy(dest, src, sizeof dest) because if strlen(src) >= sizeof dest, there would be no terminating NUL byte in dest. We avoid this problem by invoking strncpy() with a limit that is 1 less than the buffer size, and by always writing NUL to the last byte of the buffer. (3) We replace strncpy() with memcpy() or strcpy() in those cases when the result is functionally equivalent. Note: fts_fetch_index_words() never deals with len==UNIV_SQL_NULL. This was enforced by an assertion that limits the maximum length to FTS_MAX_WORD_LEN. Also, the encoding that InnoDB uses for the compressed fulltext index is not byte-order agnostic, that is, InnoDB data files that use FULLTEXT INDEX are not portable between big-endian and little-endian systems.	2019-03-06 11:22:27 +02:00
Thirunarayanan Balathandayuthapani	c0f47a4a58	MDEV-12026: Implement innodb_checksum_algorithm=full_crc32 MariaDB data-at-rest encryption (innodb_encrypt_tables) had repurposed the same unused data field that was repurposed in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN) field of SPATIAL INDEX. Because of this, MariaDB was unable to support encryption on SPATIAL INDEX pages. Furthermore, InnoDB page checksums skipped some bytes, and there are multiple variations and checksum algorithms. By default, InnoDB accepts all variations of all algorithms that ever existed. This unnecessarily weakens the page checksums. We hereby introduce two more innodb_checksum_algorithm variants (full_crc32, strict_full_crc32) that are special in a way: When either setting is active, newly created data files will carry a flag (fil_space_t::full_crc32()) that indicates that all pages of the file will use a full CRC-32C checksum over the entire page contents (excluding the bytes where the checksum is stored, at the very end of the page). Such files will always use that checksum, no matter what the parameter innodb_checksum_algorithm is assigned to. For old files, the old checksum algorithms will continue to be used. The value strict_full_crc32 will be equivalent to strict_crc32 and the value full_crc32 will be equivalent to crc32. ROW_FORMAT=COMPRESSED tables will only use the old format. These tables do not support new features, such as larger innodb_page_size or instant ADD/DROP COLUMN. They may be deprecated in the future. We do not want an unnecessary file format change for them. The new full_crc32() format also cleans up the MariaDB tablespace flags. We will reserve flags to store the page_compressed compression algorithm, and to store the compressed payload length, so that checksum can be computed over the compressed (and possibly encrypted) stream and can be validated without decrypting or decompressing the page. In the full_crc32 format, there no longer are separate before-encryption and after-encryption checksums for pages. The single checksum is computed on the page contents that is written to the file. We do not make the new algorithm the default for two reasons. First, MariaDB 10.4.2 was a beta release, and the default values of parameters should not change after beta. Second, we did not yet implement the full_crc32 format for page_compressed pages. This will be fixed in MDEV-18644. This is joint work with Marko Mäkelä.	2019-02-19 18:50:19 +02:00

1 2

100 commits