mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-16 11:57:38 +02:00

Author	SHA1	Message	Date
Oleksandr Byelkin	f66d1850ac	Merge branch '10.3' into 10.4	2019-06-14 22:10:50 +02:00
Oleksandr Byelkin	4a3d51c76c	Merge branch '10.2' into 10.3	2019-06-14 07:36:47 +02:00
Marko Mäkelä	984d7100cd	Merge 10.4 into 10.5	2019-06-13 18:36:09 +03:00
Thirunarayanan Balathandayuthapani	e9145aab44	MDEV-19435 buf_fix_count > 0 for corrupted page when it exits the LRU list Problem: ========= One of the purge thread access the corrupted page and tries to remove from LRU list. In the mean time, other purge threads are waiting for same page in buf_wait_for_read(). Assertion(buf_fix_count == 0) fails for the purge thread which tries to remove the page from LRU list. Solution: ======== - Set the page id as FIL_NULL to indicate the page is corrupted before removing the block from LRU list. Acquire hash lock for the particular page id and wait for the other threads to release buf_fix_count for the block. - Added the error check for btr_cur_open() in row_search_on_row_ref().	2019-06-13 16:13:51 +03:00
Marko Mäkelä	8bb4ea2e6f	MDEV-19738: Doublewrite buffer is unnecessarily used for newly (re)initialized pages Thanks to MDEV-12699, the doublewrite buffer will only be needed in those cases when a page is being updated in the data file. If the page had never been written to the data file since it was initialized, then recovery will be able to reconstruct the page based solely on the contents of the redo log files. The doublewrite buffer is only really needed when recovery needs to read the page in order to apply redo log. Note: As noted in MDEV-19739, we cannot safely disable the doublewrite buffer if any MLOG_INDEX_LOAD records were written in the past or will be written in the future. These records denote that redo logging was disabled for some pages in a tablespace. Ideally, we would have the setting innodb_log_optimize_ddl=OFF by default, and would not allow it to be set while the server is running. If we wanted to make this safe, assignments with SET GLOBAL innodb_log_optimize_ddl=... should not only issue a redo log checkpoint (including a write of all dirty pages from the entire buffer pool), but it should also wait for all pending ALTER TABLE activity to complete. We elect not to do this. Avoiding unnecessary use of the doublewrite buffer should improve the write performance of InnoDB. buf_page_t::init_on_flush: A new flag to indicate whether it is safe to skip doublewrite buffering when writing the page. fsp_init_file_page(): When writing a MLOG_INIT_FILE_PAGE2 record, set the init_on_flush flag if innodb_log_optimize_ddl=OFF. This is the only function that writes that log record. buf_flush_write_block_low(): Skip doublewrite if init_on_flush is set. fil_aio_wait(): Clear init_on_flush.	2019-06-12 20:18:01 +03:00
Marko Mäkelä	2fd82471ab	Merge 10.3 into 10.4	2019-06-12 08:37:27 +03:00
Marko Mäkelä	b42dbdbccd	Merge 10.2 into 10.3	2019-06-11 13:00:18 +03:00
Thirunarayanan Balathandayuthapani	b4287ec386	MDEV-19541 InnoDB crashes when trying to recover a corrupted page - Use corrupt page id instead of whole block after releasing it from LRU list.	2019-06-05 16:36:51 +05:30
Marko Mäkelä	f98bb23168	Merge 10.3 into 10.4	2019-05-29 22:17:00 +03:00
Marko Mäkelä	90a9193685	Merge 10.2 into 10.3	2019-05-29 11:32:46 +03:00
Thirunarayanan Balathandayuthapani	79b46ab2a6	MDEV-19541 InnoDB crashes when trying to recover a corrupted page - Don't apply redo log for the corrupted page when innodb_force_recovery > 0. - Allow the table to be dropped when index root page is corrupted when innodb_force_recovery > 0.	2019-05-28 11:55:02 +03:00
Marko Mäkelä	5d2619b693	MDEV-19584 Allocate recv_sys statically There is only one InnoDB crash recovery subsystem. Allocating recv_sys statically removes one level of pointer indirection and makes code more readable, and removes the awkward initialization of recv_sys->dblwr. recv_sys_t::create(): Replaces recv_sys_init(). recv_sys_t::debug_free(): Replaces recv_sys_debug_free(). recv_sys_t::close(): Replaces recv_sys_close(). recv_sys_t::add(): Replaces recv_add_to_hash_table(). recv_sys_t::empty(): Replaces recv_sys_empty_hash().	2019-05-24 16:19:38 +03:00
Oleksandr Byelkin	c07325f932	Merge branch '10.3' into 10.4	2019-05-19 20:55:37 +02:00
Marko Mäkelä	5fd7502e77	MDEV-19513: Allocate dict_sys statically dict_sys_t::create(): Renamed from dict_init(). dict_sys_t::close(): Renamed from dict_close(). dict_sys_t::add(): Sliced from dict_table_t::add_to_cache(). dict_sys_t::remove(): Renamed from dict_table_remove_from_cache(). dict_sys_t::prevent_eviction(): Renamed from dict_table_move_from_lru_to_non_lru(). dict_sys_t::acquire(): Replaces dict_move_to_mru() and some more logic. dict_sys_t::resize(): Renamed from dict_resize(). dict_sys_t::find(): Replaces dict_lru_find_table() and dict_non_lru_find_table().	2019-05-17 14:32:53 +03:00
Marko Mäkelä	be85d3e61b	Merge 10.2 into 10.3	2019-05-14 17:18:46 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	c0ac0b8860	Update FSF address	2019-05-11 19:25:02 +03:00
Marko Mäkelä	e6bdf77e4b	Merge 10.3 into 10.4 In is_eits_usable(), we disable an assertion that fails due to MDEV-19334.	2019-04-25 16:05:20 +03:00
Marko Mäkelä	acf6f92aa9	Merge 10.2 into 10.3	2019-04-25 09:05:52 +03:00
Marko Mäkelä	d315b4ff39	Remove IBUF_COUNT_DEBUG The compile-time option IBUF_COUNT_DEBUG has not been used for years. It would only work with up to 3 created .ibd files, with no buffered changes existing while InnoDB is started up.	2019-04-19 12:44:46 +03:00
Marko Mäkelä	e7029e864f	Merge 10.3 into 10.4	2019-04-17 15:59:30 +03:00
Marko Mäkelä	250799f961	Merge 10.2 into 10.3	2019-04-17 15:26:17 +03:00
Marko Mäkelä	169c00994b	MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit `aa3f7a107c`. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).	2019-04-17 13:58:41 +03:00
Marko Mäkelä	edd1a53a55	Merge 10.3 into 10.4	2019-04-08 22:00:07 +03:00
Marko Mäkelä	937ec3c48d	MDEV-19212: After-merge fix for sizeof(ulong)!=sizeof(ulint)	2019-04-08 21:54:38 +03:00
Marko Mäkelä	9ba0865b87	Merge 10.2 into 10.3	2019-04-08 21:38:13 +03:00
Marko Mäkelä	f120a15b93	MDEV-19212 4GB Limit on large_pages - integer overflow os_mem_alloc_large(): Invoke the macro ut_2pow_round() with the correct argument type. innobase_large_page_size, innobase_use_large_pages, os_use_large_pages, os_large_page_size: Remove. Simply refer to opt_large_page_size, my_use_large_pages.	2019-04-08 21:33:49 +03:00
Marko Mäkelä	d8303c3ee7	Merge 10.3 into 10.4	2019-04-08 08:22:34 +03:00
Marko Mäkelä	cc492bfd4f	Merge 10.2 into 10.3	2019-04-07 11:49:50 +03:00
Marko Mäkelä	1d30b7b1d2	MDEV-12699 preparation: Clean up recv_sys The recv_sys data structures are accessed not only from the thread that executes InnoDB plugin initialization, but also from the InnoDB I/O threads, which can invoke recv_recover_page(). Assert that sufficient concurrency control is in place. Some code was accessing recv_sys data structures without holding recv_sys->mutex. recv_recover_page(bpage): Refactor the call from buf_page_io_complete() into a separate function that performs necessary steps. The main thread was unnecessarily releasing and reacquiring recv_sys->mutex. recv_recover_page(block,mtr,recv_addr): Pass more parameters from the caller. Avoid redundant lookups and computations. Eliminate some redundant variables. recv_get_fil_addr_struct(): Assert that recv_sys->mutex is being held. That was not always the case! recv_scan_log_recs(): Acquire recv_sys->mutex for the whole duration of the function. (While we are scanning and buffering redo log records, no pages can be read in.) recv_read_in_area(): Properly protect access with recv_sys->mutex. recv_apply_hashed_log_recs(): Check recv_addr->state only once, and continuously hold recv_sys->mutex. The mutex will be released and reacquired inside recv_recover_page() and recv_read_in_area(), allowing concurrent processing by buf_page_io_complete() in I/O threads.	2019-04-06 21:25:43 +03:00
Marko Mäkelä	1b95118c5f	buf_page_get_gen(): Allow BUF_GET_IF_IN_POOL with a dummy page_size The page_size argument to buf_page_get_gen() only matters when the page is going to be loaded into the buffer pool. Allow callers to pass a dummy parameter when using BUF_GET_IF_IN_POOL (which would return NULL if the block is not in the buffer pool).	2019-04-06 21:25:43 +03:00
Marko Mäkelä	02d9b048a2	Merge 10.3 into 10.4	2019-04-05 11:41:03 +03:00
Marko Mäkelä	d5a2bc6a0f	Merge 10.2 into 10.3	2019-04-04 19:41:12 +03:00
Marko Mäkelä	cad56fbaba	MDEV-18733 MariaDB slow start after crash recovery If InnoDB crash recovery was needed, the InnoDB function srv_start() would invoke extra validation, reading something from every InnoDB data file. This should be unnecessary now that MDEV-14717 made RENAME operations crash-safe inside InnoDB (which can be disabled in MariaDB 10.2 by setting innodb_safe_truncate=OFF). dict_check_sys_tables(): Skip tables that would be dropped by row_mysql_drop_garbage_tables(). Perform extra validation only if innodb_safe_truncate=OFF, innodb_force_recovery=0 and crash recovery was needed. dict_load_table_one(): Validate the root page of the table. In this way, we can deny access to corrupted or mismatching tables not only after crash recovery, but also after a clean shutdown.	2019-04-03 19:56:03 +03:00
Marko Mäkelä	0bc4260226	Merge 10.3 into 10.4	2019-03-26 17:43:59 +02:00
Marko Mäkelä	ffc69dbd05	Merge 10.2 into 10.3	2019-03-26 15:03:37 +02:00
Marko Mäkelä	226ca250ed	Merge 10.1 into 10.2	2019-03-26 14:17:19 +02:00
Marko Mäkelä	065ba53ccb	MDEV-12711 mariabackup --backup is refused for multi-file system tablespace Before MDEV-12113 (MariaDB Server 10.1.25), on shutdown InnoDB would write the current LSN to the first page of each file of the system tablespace. This is incompatible with MariaDB's InnoDB table encryption, because encryption repurposed the field for an encryption key ID and checksum. buf_page_is_corrupted(): For the InnoDB system tablespace, skip FIL_PAGE_FILE_FLUSH_LSN when checking if a page is all zero, because the first page of each file in the system tablespace can contain nonzero bytes in the field.	2019-03-26 13:51:15 +02:00
Marko Mäkelä	1dffa9d9c1	MDEV-17441: Rename buf_pool_t::io_buf from buf_pool->tmp_arr Make buf_pool_t::io_buf_t() a more proper class. buf_pool_t::io_buf_t::io_buf_t(): Silence the GCC 8 -Wclass-memaccess warning for initializing slots[], which no longer is a plain old datatype (POD) due to the std::atomic member.	2019-03-21 12:14:36 +02:00
Marko Mäkelä	514b305dfb	Merge 10.3 into 10.4 The MDEV-17262 commit `26432e49d3` was skipped. In Galera 4, the implementation would seem to require changes to the streaming replication. In the tests archive.rnd_pos main.profiling, disable_ps_protocol for SHOW STATUS and SHOW PROFILE commands until MDEV-18974 has been fixed.	2019-03-20 10:41:32 +02:00
Daniel Black	de51acd037	MDEV-18726: innodb buffer pool size not consistent with large pages Rather than add a small extra amount on the size of chunks, keep it of the specified size. The rest of the chunk initialization code adapts to this small size reduction. This has been made in the general case, not just large pages, to keep it simple. The chunks size is controlled by innodb-buffer-pool-chunk-size. In the code increasing this by a descriptor table size length makes it difficult with large pages. With innodb-buffer-pool-chunk-size set to 2M the code before this commit would of added a small amount extra to this value when it tried to allocate this. While not normally a problem it is with large pages, it now requires addition space, a whole extra large page. With a number of pools, or with 1G or 16G large pages this is quite significant. By removing this additional amount, DBAs can set innodb-buffer-pool-chunk size to the large page size, or a multiple of it, and actually get that amount allocated. Previously they had to fudge a value less. The innodb.test results show how this is fudged over a number of tests. With this change the values are just between 488 and 500 depending on architecture and build options. Tested with --large-pages --innodb-buffer-pool-size=256M --innodb-buffer-pool-chunk-size=2M on x86_64 with 2M default large page size. Breaking before buf_pool init, one large page was allocated in MyISAM, by the end of the function 128 huge pages where allocated as expected. A further 16 pages where allocated for a 32M log buffer and during startup 1 page was allocated briefly to the redo log.	2019-03-18 21:49:53 +02:00
Marko Mäkelä	6b6fa3cdb1	MDEV-18644: Support full_crc32 for page_compressed This is a follow-up task to MDEV-12026, which introduced innodb_checksum_algorithm=full_crc32 and a simpler page format. MDEV-12026 did not enable full_crc32 for page_compressed tables, which we will be doing now. This is joint work with Thirunarayanan Balathandayuthapani. For innodb_checksum_algorithm=full_crc32 we change the page_compressed format as follows: FIL_PAGE_TYPE: The most significant bit will be set to indicate page_compressed format. The least significant bits will contain the compressed page size, rounded up to a multiple of 256 bytes. The checksum will be stored in the last 4 bytes of the page (whether it is the full page or a page_compressed page whose size is determined by FIL_PAGE_TYPE), covering all preceding bytes of the page. If encryption is used, then the page will be encrypted between compression and computing the checksum. For page_compressed, FIL_PAGE_LSN will not be repeated at the end of the page. FSP_SPACE_FLAGS (already implemented as part of MDEV-12026): We will store the innodb_compression_algorithm that may be used to compress pages. Previously, the choice of algorithm was written to each compressed data page separately, and one would be unable to know in advance which compression algorithm(s) are used. fil_space_t::full_crc32_page_compressed_len(): Determine if the page_compressed algorithm of the tablespace needs to know the exact length of the compressed data. If yes, we will reserve and write an extra byte for this right before the checksum. buf_page_is_compressed(): Determine if a page uses page_compressed (in any innodb_checksum_algorithm). fil_page_decompress(): Pass also fil_space_t::flags so that the format can be determined. buf_page_is_zeroes(): Check if a page is full of zero bytes. buf_page_full_crc32_is_corrupted(): Renamed from buf_encrypted_full_crc32_page_is_corrupted(). For full_crc32, we always simply validate the checksum to the page contents, while the physical page size is explicitly specified by an unencrypted part of the page header. buf_page_full_crc32_size(): Determine the size of a full_crc32 page. buf_dblwr_check_page_lsn(): Make this a debug-only function, because it involves potentially costly lookups of fil_space_t. create_table_info_t::check_table_options(), ha_innobase::check_if_supported_inplace_alter(): Do allow the creation of SPATIAL INDEX with full_crc32 also when page_compressed is used. commit_cache_norebuild(): Preserve the compression algorithm when updating the page_compression_level. dict_tf_to_fsp_flags(): Set the flags for page compression algorithm. FIXME: Maybe there should be a table option page_compression_algorithm and a session variable to back it?	2019-03-18 14:08:43 +02:00
Daniel Black	a9056a2b89	MDEV-18946: innodb: {de\|}allocate_large_{dodump\|dontdump} added In 1dc78d35a0beb9620bae1f4841cc07389b425707 the arguments to a deallocate_large(dontdump=true) was passed a wrong value. To avoid accidential calling large memory function that have DODUMP/DONTDUMP options and missing arguments, the functions have been given distinct names.	2019-03-16 11:04:19 +11:00
Daniel Black	8678a1052d	MDEV-18946: innodb: buffer_pool - unallocate large pages requires size MDEV-10814 introduce a bug where the size argument to deallocate_large was passed true, evaluating to 1, as the size. When this is passed to munmap this resulted in EINVAL and the page not being released. This only occured the buf_pool_free_instance when called on shutdown so no impact as the process termination correctly frees the memory.	2019-03-16 11:03:32 +11:00
Marko Mäkelä	2a791c53ad	Merge 10.3 into 10.4	2019-03-06 09:00:52 +02:00
Marko Mäkelä	a2fc36989e	Merge 10.2 into 10.3	2019-03-04 17:01:00 +02:00
Marko Mäkelä	15a20367fb	buf_page_get_zip(): Deduplicate some code	2019-02-23 10:31:07 +02:00
Marko Mäkelä	2c8d9a4e59	MDEV-10813: Update buf_page_t::buf_fix_count outside mutex Since MySQL 5.6.16 (and MariaDB Server 10.0.11), changes of buf_page_t::buf_fix_count are atomic memory operations if PAGE_ATOMIC_REF_COUNT is defined. Since MySQL 5.7 (and MariaDB Server 10.2.2), the field is always updated by atomic memory operations. In a few occurrences, updates of the counter were unnecessarily surrounded by an acquisition and release of the block mutex (buf_block_t::mutex or buf_pool_t::zip_mutex). Remove these unnecessary mutex operations.	2019-02-22 22:56:22 +02:00
Thirunarayanan Balathandayuthapani	c0f47a4a58	MDEV-12026: Implement innodb_checksum_algorithm=full_crc32 MariaDB data-at-rest encryption (innodb_encrypt_tables) had repurposed the same unused data field that was repurposed in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN) field of SPATIAL INDEX. Because of this, MariaDB was unable to support encryption on SPATIAL INDEX pages. Furthermore, InnoDB page checksums skipped some bytes, and there are multiple variations and checksum algorithms. By default, InnoDB accepts all variations of all algorithms that ever existed. This unnecessarily weakens the page checksums. We hereby introduce two more innodb_checksum_algorithm variants (full_crc32, strict_full_crc32) that are special in a way: When either setting is active, newly created data files will carry a flag (fil_space_t::full_crc32()) that indicates that all pages of the file will use a full CRC-32C checksum over the entire page contents (excluding the bytes where the checksum is stored, at the very end of the page). Such files will always use that checksum, no matter what the parameter innodb_checksum_algorithm is assigned to. For old files, the old checksum algorithms will continue to be used. The value strict_full_crc32 will be equivalent to strict_crc32 and the value full_crc32 will be equivalent to crc32. ROW_FORMAT=COMPRESSED tables will only use the old format. These tables do not support new features, such as larger innodb_page_size or instant ADD/DROP COLUMN. They may be deprecated in the future. We do not want an unnecessary file format change for them. The new full_crc32() format also cleans up the MariaDB tablespace flags. We will reserve flags to store the page_compressed compression algorithm, and to store the compressed payload length, so that checksum can be computed over the compressed (and possibly encrypted) stream and can be validated without decrypting or decompressing the page. In the full_crc32 format, there no longer are separate before-encryption and after-encryption checksums for pages. The single checksum is computed on the page contents that is written to the file. We do not make the new algorithm the default for two reasons. First, MariaDB 10.4.2 was a beta release, and the default values of parameters should not change after beta. Second, we did not yet implement the full_crc32 format for page_compressed pages. This will be fixed in MDEV-18644. This is joint work with Marko Mäkelä.	2019-02-19 18:50:19 +02:00
Marko Mäkelä	f4f8dd69aa	MDEV-18493: Correct a bogus assertion	2019-02-07 16:25:18 +02:00

... 3 4 5 6 7 ...

509 commits