mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-29 02:05:57 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	18dbeae1b8	MDEV-35849: index records in a wrong order page_cur_dtuple_cmp(): Take DESC index fields into account also when comparing a NULL value in a ROW_FORMAT≠REDUNDANT record. Thanks to Elena Stepanova for finding this bug.	2025-01-15 07:31:33 +02:00
Marko Mäkelä	15700f54c2	Merge 11.4 into 11.7	2025-01-09 09:41:38 +02:00
Marko Mäkelä	17f01186f5	Merge 10.11 into 11.4	2025-01-09 07:58:08 +02:00
Marko Mäkelä	420d9eb27f	Merge 10.6 into 10.11	2025-01-08 12:51:26 +02:00
Monty	88d9348dfc	Remove dates from all rdiff files	2025-01-05 16:40:11 +02:00
Thirunarayanan Balathandayuthapani	48b724047e	MDEV-34119 Assertion `page_dir_get_n_heap(new_page) == 2U' failed in dberr_t PageBulk::init() Problem: ======= - insert..select statement on partition table fails to use bulk insert for the transaction. Solution: ======== - Enable the bulk insert operation for insert..select statement for partition table.	2025-01-02 17:34:24 +05:30
Monty	7fcaab7aaa	MDEV-20912 Add support for utf8mb4_0900_* collations in MariaDB Server This is done by mapping most of the existing MySQL unicode 0900 collations to MariadB 1400 unicode collations. The assumption is that 1400 is a super set of 0900 for all practical purposes. I also added a new function 'compare_collations()' and changed most code to use this instead of comparing character sets directly. This enables one to seamlessly mix-and-match the corresponding 0900 and 1400 sets. Field comparision and alter table treats the character sets as identical. All MySQL 8.0 0900 collations are supported except: - utf8mb4_ja_0900_as_cs - utf8mb4_ja_0900_as_cs_ks - utf8mb4_ru_0900_as_cs - utf8mb4_zh_0900_as_cs These do not have corresponding entries in the MariadB 01400 collations. Other things: - Added COMMENT colum to information_schema.collations. For utf8mb4_0900 colletions it contains the corresponding alias collation.	2024-12-28 10:23:49 +02:00
Monty	ed5bba8a32	Fixed failing test case innodb.log_file_size_online	2024-12-27 16:14:51 +02:00
Marko Mäkelä	a54d151fc1	Merge 10.6 into 10.11	2024-12-19 15:38:53 +02:00
Marko Mäkelä	ddd7d5d8e3	MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure Under unknown circumstances, the SQL layer may wrongly disregard an invocation of thd_mark_transaction_to_rollback() when an InnoDB transaction had been aborted (rolled back) due to one of the following errors: * HA_ERR_LOCK_DEADLOCK * HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON) * HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON) Such an error used to cause a crash of InnoDB during transaction commit. These changes aim to catch and report the error earlier, so that not only this crash can be avoided but also the original root cause be found and fixed more easily later. The idea of this fix is from Michael 'Monty' Widenius. HA_ERR_ROLLBACK: A new error code that will be translated into ER_ROLLBACK_ONLY, signalling that the current transaction has been aborted and the only allowed action is ROLLBACK. trx_t::state: Add TRX_STATE_ABORTED that is like TRX_STATE_NOT_STARTED, but noting that the transaction had been rolled back and aborted. trx_t::is_started(): Replaces trx_is_started(). ha_innobase: Check the transaction state in various places. Simplify the logic around SAVEPOINT. ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only(). The InnoDB logic around transaction savepoints, commit, and rollback was unnecessarily complex and might have contributed to this inconsistency. So, we are simplifying that logic as well. trx_savept_t: Replace with const undo_no_t*. When we rollback to a savepoint, all we need to know is the number of undo log records that must survive. trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t directly in the space allocated at innobase_hton->savepoint_offset. fts_trx_create(): Do not copy previous savepoints. fts_savepoint_rollback(): If a savepoint was not found, roll back everything after the default savepoint of fts_trx_create(). The test innodb_fts.savepoint is extended to cover this code. Reviewed by: Vladislav Lesin Tested by: Matthias Leich	2024-12-12 18:02:00 +02:00
Thirunarayanan Balathandayuthapani	b9e592a786	MDEV-35475 Assertion `!rec_offs_nth_extern(offsets1, n)' failed in cmp_rec_rec_simple_field Problem: ======= InnoDB wrongly stores the primary key field in externally stored off page during bulk insert operation. This leads to assert failure. Solution: ======== row_merge_buf_blob(): Should store the primary key fields inline. Store the variable length field data externally based on the row format of the table. row_merge_buf_write(): check whether the record size exceeds the maximum record size. row_merge_copy_blob_from_file(): Construct the tuple based on the variable length field	2024-12-09 20:27:12 +05:30
Sergei Golubchik	56e4b01b7b	MDEV-35419 Server crashes when a adding column to the table which has a primary key using hash correct the assert. followup for `062f8eb37d`	2024-12-06 20:28:45 +01:00
Marko Mäkelä	33907f9ec6	Merge 11.4 into 11.7	2024-12-02 17:51:17 +02:00
Marko Mäkelä	2719cc4925	Merge 10.11 into 11.4	2024-12-02 11:35:34 +02:00
Marko Mäkelä	3d23adb766	Merge 10.6 into 10.11	2024-11-29 13:43:17 +02:00
Marko Mäkelä	19acb0257e	MDEV-35508 Race condition between purge and secondary index INSERT or UPDATE row_purge_remove_sec_if_poss_leaf(): If there is an active transaction that is not newer than PAGE_MAX_TRX_ID, return the bogus value 1 so that row_purge_remove_sec_if_poss_tree() is guaranteed to recheck if the record needs to be purged. It could be the case that an active transaction would insert this record between the time this check completed and row_purge_remove_sec_if_poss_tree() acquired a latch on the secondary index leaf page again. row_purge_del_mark_error(), row_purge_check(): Some unlikely code refactored into separate non-inline functions. trx_sys_t::find_same_or_older_low(): Move the unlikely and bulky part of trx_sys_t::find_same_or_older() to a non-inline function. trx_sys_t::find_same_or_older_in_purge(): A variant of trx_sys_t::find_same_or_older() for use in the purge subsystem, with potential concurrent access of the same trx_t object from multiple threads. trx_t::max_inactive_id_atomic: An Atomic_relaxed alias of the regular data field trx_t::max_inactive_id, which we use on systems that have native 64-bit loads or stores. On any 64-bit system that seems to be supported by GCC, Clang or MSVC, relaxed atomic loads and stores use the regular load and store instructions. On -march=i686 the 64-bit atomic loads and stores would use an XMM register. This fixes a regression that had been introduced in commit `b7b9f3ce82` (MDEV-34515). There would be messages [ERROR] InnoDB: tried to purge non-delete-marked record in index in the server error log, and an assertion ut_ad(0) would cause a crash of debug instrumented builds. This could also cause incorrect results for MVCC reads and corrupted secondary indexes. The debug instrumented test case was written by Debarun Banerjee. Reviewed by: Debarun Banerjee	2024-11-29 10:44:38 +02:00
Thirunarayanan Balathandayuthapani	9ba18d1aa0	MDEV-35394 Innochecksum misinterprets freed pages - Innochecksum misinterprets the freed pages as active one. This leads the user to think there are too many valid pages exist. - To avoid this confusion, innochecksum introduced one more option --skip-freed-pages and -r to avoid the freed pages while dumping or printing the summary of the tablespace. - Innochecksum can safely assume the page is freed if the respective extent doesn't belong to a segment and marked as freed in XDES_BITMAP in extent descriptor page. - Innochecksum shouldn't assume that zero-filled page as extent descriptor page. Reviewed-by: Marko Mäkelä	2024-11-27 13:00:51 +05:30
Thirunarayanan Balathandayuthapani	866a8ea673	MDEV-35398 Improve shrinking of system tablespace Problem: ======== From 10.6.13(`86767bcc0f`) version, InnoDB purge thread does free the TRX_UNDO_CACHED undo segment. Pre-10.6.13 version data directory can contain TRX_UNDO_CACHED undo segment in system tablespace even though it has external undo tablespace. During slow shutdown, InnoDB collects the segment from tables that exist in system tablespace and cached undo segment in the system tablespace as used segment exist in system tablespace. While shrinking the system tablespace, last used extent can be used by undo cached segment. This extent blocks the shrinking of system tablespace in a effective way. Solution: ======== While freeing the unused segment, InnoDB should free the cached undo segment header page exists in system tablespace and reset the TRX_RSEG_UNDO_SLOTS to 0xff for the rollback segment header page exists in system tablespace. This could improve the shrinking of system tablespace further.	2024-11-19 20:13:24 +05:30
Thirunarayanan Balathandayuthapani	8e1cf078a0	MDEV-35363 Avoid cloning of table statistics while saving the InnoDB table stats - Remove the test case added as a part of the commit `98d57719e2` (MDEV-32667)	2024-11-14 15:32:55 +05:30
Marko Mäkelä	c72221e2f8	MDEV-35411 innodb.log_file_size_online occasionally fails The test innodb.log_file_size_online was killing the server shortly after the completion of SET GLOBAL innodb_log_file_size. It is possible that log_t::write_checkpoint() would let mysqltest.cc proceed to kill the server before the message had been written to the server error log. Let us remove the occasionally failing check of that message, and instead ensure that SET GLOBAL innodb_log_file_size to a different size will result in an error log message when the server is not being killed.	2024-11-13 12:04:05 +02:00
Oleksandr Byelkin	b12ff287ec	Merge branch '11.6' into 11.7	2024-11-10 19:22:21 +01:00
Thirunarayanan Balathandayuthapani	074831ec61	Merge branch 10.5 into 10.6	2024-11-08 18:17:15 +05:30
Thirunarayanan Balathandayuthapani	7afee25b08	MDEV-35115 Inconsistent Replace behaviour when multiple unique index exist - Replace statement fails with duplicate key error when multiple unique key conflict happens. Reason is that Server expects the InnoDB engine to store the confliciting keys in ascending order. But the InnoDB doesn't store the conflicting keys in ascending order. Fix: === - Enable HA_DUPLICATE_KEY_NOT_IN_ORDER for InnoDB storage engine only when unique index order is different in .frm and innodb dictionary.	2024-11-08 16:46:41 +05:30
Marko Mäkelä	ba4541ba7f	MDEV-29015/MDEV-29260/MDEV-34938 test fixup The merge `f00711bba2` included a change of the test innodb.log_file_name, which would try to ensure that in the presence of the code fix `decdd4bf49` we would get an error on Linux when invoking lseek() on a directory. It turns out that this is not the case in at least one Linux based cloud environment.	2024-11-08 09:55:47 +02:00
Oleksandr Byelkin	9e1fb104a3	MariaDB 11.4.4 release -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEF39AEP5WyjM2MAMF8WVvJMdM0dgFAmck77AACgkQ8WVvJMdM 0dgccQ/+Lls8fWt4D+gMPP7x+drJSO/IE/gZFt3ugbWF+/p3B2xXAs5AAE83wxEh QSbp4DCkb/9PnuakhLmzg0lFbxMUlh4rsJ1YyiuLB2J+YgKbAc36eQQf+rtYSipd DT5uRk36c9wOcOXo/mMv4APEvpPXBIBdIL4VvpKFbIOE7xT24Sp767zWXdXqrB1f JgOQdM2ct+bvSPC55oZ5p1kqyxwvd6K6+3RB3CIpwW9zrVSLg7enT3maLjj/761s jvlRae+Cv+r+Hit9XpmEH6n2FYVgIJ3o3WhdAHwN0kxKabXYTg7OCB7QxDZiUHI9 C/5goKmKaPB1PCQyuTQyLSyyK9a8nPfgn6tqw/p/ZKDQhKT9sWJv/5bSWecrVndx LLYifSTrFC/eXLzgPvCnNv/U8SjsZaAdMIKS681+qDJ0P5abghUIlGnMYTjYXuX1 1B6Vrr0bdrQ3V1CLB3tpkRjpUvicrsabtuAUAP65QnEG2G9UJXklOer+DE291Gsl f1I0o6C1zVGAOkUUD3QEYaHD8w7hlvyfKme5oXKUm3DOjaAar5UUKLdr6prxRZL4 ebhmGEy42Mf8fBYoeohIxmxgvv6h2Xd9xCukgPp8hFpqJGw8abg7JNZTTKH4h2IY J51RpD10h4eoi6WRn3opEcjexTGvZ+xNR7yYO5WxWw6VIre9IUA= =s+WW -----END PGP SIGNATURE----- Merge tag '11.4' into 11.6 MariaDB 11.4.4 release	2024-11-08 07:17:00 +01:00
Thirunarayanan Balathandayuthapani	98d57719e2	MDEV-32667 dict_stats_save_index_stat() reads uninitialized index->stats_error_printed Problem: ======== - dict_stats_table_clone_create() does not initialize the flag stats_error_printed in either dict_table_t or dict_index_t. Because dict_stats_save_index_stat() is operating on a copy of a dict_index_t object, it appears that dict_index_t::stats_error_printed will always be false for actual metadata objects, and uninitialized in dict_stats_save_index_stat(). Solution: ========= dict_stats_table_clone_create(): Assign stats_error_printed for table and index while copying the statistics	2024-11-08 11:35:19 +05:30
Sergei Golubchik	7feec30939	relax the XA recovery error it's just a suggestion anyway, not a bullet-proof check, let's not act as if it is	2024-11-05 14:00:52 -08:00
Sergei Golubchik	126d6d787c	cleanup: handlerton remove unused methods, reorder methods, add comments	2024-11-05 14:00:50 -08:00
Sergei Golubchik	062f8eb37d	cleanup: key algorithm vs key flags the information about index algorithm was stored in two places inconsistently split between both. BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not. RTREE index had key->algorithm == HA_KEY_ALG_RTREE and always had key->flags & HA_SPATIAL FULLTEXT index had key->algorithm == HA_KEY_ALG_FULLTEXT and always had key->flags & HA_FULLTEXT HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH In this commit: All indexes except BTREE and HASH always have key->algorithm set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except for storage to keep frms backward compatible). As a side effect ALTER TABLE now detects FULLTEXT index renames correctly	2024-11-05 14:00:47 -08:00
Oleksandr Byelkin	c770bce898	Merge branch '11.2' into 11.4	2024-10-30 15:11:17 +01:00
Oleksandr Byelkin	69d033d165	Merge branch '10.11' into 11.2	2024-10-29 16:42:46 +01:00
Oleksandr Byelkin	3d0fb15028	Merge branch '10.6' into 10.11	2024-10-29 15:24:38 +01:00
Oleksandr Byelkin	f00711bba2	Merge branch '10.5' into 10.6	2024-10-29 14:20:03 +01:00
Thirunarayanan Balathandayuthapani	db3be9b434	MDEV-35237 Bulk insert fails to apply buffered operation during CREATE..SELECT statement Problem: ======= - InnoDB fails to write the buffered insert operation during create..select operation. This happens when bulk_insert in transaction is reset to false while unlocking a source table. Fix: === - InnoDB should apply the previous buffered changes to all tables if we encounter any statement other than pure INSERT or INSERT..SELECT statement in ha_innobase::external_lock() and start_stmt(). - Remove the function bulk_insert_apply_for_table() start_stmt(), external_lock(): Assert that trx->duplicates should be enabled during bulk insert operation	2024-10-29 15:03:23 +05:30
Sergei Golubchik	3cd706b107	MDEV-35236 Assertion `(mem_root->flags & 4) == 0' failed in safe_lexcstrdup_root Post-fix for MDEV-35144. Cannot allocate options values on the statement arena, because HA_CREATE_INFO is shallow-copied for every execution, so if the option_list was initially empty, it will be reset for every execution and any values allocated on the statement arena will be lost. Cannot allocate option values on the execution arena, because HA_CREATE_INFO is shallow-copied for every execution, so if the option_list was initially NOT empty, any values appended to the end will be preserved and if they're on the execution arena their content will be destroyed. Let's use thd->change_item_tree() to save and restore necessary pointers for every execution. followup for `3da565c41d`	2024-10-23 14:58:57 +02:00
Vlad Lesin	8c7786e7d5	MDEV-34690 lock_rec_unlock_unmodified() causes deadlock lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock() or under a combination of lock_sys.rd_lock() + record locks hash table cell latch. It also requests page latch to check if locked records were changed by the current transaction or not. Usually InnoDB requests page latch to find the certain record on the page, and then requests lock_sys and/or record lock hash cell latch to request record lock. lock_rec_unlock_unmodified() requests the latches in the opposite order, what causes deadlocks. One of the possible scenario for the deadlock is the following: thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table cell latch, the latch is acquired; thread 2 - purge thread acquires page latch and tries to remove delete-marked record, it invokes lock_update_delete(), which requests locks hash table cell latch, held by thread 1; thread 1 - requests page latch, held by thread 2. To fix it we need to release lock_sys.latch and/or lock hash cell latch, acquire page latch and re-acquire lock_sys related latches. When lock_sys.latch and/or lock hash cell latch are released in lock_release_on_prepare() and lock_release_on_prepare_try(), the page on which the current lock is held, can be merged. In this case the bitmap of the current lock must be cleared, and the new lock must be added to the end of trx->lock.trx_locks list, or bitmap of already existing lock must be changed. The new field trx_lock_t::set_nth_bit_calls indicates if new locks (bits in existing lock bitmaps or new lock objects) were created during the period when lock_sys was released in trx->lock.trx_locks list iteration loop in lock_release_on_prepare() or lock_release_on_prepare_try(). And, if so, we traverse the list again. The block can be freed during pages merging, what causes assertion failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page get mode to it. That's why page_get_mode parameter was added to btr_block_get() to pass BUF_GET_POSSIBLY_FREED from lock_release_on_prepare() and lock_release_on_prepare_try() to buf_page_get_gen(). As searching for id of trx, which modified secondary index record, is quite expensive operation, restrict its usage for master. System variable was added to remove the restriction for testing simplifying. The variable exists only either for debug build or for build with -DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the probability of catching bugs for release build with RQG. Note that the code, which does primary index lookup to find out what transaction modified secondary index record, is necessary only when there is no primary key and no unique secondary key on replica with row based replication, because only in this case extra X locks on unmodified records can be set during scan phase. Reviewed by Marko Mäkelä.	2024-10-23 12:36:17 +03:00
Vlad Lesin	92180ad513	MDEV-34466 XA prepare don't release unmodified records for some cases There is no need to exclude exclusive non-gap locks from the procedure of locks releasing on XA PREPARE execution in lock_release_on_prepare_try() after commit `17e59ed3aa` (MDEV-33454), because lock_rec_unlock_unmodified() should check if the record was modified with the XA, and release the lock if it was not. lock_release_on_prepare_try(): don't skip X-locks, let lock_rec_unlock_unmodified() to process them. lock_sec_rec_some_has_impl(): add template parameter for not acquiring trx_t::mutex for the case if a caller already holds the mutex, don't crash if lock's bitmap is clean. row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): add new argument to skip trx_t::mutex acquiring. rw_trx_hash_t::validate_element(): don't acquire trx_t::mutex if the current thread already holds it. Thanks to Andrei Elkin for finding the bug. Reviewed by Marko Mäkelä, Debarun Banerjee.	2024-10-23 12:36:17 +03:00
Marko Mäkelä	1cad1dbde6	MDEV-35235 innodb_snapshot_isolation=ON fails to signal transaction rollback convert_error_code_to_mysql(): Treat DB_DEADLOCK and DB_RECORD_CHANGED in the same way, that is, signal to the SQL layer that the transaction had been rolled back.	2024-10-23 07:55:22 +03:00
Kristian Nielsen	abc46259c6	MDEV-34753 memory pressure - erroneous termination condition Fix race condition in test case by waiting for the expected state to occur. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-10-19 17:20:27 +11:00
Daniel Black	eb29190398	MDEV-34753 memory pressure - erroneous termination condition The 'if (!m_abort) break' condition was inverted by accident. Constrain the test case to environments where there is cgroupv2 runtime environment which is the same case that will pass a memory pressure initialization. Remove the explicit garbage_collection trigger as it hides the abnormal termination error on the event loop for memory pressure. This also means there is no support in non-cgroupv2 environments (possibly some container environments). As the trigger to memory pressure is via a different thread we need to wait until a "[mM]emory pressure" log message is there to know it has succeeded or failed. Thanks Kristian Nielsen for noticing and review.	2024-10-19 17:20:27 +11:00
Marko Mäkelä	ebefef658e	Merge 10.11 into 11.2	2024-10-18 11:32:22 +03:00
Marko Mäkelä	eca552a1a4	MDEV-34830: LSN in the future is not being treated as serious corruption The invariant of write-ahead logging is that before any change to a page is written to the data file, the corresponding log record must must first have been durably written. In crash recovery, there were some sloppy checks for this. Let us implement accurate checks and flag an inconsistency as a hard error, so that we can avoid further corruption of a corrupted database. For data extraction from the corrupted database, innodb_force_recovery can be used. Before recovery is reading any data pages or invoking buf_dblwr_t::recover() to recover torn pages from the doublewrite buffer, InnoDB will have parsed the log until the final LSN and updated log_sys.lsn to that. So, we can rely on log_sys.lsn at all times. The doublewrite buffer recovery has been refactored in such a way that the recv_sys.dblwr.pages may be consulted while discovering files and their page sizes, but nothing will be written back to data files before buf_dblwr_t::recover() is invoked. recv_max_page_lsn, recv_lsn_checks_on: Remove. recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging condition at the end of the recovery. recv_dblwr_t::validate_page(): Keep track of the maximum LSN (if we are checking a non-doublewrite copy of a page) but do not complain LSN being in the future. The doublewrite buffer is a special case, because it will be read early during recovery. Besides, starting with commit `762bcb81b5` the dblwr=true copies of pages may legitimately be "too new". recv_dblwr_t::find_page(): Find a valid page with the smallest FIL_PAGE_LSN that is in the valid range for recovery. recv_dblwr_t::restore_first_page(): Replaced by find_page(). Only buf_dblwr_t::recover() will write to data files. buf_dblwr_t::recover(): Simplify the message output. Do attempt doublewrite recovery on user page read error. Ignore doublewrite pages whose FIL_PAGE_LSN is outside the usable bounds. Previously, we could wrongly recover a too new page from the doublewrite buffer. It is unlikely that this could have lead to an actual error. Write back all recovered pages from the doublewrite buffer here, including for the first page of any tablespace. buf_page_is_corrupted(): Distinguish the return values CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER. buf_page_check_corrupt(): Return the error code DB_CORRUPTION in case the LSN is in the future. Datafile::read_first_page_flags(): Split from read_first_page(). Take a copy of the first page as a parameter. recv_sys_t::free_corrupted_page(): Take the file as a parameter and return whether a message was displayed. This avoids some duplicated and incomplete error messages. buf_page_t::read_complete(): Remove some redundant output and always display the name of the corrupted file. Never return DB_FAIL; use it only in internal error handling. IORequest::read_complete(): Assume that buf_page_t::read_complete() will have reported any error. fil_space_t::set_corrupted(): Return whether this is the first time the tablespace had been flagged as corrupted. Datafile::validate_first_page(), fil_node_open_file_low(), fil_node_open_file(), fil_space_t::read_page0(), fil_node_t::read_page0(): Add a parameter for a copy of the first page, and a parameter to indicate whether the FIL_PAGE_LSN check should be suppressed. Before buf_dblwr_t::recover() is invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the FSP_SPACE_FLAGS and the tablespace ID that may be present in a potentially too new copy of a page. Reviewed by: Debarun Banerjee	2024-10-18 10:12:47 +03:00
Sergei Golubchik	3a1cf2c85b	MDEV-34679 ER_BAD_FIELD uses non-localizable substrings	2024-10-17 21:37:37 +02:00
Sergei Golubchik	3da565c41d	MDEV-35144 CREATE TABLE ... LIKE uses current innodb_compression_default instead of the create value When adding a column or index that uses plugin-defined sysvar-based options with CREATE ... LIKE the server was using the current value of the sysvar, not the default one. Because parse_option_list() function was used both in create and open and it tried to guess when it's create (need to use current sysvar value and add a new name=value pair to the list) or open (need to use default, without extending the list). Let's move the list extending functionality into a separate function and call it explicitly when needed. Operations that add new objects (CREATE, ALTER ... ADD) will extend the list, other operations (ALTER, CREATE ... LIKE, open) will not.	2024-10-17 16:28:39 +02:00
Marko Mäkelä	bb47e575de	MDEV-34830: LSN in the future is not being treated as serious corruption The invariant of write-ahead logging is that before any change to a page is written to the data file, the corresponding log record must must first have been durably written. On crash recovery, there were some sloppy checks for this. Let us implement accurate checks and flag an inconsistency as a hard error, so that we can avoid further corruption of a corrupted database. For data extraction from the corrupted database, innodb_force_recovery can be used. Before recovery is reading any data pages or invoking buf_dblwr_t::recover() to recover torn pages from the doublewrite buffer, InnoDB will have parsed the log until the final LSN and updated log_sys.lsn to that. So, we can rely on log_sys.lsn at all times. The doublewrite buffer recovery has been refactored in such a way that the recv_sys.dblwr.pages may be consulted while discovering files and their page sizes, but nothing will be written back to data files before buf_dblwr_t::recover() is invoked. A section of the test mariabackup.innodb_redo_overwrite that is parsing some mariadb-backup --backup output has been removed, because that output "redo log block is overwritten" would often be missing in a Microsoft Windows environment as a result of these changes. recv_max_page_lsn, recv_lsn_checks_on: Remove. recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging condition at the end of the recovery. recv_dblwr_t::validate_page(): Keep track of the maximum LSN (if we are checking a non-doublewrite copy of a page) but do not complain LSN being in the future. The doublewrite buffer is a special case, because it will be read early during recovery. Besides, starting with commit `762bcb81b5` the dblwr=true copies of pages may legitimately be "too new". recv_dblwr_t::find_page(): Find a valid page with the smallest FIL_PAGE_LSN that is in the valid range for recovery. recv_dblwr_t::restore_first_page(): Replaced by find_page(). Only buf_dblwr_t::recover() will write to data files. buf_dblwr_t::recover(): Simplify the message output. Do attempt doublewrite recovery on user page read error. Ignore doublewrite pages whose FIL_PAGE_LSN is outside the usable bounds. Previously, we could wrongly recover a too new page from the doublewrite buffer. It is unlikely that this could have lead to an actual error. Write back all recovered pages from the doublewrite buffer here, including for the first page of any tablespace. buf_page_is_corrupted(): Distinguish the return values CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER. buf_page_check_corrupt(): Return the error code DB_CORRUPTION in case the LSN is in the future. Datafile::read_first_page(): Handle FSP_SPACE_FLAGS=0xffffffff in the same way on both 32-bit and 64-bit architectures. Datafile::read_first_page_flags(): Split from read_first_page(). Take a copy of the first page as a parameter. recv_sys_t::free_corrupted_page(): Take the file as a parameter and return whether a message was displayed. This avoids some duplicated and incomplete error messages. buf_page_t::read_complete(): Remove some redundant output and always display the name of the corrupted file. Never return DB_FAIL; use it only in internal error handling. IORequest::read_complete(): Assume that buf_page_t::read_complete() will have reported any error. fil_space_t::set_corrupted(): Return whether this is the first time the tablespace had been flagged as corrupted. Datafile::validate_first_page(), fil_node_open_file_low(), fil_node_open_file(), fil_space_t::read_page0(), fil_node_t::read_page0(): Add a parameter for a copy of the first page, and a parameter to indicate whether the FIL_PAGE_LSN check should be suppressed. Before buf_dblwr_t::recover() is invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the FSP_SPACE_FLAGS and the tablespace ID that may be present in a potentially too new copy of a page. Reviewed by: Debarun Banerjee	2024-10-17 17:24:20 +03:00
Thirunarayanan Balathandayuthapani	4a1ded61a4	MDEV-34529 Shrink the system tablespace when system tablespace contains MDEV-30671 leaked undo pages - InnoDB fails to shrink the system tablespace when it contains the leaked undo log pages caused by MDEV-30671. - InnoDB does free the unused segment in system tablespace before shrinking the tablespace. InnoDB fails to free the unused segment if XA PREPARE transaction exist or if the previous shutdown was not with innodb_fast_shutdown=0 inode_info: Structure to store the inode page and offsets. fil_space_t::garbage_collect(): Frees the system tablespace unused segment fsp_get_sys_used_segment(): Iterates through all default file segment and index segment present in system tablespace. trx_sys_t::is_xa_exist(): Returns true if the XA transaction exist in the undo logs fseg_inode_free(): Frees the extents, fragment pages for the given index node and ignores any error similar to trx_purge_free_segment() trx_sys_t::reset_page(): Retain the TRX_SYS_FSEG_HEADER value in trx_sys page while resetting the page.	2024-10-16 21:34:24 +05:30
Vladislav Vaintroub	c1fc59277a	MDEV-34929 page-compressed tables do not work on Windows Remove workaround for MDEV-13941, it served for 5 years,and all affected pre-release 10.2 installation should have been already fixed in between. Apparently Innodb is using is_sparse parameter in os_file_set_size() inconsistently, and it passes is_sparse=false now during first file extension. With MDEV-13941 workaround in place, it would unsparse the file, which is makes compression not to work at all anymore.	2024-10-16 16:02:13 +02:00
Thirunarayanan Balathandayuthapani	6aaae4c03b	MDEV-35122 Incorrect NULL value handling for instantly dropped BLOB columns Problem: ======= - Redundant table fails to insert into the table after instant drop blob column. Instant drop column only marking the column as hidden and consecutive insert statement tries to insert NULL value for the dropped BLOB column and returns the fixed length of the blob type as 65535. This lead to row size too large error. Fix: ==== For redundant table, if the non-fixed dropped column can be null then set the length of the field type as 0.	2024-10-15 12:04:37 +05:30
Yuchen Pei	cd5577ba4a	Merge branch '10.5' into 10.6	2024-10-15 16:00:44 +11:00
Thirunarayanan Balathandayuthapani	5777d9f282	MDEV-35116 InnoDB fails to set error index for HA_ERR_NULL_IN_SPATIAL - InnoDB fails to set the index information or index number for the spatial index error HA_ERR_NULL_IN_SPATIAL. row_build_spatial_index_key(): Initialize the tmp_mbr array completely. check_if_supported_inplace_alter(): Fix the spelling mistake of alter	2024-10-14 14:28:24 +05:30

1 2 3 4 5 ...

4202 commits