Commit graph

4202 commits

Author SHA1 Message Date
Marko Mäkelä
18dbeae1b8 MDEV-35849: index records in a wrong order
page_cur_dtuple_cmp(): Take DESC index fields into account also when
comparing a NULL value in a ROW_FORMAT≠REDUNDANT record.

Thanks to Elena Stepanova for finding this bug.
2025-01-15 07:31:33 +02:00
Marko Mäkelä
15700f54c2 Merge 11.4 into 11.7 2025-01-09 09:41:38 +02:00
Marko Mäkelä
17f01186f5 Merge 10.11 into 11.4 2025-01-09 07:58:08 +02:00
Marko Mäkelä
420d9eb27f Merge 10.6 into 10.11 2025-01-08 12:51:26 +02:00
Monty
88d9348dfc Remove dates from all rdiff files 2025-01-05 16:40:11 +02:00
Thirunarayanan Balathandayuthapani
48b724047e MDEV-34119 Assertion `page_dir_get_n_heap(new_page) == 2U' failed in dberr_t PageBulk::init()
Problem:
=======
- insert..select statement on partition table fails to use
bulk insert for the transaction.

Solution:
========
- Enable the bulk insert operation for insert..select
statement for partition table.
2025-01-02 17:34:24 +05:30
Monty
7fcaab7aaa MDEV-20912 Add support for utf8mb4_0900_* collations in MariaDB Server
This is done by mapping most of the existing MySQL unicode 0900 collations
to MariadB 1400 unicode collations. The assumption is that 1400 is a super
set of 0900 for all practical purposes.

I also added a new function 'compare_collations()' and changed most code
to use this instead of comparing character sets directly.
This enables one to seamlessly mix-and-match the corresponding 0900 and
1400 sets. Field comparision and alter table treats the character sets
as identical.

All MySQL 8.0 0900 collations are supported except:
- utf8mb4_ja_0900_as_cs
- utf8mb4_ja_0900_as_cs_ks
- utf8mb4_ru_0900_as_cs
- utf8mb4_zh_0900_as_cs

These do not have corresponding entries in the MariadB 01400 collations.

Other things:
- Added COMMENT colum to information_schema.collations. For utf8mb4_0900
  colletions it contains the corresponding alias collation.
2024-12-28 10:23:49 +02:00
Monty
ed5bba8a32 Fixed failing test case innodb.log_file_size_online 2024-12-27 16:14:51 +02:00
Marko Mäkelä
a54d151fc1 Merge 10.6 into 10.11 2024-12-19 15:38:53 +02:00
Marko Mäkelä
ddd7d5d8e3 MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure
Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)

Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.

The idea of this fix is from Michael 'Monty' Widenius.

HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.

trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.

trx_t::is_started(): Replaces trx_is_started().

ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.

ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().

The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.

trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.

trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.

fts_trx_create(): Do not copy previous savepoints.

fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2024-12-12 18:02:00 +02:00
Thirunarayanan Balathandayuthapani
b9e592a786 MDEV-35475 Assertion `!rec_offs_nth_extern(offsets1, n)' failed in cmp_rec_rec_simple_field
Problem:
=======
InnoDB wrongly stores the primary key field in externally
stored off page during bulk insert operation. This leads
to assert failure.

Solution:
========
row_merge_buf_blob(): Should store the primary key fields
inline. Store the variable length field data externally
based on the row format of the table.

row_merge_buf_write(): check whether the record size exceeds
the maximum record size.

row_merge_copy_blob_from_file(): Construct the tuple based on
the variable length field
2024-12-09 20:27:12 +05:30
Sergei Golubchik
56e4b01b7b MDEV-35419 Server crashes when a adding column to the table which has a primary key using hash
correct the assert. followup for 062f8eb37d
2024-12-06 20:28:45 +01:00
Marko Mäkelä
33907f9ec6 Merge 11.4 into 11.7 2024-12-02 17:51:17 +02:00
Marko Mäkelä
2719cc4925 Merge 10.11 into 11.4 2024-12-02 11:35:34 +02:00
Marko Mäkelä
3d23adb766 Merge 10.6 into 10.11 2024-11-29 13:43:17 +02:00
Marko Mäkelä
19acb0257e MDEV-35508 Race condition between purge and secondary index INSERT or UPDATE
row_purge_remove_sec_if_poss_leaf(): If there is an active transaction
that is not newer than PAGE_MAX_TRX_ID, return the bogus value 1
so that row_purge_remove_sec_if_poss_tree() is guaranteed to recheck if
the record needs to be purged. It could be the case that an active
transaction would insert this record between the time this check
completed and row_purge_remove_sec_if_poss_tree() acquired a latch
on the secondary index leaf page again.

row_purge_del_mark_error(), row_purge_check(): Some unlikely code
refactored into separate non-inline functions.

trx_sys_t::find_same_or_older_low(): Move the unlikely and bulky
part of trx_sys_t::find_same_or_older() to a non-inline function.

trx_sys_t::find_same_or_older_in_purge(): A variant of
trx_sys_t::find_same_or_older() for use in the purge subsystem,
with potential concurrent access of the same trx_t object from
multiple threads.

trx_t::max_inactive_id_atomic: An Atomic_relaxed alias of the
regular data field trx_t::max_inactive_id, which we
use on systems that have native 64-bit loads or stores.
On any 64-bit system that seems to be supported by GCC, Clang or MSVC,
relaxed atomic loads and stores use the regular load and store
instructions. On -march=i686 the 64-bit atomic loads and stores
would use an XMM register.

This fixes a regression that had been introduced in
commit b7b9f3ce82 (MDEV-34515).
There would be messages
[ERROR] InnoDB: tried to purge non-delete-marked record in index
in the server error log, and an assertion ut_ad(0) would cause a
crash of debug instrumented builds. This could also cause incorrect
results for MVCC reads and corrupted secondary indexes.

The debug instrumented test case was written by Debarun Banerjee.

Reviewed by: Debarun Banerjee
2024-11-29 10:44:38 +02:00
Thirunarayanan Balathandayuthapani
9ba18d1aa0 MDEV-35394 Innochecksum misinterprets freed pages
- Innochecksum misinterprets the freed pages as active one.
This leads the user to think there are too many valid
pages exist.

- To avoid this confusion, innochecksum introduced one
more option --skip-freed-pages and -r to avoid the freed
pages while dumping or printing the summary of the tablespace.

- Innochecksum can safely assume the page is freed if
the respective extent doesn't belong to a segment and marked as
freed in XDES_BITMAP in extent descriptor page.

- Innochecksum shouldn't assume that zero-filled page as extent
descriptor page.

Reviewed-by: Marko Mäkelä
2024-11-27 13:00:51 +05:30
Thirunarayanan Balathandayuthapani
866a8ea673 MDEV-35398 Improve shrinking of system tablespace
Problem:
========
From 10.6.13(86767bcc0f) version,
InnoDB purge thread does free the TRX_UNDO_CACHED undo segment.
Pre-10.6.13 version data directory can contain TRX_UNDO_CACHED
undo segment in system tablespace even though it
has external undo tablespace.

During slow shutdown, InnoDB collects the segment from tables
that exist in system tablespace and cached undo segment in
the system tablespace as used segment exist in system tablespace.
While shrinking the system tablespace, last used extent can be
used by undo cached segment. This extent blocks the shrinking of
system tablespace in a effective way.

Solution:
========
While freeing the unused segment, InnoDB should free the cached
undo segment header page exists in system tablespace and
reset the TRX_RSEG_UNDO_SLOTS to 0xff for the rollback segment
header page exists in system tablespace. This could improve
the shrinking of system tablespace further.
2024-11-19 20:13:24 +05:30
Thirunarayanan Balathandayuthapani
8e1cf078a0 MDEV-35363 Avoid cloning of table statistics while saving the InnoDB table stats
- Remove the test case added as a part of the
commit 98d57719e2 (MDEV-32667)
2024-11-14 15:32:55 +05:30
Marko Mäkelä
c72221e2f8 MDEV-35411 innodb.log_file_size_online occasionally fails
The test innodb.log_file_size_online was killing the server shortly
after the completion of SET GLOBAL innodb_log_file_size.
It is possible that log_t::write_checkpoint() would let mysqltest.cc
proceed to kill the server before the message had been written to
the server error log.

Let us remove the occasionally failing check of that message, and
instead ensure that SET GLOBAL innodb_log_file_size to a different
size will result in an error log message when the server is not being
killed.
2024-11-13 12:04:05 +02:00
Oleksandr Byelkin
b12ff287ec Merge branch '11.6' into 11.7 2024-11-10 19:22:21 +01:00
Thirunarayanan Balathandayuthapani
074831ec61 Merge branch 10.5 into 10.6 2024-11-08 18:17:15 +05:30
Thirunarayanan Balathandayuthapani
7afee25b08 MDEV-35115 Inconsistent Replace behaviour when multiple unique index exist
- Replace statement fails with duplicate key error when multiple
unique key conflict happens. Reason is that Server expects the
InnoDB engine to store the confliciting keys in ascending order.
But the InnoDB doesn't store the conflicting keys
in ascending order.

Fix:
===
- Enable HA_DUPLICATE_KEY_NOT_IN_ORDER for InnoDB storage engine
only when unique index order is different in .frm and innodb dictionary.
2024-11-08 16:46:41 +05:30
Marko Mäkelä
ba4541ba7f MDEV-29015/MDEV-29260/MDEV-34938 test fixup
The merge f00711bba2 included a change
of the test innodb.log_file_name, which would try to ensure that
in the presence of the code fix decdd4bf49
we would get an error on Linux when invoking lseek() on a directory.
It turns out that this is not the case in at least one Linux based
cloud environment.
2024-11-08 09:55:47 +02:00
Oleksandr Byelkin
9e1fb104a3 MariaDB 11.4.4 release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEF39AEP5WyjM2MAMF8WVvJMdM0dgFAmck77AACgkQ8WVvJMdM
 0dgccQ/+Lls8fWt4D+gMPP7x+drJSO/IE/gZFt3ugbWF+/p3B2xXAs5AAE83wxEh
 QSbp4DCkb/9PnuakhLmzg0lFbxMUlh4rsJ1YyiuLB2J+YgKbAc36eQQf+rtYSipd
 DT5uRk36c9wOcOXo/mMv4APEvpPXBIBdIL4VvpKFbIOE7xT24Sp767zWXdXqrB1f
 JgOQdM2ct+bvSPC55oZ5p1kqyxwvd6K6+3RB3CIpwW9zrVSLg7enT3maLjj/761s
 jvlRae+Cv+r+Hit9XpmEH6n2FYVgIJ3o3WhdAHwN0kxKabXYTg7OCB7QxDZiUHI9
 C/5goKmKaPB1PCQyuTQyLSyyK9a8nPfgn6tqw/p/ZKDQhKT9sWJv/5bSWecrVndx
 LLYifSTrFC/eXLzgPvCnNv/U8SjsZaAdMIKS681+qDJ0P5abghUIlGnMYTjYXuX1
 1B6Vrr0bdrQ3V1CLB3tpkRjpUvicrsabtuAUAP65QnEG2G9UJXklOer+DE291Gsl
 f1I0o6C1zVGAOkUUD3QEYaHD8w7hlvyfKme5oXKUm3DOjaAar5UUKLdr6prxRZL4
 ebhmGEy42Mf8fBYoeohIxmxgvv6h2Xd9xCukgPp8hFpqJGw8abg7JNZTTKH4h2IY
 J51RpD10h4eoi6WRn3opEcjexTGvZ+xNR7yYO5WxWw6VIre9IUA=
 =s+WW
 -----END PGP SIGNATURE-----

Merge tag '11.4' into 11.6

MariaDB 11.4.4 release
2024-11-08 07:17:00 +01:00
Thirunarayanan Balathandayuthapani
98d57719e2 MDEV-32667 dict_stats_save_index_stat() reads uninitialized index->stats_error_printed
Problem:
========
- dict_stats_table_clone_create() does not initialize the
flag stats_error_printed in either dict_table_t or dict_index_t.
Because dict_stats_save_index_stat() is operating on a copy
of a dict_index_t object, it appears that
dict_index_t::stats_error_printed will always be false
for actual metadata objects, and uninitialized in
dict_stats_save_index_stat().

Solution:
=========
dict_stats_table_clone_create(): Assign stats_error_printed
for table and index while copying the statistics
2024-11-08 11:35:19 +05:30
Sergei Golubchik
7feec30939 relax the XA recovery error
it's just a suggestion anyway, not a bullet-proof check,
let's not act as if it is
2024-11-05 14:00:52 -08:00
Sergei Golubchik
126d6d787c cleanup: handlerton
remove unused methods, reorder methods, add comments
2024-11-05 14:00:50 -08:00
Sergei Golubchik
062f8eb37d cleanup: key algorithm vs key flags
the information about index algorithm was stored in two
places inconsistently split between both.

BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user
explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not.

RTREE index had key->algorithm == HA_KEY_ALG_RTREE
and always had key->flags & HA_SPATIAL

FULLTEXT index had  key->algorithm == HA_KEY_ALG_FULLTEXT
and always had key->flags & HA_FULLTEXT

HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF

long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH

In this commit:

All indexes except BTREE and HASH always have key->algorithm
set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except
for storage to keep frms backward compatible).

As a side effect ALTER TABLE now detects FULLTEXT index renames correctly
2024-11-05 14:00:47 -08:00
Oleksandr Byelkin
c770bce898 Merge branch '11.2' into 11.4 2024-10-30 15:11:17 +01:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Oleksandr Byelkin
3d0fb15028 Merge branch '10.6' into 10.11 2024-10-29 15:24:38 +01:00
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
Thirunarayanan Balathandayuthapani
db3be9b434 MDEV-35237 Bulk insert fails to apply buffered operation during CREATE..SELECT statement
Problem:
=======
- InnoDB fails to write the buffered insert operation during
create..select operation. This happens when bulk_insert
in transaction is reset to false while unlocking a source table.

Fix:
===
- InnoDB should apply the previous buffered changes to
all tables if we encounter any statement other than
pure INSERT or INSERT..SELECT statement in
ha_innobase::external_lock() and start_stmt().

- Remove the function bulk_insert_apply_for_table()

start_stmt(), external_lock(): Assert that trx->duplicates
should be enabled during bulk insert operation
2024-10-29 15:03:23 +05:30
Sergei Golubchik
3cd706b107 MDEV-35236 Assertion `(mem_root->flags & 4) == 0' failed in safe_lexcstrdup_root
Post-fix for MDEV-35144.

Cannot allocate options values on the statement arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was initially empty, it will be reset for every execution
and any values allocated on the  statement arena will be lost.

Cannot allocate option values on the execution arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was  initially NOT empty, any values appended to the
end will be preserved and if they're on the execution arena their
content will be destroyed.

Let's use thd->change_item_tree() to save and restore necessary pointers
for every execution.

followup for 3da565c41d
2024-10-23 14:58:57 +02:00
Vlad Lesin
8c7786e7d5 MDEV-34690 lock_rec_unlock_unmodified() causes deadlock
lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock()
or under a combination of lock_sys.rd_lock() + record locks hash table
cell latch. It also requests page latch to check if locked records were
changed by the current transaction or not.

Usually InnoDB requests page latch to find the certain record on the
page, and then requests lock_sys and/or record lock hash cell latch to
request record lock. lock_rec_unlock_unmodified() requests the latches
in the opposite order, what causes deadlocks. One of the possible
scenario for the deadlock is the following:

thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table
           cell latch, the latch is acquired;
thread 2 - purge thread acquires page latch and tries to remove
           delete-marked record, it invokes lock_update_delete(), which
           requests locks hash table cell latch, held by thread 1;
thread 1 - requests page latch, held by thread 2.

To fix it we need to release lock_sys.latch and/or lock hash cell latch,
acquire page latch and re-acquire lock_sys related latches.

When lock_sys.latch and/or lock hash cell latch are released in
lock_release_on_prepare() and lock_release_on_prepare_try(), the page on
which the current lock is held, can be merged. In this case the bitmap
of the current lock must be cleared, and the new lock must be added to
the end of trx->lock.trx_locks list, or bitmap of already existing lock
must be changed.

The new field trx_lock_t::set_nth_bit_calls indicates if new locks
(bits in existing lock bitmaps or new lock objects) were created during
the period when lock_sys was released in trx->lock.trx_locks list
iteration loop in lock_release_on_prepare() or
lock_release_on_prepare_try(). And, if so, we traverse the list again.

The block can be freed during pages merging, what causes assertion
failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page
get mode to it. That's why page_get_mode parameter was added to
btr_block_get() to pass BUF_GET_POSSIBLY_FREED from
lock_release_on_prepare() and lock_release_on_prepare_try() to
buf_page_get_gen().

As searching for id of trx, which modified secondary index record, is
quite expensive operation, restrict its usage for master. System variable
was added to remove the restriction for testing simplifying. The
variable exists only either for debug build or for build with
-DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the
probability of catching bugs for release build with RQG.

Note that the code, which does primary index lookup to find out what
transaction modified secondary index record, is necessary only when
there is no primary key and no unique secondary key on replica with row
based replication, because only in this case extra X locks on unmodified
records can be set during scan phase.

Reviewed by Marko Mäkelä.
2024-10-23 12:36:17 +03:00
Vlad Lesin
92180ad513 MDEV-34466 XA prepare don't release unmodified records for some cases
There is no need to exclude exclusive non-gap locks from the procedure
of locks releasing on XA PREPARE execution in
lock_release_on_prepare_try() after commit
17e59ed3aa (MDEV-33454), because
lock_rec_unlock_unmodified() should check if the record was modified
with the XA, and release the lock if it was not.

lock_release_on_prepare_try(): don't skip X-locks, let
lock_rec_unlock_unmodified() to process them.

lock_sec_rec_some_has_impl(): add template parameter for not acquiring
trx_t::mutex for the case if a caller already holds the mutex, don't
crash if lock's bitmap is clean.

row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): add new argument
to skip trx_t::mutex acquiring.

rw_trx_hash_t::validate_element(): don't acquire trx_t::mutex if the
current thread already holds it.

Thanks to Andrei Elkin for finding the bug.
Reviewed by Marko Mäkelä, Debarun Banerjee.
2024-10-23 12:36:17 +03:00
Marko Mäkelä
1cad1dbde6 MDEV-35235 innodb_snapshot_isolation=ON fails to signal transaction rollback
convert_error_code_to_mysql(): Treat DB_DEADLOCK and DB_RECORD_CHANGED
in the same way, that is, signal to the SQL layer that the transaction
had been rolled back.
2024-10-23 07:55:22 +03:00
Kristian Nielsen
abc46259c6 MDEV-34753 memory pressure - erroneous termination condition
Fix race condition in test case by waiting for the expected state to occur.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-10-19 17:20:27 +11:00
Daniel Black
eb29190398 MDEV-34753 memory pressure - erroneous termination condition
The 'if (!m_abort) break' condition was inverted by accident.

Constrain the test case to environments where there is cgroupv2
runtime environment which is the same case that will pass a memory
pressure initialization.

Remove the explicit garbage_collection trigger as it hides the abnormal
termination error on the event loop for memory pressure. This
also means there is no support in non-cgroupv2 environments
(possibly some container environments).

As the trigger to memory pressure is via a different thread we
need to wait until a "[mM]emory pressure" log message is there to
know it has succeeded or failed.

Thanks Kristian Nielsen for noticing and review.
2024-10-19 17:20:27 +11:00
Marko Mäkelä
ebefef658e Merge 10.11 into 11.2 2024-10-18 11:32:22 +03:00
Marko Mäkelä
eca552a1a4 MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

In crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-18 10:12:47 +03:00
Sergei Golubchik
3a1cf2c85b MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
Sergei Golubchik
3da565c41d MDEV-35144 CREATE TABLE ... LIKE uses current innodb_compression_default instead of the create value
When adding a column or index that uses plugin-defined
sysvar-based options with CREATE ... LIKE the server
was using the current value of the sysvar, not the default one.

Because parse_option_list() function was used both in create
and open and it tried to guess when it's create (need to use
current sysvar value and add a new name=value pair to the list)
or open (need to use default, without extending the list).

Let's move the list extending functionality into a separate
function and call it explicitly when needed. Operations that
add new objects (CREATE, ALTER ... ADD) will extend the list,
other operations (ALTER, CREATE ... LIKE, open) will not.
2024-10-17 16:28:39 +02:00
Marko Mäkelä
bb47e575de MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

On crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

A section of the test mariabackup.innodb_redo_overwrite
that is parsing some mariadb-backup --backup output has
been removed, because that output "redo log block is overwritten"
would often be missing in a Microsoft Windows environment
as a result of these changes.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page(): Handle FSP_SPACE_FLAGS=0xffffffff
in the same way on both 32-bit and 64-bit architectures.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-17 17:24:20 +03:00
Thirunarayanan Balathandayuthapani
4a1ded61a4 MDEV-34529 Shrink the system tablespace when system tablespace contains MDEV-30671 leaked undo pages
- InnoDB fails to shrink the system tablespace when it contains
the leaked undo log pages caused by MDEV-30671.

- InnoDB does free the unused segment in system tablespace
before shrinking the tablespace. InnoDB fails to free
the unused segment if XA PREPARE transaction exist or
if the previous shutdown was not with innodb_fast_shutdown=0

inode_info: Structure to store the inode page and offsets.

fil_space_t::garbage_collect(): Frees the system tablespace
unused segment

fsp_get_sys_used_segment(): Iterates through all default
file segment and index segment present in system tablespace.

trx_sys_t::is_xa_exist(): Returns true if the XA transaction
exist in the undo logs

fseg_inode_free(): Frees the extents, fragment pages for the
given index node and ignores any error similar to
trx_purge_free_segment()

trx_sys_t::reset_page(): Retain the TRX_SYS_FSEG_HEADER value
in trx_sys page while resetting the page.
2024-10-16 21:34:24 +05:30
Vladislav Vaintroub
c1fc59277a MDEV-34929 page-compressed tables do not work on Windows
Remove workaround for MDEV-13941, it served for 5 years,and all affected
pre-release 10.2 installation should have been already fixed in between.

Apparently Innodb is using is_sparse parameter in os_file_set_size()
inconsistently, and it passes is_sparse=false now during first file
extension. With MDEV-13941 workaround in place, it would unsparse
the file, which is makes compression not to work at all anymore.
2024-10-16 16:02:13 +02:00
Thirunarayanan Balathandayuthapani
6aaae4c03b MDEV-35122 Incorrect NULL value handling for instantly dropped BLOB columns
Problem:
=======
- Redundant table fails to insert into the table after
instant drop blob column. Instant drop column only marking
the column as hidden and consecutive insert statement tries
to insert NULL value for the dropped BLOB column and returns
the fixed length of the blob type as 65535. This lead to
row size too large error.

Fix:
====
 For redundant table, if the non-fixed dropped column can be null
then set the length of the field type as 0.
2024-10-15 12:04:37 +05:30
Yuchen Pei
cd5577ba4a
Merge branch '10.5' into 10.6 2024-10-15 16:00:44 +11:00
Thirunarayanan Balathandayuthapani
5777d9f282 MDEV-35116 InnoDB fails to set error index for HA_ERR_NULL_IN_SPATIAL
- InnoDB fails to set the index information or index number
for the spatial index error HA_ERR_NULL_IN_SPATIAL.

row_build_spatial_index_key(): Initialize the tmp_mbr array completely.

check_if_supported_inplace_alter(): Fix the spelling mistake of alter
2024-10-14 14:28:24 +05:30