Commit graph

27114 commits

Author SHA1 Message Date
Thirunarayanan Balathandayuthapani
f8cf493290 MDEV-34898 Doublewrite recovery of innodb_checksum_algorithm=full_crc32 encrypted pages does not work
- InnoDB fails to recover the full crc32 encrypted page from
doublewrite buffer. The reason is that buf_dblwr_t::recover()
fails to identify the space id from the page because the page has
been encrypted from FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION bytes.

Fix:
===
buf_dblwr_t::recover(): preserve any pages whose space_id
does not match a known tablespace. These could be encrypted pages
of tablespaces that had been created with
innodb_checksum_algorithm=full_crc32.

buf_page_t::read_complete(): If the page looks corrupted and the
tablespace is encrypted and in full_crc32 format, try to
restore the page from doublewrite buffer.

recv_dblwr_t::recover_encrypted_page(): Find the page which
has the same page number and try to decrypt the page using
space->crypt_data. After decryption, compare the space id.
Write the recovered page back to the file.
2025-01-07 19:33:56 +05:30
Monty
a2d37705ca Only print "InnoDB: Transaction was aborted..." if log_warnings >= 4
This is a minor fixup for
MDEV-24035 Failing assertion UT_LIST_GET_LEN(lock.trx_locks) == 0
causing disruption and replication failure
2025-01-05 16:40:12 +02:00
Monty
88d9348dfc Remove dates from all rdiff files 2025-01-05 16:40:11 +02:00
Monty
52c29f3bdc MDEV-35469 Heap tables are calling mallocs to often
Heap tables are allocated blocks to store rows according to
my_default_record_cache (mapped to the server global variable
 read_buffer_size).
This causes performance issues when the record length is big
(> 1000 bytes) and the my_default_record_cache is small.

Changed to instead split the default heap allocation to 1/16 of the
allowed space and not use my_default_record_cache anymore when creating
the heap. The allocation is also aligned to be just under a power of 2.

For some test that I have been running, which was using record length=633,
the speed of the query doubled thanks to this change.

Other things:
- Fixed calculation of max_records passed to hp_create() to take
  into account padding between records.
- Updated calculation of memory needed by heap tables. Before we
  did not take into account internal structures needed to access rows.
- Changed block sized for memory_table from 1 to 16384 to get less
  fragmentation. This also avoids a problem where we need 1K
  to manage index and row storage which was not counted for before.
- Moved heap memory usage to a separate test for 32 bit.
- Allocate all data blocks in heap in powers of 2. Change reported
  memory usage for heap to reflect this.

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
2025-01-05 16:40:11 +02:00
Marko Mäkelä
f20ee931d8 Merge 10.5 into 10.6
Note: Changes to the test innodb.stats_persistent
in commit e5c4c0842d (MDEV-35443)
are not merged, because the test scenario is impossible
due to commit e66928ab28 (MDEV-33462).
2025-01-03 09:10:25 +02:00
Marko Mäkelä
e5c4c0842d MDEV-35443: opt_search_plan_for_table() may degrade to full table scan
opt_calc_index_goodness(): Correct an inaccurate condition.
We can very well use a clustered index of a table that is subject
to online rebuild. But we must not choose an index that has not been
committed (it is a secondary index that was not fully created)
or that is corrupted or not a normal B-tree index.

opt_search_plan_for_table(): Remove some redundant code, now that
opt_calc_index_goodness() checks against corrupted indexes.

The test case allows this code to be exercised. The main observation
in the following:
	./mtr --rr innodb.stats_persistent
	rr replay var/log/mysqld.1.rr/latest-trace
should be that when opt_search_plan_for_table() is being invoked by
dict_stats_update_persistent() on the being-altered statistics table
in the 2nd call after ha_innobase::inplace_alter_table(),
and the fix in opt_calc_index_goodness() is absent,
it would choose the code path if (n_fields == 0), that is, a full
table scan, instead of searching for the record. The GDB commands to
execute in "rr replay" would be as follows:
	break ha_innobase::inplace_alter_table
	continue
	break opt_search_plan_for_table
	continue
	continue
	next
	next
	…

Reviewed by: Vladislav Lesin
2024-12-19 14:05:16 +02:00
Daniele Sciascia
07b77e862c MDEV-35660 Assertion `trx->xid.is_null()' failed
The assertion fails during wsrep recovery step, in function
innobase_rollback_by_xid(). The transaction's xid is normally
cleared as part of lookup by xid, unless the transaction has
a wsrep specific xid.
This is a regression from MDEV-24035 (commit ddd7d5d8e3)
which removed the part clears xid before rollback for transaction
with a wsrep specific xid.
2024-12-19 08:55:59 +01:00
mariadb-DebarunBanerjee
3f22f5f2fe MDEV-35679 Potential issue in Secondary Index with ROW_FORMAT=COMPRESSED and Change buffering enabled
In function buf_page_create_low(), remove duplicate code that
over-write the ibuf_exist variable incorrectly when only compressed
page is loaded in buffer pool. This would help removing any old change
buffer record immediately before re-using the page.
2024-12-18 20:46:26 +05:30
Yuchen Pei
671f80c738
Merge branch '10.5' into 10.6 2024-12-17 11:06:09 +11:00
Yuchen Pei
77c9917663
MDEV-34716 Fix mysql.servers socket max length too short
The limit of socket length on unix according to libc is 108, see
sockaddr_un::sun_path, but in the table it is a string of max length
64, which results in truncation of socket and failure to connect by
plugins using servers such as spider.
2024-12-17 10:40:57 +11:00
Marko Mäkelä
c982a143fc MDEV-35494 fixup: Always initialize latch
It turns out that init() always checks in debug builds that
some fields of the latch had been filled with zero.
2024-12-16 13:23:13 +02:00
mariadb-DebarunBanerjee
c7698a0b70 MDEV-35626 Race condition between buf_page_create_low() and read completion
This regression is introduced in 10.6 by following commit.
commit 35d477dd1d
MDEV-34453 Trying to read 16384 bytes at 70368744161280

The page state could change after being buffer-fixed and needs to be
read again after locking the page.
2024-12-13 18:36:47 +05:30
Marko Mäkelä
1097164d3f MDEV-35619 Assertion failure in row_purge_del_mark_error
trx_sys_t::find_same_or_older_in_purge(): Correct a mistake that
was made in commit 19acb0257e
(MDEV-35508) and make the caching logic correspond to the one in
trx_sys_t::find_same_or_older(). In the more common code path
for 64-bit systems, the condition !hot was inadvertently inverted,
making us wrongly skip calls to find_same_or_older_low() when the
transaction may still be active.

Furthermore, the call should have been to find_same_or_older_low()
and not the wrapper find_same_or_older().
2024-12-13 11:41:47 +02:00
Marko Mäkelä
ddd7d5d8e3 MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure
Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)

Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.

The idea of this fix is from Michael 'Monty' Widenius.

HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.

trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.

trx_t::is_started(): Replaces trx_is_started().

ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.

ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().

The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.

trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.

trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.

fts_trx_create(): Do not copy previous savepoints.

fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2024-12-12 18:02:00 +02:00
Dave Gosselin
9aa84cf57f MDEV-35587 unit.innodb_sync leaks memory on mac
unit.innodb_sync calls my_end to cleanup its memory
2024-12-12 10:27:36 +11:00
Marko Mäkelä
7bcd6c610a MDEV-35618 Bogus assertion failure 'recv_sys.scanned_lsn < max_lsn + 32 * 512U' during recovery
buf_dblwr_t::recover(): Correct a debug assertion failure that had
been added in commit bb47e575de (MDEV-34830).
The server may have been killed while a log write was in progress, and
therefore recv_sys.scanned_lsn may be up to RECV_PARSING_BUF_SIZE bytes
ahead of recv_sys.recovered_lsn.

Thanks to Matthias Leich for providing "rr replay" traces and
testing this.
2024-12-11 14:47:39 +02:00
Marko Mäkelä
69e20cab28 Merge 10.5 into 10.6 2024-12-11 14:46:43 +02:00
Marko Mäkelä
bfe7c8ff0a MDEV-35494 fil_space_t::fil_space_t() may be unsafe with GCC -flifetime-dse
fil_space_t::create(): Instead of invoking the default fil_space_t
constructor on a zero-filled buffer, allocate an uninitialized buffer
and invoke an explicitly defined constructor on it. Also, specify
initializer expressions for all constant data members, so that all of them
will be initialized in the constructor.

fil_space_t::being_imported: Replaces part of fil_space_t::purpose.

fil_space_t::is_being_imported(), fil_space_t::is_temporary():
Replaces fil_space_t::purpose.

fil_space_t:🆔 Changed the type from ulint to uint32_t to reduce
incompatibility with later branches that include
commit ca501ffb04 (MDEV-26195).

fil_space_t::try_to_close(): Do not attempt to close files that are
in an I/O bound phase of ALTER TABLE…IMPORT TABLESPACE.

log_file_op, first_page_init: recv_spaces_t:
Use uint32_t for the tablespace id.

Reviewed by: Debarun Banerjee
2024-12-11 14:44:42 +02:00
Daniel Black
807e4f320f Change my_umask{,_dir} to mode_t and remove os_innodb_umask
os_innodb_umask was of the incorrect type resulting in warnings
in clang-19. The correct type is mode_t.

As os_innodb_umask was set during innnodb_init from my_umask,
corrected the type there along with its companion my_umask_dir.
Because of this, the defaults mask values in innodb never
had an effect.

The resulting change allow found signed differences in
my_create{,_nosymlink}, open_nosymlinks:

mysys/my_create.c:47:20: error: operand of ?: changes signedness from ‘int’ to ‘mode_t’ {aka ‘unsigned int’} due to unsignedness of other operand [-Werror=sign-compare]
   47 |      CreateFlags ? CreateFlags : my_umask);

Ref: clang-19 warnings:

[55/123] Building CXX object storage/innobase/CMakeFiles/innobase.dir/os/os0file.cc.o
storage/innobase/os/os0file.cc:1075:46: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
 1075 |                 file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
      |                        ~~~~                                ^~~~~~~~~~~~~~~
storage/innobase/os/os0file.cc:1249:46: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
 1249 |                 file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
      |                        ~~~~                                ^~~~~~~~~~~~~~~
storage/innobase/os/os0file.cc:1381:45: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
 1381 |         file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
      |                ~~~~                                ^~~~~~~~~~~~~~~
2024-12-11 17:21:01 +11:00
Daniel Black
bf7cfa2535 MDEV-35574 remove obsolete pthread_exit calls
Threads can normally exit without a explicit pthread_exit call.

There seem to date to old glibc bugs, many around 2.2.5.

The semi related bug was https://bugs.mysql.com/bug.php?id=82886.

To improve safety in the signal handlers DBUG_* code was removed.

These where also needed to avoid some MSAN unresolved stack issues.

This is effectively a backport of 2719cc4925.
2024-12-10 12:12:20 +11:00
Kristian Nielsen
b4fde50b1f MDEV-5798: Wrong errorcode for missing partition after TRUNCATE PARTITION
The partitioning error handling code was looking at
thd->lex->alter_info.partition_flags in non-alter-table cases, in which cases
the value is stale and contains whatever was set by any earlier ALTER TABLE.
This could cause the wrong error code to be generated, which then in some cases
can cause replication to break with "different errorcode" error.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 08:17:35 +01:00
Julius Goryavsky
cefdc3e67d Merge branch '10.5' into '10.6' 2024-12-03 13:08:12 +01:00
Yuchen Pei
d0fcac4450
MDEV-35422 Fix spider group by handler trying to use fake group by fields
This is a fixup of MDEV-26345 commit
77ed235d50.

In MDEV-26345 the spider group by handler was updated so that it uses
the item_ptr fields of Query::group_by and Query::order_by, instead of
item. This was and is because the call to
join->set_items_ref_array(join->items1) during the execution stage,
just before the execution replaces the order-by / group-by item arrays
with Item_temptable_field.

Spider traverses the item tree during the group by handler (gbh)
creation at the end of the optimization stage, and decides a gbh could
handle the execution of the query. Basically spider gbh can handle the
execution if it can construct a well-formed query, executes on the
data node, and store the results in the correct places. If so, it will
create one, otherwise it will return NULL and the execution will use
the usual handler (ha_spider instead of spider_group_by_handler). To
that end, the general principle is the items checked for creation
should be the same items later used for query construciton. Since in
MDEV-26345 we changed to use the item_ptr field instead of item field
of order-by and group-by in query construction, in this patch we do
the same for the gbh creation.

The item_ptr field could be the uninitialised NULL value during the
gbh creation. This is because the optimizer may replace a DISTINCT
with a GROUP BY, which only happens if the original GROUP BY is empty.
It creates the artificial GROUP BY by calling create_distinct_group(),
which creates the corresponding ORDER object with item field aligning
with somewhere in ref_pointer_array, but leaving item_ptr to be NULL.
When spider finds out that item_ptr is NULL, it knows there's some
optimizer skullduggery and it is passed a query different from the
original. Without a clear contract between the server layer and the
gbh, it is better to be safe than sorry and not create the gbh in this
case.

Also add a check and error reporting for the unlikely case of item_ptr
changing from non-NULL at gbh construction to NULL at execution to
prevent server crash.

Also, we remove a check added in MDEV-29480 of order by items being
aggregate functions. That check was added with the premise that spider
was including auxiliary SELECT items which is referenced by ORDER BY
items. This premise was no longer true since MDEV-26345, and caused
problems such as MDEV-29546, which was fixed by MDEV-26345.
2024-12-03 10:32:42 +11:00
Alexander Barkov
01cc92e098 MDEV-34700 Connect SQLite3 MTR test fails due to various charset/collation related output changes
Re-recorded test results with the COLLATE clause, according to MDEV-29446.
2024-12-01 13:52:41 +04:00
Marko Mäkelä
1a9011d273 MDEV-35525: Index corruption in reverse scans
btr_cur_t::search_leaf(): In the BTR_SEARCH_PREV and BTR_MODIFY_PREV
modes, reset the previous search status before invoking
page_cur_search_with_match(). Otherwise, we the search could invoke
in a totally wrong subtree.

This fixes a regression that was introduced in
commit de4030e4d4 (MDEV-30400).
2024-11-29 15:12:20 +02:00
Marko Mäkelä
507323abe6 Cleanup: Remove duplicated code
buf_block_alloc(): Define as an alias in buf0lru.h, which defines
the underlying buf_LRU_get_free_block().

buf_block_free(): Define as an alias of the non-inline function
buf_pool.free_block(block).

Reviewed by: Vladislav Lesin
2024-11-29 14:16:34 +02:00
Marko Mäkelä
998a625d00 Clean up recv_sys.pages bookkeeping
Instead of repurposing buf_page_t::access_time for state()==MEMORY
blocks that are part of recv_sys.pages, let us define an anonymous
union around buf_page_t::hash.  In this way, we will be able to
declare access_time private.

Reviewed by: Vladislav Lesin
2024-11-29 14:16:11 +02:00
Marko Mäkelä
7d4077cc11 Merge 10.5 into 10.6 2024-11-29 12:37:46 +02:00
Marko Mäkelä
19acb0257e MDEV-35508 Race condition between purge and secondary index INSERT or UPDATE
row_purge_remove_sec_if_poss_leaf(): If there is an active transaction
that is not newer than PAGE_MAX_TRX_ID, return the bogus value 1
so that row_purge_remove_sec_if_poss_tree() is guaranteed to recheck if
the record needs to be purged. It could be the case that an active
transaction would insert this record between the time this check
completed and row_purge_remove_sec_if_poss_tree() acquired a latch
on the secondary index leaf page again.

row_purge_del_mark_error(), row_purge_check(): Some unlikely code
refactored into separate non-inline functions.

trx_sys_t::find_same_or_older_low(): Move the unlikely and bulky
part of trx_sys_t::find_same_or_older() to a non-inline function.

trx_sys_t::find_same_or_older_in_purge(): A variant of
trx_sys_t::find_same_or_older() for use in the purge subsystem,
with potential concurrent access of the same trx_t object from
multiple threads.

trx_t::max_inactive_id_atomic: An Atomic_relaxed alias of the
regular data field trx_t::max_inactive_id, which we
use on systems that have native 64-bit loads or stores.
On any 64-bit system that seems to be supported by GCC, Clang or MSVC,
relaxed atomic loads and stores use the regular load and store
instructions. On -march=i686 the 64-bit atomic loads and stores
would use an XMM register.

This fixes a regression that had been introduced in
commit b7b9f3ce82 (MDEV-34515).
There would be messages
[ERROR] InnoDB: tried to purge non-delete-marked record in index
in the server error log, and an assertion ut_ad(0) would cause a
crash of debug instrumented builds. This could also cause incorrect
results for MVCC reads and corrupted secondary indexes.

The debug instrumented test case was written by Debarun Banerjee.

Reviewed by: Debarun Banerjee
2024-11-29 10:44:38 +02:00
Daniele Sciascia
e821c9fa7c MDEV-35281 SR transaction crashes with innodb_snapshot_isolation
Ignore snapshot isolation conflict during fragment removal, before
streaming transaction commits. This happens when a streaming
transaction creates a read view that precedes the INSERTion of
fragments into the streaming_log table. Fragments are INSERTed
using a different transaction. These fragment are then removed
as part of COMMIT of the streaming transaction. This fragment
removal operation could fail when the fragments were not part
the transaction's read view, thus violating snapshot isolation.
2024-11-29 08:06:32 +01:00
Julius Goryavsky
8bc254dd62 MDEV-26516: WSREP: Record locking is disabled in this thread, but the table being modified
We periodically observe assertion failures in the mtr tests,
specifically in the /storage/innobase/row/row0ins.cc file,
following a WSREP error. The error message is: 'WSREP: record
locking is disabled in this thread, but the table being modified
is not mysql/wsrep_streaming_log: mysql/innodb_table_stats.'"
This issue seems to occur because, upon opening the table,
innodb_stats_auto_recalc may trigger, which Galera does not
anticipate. This commit should fix this bug.
2024-11-28 01:02:35 +01:00
Yuchen Pei
5be859d52c
MDEV-30649 Adding a spider testcase showing copying from a remote to a local table
Also deleted some trailing whitespace in mdev_30191.test.
2024-11-27 10:25:14 +11:00
Yuchen Pei
a8cc40d9a4
MDEV-35064 Reduce the default spider connect retry counts to 2
The existing default value 1000 is too big and could result in
"hanging" when failing to connect a remote server. Three tries in
total is a more sensible default.
2024-11-27 10:25:14 +11:00
Marko Mäkelä
2255be0395 MDEV-35472 Server crash in ha_storage_put_memlim upon reading from INNODB_LOCKS
ha_storage_put_memlim(): Initialize node->next in order to avoid a
crash on a subsequent invocation, due to dereferencing an uninitialized
pointer.

This fixes a regression that had been introduced in
commit ccb6cd8053 (MDEV-35189).

Reviewed by: Debarun Banerjee
2024-11-25 10:31:57 +02:00
ParadoxV5
ec58fce3da MDEV-35478 Correction for table->space_id in dict_load_tablespace() was mistakenly applied on an earlier branch
It is `ulint` on 10.6 and `uint32_t` on 10.11+, but I included its
format specifier change in 10.6 (MDEV-35430, merged #3493) rather
than 10.11. This commit reverts that change so 10.11 can reapply it.
2024-11-25 18:25:02 +11:00
Daniel Black
971a0ba23c MDEV-34408: Facilitate the addition of warnings into the build system
Create a MY_WARNING_FLAGS_NON_FATAL for testing warnings

Add -Weffc++ as no-error espect, disabling from rocksdb due
to excessive errors that will be corrected later.
2024-11-23 08:14:23 -07:00
Brandon Nesterenko
78d7bb1d27 MDEV-34348: Miscellaneous fixes
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

Various additional fixes, each too small to put into
their own commit.

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:23 -07:00
Brandon Nesterenko
3c785499da MDEV-34348: Fix casts relating to tree_walk_action
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:23 -07:00
Brandon Nesterenko
840fe316d4 MDEV-34348: my_hash_get_key fixes
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

Change the type of my_hash_get_key to:
 1) Return const
 2) Change the context parameter to be const void*

Also fix casting in hash adjacent areas.

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:22 -07:00
Brandon Nesterenko
dbfee9fc2b MDEV-34348: Consolidate cmp function declarations
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.

This patch consolidates the functions all into the
qsort_cmp2 interface.

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:22 -07:00
Marko Mäkelä
26597b91b3 MDEV-35413 InnoDB: Cannot load compressed BLOB
A race condition was observed between two buf_page_get_zip() for a page.
One of them had proceeded to buf_read_page(), allocating and x-latching
a buf_block_t that initially comprises only an uncompressed page frame.
While that thread was waiting inside buf_block_alloc(), another thread
would try to access the same page. Without acquiring a page latch, it
would wrongly conclude that there is corruption because no compressed
page frame exists for the block.

buf_page_get_zip(): Simplify the logic and correct the documentation.
Always acquire a shared latch to prevent any race condition with a
concurrent read operation. No longer increment a buffer-fix; the latch
is sufficient for preventing page relocation or eviction.

buf_read_page(): Add the parameter bool unzip=true. In buf_page_get_zip()
there is no need to allocate an uncompressed page frame for reading a
compressed BLOB page. We only need that for other ROW_FORMAT=COMPRESSED
pages, or for writing compressed BLOB pages.

btr_copy_zblob_prefix(): Remove the message "Cannot load compressed BLOB"
because buf_page_get_zip() will already have reported a more specific
error whenever it returns nullptr.

row_merge_buf_add(): Do not crash on BLOB corruption, but return an
error instead. (In debug builds, an assertion will fail if this
corruption is noticed.)

Reviewed by: Debarun Banerjee
2024-11-22 08:33:03 +02:00
ParadoxV5
cf2d49ddcf Extract some of #3360 fixes to 10.5.x
That PR uncovered countless issues on `my_snprintf` uses.
This commit backports a squashed subset of their fixes.
2024-11-21 22:43:56 +11:00
Marko Mäkelä
895cd553a3 MDEV-32175: Reduce page_align(), page_offset() calls
When srv_page_size and innodb_page_size were introduced,
the functions page_align() and page_offset() got more expensive.
Let us try to replace such calls with simpler pointer arithmetics
with respect to the buffer page frame.

page_rec_get_next_non_del_marked(): Add a page frame as a parameter,
and template<bool comp>.

page_rec_next_get(): A more efficient variant of page_rec_get_next(),
with template<bool comp> and const page_t* parameters.

lock_get_heap_no(): Replaces page_rec_get_heap_no() outside debug checks.

fseg_free_step(), fseg_free_step_not_header(): Take the header block
as a parameter.

Reviewed by: Vladislav Lesin
2024-11-21 11:01:30 +02:00
Marko Mäkelä
df3855a471 MDEV-35247: ut_hash_ulint() is a waste
ut_hash_ulint(): Remove. The exclusive OR before a modulus operation
does not serve any useful purpose; it is only obfuscating code and
wasting some CPU cycles.

Reviewed by: Debarun Banerjee
2024-11-21 08:59:31 +02:00
Marko Mäkelä
a9b0a1c5d0 MDEV-35247: ut_fold_ull() is a waste
ut_fold_ull(): For SIZEOF_SIZE_T < 8, we simulate universal hashing
(Carter and Wegman, 1977) by pretending that SIZE_T_MAX + 1
is a prime. In other words, we implement a Rabin–Karp rolling
hash algorithm similar to java.lang.String.hashCode().
This is used for representing 64-bit dict_index_t::id or
dict_table_t::id in the native word size.

For SIZEOF_SIZE_T >= 8, we just use an identity mapping.

Reviewed by: Debarun Banerjee
2024-11-21 08:59:17 +02:00
Marko Mäkelä
3c312d247c MDEV-35190 HASH_SEARCH duplicates effort before HASH_INSERT or HASH_DELETE
The HASH_ macros are unnecessarily obfuscating the logic,
so we had better replace them.

hash_cell_t::search(): Implement most of the HASH_DELETE logic,
for a subsequent insert or remove().

hash_cell_t::remove(): Remove an element.

hash_cell_t::find(): Implement the HASH_SEARCH logic.

xb_filter_hash_free(): Avoid any hash table lookup;
just traverse the hash bucket chains and free each element.

xb_register_filter_entry(): Search databases_hash only once.

rm_if_not_found(): Make use of find_filter_in_hashtable().

dict_sys_t::acquire_temporary_table(), dict_sys_t::find_table():
Define non-inline to avoid unnecessary code duplication.

dict_sys_t::add(dict_table_t *table), dict_table_rename_in_cache():
Look for duplicate while finding the insert position.

dict_table_change_id_in_cache(): Merged to the only caller
row_discard_tablespace().

hash_insert(): Helper function of dict_sys_t::resize().

fil_space_t::create(): Look for a duplicate (and crash if found)
when searching for the insert position.

lock_rec_discard(): Take the hash array cell as a parameter
to avoid a duplicated lookup.

lock_rec_free_all_from_discard_page(): Remove a parameter.

Reviewed by: Debarun Banerjee
2024-11-21 08:59:02 +02:00
Vlad Lesin
bcbeef6772 MDEV-35457 Remove btr_cur_t::path_arr
After MDEV-21136 fix, the btr_cur_t::path_arr field stayed declared, but
not used, wasting space in each btr_cur_t and btr_pcur_t. Remove it.
2024-11-20 17:43:04 +03:00
Monty
32962ea253 Do not read aria bitmap page for internal temporary tables
Instead create the bitmap page from scratch
2024-11-20 10:01:20 +02:00
Monty
69be363daa Fixed that internal temporary Aria tables are not flushed to disk
This bug was caused by
MDEV-17070 Table corruption or Assertion `table->file->stats.records > 0
2024-11-19 20:27:51 +02:00
Marko Mäkelä
ba69d811fa MDEV-35409 InnoDB can still hang while running out of buffer pool
buf_pool_t::LRU_warn(): Also clear the try_LRU_scan flag, to ensure
that need_LRU_eviction() will hold. This should ensure progress when
buf_LRU_get_free_block() is expecting buf_flush_page_cleaner() to
make some room, even when buf_pool.LRU.count is small.

This hang was observed in trx_lists_init_at_db_start() while the last
batch of crash recovery was in progress, but it could theoretically
be possible also when a large part of the buffer pool is occupied by
record locks or the adaptive hash index.

Reviewed by: Debarun Banerjee
2024-11-18 08:13:18 +02:00