Commit graph

27112 commits

Author SHA1 Message Date
Yuchen Pei
de3dd942c0
MDEV-28894 Spider: remove #ifdef HA_EXTRA_HAS_STARTING_ORDERED_INDEX_SCAN
HA_EXTRA_STARTING_ORDERED_INDEX_SCAN was added latest 2018:
921c5e9314
2024-09-10 11:15:13 +10:00
Yuchen Pei
64581c83e8
MDEV-28893 Spider: remove #ifdef SPIDER_NET_HAS_THD
net has thd since 2015 in 56aa19989f for MDEV-6152
2024-09-10 11:15:12 +10:00
Yuchen Pei
ba9bebd719
MDEV-28892 remove #ifdef SPIDER_Item_args_arg_count_IS_PROTECTED
arg_count was protected since 2015 in commit
afa1773439
2024-09-10 11:15:12 +10:00
Yuchen Pei
05fafaf82d
MDEV-27646 remove SPIDER_HAS_HASH_VALUE_TYPE
unifdef -DSPIDER_HAS_HASH_VALUE_TYPE -m storage/spider/spd_* storage/spider/ha_spider.* storage/spider/hs_client/*
2024-09-10 11:15:12 +10:00
Marko Mäkelä
b7b2d2bde4 Merge 10.5 into 10.6 2024-09-09 11:30:30 +03:00
Marko Mäkelä
f06060f5ed Cleanup: Remove the function dict_remove_db_name() 2024-09-06 14:31:55 +03:00
Marko Mäkelä
024a18dbcb MDEV-34823 Invalid arguments in ib_push_warning()
In the bug report MDEV-32817 it occurred that the function
row_mysql_get_table_status() is outputting a fil_space_t*
as if it were a numeric tablespace identifier.

ib_push_warning(): Remove. Let us invoke push_warning_printf() directly.

innodb_decryption_failed(): Report a decryption failure and set the
dict_table_t::file_unreadable flag. This code was being duplicated in
very many places. We return the constant value DB_DECRYPTION_FAILED
in order to avoid code duplication in the callers and to allow tail calls.

innodb_fk_error(): Report a FOREIGN KEY error.

dict_foreign_def_get(), dict_foreign_def_get_fields(): Remove.
This code was being used in dict_create_add_foreign_to_dictionary()
in an apparently uncovered code path. That ib_push_warning() call
would pass the integer i+1 instead of a pointer to NUL terminated
string ("%s"), and therefore the call should have resulted in a crash.

dict_print_info_on_foreign_key_in_create_format(),
innobase_quote_identifier(): Add const qualifiers.

row_mysql_get_table_error(): Replaces row_mysql_get_table_status().
Display no message on DB_CORRUPTION; it should be properly reported at
the SQL layer anyway.
2024-09-06 14:29:09 +03:00
Yuchen Pei
60b93cdd30
Merge branch '10.5' into 10.6 2024-09-06 13:52:57 +10:00
Daniel Black
8024b8e4c1 MDEV-33091 pcre2 headers - handle columnstore
From e735cf2ed7cefb2af36f10f3cb47dfc060789df3, the PCRE_INCLUDES
changed to PCRE_INCLUDE_DIRS for consistency.

The columnstore module depends on the old name.

Create a mapping for the columnstore submodule.

10.6+ fix for submodule is:
* https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3304
2024-09-05 12:14:06 +10:00
Sergei Golubchik
b2ebe1cb7b MDEV-33091 pcre2 headers aren't found on Solaris
use pkg-config to find pcre2, if possible

rename PCRE_INCLUDES to use PKG_CHECK_MODULES naming, PCRE_INCLUDE_DIRS
2024-09-05 12:14:06 +10:00
Marko Mäkelä
9f0b106631 MDEV-34845 buf_flush_buffer_pool(): Assertion `!os_aio_pending_reads()' failed
buf_flush_buffer_pool(): Wait for any pending asynchronous reads
to complete. This assertion failed in a run where buf_read_ahead_linear()
had been triggered in an SQL statement that was executed right
before shutdown.

Reviewed by: Debarun Banerjee
2024-09-03 18:22:10 +03:00
Marko Mäkelä
9878238f74 MDEV-34791: Redundant page lookups hurt performance
btr_cur_t::search_leaf(): When the index root page is also a leaf page,
we may need to upgrade our existing shared root page latch into an
exclusive latch. Even if we end up waiting, the root page won't be able
to go away while we hold an index()->lock. The index page may be split;
that is all.

btr_latch_prev(): Acquire the page latch while holding a buffer-fix
and an index tree latch. Merge the change buffer if needed. Use
buf_pool_t::page_fix() for this special case instead of complicating
buf_page_get_low() and buf_page_get_gen().

row_merge_read_clustered_index(): Remove some code that does not seem
to be useful. No difference was observed with regard to removing this
code when a CREATE INDEX or OPTIMIZE TABLE statement was run concurrently
with sysbench oltp_update_index --tables=1 --table_size=1000 --threads=16.

buf_pool_t::unzip(): Decompress a ROW_FORMAT=COMPRESSED page.

buf_pool_t::page_fix(): Handle also ROW_FORMAT=COMPRESSED pages
as well as change buffer merge. Optionally return an error.
Add a flag for suppressing a page latch wait and a special return
value -1 to indicate that the call would block.
This is the preferred way of buffer-fixing blocks.
The functions buf_page_get_gen() and buf_page_get_low() are only being
invoked with rw_latch=RW_NO_LATCH in operations on SPATIAL INDEX.

buf_page_t: Define some static functions for interpreting state().

buf_page_get_zip(), buf_read_page(),
buf_read_ahead_random(), buf_read_ahead_linear():
Remove the redundant parameter zip_size. We must look up the
tablespace and can invoke fil_space_t::zip_size() on it.

buf_page_get_low(): Require mtr!=nullptr.

buf_page_get_gen(): Implement some lock downgrading during recovery.

ibuf_page_low(): Use buf_pool_t::page_fix() in a debug check.
We do wait for a page read here, because otherwise a debug assertion in
buf_page_get_low() in the test innodb.ibuf_delete could occasionally fail.

PageConverter::operator(): Invoke buf_pool_t::page_fix() in order
to possibly evict a block. This allows us to remove some
special case code from buf_page_get_low().
2024-09-03 14:15:57 +03:00
Marko Mäkelä
0e76c1ba94 Merge 10.5 into 10.6 2024-08-28 15:51:36 +03:00
Marko Mäkelä
1ff6b6f0b4 MDEV-34802 Recovery fails to note some log corruption
recv_recovery_from_checkpoint_start(): Abort startup due to log
corruption if we were unable to parse the entire log between
the latest log checkpoint and the corresponding FILE_CHECKPOINT record.

Also, reduce some code bloat related to log output and log_sys.mutex.

Reviewed by: Debarun Banerjee
2024-08-28 15:44:42 +03:00
Marko Mäkelä
bda40ccb85 MDEV-34803 innodb_lru_flush_size is no longer used
In commit fa8a46eb68 (MDEV-33613)
the parameter innodb_lru_flush_size ceased to have any effect.

Let us declare the parameter as deprecated and additionally as
MARIADB_REMOVED_OPTION, so that there will be a warning written
to the error log in case the option is specified in the command line.

Let us also do the same for the parameter
innodb_purge_rseg_truncate_frequency
that was deprecated&ignored earlier in MDEV-32050.

Reviewed by: Debarun Banerjee
2024-08-28 07:18:03 +03:00
Marko Mäkelä
e7bb9b7c55 MDEV-24923 fixup: Correct a function comment 2024-08-27 18:06:24 +03:00
Marko Mäkelä
48becffd07 Merge 10.5 into 10.6 2024-08-27 08:52:10 +03:00
Yuchen Pei
58bc83e1a7
[fixup] Spider: Restored lines accidentally deleted in MDEV-32157
Also restored a change that resulted in off-by-one, as well as
appending the correctly indexed key_hint.
2024-08-27 15:36:39 +10:00
Marko Mäkelä
36ab75a498 MDEV-34515: Fix a bogus debug assertion
purge_sys_t::stop_FTS(): Fix an incorrect debug assertion that
commit d58734d781 added.
The assertion would fail if there had been prior invocations of
purge_sys.stop_SYS() without purge_sys.resume_SYS().
The intention of the assertion is to check that number of pending
stop_FTS() stays below 65536.
2024-08-27 07:27:24 +03:00
Marko Mäkelä
76f6b6d818 MDEV-34515: Reduce context switching in purge
Before this patch, the InnoDB purge coordinator task submitted
innodb_purge_threads-1 tasks even if there was not sufficient amount
of work for all of them. For example, if there are undo log records
only for 1 table, only 1 task can be employed, and that task had better
be the purge coordinator.

srv_purge_worker_task_low(): Split from purge_worker_callback().

trx_purge_attach_undo_recs(): Remove the parameter n_purge_threads,
and add the parameter n_work_items, to keep track of the amount of
work.

trx_purge(): Launch purge worker tasks only if necessary. The work of
one thread will be executed by this purge coordinator thread.

que_fork_scheduler_round_robin(): Merged to trx_purge().

Thanks to Vladislav Vaintroub for supplying a prototype of this.

Reviewed by: Debarun Banerjee
2024-08-26 12:23:17 +03:00
Marko Mäkelä
b7b9f3ce82 MDEV-34515: Contention between purge and workload
In a Sysbench oltp_update_index workload that involves 1 table,
a serious contention between the workload and the purge of history
was observed. This was the worst when the table contained only 1 record.

This turned out to be fixed by setting innodb_purge_batch_size=128,
which corresponds to the number of usable persistent rollback segments.
When we go above that, there would be contention between row_purge_poss_sec()
and the workload, typically on the clustered index page latch, sometimes
also on a secondary index page latch. It might be that with smaller
batches, trx_sys.history_size() will end up pausing all concurrent
transaction start/commit frequently enough so that purge will be able
to make some progress, so that there would be less contention on the
index page latches between purge and SQL execution.

In commit aa719b5010 (part of MDEV-32050)
the interpretation of the parameter innodb_purge_batch_size was slightly
changed. It would correspond to the maximum desired size of the
purge_sys.pages cache. Before that change, the parameter was referring to
a number of undo log pages, but the accounting might have been inaccurate.

To avoid a regression, we will reduce the default value to
innodb_purge_batch_size=127, which will also be compatible with
innodb_undo_tablespaces>1 (which will disable rollback segment 0).

Additionally, some logic in the purge and MVCC checks is simplified.
The purge tasks will make use of purge_sys.pages when accessing undo
log pages to find out if a secondary index record can be removed.
If an undo page needs to be looked up in buf_pool.page_hash, we will
merely buffer-fix it. This is correct, because the undo pages are
append-only in nature. Holding purge_sys.latch or purge_sys.end_latch
or the fact that the current thread is executing as a part of an
in-progress purge batch will prevent the contents of the undo page from
being freed and subsequently reused. The buffer-fix will prevent the
page from being evicted form the buffer pool. Thanks to this logic,
we can refer to the undo log record directly in the buffer pool page
and avoid copying the record.

buf_pool_t::page_fix(): Look up and buffer-fix a page. This is useful
for accessing undo log pages, which are append-only by nature.
There will be no need to deal with change buffer or ROW_FORMAT=COMPRESSED
in that case.

purge_sys_t::view_guard::view_guard(): Allow the type of guard to be
acquired: end_latch, latch, or no latch (in case we are a purge thread).

purge_sys_t::view_guard::get(): Read-only accessor to purge_sys.pages.

purge_sys_t::get_page(): Invoke buf_pool_t::page_fix().

row_vers_old_has_index_entry(): Replaced with row_purge_is_unsafe()
and row_undo_mod_sec_unsafe().

trx_undo_get_undo_rec(): Merged to trx_undo_prev_version_build().

row_purge_poss_sec(): Add the parameter mtr and remove redundant
or unused parameters sec_pcur, sec_mtr, is_tree. We will use the
caller's mtr object but release any acquired page latches before
returning.

btr_cur_get_page(), page_cur_get_page(): Do not invoke page_align().

row_purge_remove_sec_if_poss_leaf(): Return the value of PAGE_MAX_TRX_ID
to be checked against the page in row_purge_remove_sec_if_poss_tree().
If the secondary index page was not changed meanwhile, it will be
unnecessary to invoke row_purge_poss_sec() again.

trx_undo_prev_version_build(): Access any undo log pages using
the caller's mini-transaction object.

row_purge_vc_matches_cluster(): Moved to the only compilation unit that
needs it.

Reviewed by: Debarun Banerjee
2024-08-26 12:23:06 +03:00
Marko Mäkelä
d58734d781 MDEV-34520 purge_sys_t::wait_FTS sleeps 10ms, even if it does not have to
There were two separate Atomic_counter<uint32_t>, purge_sys.m_SYS_paused
and purge_sys.m_FTS_paused. In purge_sys.wait_FTS() we have to read both
atomically. We used to use an overkill solution for this, acquiring
purge_sys.latch and waiting 10 milliseconds between samples. To make
matters worse, the 10-millisecond wait was unconditional, which would
unnecessarily suspend the purge_coordinator_task every now and then.

It turns out that we can fold both "reference counts" into a single
Atomic_relaxed<uint32_t> and avoid the purge_sys.latch.
To assess whether std::memory_order_relaxed is acceptable, we should
consider the operations that read these "reference counts", that is,
purge_sys_t::wait_FTS(bool) and purge_sys_t::must_wait_FTS().

Outside debug assertions, purge_sys.must_wait_FTS() is only invoked in
trx_purge_table_acquire(), which is covered by a shared dict_sys.latch.
We would increment the counter as part of a DDL operation, but before
acquiring an exclusive dict_sys.latch. So, a
purge_sys_t::close_and_reopen() loop could be triggered slightly
prematurely, before a problematic DDL operation is actually executed.
Decrementing the counter is less of an issue; purge_sys.resume_FTS()
or purge_sys.resume_SYS() would mostly be invoked while holding an
exclusive dict_sys.latch; ha_innobase::delete_table() does it outside
that critical section. Still, this would only cause some extra wait in
the purge_coordinator_task, just like at the start of a DDL operation.

There are two calls to purge_sys_t::wait_FTS(bool): in the above mentioned
purge_sys_t::close_and_reopen() and in purge_sys_t::clone_oldest_view(),
both invoked by the purge_coordinator_task. There is also a
purge_sys.clone_oldest_view<true>() call at startup when no DDL operation
can be in progress.

purge_sys_t::m_SYS_paused: Merged into m_FTS_paused, using a new
multiplier PAUSED_SYS = 65536.

purge_sys_t::wait_FTS(): Remove an unnecessary sleep as well as the
access to purge_sys.latch. It suffices to poll purge_sys.m_FTS_paused.

purge_sys_t::stop_FTS(): Do not acquire purge_sys.latch.

Reviewed by: Debarun Banerjee
2024-08-26 12:22:44 +03:00
Marko Mäkelä
9db2b327d4 MDEV-34759: buf_page_get_low() is unnecessarily acquiring exclusive latch
buf_page_ibuf_merge_try(): A new, separate function for invoking
ibuf_merge_or_delete_for_page() when needed. Use the already requested
page latch for determining if the call is necessary. If it is and
if we are currently holding rw_latch==RW_S_LATCH, upgrading to an exclusive
latch may involve waiting that another thread acquires and releases
a U or X latch on the page. If we have to wait, we must recheck if the
call to ibuf_merge_or_delete_for_page() is still needed. If the page
turns out to be corrupted, we will release and fail the operation.
Finally, the exclusive page latch will be downgraded to the originally
requested latch.

ssux_lock_impl::rd_u_upgrade_try(): Attempt to upgrade a shared lock to
an update lock.

sux_lock::s_x_upgrade_try(): Attempt to upgrade a shared lock to
exclusive.

sux_lock::s_x_upgrade(): Upgrade a shared lock to exclusive.
Return whether a wait was elided.

ssux_lock_impl::u_rd_downgrade(), sux_lock::u_s_downgrade():
Downgrade an update lock to shared.
2024-08-23 13:27:50 +03:00
Monty
1f040ae048 MDEV-34043 Drastically slower query performance between CentOS (2sec) and Rocky (48sec)
One cause of the slowdown is because the ftruncate call can be much
slower on some systems.  ftruncate() is called by Aria for internal
temporary tables, tables created by the optimizer, when the upper level
asks Aria to delete the previous result set. This is needed when some
content from previous tables changes.

I have now changed Aria so that for internal temporary tables we don't
call ftruncate() anymore for maria_delete_all_rows().

I also had to update the Aria repair code to use the logical datafile
size and not the on-disk datafile size, which may contain data from a
previous result set.  The repair code is called to create indexes for
the internal temporary table after it is filled.
I also replaced a call to mysql_file_size() with a pwrite() in
_ma_bitmap_create_first().

Reviewer: Sergei Petrunia <sergey@mariadb.com>
Tester: Dave Gosselin <dave.gosselin@mariadb.com>
2024-08-21 22:47:29 +03:00
Marko Mäkelä
267c0fce56 Merge 10.5 into 10.6 2024-08-15 10:16:46 +03:00
Marko Mäkelä
e40dfcdd89 Fix clang++-19 -Wunused-but-set-variable 2024-08-15 10:13:49 +03:00
Marko Mäkelä
757c368139 Merge 10.5 into 10.6 2024-08-14 10:56:11 +03:00
Marko Mäkelä
4f8803c036 MDEV-34678 pthread_mutex_init() without pthread_mutex_destroy()
When SUX_LOCK_GENERIC is defined, the srw_mutex, srw_lock, sux_lock are
implemented based on pthread_mutex_t and pthread_cond_t.  This is the
only option for systems that lack a futex-like system call.

In the SUX_LOCK_GENERIC mode, if pthread_mutex_init() is allocating
some resources that need to be freed by pthread_mutex_destroy(),
a memory leak could occur when we are repeatedly invoking
pthread_mutex_init() without a pthread_mutex_destroy() in between.

pthread_mutex_wrapper::initialized: A debug field to track whether
pthread_mutex_init() has been invoked.  This also helps find bugs
like the one that was fixed by
commit 1c8af2ae53 (MDEV-34422);
one simply needs to add -DSUX_LOCK_GENERIC to the CMAKE_CXX_FLAGS
to catch that particular bug on the initial server bootstrap.

buf_block_init(), buf_page_init_for_read(): Invoke block_lock::init()
because buf_page_t::init() will no longer do that.

buf_page_t::init(): Instead of invoking lock.init(), assert that it
has already been invoked (the lock is vacant).

add_fts_index(), build_fts_hidden_table(): Explicitly invoke
index_lock::init() in order to avoid a pthread_mutex_destroy()
invocation on an uninitialized object.

srw_lock_debug::destroy(): Invoke readers_lock.destroy().

trx_sys_t::create(): Invoke trx_rseg_t::init() on all rollback segments
in order to guarantee a deterministic state for shutdown, even if
InnoDB fails to start up.

trx_rseg_array_init(), trx_temp_rseg_create(), trx_rseg_create():
Invoke trx_rseg_t::destroy() before trx_rseg_t::init() in order to
balance pthread_mutex_init() and pthread_mutex_destroy() calls.
2024-08-14 07:54:15 +03:00
Jan Lindström
cd8b8bb964 MDEV-34594 : Assertion `client_state.transaction().active()' failed in
int wsrep_thd_append_key(THD*, const wsrep_key*, int, Wsrep_service_key_type)

CREATE TABLE [SELECT|REPLACE SELECT] is CTAS and idea was that
we force ROW format. However, it was not correctly enforced
and keys were appended before wsrep transaction was started.

At THD::decide_logging_format we should force used stmt binlog
format to ROW in CTAS case and produce a warning if used
binlog format was not ROW.

At ha_innodb::update_row we should not append keys similarly
as in ha_innodb::write_row if sql_command is SQLCOM_CREATE_TABLE.
Improved error logging on ::write_row, ::update_row and ::delete_row
if wsrep key append fails.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-08-12 23:54:30 +02:00
Oleksandr Byelkin
2e580dc2a8 Merge branch '10.6' into mariadb-10.6.19 2024-08-09 08:51:21 +02:00
Nikita Malyavin
25e2d0a6bb MDEV-34632 Assertion failed in handler::assert_icp_limitations
Assertion `table->field[0]->ptr >= table->record[0] &&
table->field[0]->ptr <= table->record[0] + table->s->reclength' failed in
handler::assert_icp_limitations.

table->move_fields has some limitations:
1. It cannot be used in cascade
2. It should always have a restoring pair.

Rule 1 is covered by assertions in handler::assert_icp_limitations
and handler::ptr_in_record (commit 30894fe9a9).

Rule 2 should be manually maintained with care. Hopefully, the rule 1 assertions
may sometimes help as well.

In ha_myisam::repair, both rules are broken. table->move_fields is used
asymmetrically there: it is set on every param->fix_record call
(i.e. in compute_vcols) but is restored only once, in the end of repair.

The reason to updating field ptr's for every call is that compute_vcols can
(supposedly) be called in parallel, that is, with the same table, but different
records.

The condition to "unmove" the pointers in ha_myisam::restore_vcos_after_repair
is incorrect, when stored vcols are available, and myisam stores a VIRTUAL field
if it's the only field in the table (the record cannot be of zero length).

This patch solves the problem by "unmoving" the pointers symmetrically, in
compute_vcols. That is, both rules will be preserved maintained.
2024-08-07 14:50:19 +02:00
Yuchen Pei
fa8ce92cc0
MDEV-34682 Return the return value of ddl recovery done in ha_initialize_handlerton
Otherwise it could cause false negative when ddl recovery done is part
of the plugin initialization
2024-08-07 15:13:08 +10:00
Yuchen Pei
bce3f3f628
MDEV-34682 Reset spider_hton_ptr in error mode of spider_db_init() 2024-08-07 15:10:56 +10:00
Oleksandr Byelkin
8f020508c8 Merge branch '10.5' into 10.6 2024-08-03 09:04:24 +02:00
Thirunarayanan Balathandayuthapani
37119cd256 MDEV-29010 Table cannot be loaded after instant ALTER
Reason:
======
- InnoDB fails to load the instant alter table metadata from
clustered index while loading the table definition.
The reason is that InnoDB metadata blob has the column length
exceeds maximum fixed length column size.

Fix:
===
- InnoDB should treat the long fixed length column as variable
length fields that needs external storage while initializing
the field map for instant alter operation
2024-08-01 18:58:43 +05:30
Thirunarayanan Balathandayuthapani
533e6d5d13 MDEV-34670 IMPORT TABLESPACE unnecessary traverses tablespace list
Problem:
========
- After the commit ada1074bb1 (MDEV-14398)
fil_crypt_set_encrypt_tables() iterates through all tablespaces to
fill the default_encrypt tables list. This was a trigger to
encrypt or decrypt when key rotation age is set to 0. But import
tablespace does call fil_crypt_set_encrypt_tables() unnecessarily.
The motivation for the call is to signal the encryption threads.

Fix:
====
ha_innobase::discard_or_import_tablespace: Remove the
fil_crypt_set_encrypt_tables() and add the import tablespace
to the default encrypt list if necessary
2024-07-31 14:13:38 +05:30
Thirunarayanan Balathandayuthapani
ee5f7692d7 MDEV-34357 InnoDB: Assertion failure in file ./storage/innobase/page/page0zip.cc line 4211
During InnoDB root page split, InnoDB does the following
1) First move the root records to the new page(p1)
2) Empty the root, insert the node pointer to the root page
3) Split the new page and make it as child nodes.
4) Finds the split record, allocate another new page(p2)
to the index
5) InnoDB stores the record(ret) predecessor to the supremum
record of the page (p2).
6) In page_copy_rec_list_start(), move the records from p1 to p2
upto the split record
6) Given table is a compressed row format page, InnoDB attempts to
compress the page p2 and failed (due to innodb_compression_level = 0)
7) Since the compression fails, InnoDB gets the number of preceding
records(ret_pos) of a record (ret) on the page (p2)
8) Page (p2) is a new page, ret points to infimum record.
ret_pos can be 0. InnoDB have wrong condition that ret_pos shouldn't
be 0 and returns corruption. InnoDB has similar wrong check in
page_copy_rec_list_end()
2024-07-30 14:36:34 +05:30
Thirunarayanan Balathandayuthapani
c038b3c05e MDEV-34181 Instant table aborts after discard tablespace
- commit 85db534731 (MDEV-33400)
retains the instantness in the table definition after discard
tablespace. So there is no need to assign n_core_null_bytes
during instant table preparation unless they are not
initialized.
2024-07-30 13:31:43 +05:30
Monty
4bf7c966b3 MDEV-34664: Add an option to fix InnoDB's doubling of secondary index cardinalities
(With trivial fixes by sergey@mariadb.com)
Added option fix_innodb_cardinality to optimizer_adjust_secondary_key_costs

Using fix_innodb_cardinality disables the 'divide by 2' of rec_per_key_int
in InnoDB that in effect doubles the Cardinality for secondary keys.
This has the biggest effect for indexes where a few rows has the same key
value. Using this may also cause table scans for very small tables (which
in some cases may be better than an index scan).

The user visible effect is that 'SHOW INDEX FROM table_name' will for
InnoDB show the true Cardinality (and not 2x the real value). It will
also allow the optimizer to chose a better index in some cases as the
division by 2 could have a bad effect for tables with 2-5 identical values
per key.

A few notes about using fix_innodb_cardinality:
- It has direct affect for SHOW INDEX FROM table_name. SHOW INDEX
  will also update the statistics in table share.
- The effect of fix_innodb_cardinality for query plans or EXPLAIN
  is only visible after first open of the table. This is why one must
  do a flush tables or use SHOW INDEX for the option to take effect.
- Using fix_innodb_cardinality can thus affect all user in their query
  plans if they are using the same tables.

Because of this, it is strongly recommended that one uses
optimizer_adjust_secondary_key_costs=fix_innodb_cardinality mainly
in configuration files to not cause issues for other users.
2024-07-29 16:40:53 +03:00
Marko Mäkelä
7e5c9ccda5 MDEV-34502 fixup: Do not cripple MSAN
We need to work around deficiencies of Valgrind, and apparently
the previous work-around attempts
(such as d247d64988) do not work
anymore, definitely not on recent clang-based compilers.

MemorySanitizer should be fine; unfortunately we set HAVE_valgrind for it
as well.
2024-07-29 15:04:16 +03:00
Thirunarayanan Balathandayuthapani
3359ac09a4 MDEV-34066 Output of SHOW ENGINE INNODB STATUS uses the nanoseconds suffix for microseconds
- This issue is caused by commit e71e613353
(MDEV-24671). Change the output of transaction lock wait
time in microseconds suffix.
2024-07-23 21:36:13 +05:30
Oleksandr Byelkin
9af2caca33 Merge branch '10.5' into 10.6 2024-07-18 16:25:33 +02:00
Sergei Golubchik
d20518168a also protect the /*!999999 sandbox comment 2024-07-17 21:25:40 +02:00
Sutou Kouhei
383d53edbc MDEV-21166 Add Mroonga initialized check to Mroonga UDFs
Mroonga UDFs can't be used without loading Mroonga.
2024-07-17 15:52:58 +10:00
Yuchen Pei
03a350378a
MDEV-30408 Reset explicit_limit in exists2in
Item_exists_subselect::fix_length_and_dec() sets explicit_limit to 1.
In the exists2in transformation it resets select_limit to NULL. For
consistency we should reset explicity_limit too.

This fixes a bug where spider table returns wrong results for queries
that gets through exists2in transformation when semijoin is off.
2024-07-17 08:54:40 +08:00
Yuchen Pei
8416fd323c
MDEV-32627 [fixup] Spider: Fix conn key length
To avoid off-by-one in spider_get_share.

And document conn key and conn key length.
2024-07-17 08:52:35 +08:00
Yuchen Pei
20d99f3f86
MDEV-32627 Distinguish between absence of a keyword and empty value for the keyword
Distinguish them in two place:

when constructing connection key in create_conn_key and
spider_create_conn for both ordinary queries and spider_direct_sql

For spider_create_conn and spider_udf_direct_sql_create_conn, the
created conn gets assigned a field from the source object if and only
if source->field is non-null.

For spider_create_conn_keys and spider_udf_direct_sql_create_conn_key,
we update the encoding so that absence of keyword and keyword with an
empty value result in different keys. More specifically, if the i-th
keyword has a value, the corresponding part in the conn key begins
with the char \i, followed by the possibly empty value and ends with a
\0. If the i-th keyword is not specified, then it does not get a
mention in the conn key.

As part of this change, we also update table param / option parsing to
recognise a singleton empty string when creating an string list,
instead of writing it off as NULL.
2024-07-17 08:51:44 +08:00
Yuchen Pei
6f3baec4f5
MDEV-32627 Spider: add tests for connection param overriding
When connection parameters are specified in COMMENT, CONNECTION and
SERVER, they are overriden in the following order:

COMMENT > CONNECTION > SERVER
2024-07-17 08:51:44 +08:00
Yuchen Pei
384ec03e48
MDEV-34421 Check the SQL command when resolving storage engine
ENGINE_SUBSTITUTION only applies to CREATE TABLE and ALTER TABLE, and
Storage_engine_name::resolve_storage_engine_with_error() could be
called when executing any sql command.
2024-07-16 16:33:05 +08:00
Yuchen Pei
132270d3de
MDEV-34541 Clean up spider self reference check
SPIDER_CONN::loop_check_meraged_first is useless, because all
SPIDER_CONN_LOOP_CHECKs are in SPIDER_CONN::loop_check_queue, which in
spider_db_conn::fin_loop_check() is iterated over.

This fixes the use-after-free issue when there are three spider tables
sharing the same remote, and their corresponding
SPIDER_CONN_LOOP_CHECKs getting merged in
spider_conn_queue_and_merge_loop_check()

This also fixes MDEV-34555
2024-07-16 16:33:05 +08:00
Yuchen Pei
0bb9862888
MDEV-29962 Spider: creates connections if needed before lock_tables()
Same cause as MDEV-31996. This prevents reuse of conns freed
previously, e.g. during the commit of previous statement.

It does not occur in 10.4 because of the commit for MDEV-19002 removed
the call to spider_check_trx_and_get_conn().
2024-07-16 16:33:05 +08:00
Yuchen Pei
696c2497fc
MDEV-27902 Spider check trx and get conn in rnd_next()
This allows creation of SPIDER_CONN on demand, if the previous one was
freed.

Also, the first_link_idx's are reset during
spider_check_trx_and_get_conn(), which in the case of remote HANDLER
commands, might not match the link to use correct first_link_idx for
remote HANDLER statement that was later set in
ha_spider::rnd_handler_init() (causing testing regressions in the
spider/handler suite). Therefore we fix the first_link_idx there.
2024-07-16 16:33:05 +08:00
Yuchen Pei
85a36958e3
MDEV-27902 Some Spider code documentation regarding first_link_idx and remote HANDLER commands 2024-07-16 16:33:04 +08:00
Yuchen Pei
14c4050992
MDEV-32492 Delete and remove trx_ha on spider share mismatch
A SPIDER_TRX_HA associated with a SPIDER_TRX could have longer
lifetime than its associated SPIDER_SHARE. And it is identified with
the associated table name. When the SPIDER_SHARE no longer valid, e.g.
when the associated spider table has been dropped and recreated, the
SPIDER_TRX_HA should be reset too.

Since spider could create a new SPIDER_SHARE with the exact same
address of a freed SPIDER_SHARE, we try to mark all SPIDER_TRX_HAs
associated with a SPIDER_SHARE invalid when the SPIDER_SHARE is about
to be freed.
2024-07-16 16:33:04 +08:00
Yuchen Pei
653050ac84
MDEV-32492 MDEV-29676 Spider: some code documentation and cleanup
Some documentations come from commit for MDEV-29676, and the rest are
mainly related to SPIDER_TRX_HA.

Removed SPIDER_TRX_HA::trx as it is unused.
2024-07-16 16:33:04 +08:00
Yuchen Pei
f071b7620b
Merge branch '10.5' into 10.6 2024-07-16 15:54:22 +08:00
Thirunarayanan Balathandayuthapani
46d95ae265 MDEV-34474 InnoDB: Failing assertion: stat_n_leaf_pages > 0 in ha_innobase::estimate_rows_upper_bound
- Fixing the compilation issue.
2024-07-15 14:44:59 +05:30
Yuchen Pei
3b956f8329
MDEV-34449 Default ha_spider::bulk_size to 0.
This fixes a valgrind failure where the bulk_size is used before
initialised in ha_spider::end_bulk_insert().
2024-07-15 09:30:01 +08:00
Thirunarayanan Balathandayuthapani
00d2c7f7f4 MDEV-34542 Assertion `lock_trx_has_sys_table_locks(trx) == __null' failed in void row_mysql_unfreeze_data_dictionary(trx_t*)
- During XA PREPARE, InnoDB releases the non-exclusive locks.
But it fails to remove the non-exclusive table lock from the
transaction table locks. In the mean time, main thread evicts
the table from the LRU cache. While rollbacking the XA transaction,
InnoDB iterates through the table locks to check whether it
holds lock on any system tables and wrongly assumes the
evicted table as system table since the table id is 0

Fix:
===
During XA PREPARE, remove the table locks of the transaction while
releasing the non-exclusive locks.
2024-07-12 17:42:14 +05:30
Thirunarayanan Balathandayuthapani
3662f8ca89 MDEV-34543 Shutdown hangs while freeing the asynchronous i/o slots
Problem:
========
- During shutdown, InnoDB tries to free the asynchronous
I/O slots and hangs. The reason is that InnoDB disables
asynchronous I/O before waiting for pending
asynchronous I/O to finish.

buf_load(): InnoDB aborts the buffer pool load due to
user requested shutdown and doesn't wait for the asynchronous
read to get completed. This could lead to debug assertion
in buf_flush_buffer_pool() during shutdown

Fix:
===
os_aio_free(): Should wait all read_slots and write_slots
to finish before disabling the aio.

buf_load(): Should wait for pending read request to complete
even though it was aborted.
2024-07-12 17:42:14 +05:30
Thirunarayanan Balathandayuthapani
d9ffefdaac Merge 10.5 into 10.6 2024-07-10 18:27:23 +02:00
Thirunarayanan Balathandayuthapani
60125a77b7 MDEV-34474 InnoDB: Failing assertion: stat_n_leaf_pages > 0 in ha_innobase::estimate_rows_upper_bound
- Column stat_value and sample_size in mysql.innodb_index_stats
table is declared as BIGINT UNSIGNED without any check constraint.
user manually updates the value of stat_value and sample_size
to zero. InnoDB aborts the server while reading the statistics
information because InnoDB expects at least one leaf
page to exist for the index.

- To fix this issue, InnoDB should interpret the value of
stat_n_leaf_pages, stat_index_size in innodb_index_stats
stat_clustered_index_size, stat_sum_of_other_index_sizes
in innodb_table_stats as valid one even though user
mentioned it as 0.
2024-07-10 11:43:03 +05:30
Julius Goryavsky
4026f04425 Merge branch 10.5 into 10.6 2024-07-09 11:56:47 +02:00
Denis Protivensky
b7718a1c1c MDEV-32738: Don't roll back high-prio txn waiting on a lock in InnoDB
DML transactions on FK-child tables also get table locks
on FK-parent tables. If there is a DML transaction holding
such a lock, and a TOI transaction starts, the latter
BF-aborts the former and puts itself into a waiting state.
If at this moment another DML transaction on FK-child table
starts, it doesn't check that the transaction waiting on
a parent table lock is TOI, and it erroneously BF-aborts
the waiting TOI transaction.

The fix: don't roll back high-priority transaction waiting
on a lock in InnoDB, instead roll back an incoming DML
transaction.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-07-08 23:36:21 +02:00
Alexander Barkov
e56040fee8 Merge remote-tracking branch 'origin/10.5' into 10.6 2024-07-08 18:59:04 +04:00
Anson Chung
215fab68db Perform simple fixes for cppcheck findings
Rectify cases of mismatched brackets and address
possible cases of division by zero by checking if
the denominator is zero before dividing.

No functional changes were made.

All new code of the whole pull request, including one or several
files that are either new files or modified ones, are contributed
under the BSD-new license. I am contributing on behalf of my
employer Amazon Web Services, Inc.
2024-07-08 10:51:48 +01:00
Monty
33964984ad MDEV-34522 Index for (specific) Aria table is created as corrupted
The issue was that when repairing an Aria table of row format PAGE and
the data file was bigger the 4G, the data file length was cut short
because of wrong parameters to MY_ALIGN().

The effect was that ALTER TABLE, OPTIMIZE TABLE or REPAIR TABLE would fail
on these tables, possibly corrupting them.
The MDEV also exposed a bug where error state was not propagated properly
to the upper level if the number of rows in the table changed.
2024-07-07 12:43:44 +03:00
Thirunarayanan Balathandayuthapani
834c013b64 MDEV-34519 innodb_log_checkpoint_now crashes when innodb_read_only is enabled
During read only mode, InnoDB doesn't allow checkpoint to happen.
So InnoDB should throw the warning when InnoDB tries to
force the checkpoint when innodb_read_only = 1 or
innodb_force_recovery = 6.
2024-07-05 15:26:05 +05:30
Sergei Petrunia
513c827041 MDEV-34190: r_engine_stats.pages_read_count is unrealistically low
The symptoms were: take a server with no activity and a table that's
not in the buffer pool. Run a query that reads the whole table and
observe that r_engine_stats.pages_read_count shows about 2% of the table
was read. Who reads the rest?

The cause was that page prefetching done inside InnoDB was not counted.

This counts page prefetch requests made in buf_read_ahead_random() and
buf_read_ahead_linear() and makes them visible in:

- ANALYZE: r_engine_stats.pages_prefetch_read_count
- Slow Query Log: Pages_prefetched:

This patch intentionally doesn't attempt to count the time to read the
prefetched pages:
* there's no obvious place where one can do it
* prefetch reads may be done in parallel (right?), it is not clear how
  to count the time in this case.
2024-07-04 15:24:49 +03:00
mariadb-DebarunBanerjee
73ad436e16 MDEV-34458 wait_for_read in buf_page_get_low hurts performance
The performance regression seen while loading BP is caused by the
deadlock fix given in MDEV-33543. The area of impact is wider but is
more visible when BP is being loaded initially via DMLs.  Specifically
the response time could be impacted in DML doing pessimistic operation
on index(split/merge) and the leaf pages are not found in buffer pool.
It is more likely to occur with small BP size.

The origin of the issue dates back to MDEV-30400 that introduced
btr_cur_t::search_leaf() replacing btr_cur_search_to_nth_level() for
leaf page searches. In btr_latch_prev, we use RW_NO_LATCH to get the
previous page fixed in BP without latching. When the page is not in BP,
we try to acquire and wait for S latch violating the latching order.

This deadlock was analyzed in MDEV-33543 and fixed by using the already
present wait logic in buf_page_get_gen() instead of waiting for latch.
The wait logic is inferior to usual S latch wait and is simply a
repeated sleep 100 of micro-sec (The actual sleep time could be more
depending on platforms). The bug was seen with "change-buffering" code
path and the idea was that this path should be less exercised. The
judgement was not correct and the path is actually quite frequent and
does impact performance when pages are not in BP and being loaded by
DML expanding/shrinking large data.

Fix: While trying to get a page with RW_NO_LATCH and we are attempting
"out of order" latch, return from buf_page_get_gen immediately instead
of waiting and follow the ordered latching path.
2024-07-03 18:08:43 +05:30
Oleksandr Byelkin
dcd8a64892 Merge branch '10.5' into 10.6 2024-07-03 13:27:23 +02:00
Daniel Black
25c6e3e4b7 MDEV-34502 InnoDB debug mode build - asserts with Valgrind
Valgrind looks as the assertions as examining uninitalized values.

As the assertions are tested in other Debug builds we know
it isn't all invalid.

Account for Valgrind by removing the assertion under
the WITH_VALGRIND=1 compulation.
2024-07-03 20:07:04 +10:00
Denis Protivensky
f6fcfc1a6a MDEV-33064 amendment: replace trx->wsrep=0 with an assert in trx_rollback_for_mysql()
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-07-01 13:08:11 +02:00
Denis Protivensky
cfbd57dfb7 MDEV-33064: Sync trx->wsrep state from THD on trx start
InnoDB transactions may be reused after committed:
- when taken from the transaction pool
- during a DDL operation execution

In this case wsrep flag on trx object is cleared, which may cause wrong
execution logic afterwards (wsrep-related hooks are not run).

Make trx->wsrep flag initialize from THD object only once on InnoDB transaction
start and don't change it throughout the transaction's lifetime.
The flag is reset at commit time as before.

Unconditionally set wsrep=OFF for THD objects that represent InnoDB background
threads.

Make Wsrep_schema::store_view() operate in its own transaction.

Fix streaming replication transactions' fragments rollback to not switch
THD->wsrep value during transaction's execution
(use THD->wsrep_ignore_table as a workaround).

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-07-01 13:07:39 +02:00
Marko Mäkelä
d1ecf5cc5f MDEV-32176 Contention in ha_innobase::info_low()
During a Sysbench oltp_point_select workload with 1 table and 400
concurrent connections, a bottleneck on dict_table_t::lock_mutex was
observed in ha_innobase::info_low().

dict_table_t::lock_latch: Replaces lock_mutex.

In ha_innobase::info_low() and several other places, we will acquire
a shared dict_table_t::lock_latch or we may elide the latch if
hardware memory transactions are available.

innobase_build_v_templ(): Remove the parameter "bool locked", and
require the caller to hold exclusive dict_table_t::lock_latch
(instead of holding an exclusive dict_sys.latch).

Tested by: Vladislav Vaintroub
Reviewed by: Vladislav Vaintroub
2024-06-28 15:57:07 +03:00
Meng-Hsiu Chiang
55db59f16d [MDEV-28162] Replace PFS_atomic with std::atomic<T>
PFS_atomic class contains wrappers around my_atomic_* operations, which
are macros to GNU atomic operations (__atomic_*). Due to different
implementations of compilers, clang may encounter errors when compiling
on x86_32 architecture.

The following functions are replaced with C++ std::atomic type in
performance schema code base:
  - PFS_atomic::store_*()
      -> my_atomic_store*
        -> __atomic_store_n()
    => std::atomic<T>::store()

  - PFS_atomic::load_*()
      -> my_atomic_load*
        -> __atomic_load_n()
    => std::atomic<T>::load()

  - PFS_atomic::add_*()
      -> my_atomic_add*
        -> __atomic_fetch_add()
    => std::atomic<T>::fetch_add()

  - PFS_atomic::cas_*()
    -> my_atomic_cas*
      -> __atomic_compare_exchange_n()
    => std::atomic<T>::compare_exchange_strong()

and PFS_atomic class could be dropped completely.

Note that in the wrapper memory order passed to original GNU atomic
extensions are hard-coded as `__ATOMIC_SEQ_CST`, which is equivalent to
`std::memory_order_seq_cst` in C++, and is the default parameter for
std::atomic_* functions.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services.
2024-06-27 09:27:12 +03:00
Yuchen Pei
c9212b7a43
MDEV-33746 [fixup] Add suggested overrides in mroonga 2024-06-26 11:08:56 +08:00
Yuchen Pei
d7042ec4da
Merge branch '10.5' into 10.6 2024-06-26 09:16:54 +08:00
Yuchen Pei
ad0ee8cdb9
MDEV-33746 [fixup] Add suggested overrides in oqgraph 2024-06-25 14:22:25 +08:00
Yuchen Pei
01289dac41
MDEV-34361 Split my.cnf in the spider suite.
Just like the spider/bugfix suite.

One caveat is that my_2_3.cnf needs something under mysqld.2.3 group,
otherwise mtr will fail with something like:

There is no group named 'mysqld.2.3' that can be used to resolve
'port' for ...

This will allow new tests under the spider suite to use what is
needed. It also somehow fixes issues of running a test followed by
spider.slave_trx_isolation.
2024-06-25 13:45:04 +08:00
Yuchen Pei
aebd2397cc
MDEV-34404 Use safe_str in spider udfs to avoid passing NULL str 2024-06-25 13:45:04 +08:00
Marko Mäkelä
0076eb3d4e Merge 10.5 into 10.6 2024-06-24 13:09:47 +03:00
Marko Mäkelä
acc077ffa1 MDEV-34443 ha_innobase::info_low() does not distinguish HA_STATUS_VARIABLE_EXTRA
ha_innobase::info_low(): For HA_STATUS_VARIABLE without
HA_STATUS_VARIABLE_EXTRA, let us avoid unnecessary and costly updates
of the data_free statistics, which are only needed for SHOW TABLE STATUS.
This optimization had been enabled in
commit 247ecb7597 but not utilized until now.
2024-06-24 10:39:13 +03:00
Dave Gosselin
db0c28eff8 MDEV-33746 Supply missing override markings
Find and fix missing virtual override markings.  Updates cmake
maintainer flags to include -Wsuggest-override and
-Winconsistent-missing-override.
2024-06-20 11:32:13 -04:00
Vlad Lesin
0a199cb810 MDEV-34108 Inappropriate semi-consistent read in RC if innodb_snapshot_isolation=ON
The fixes in b8a6719889 have not disabled
semi-consistent read for innodb_snapshot_isolation=ON mode, they just allowed
read uncommitted version of a record, that's why the test for MDEV-26643 worked
well.

The semi-consistent read should be disabled on upper level in
row_search_mvcc() for READ COMMITTED isolation level.

Reviewed by Marko Mäkelä.
2024-06-20 16:11:54 +03:00
Thirunarayanan Balathandayuthapani
ab448d4b34 MDEV-34389 Avoid log overwrite in early recovery
- InnoDB tries to write FILE_CHECKPOINT marker during
early recovery when log file size is insufficient.
While updating the log checkpoint at the end of the recovery,
InnoDB must already have written out all pending changes
to the persistent files. To complete the checkpoint, InnoDB
has to write some log records for the checkpoint and to
update the checkpoint header. If the server gets killed
before updating the checkpoint header then it would lead
the logfile to be unrecoverable.

- This patch avoids FILE_CHECKPOINT marker during early
recovery and narrows down the window of opportunity to
make the log file unrecoverable.
2024-06-20 17:54:57 +05:30
Iaroslav Babanin
5d49a2add7 MDEV-33935 fix deadlock counter
- The deadlock counter was moved from
Deadlock::find_cycle into Deadlock::report, because
the find_cycle method is called multiple times during deadlock
detection flow, which means it shouldn't have such side effects.
But report() can, which called only once for
a victim transaction.
- Also the deadlock_detect.test and *.result test case
has been extended to handle the fix.
2024-06-19 20:43:33 +03:00
Jan Lindström
ee974ca5e0 MDEV-31658 : Deadlock found when trying to get lock during applying
Problem was that there was two non-conflicting local idle
transactions in node_1 that both inserted a key to primary key.
Then two transactions from other nodes inserted also
a key to primary key so that insert from node_2 conflicted
one of the local transactions in node_1 so that there would
be duplicate key if both are committed. For this insert
from other node tries to acquire S-lock for this record
and because this insert is high priority brute force (BF)
transaction it will kill idle local transaction.

Concurrently, second insert from node_3 conflicts the second
idle insert transaction in node_1. Again, it tries to acquire
S-lock for this record and kills idle local transaction.

At this point we have two non-conflicting high priority
transactions holding S-lock on different records in node_1.
For example like this: rec s-lock-node2-rec s-lock-node3-rec rec.

Because these high priority BF-transactions do not wait
each other insert from node3 that has later seqno compared
to insert from node2 can continue. It will try to acquire
insert intention for record it tries to insert (to avoid
duplicate key to be inserted by local transaction). Hower,
it will note that there is conflicting S-lock in same gap
between records. This will lead deadlock error as we have
defined that BF-transactions may not wait for record lock
but we can't kill conflicting BF-transaction because
it has lower seqno and it should commit first.

BF-transactions are executed concurrently because their
values to primary key are different i.e. they do not
conflict.

Galera certification will make sure that inserts from
other nodes i.e these high priority BF-transactions
can't insert duplicate keys. Local transactions naturally
can but they will be killed when BF-transaction
acquires required record locks.

Therefore, we can allow situation where there is conflicting
S-lock and insert intention lock regardless of their seqno
order and let both continue with no wait. This will lead
to situation where we need to allow BF-transaction
to wait when lock_rec_has_to_wait_in_queue is called
because this function is also called from
lock_rec_queue_validate and because lock is waiting
there would be assertion in ut_a(lock->is_gap()
|| lock_rec_has_to_wait_in_queue(cell, lock));

lock_wait_wsrep_kill
  Add debug sync points for BF-transactions killing
  local transaction.

wsrep_assert_no_bf_bf_wait
  Print also requested lock information

lock_rec_has_to_wait
  Add function to handle wsrep transaction lock wait
  cases.

lock_rec_has_to_wait_wsrep
  New function to handle wsrep transaction lock wait
  exceptions.

lock_rec_has_to_wait_in_queue
  Remove wsrep exception, in this function all
  conflicting locks need to wait in queue.
  Conflicts between BF and local transactions
  are handled in lock_wait.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-06-19 14:09:11 +02:00
Marko Mäkelä
5b26a07698 MDEV-34178: Enable spinloop for index_lock
In an I/O bound concurrent INSERT test conducted by Mark Callaghan,
spin loops on dict_index_t::lock turn out to be beneficial.

This is a mixed bag; enabling the spin loops will improve throughput
and latency on some workloads and degrade in others.

Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
Performance tested by: Axel Schwenke
2024-06-19 13:41:11 +03:00
Marko Mäkelä
f8d213bd0e MDEV-34178: Improve the spin loops
srw_mutex_impl<spinloop>::wait_and_lock(): Invoke srw_pause() and
reload the lock word on each loop. Thanks to Mark Callaghan for
suggesting this.

ssux_lock_impl<spinloop>::rd_wait(): Actually implement a spin loop
on the rw-lock component without blocking on the mutex component.
If there is a conflict with wr_lock(), wait for writer.lock to be
released without actually acquiring it.

Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
2024-06-19 13:40:57 +03:00
Marko Mäkelä
6cde03aedc MDEV-34178: Improve PERFORMANCE_SCHEMA instrumentation
When MariaDB is built with PERFORMANCE_SCHEMA support enabled
and with futex-based rw-locks (not srw_lock_), we were unnecessarily
releasing and reacquiring lock.writer in srw_lock_impl::psi_wr_lock()
and ssux_lock::psi_wr_lock().

If there is a conflict with rd_lock(), let us hold the lock.writer
and execute u_wr_upgrade() to wait for rd_unlock().

Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
2024-06-19 13:30:23 +03:00
Marko Mäkelä
2bd661ca10 MDEV-34178: Simplify the U lock
The U lock mode of the sux_lock that was introduced in
commit 03ca6495df (MDEV-24142)
is unnecessarily complex.

Internally, sux_lock comprises two parts, each with their own wait queue
inside the operating system kernel: a mutex and a rw-lock.

We can map the operations as follows:

x_lock(): (X,X)
u_lock(): (X,_)
s_lock(): (_,S)

The Update lock mode, which is mutually exclusive with itself and with
X (exclusive) locks but not with shared (S) locks, was unnecessarily
acquiring a shared lock on the second component. The mutual exclusion
is guaranteed by the first component.

We might simplify the #ifdef SUX_LOCK_GENERIC case further by omitting
srw_mutex_impl::lock, because it is kind-of duplicating the mutex
that we will use for having a wait queue. However, the predicate
buf_page_t::can_relocate() would depend on the predicate
is_locked_or_waiting(), which is not available for pthread_mutex_t.

Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
2024-06-18 18:22:28 +03:00
Marko Mäkelä
e60acae655 Merge 10.5 into 10.6 2024-06-17 08:40:07 +03:00
Marko Mäkelä
4b4c371fe7 MDEV-34297 fixup: -Wconversion on 32-bit 2024-06-14 13:21:19 +03:00
Thirunarayanan Balathandayuthapani
3271588bb7 MDEV-34381 During innodb_undo_truncate=ON recovery, InnoDB may fail to shrink undo* files
- During recovery, InnoDB may fail to shrink the undo tablespaces
when there are no pages to recover while applying the redo log.
This issue exists only when innodb_undo_truncate is enabled.
trx_lists_init_at_db_start() could've applied the redo logs
for undo tablespace page0.
2024-06-14 12:46:02 +05:30
Brandon Nesterenko
d3a7e46bb4 MDEV-34365: UBSAN runtime error: call to function io_callback(tpool::aiocb*)
On an UBSAN clang-15 build, if running with UBSAN option
halt_on_error=1 (the issue doesn't show up without it),
MTR fails during mysqld --bootstrap with UBSAN error:

call to function io_callback(tpool::aiocb*) through pointer to incorrect function type 'void (*)(void *)'

This patch corrects the parameter type of io_callback
to match its expected type defined by callback_func,
i.e. (void*).

Reviewed By:
============
<TODO>
2024-06-12 08:39:41 +03:00
Marko Mäkelä
fc9005adc4 Merge 10.5 into 10.6 2024-06-12 07:51:28 +03:00
Yuchen Pei
d524cb5b3d
MDEV-34002 Initialise fields in spider_db_handler
Otherwise it may result in nonsensical values like 190 for a boolean.
2024-06-11 09:18:42 +10:00
Marko Mäkelä
27834ebc91 Merge 10.5 into 10.6 2024-06-10 15:22:15 +03:00
Marko Mäkelä
a2bd936c52 MDEV-33161 Function pointer signature mismatch in LF_HASH
In cmake -DWITH_UBSAN=ON builds with clang but not with GCC,
-fsanitize=undefined will flag several runtime errors on
function pointer mismatch related to the lock-free hash table LF_HASH.

Let us use matching function signatures and remove function pointer
casts in order to avoid potential bugs due to undefined behaviour.

These errors could be caught at compilation time by
-Wcast-function-type-strict, which is available starting with clang-16,
but not available in any version of GCC as of now. The old GCC flag
-Wcast-function-type is enabled as part of -Wextra, but it specifically
does not catch these errors.

Reviewed by: Vladislav Vaintroub
2024-06-10 12:35:33 +03:00
Brandon Nesterenko
bf0aa99aeb MDEV-34237: On Startup: UBSAN: runtime error: call to function MDL_lock::lf_hash_initializer lf_hash_insert through pointer to incorrect function type 'void (*)(st_lf_hash *, void *, const void *)'
A few different incorrect function type UBSAN issues have been
grouped into this patch.

The only real potentially undefined behavior is an error about
show_func_mutex_instances_lost, which when invoked in
sql_show.cc::show_status_array(), puts 5 arguments onto the stack;
however, the implementing function only actually has 3 parameters (so
only 3 would be popped). This was fixed by adding in the remaining
parameters to satisfy the type mysql_show_var_func.

The rest of the findings are pointer type mismatches that wouldn't
lead to actual undefined behavior. The lf_hash_initializer function
type definition is

typedef void (*lf_hash_initializer)(LF_HASH *hash, void *dst, const void *src);

but the MDL_lock and table cache's implementations of this function
do not have that signature. The MDL_lock has specific MDL object
parameters:

static void lf_hash_initializer(LF_HASH *hash __attribute__((unused)),
                                MDL_lock *lock, MDL_key *key_arg)

and the table cache has specific TDC parameters:

static void tdc_hash_initializer(LF_HASH *,
                                 TDC_element *element, LEX_STRING *key)

leading to UBSAN runtime errors when invoking these functions.

This patch fixes these type mis-matches by changing the
implementing functions to use void * and const void * for their
respective parameters, and later casting them to their expected
type in the function body.

Note too the functions tdc_hash_key and tc_purge_callback had
a similar problem to tdc_hash_initializer and was fixed
similarly.

Reviewed By:
============
Sergei Golubchik <serg@mariadb.com>
2024-06-08 19:59:59 -06:00
Thirunarayanan Balathandayuthapani
4b4dbb23ea MDEV-34169 Don't allow innodb_open_files to be lesser than
number of non-user tablespace.

fil_space_t::try_to_close(): Don't try to close
the tablespace which is acquired by the caller of
the function

Added the suppression message in open_files_limit test case
2024-06-07 20:50:39 +05:30
Thirunarayanan Balathandayuthapani
b7a75fbb8a MDEV-34169 Don't allow innodb_open_files to be lesser than
number of non-user tablespace.

- InnoDB only closes the user tablespace when the number of open
files exceeds innodb_open_files limit. In that case, InnoDB should
make sure that innodb_open_files value should be greater
than number of undo tablespace, system and temporary tablespace files.
2024-06-07 15:37:11 +05:30
Marko Mäkelä
a687cf8661 Merge 10.5 into 10.6 2024-06-07 10:03:51 +03:00
Thirunarayanan Balathandayuthapani
a02773f7c0 MDEV-34057 Inconsistent FTS state in concurrent scenarios
Problem:
=======
- This commit is a merge of mysql commit 129ee47ef994652081a11ee9040c0488e5275b14.
InnoDB FTS can be in inconsistent state when sync operation
terminates the server before committing the operation. This
could lead to incorrect synced doc id and incorrect query results.

Solution:
========
- During sync commit operation, InnoDB should pass
the sync transaction to update the max doc id
in the config table.

fts_read_synced_doc_id() : This function is used
to read only synced doc id from the config table.
2024-06-06 19:09:13 +05:30
Marko Mäkelä
699d38d951 MDEV-34296 extern thread_local is a CPU waste
In commit 99bd22605938c42d876194f2ec75b32e658f00f5 (MDEV-31558)
we wrongly thought that there would be minimal overhead for accessing
a thread-local variable mariadb_stats.

It turns out that in C++11, each access to an extern thread_local
variable requires conditionally invoking an initialization function.
In fact, the initializer expression of mariadb_stats is dynamic, and
those calls were actually unavoidable.

In C++20, one could declare constinit thread_local variables, but
the address of a thread_local variable (&mariadb_dummy_stats) is not
a compile-time constant. We did not want to declare mariadb_dummy_stats
without thread_local, because then the dummy accesses could lead to
cache line contention between threads.

mariadb_stats: Declare as __thread or __declspec(thread) so that
there will be no dynamic initialization, but zero-initialization.

mariadb_dummy_stats: Remove. It is a lesser evil to let
the environment perform zero-initialization and check if
!mariadb_stats.

Reviewed by: Sergei Petrunia
2024-06-06 14:38:42 +03:00
Marko Mäkelä
9fac857f26 MDEV-34283 A misplaced btr_cur_need_opposite_intention() check may fail to prevent hangs
btr_cur_t::search_leaf(): Invoke btr_cur_need_opposite_intention() after
positioning page_cur.rec so that the record will be in the intended page.
This is something that was broken in
commit f2096478d5 or
commit de4030e4d4 or related changes.

btr_cur_need_opposite_intention(): Add a debug assertion that would
catch the misuse.

The "next line of defence" that should have caught this bug in debug builds
are assertions that mtr_t::m_memo contains MTR_MEMO_X_LOCK for the
dict_index_t::lock. When btr_cur_need_opposite_intention() holds,
we should escalate to acquiring an exclusive index->lock in
btr_cur_t::pessimistic_search_leaf().

Reviewed by: Debarun Banerjee
2024-06-06 13:03:34 +03:00
Marko Mäkelä
bc3660925d MDEV-34307 On startup, [FATAL] InnoDB: Page ... still fixed or dirty
buf_pool_invalidate(): Properly wait for
os_aio_wait_until_no_pending_writes() to ensure so that there
are no pending buf_page_t::write_complete() or buf_page_write_complete()
operations. This will avoid a failure of buf_pool.assert_all_freed().

This bug should affect debug builds only. At this point, the
buf_pool.flush_list should be clear and all changes should have
been written out. The loop around buf_LRU_scan_and_free_block() should
have eventually completed and freed all pages as soon as
buf_page_t::write_complete() had a chance to release the page latches.

It is worth noting that buf_flush_wait() is working as intended.
As soon as buf_flush_page_cleaner() invokes
buf_pool.get_oldest_modification() it will observe that
buf_page_t::write_complete() had assigned oldest_modification_ to 1,
and remove such blocks from buf_pool.flush_list. Upon reaching
buf_pool.flush_list.count=0 the buf_flush_page_cleaner() will mark
itself idle and wake buf_flush_wait() by broadcasting
buf_pool.done_flush_list.

This regression was introduced in
commit a55b951e60 (MDEV-26827).

Reviewed by: Debarun Banerjee
2024-06-06 10:18:42 +03:00
Vladislav Vaintroub
ce9efb4e02 MDEV-34296 tpool - declare thread_local_waiter "static thread_local" 2024-06-05 16:55:11 +02:00
mariadb-DebarunBanerjee
b12c14e3b4 MDEV-34265 Possible hang during IO burst with innodb_flush_sync enabled
When checkpoint age goes beyond the sync flush threshold and
buf_flush_sync_lsn is set, page cleaner enters into "furious flush"
stage to aggressively flush dirty pages from flush list and pull
checkpoint LSN above safe margin. In this stage, page cleaner skips
doing LRU flush and eviction.

In 10.6, all other threads entirely rely on page cleaner to generate
free pages. If free pages get over while page cleaner is busy in
"furious flush" stage, a session thread could wait for free page in the
middle of a min-transaction(mtr) while holding latches on other pages.

It, in turn, can prevent page cleaner to flush such pages preventing
checkpoint LSN to move forward creating a deadlock situation. Even
otherwise, it could create a stall and hang like situation for large BP
with plenty of dirty pages to flush before the stage could finish.

Fix: During furious flush, check and evict LRU pages after each flush
iteration.
2024-06-05 18:11:29 +05:30
Vladislav Vaintroub
40abd973ab MDEV-34236 Mroonga build with ASAN/UBSAN with GCC 12+ extremely slow.
Workaround by disabling sanitizer for single source file.
2024-06-05 11:58:53 +02:00
Monty
38cbef8b3f MDEV-22935 Erroneous Aria Index / Optimizer behaviour
The problem was in the Aria part of the range optimizer,
maria_records_in_range(), which wrong concluded that there was no rows
in the range.

This error would happen in the unlikely case when searching for a range
on a partial key and there was a match for the first key part in the
upper part of the b-tree (node) and also a match in the underlying
node page.

In other words, for this bug to happen one have to use Aria, have a multi
part key with a lot of identical values for the first key part and do a
range search on the second part of the key.

Fixed by ensuring that we do not stop searching for partial keys found
on node.

Other things:
- Added some comments
- Changed a variable name to more clearly explain it's purpose.
- Fixed wrong cast in _ma_record_pos() that could cause problems on 32 bit
  systems.
2024-06-05 10:29:49 +03:00
Marko Mäkelä
c6d36c3e7c MDEV-34297 get_rnd_value() of ib_counter_t is unnecessarily complex
The shared counter template ib_counter_t uses the function
my_timer_cycles() as a source of pseudo-random numbers to pick a shard.
On some platforms, my_timer_cycles() could return the constant value 0.

get_rnd_value(): Remove.

my_pseudo_random(): Implement as an alias of my_timer_cycles() or
a wrapper for pthread_self().

Reviewed by: Vladislav Vaintroub
2024-06-05 09:54:14 +03:00
Yuchen Pei
042a0d85ad
MDEV-27186 spider/partition: Report error on info() failure
Like MDEV-28105, spider may attempt to connect to remote server in
info(), and it may emit an error upon failure to connect. In this
case, the downstream caller ha_partition::open() should return the
error to avoid inconsistency.

This fixes MDEV-27186, MDEV-27237, MDEV-27334, MDEV-28241, MDEV-34101.
2024-06-05 10:13:30 +10:00
Yuchen Pei
581712b989
MDEV-33490 MENT-1504 Fix some english strings in spider. 2024-06-04 12:25:08 +10:00
Thirunarayanan Balathandayuthapani
58a0e1e3dd MDEV-34223 Innodb - add status variable for number of bulk inserts
- Added a counter innodb_num_bulk_insert_operation in
INFORMATION_SCHEMA.GLOBAL_STATUS. This counter is incremented
whenever a InnoDB undergoes bulk insert operation.

- Change the innodb_instant_alter_column to atomic variable.
2024-06-03 16:27:22 +05:30
Yuchen Pei
f2302a62e3
Merge branch '10.5' into 10.6 2024-05-31 09:10:17 +10:00
Yuchen Pei
25476ba1ae
MDEV-29027 ASAN errors in spider_db_free_result after partition DDL
Spider calls ha_spider::close() at least twice on ALTER TABLE ... ADD
PARTITION. The first call frees wide_handler and the second call
accesses wide_handler->trx->thd (heap-use-after-free).

In general, there seems to be no problem with using THD obtained by
the macro current_thd() except in background threads. Thus, we simply
replace wide_handler->trx->thd with current_thd().

Original author: Nayuta Yanagasawa
2024-05-31 09:06:55 +10:00
Nayuta Yanagisawa
6d0c9872d9
MDEV-28522 Delete constant SPIDER_SQL_TYPE_*_HS
The HandlerSocket support of Spider has been deleted by MDEV-26858.
Thus, the constants, SPIDER_SQL_TYPE_*_HS, are no longer necessary.
2024-05-31 09:06:55 +10:00
Yuchen Pei
6c30220780
MDEV-26858 Spider: Remove dead code related to HandlerSocket
Remove the dead-code, in Spider, which is related to the Spider's
HandlerSocket support. The code has been disabled for a long time
and it is unlikely that the code will be enabled.

- rm all files under storage/spider/hs_client/ except hs_compat.h
- rm storage/spider/spd_db_handlersocket.*
- unifdef -UHS_HAS_SQLCOM -UHAVE_HANDLERSOCKET \
  -m storage/spider/spd_* storage/spider/ha_spider.* storage/spider/hs_client/*
- remove relevant files from storage/spider/CMakeLists.txt
2024-05-31 09:06:55 +10:00
Marko Mäkelä
5ba542e9ee Merge 10.5 into 10.6 2024-05-30 14:27:07 +03:00
Dave Gosselin
b0b463a894 MDEV-33616 Fix memleak in pfs_noop
Invoke cleanup routine at the end of pfs_noop.
2024-05-29 16:49:51 -04:00
Thirunarayanan Balathandayuthapani
44b23bb184 MDEV-34222 Alter operation on redundant table aborts the server
- InnoDB page compression works only on COMPACT or DYNAMIC row
format tables. So InnoDB should throw error when alter table
tries to enable PAGE_COMPRESSED for redundant table.
2024-05-24 15:48:19 +05:30
Thirunarayanan Balathandayuthapani
0ffa340a49 MDEV-34221 Errors about checksum mismatch on crash recovery are confusing
- InnoDB should avoid printing the error message before
restoring the first page from doublewrite buffer.
2024-05-24 12:57:42 +05:30
Vladislav Vaintroub
736449d30f MDEV-34205: ASAN stack buffer overflow in strxnmov() in frm_file_exists
Correct the second parameter for strxnmov to prevent potential buffer
overflows. The second parameter must be one less than the size of the
input buffer to avoid writing past the end of the buffer.

While the second parameter is usually correct, there are exceptions
that need fixing.

This commit addresses the issue within frm_file_exists() and other
affected places.
2024-05-23 22:08:27 +02:00
Yuchen Pei
c4020b541c
MDEV-24610 MEMORY SE: check overflow in info calls with HA_STATUS_AUTO 2024-05-22 09:18:09 +10:00
Alexander Barkov
310fd6ff69 Backporting bugs fixes fixed by MDEV-31340 from 11.5
The patch for MDEV-31340 fixed the following bugs:

MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
MDEV-33088 Cannot create triggers in the database `MYSQL`
MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
MDEV-33108 TABLE_STATISTICS and INDEX_STATISTICS are case insensitive with lower-case-table-names=0
MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0

Backporting the fixes from 11.5 to 10.5
2024-05-21 14:58:01 +04:00
mariadb-DebarunBanerjee
b2944adb76 MDEV-34166 Server could hang with BP < 80M under stress
BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size.
For example, for BP size lower than 80M and 16 K page size, the limit is
more than 5% of total BP and for lowest BP 5M, it is 80% of the BP.
Non-data objects like explicit locks could occupy part of the BP pool
reducing the pages available for LRU. If LRU reaches minimum limit and
if no free pages are available, server would hang with page cleaner not
able to free any more pages.

Fix: To avoid such hang, we adjust the LRU limit lower than the limit
for data objects as checked in buf_LRU_check_size_of_non_data_objects()
i.e. one page less than 5% of BP.
2024-05-21 14:13:29 +05:30
Marko Mäkelä
0907df3d89 MDEV-34204 Assertion `!*detailed_error' failed on shutdown after XA PREPARE
trx_free_at_shutdown(): Similar to trx_t::commit_in_memory(),
clear the detailed_error (FOREIGN KEY constraint error) before
invoking trx_t::free(). We only do this on debug instrumented
builds in order to avoid a debug assertion failure on shutdown.
2024-05-21 09:52:35 +03:00
mariadb-DebarunBanerjee
2d5cba22a9 MDEV-34167 We fail to terminate transaction early with ER_LOCK_TABLE_FULL when lock memory is growing
This regression is introduced in 10.6 by following commit.
commit b6a2472489
MDEV-27891: SIGSEGV in InnoDB buffer pool resize

During DML, we check if buffer pool is running out of data pages in
buf_pool_t::running_out. Here is 75% of the buffer pool is occupied by
non-data pages we rollback the current transaction and exit with
ER_LOCK_TABLE_FULL.

The integer division (n_chunks_new / 4) becomes zero whenever the total
number of chunks are < 4 making the check completely ineffective for
such cases. Also the check is inaccurate for larger chunks.

Fix-1: Correct the check in buf_pool_t::running_out.

Fix-2: While waiting for free page, check for
buf_LRU_check_size_of_non_data_objects.
2024-05-21 08:33:44 +05:30
Yuchen Pei
698dae54ef
MDEV-31475 Spider: only reset wide_handler when owning it
A wide_handler is shared among ha_spider of partitions of the same
spider table, where the last partition is designated the owner of the
wide_handler, and is responsible for its deallocation. Therefore in
case of failure, we only reset wide_handler in error handling if the
current ha_spider is the owner of the wide_handler, otherwise it will
result in segv in the destructor of ha_spider, or during
ha_spider::close().
2024-05-21 09:17:12 +10:00
Yuchen Pei
86adee3806
MDEV-31475 remove unnecessary assignment to spider share init_error
The init, init_error, and init_error_time fields of a SPIDER_SHARE
should only be assigned when actually doing the initialisation of a
SPIDER_SHARE, otherwise they could result in spurious failures from
spider_get_share() in a subsequent statement.
2024-05-21 09:17:12 +10:00
mariadb-DebarunBanerjee
8047c8bc71 MDEV-28800 SIGABRT due to running out of memory for InnoDB locks
This regression is introduced in 10.6 by following commit.
commit 898dcf93a8
(Cleanup the lock creation)

It removed one important optimization for lock bitmap pre-allocation.
We pre-allocate about 8 byte extra space along with every lock object to
adjust for similar locks on newly created records on the same page by
same transaction. When it is exhausted, a new lock object is created
with similar 8 byte pre-allocation. With this optimization removed we
are left with only 1 byte pre-allocation. When large number of records
are inserted and locked in a single page, we end up creating too many
new locks almost in n^2 order.

Fix-1: Bring back LOCK_PAGE_BITMAP_MARGIN for pre-allocation.

Fix-2: Use the extra space (40 bytes) for bitmap in trx->lock.rec_pool.
2024-05-20 21:19:13 +05:30
Thirunarayanan Balathandayuthapani
ac2e02e961 MDEV-34175 mtr_t::log_close() warning should change the shutdown condition
- InnoDB should print the warning message saying
"Shutdown is in progress" only when shutdown state
is greater than SRV_SHUTDOWN_INITIATED.
2024-05-20 18:18:41 +05:30
Yuchen Pei
fd76746234
MDEV-28105 Return error in ha_spider::write_row() if info(HA_STATUS_AUTO) fails
Spider calls info with HA_STATUS_AUTO to update auto increment info,
which may attempt to connect the data node. If the connection fails,
it may emit an error and return the same error. This error should not
be of lower priority than any possible error from the later call to
handler::update_auto_increment().

Without this change, certain errors from update_auto_increment() such
as HA_ERR_AUTOINC_ERANGE may get ignored, causing my_insert() to call
my_ok(), which fails the assertion because the error was emitted in
the info() call (Diagnostics_area::is_set() returns true).
2024-05-14 15:50:29 +10:00
Marko Mäkelä
6bf2b64a97 MDEV-34118 fsp_alloc_free_extent() fails to flag DB_OUT_OF_FILE_SPACE
fsp_alloc_free_extent(): When applicable, set *err = DB_OUT_OF_FILE_SPACE.
2024-05-10 12:49:16 +03:00
Sergei Golubchik
887bb3f735 columnstore 6.4.8-2 2024-05-09 00:04:20 +02:00
Sergei Golubchik
7b53672c63 Merge branch '10.5' into 10.6 2024-05-08 20:06:00 +02:00
Monty
ec6aa9ac42 MDEV-34055 Assertion '...' failure or corruption errors upon REPAIR on Aria tables
The problem was two fold:
- REPAIR TABLE t1 USE_FRM did not work for transactional
  Aria tables (Table was thought to be repaired, which it was not) which
  caused issues in later usage of the table.
- When swapping tmp_data file to data file, sort_info files where not
  updated. This caused problems if there was several unique keys and
  there was a duplicate for the second key.
2024-05-07 19:24:02 +03:00
Yuchen Pei
10a7599286 MDEV-34036 Reset spider_hton_ptr in spider_db_done()
Otherwise spider_direct_sql may still think the spider plugin is
available even after spider_db_done() was called.
2024-05-07 10:56:02 +02:00
Yuchen Pei
bca366e4a1
MDEV-34098 source start_slave.inc in spider suites
The spider suite should --source include/start_slave.inc to make sure
the slave is up before proceeding.
2024-05-07 10:16:38 +10:00
Sergei Golubchik
13663cb5c4 MDEV-33727 mariadb-dump trusts the server and does not validate the data
safety first - tell mariadb client not to execute dangerous
cli commands, they cannot be present in the dump anyway.

wrapping the command in /*!999999 ..... */ guarantees that
if a non-mariadb-cli client loads the dump and sends it to the
server - the server will ignore the command it doesn't understand
2024-05-06 17:16:10 +02:00
Sergei Golubchik
22b3ba9312 MDEV-25102 UNIQUE USING HASH error after ALTER ... DISABLE KEYS
on disable_indexes(HA_KEY_SWITCH_NONUNIQ_SAVE) the engine does
not know that the long unique is logically unique, because on the
engine level it is not. And the engine disables it,

Change the disable_indexes/enable_indexes API. Instead of the enum
mode, send a key_map of indexes that should be enabled. This way the
server will decide what is unique, not the engine.
2024-05-06 17:16:10 +02:00
Thirunarayanan Balathandayuthapani
9b2bf09b95 MDEV-33980 mariadb-backup --backup is missing retry logic for undo tablespaces
- This is a merge of commit f378e76434
from 10.4 to 10.5.
2024-05-06 19:50:20 +05:30
Julius Goryavsky
b88c20ce1b Merge branch 10.4 into 10.5 2024-05-06 13:55:42 +02:00
Sergei Petrunia
55754be20c MDEV-33781: rocksdb.locking_issues_case5_rc fails windows ... : Disable it 2024-05-06 13:34:39 +03:00
Sergei Petrunia
f90fcefdb2 MDEV-33866: rocksdb.write_sync fails on windows ... : Disable it 2024-05-06 13:31:31 +03:00
Sergei Petrunia
be60782103 MDEV-33789: rocksdb.bloomfilter2 failed on ... : Disable it. 2024-05-06 13:06:14 +03:00
Sergei Golubchik
4f5dea43df cleanup
* remove dead code
* simplify the check for table->s->next_number_index
* misc
2024-05-05 21:37:08 +02:00
Sergei Golubchik
03295f0c20 MDEV-30727 Check spider_hton_ptr in spider udfs
UDF isn't supposed to use my_error(), it should return
the error message in the provided error message buffer

Fixes valgrind:

==93993== Conditional jump or move depends on uninitialised value(s)
==93993==    at 0x484ECCD: strnlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==93993==    by 0x1AD2B2C: process_str_arg (my_vsnprintf.c:259)
==93993==    by 0x1AD47E0: my_vsnprintf_ex (my_vsnprintf.c:696)
==93993==    by 0x1A3B91E: my_error (my_error.c:120)
==93993==    by 0xF87BE8: udf_handler::fix_fields(THD*, Item_func_or_sum*, unsigned int, Item**) (item_func.cc:3638)

followup for 267dd5a993
2024-05-05 21:37:07 +02:00
Vladislav Vaintroub
b18259ecf5 Compiling - Fix MSVC compile warnings, on x86 2024-05-03 22:00:56 +02:00
Vladislav Vaintroub
029e2a5fd9 MDEV-33876 CMake, zlib - use names compatible with official FindZLIB.cmake
- ZLIB_LIBRARIES, not ZLIB_LIBRARY
- ZLIB_INCLUDE_DIRS, not ZLIB_INCLUDE_DIR

For building libmariadb, ZLIB_LIBRARY/ZLIB_INCLUDE_DIR are still defined
This workaround will be removed later.
2024-05-03 21:48:47 +02:00
Yuchen Pei
b84d335d9d
MDEV-33538 make auxiliary spider plugins init depend on actual spider
The two I_S plugins SPIDER_ALLOC_MEM and SPIDER_WRAPPER_PROTOCOL
only makes sense if the main SPIDER plugin is installed. Further,
SPIDER_ALLOC_MEM requires a mutex that requires SPIDER init to fill
the table.

We also update the spider init query to override
--transaction_read_only=on so that it does not affect the spider init.

Also fixed error handling in spider_db_init() so that failure in
spider table init does not result in memory leak
2024-05-03 14:47:54 +10:00
mariadb-DebarunBanerjee
90b95c6149 MDEV-33543 Server hang caused by InnoDB change buffer
Issue: When getting a page (buf_page_get_gen) with no latch option
(RW_NO_LATCH), the caller is not expected to follow the B-tree latching
order. However in buf_page_get_low we try to acquire shared page latch
unconditionally to wait for a page that is being loaded by another
thread concurrently. In general it could lead to latch order violation
and deadlock.

Currently it affects the change buffer insert path btr_latch_prev()
which tries to load the previous page out of order with RW_NO_LATCH and
two concurrent inserts into IBUF tree cause deadlock. This problem is
introduced in 10.6 by following commit.
commit 9436c778c3 (MDEV-27058)

Fix: While trying to latch a page with RW_NO_LATCH, always use the
"*lock_try" interface and retry operation on failure after unfixing the
page.
2024-05-02 17:07:01 +05:30
Thirunarayanan Balathandayuthapani
156761db3b MDEV-31161 Assertion failures upon adding a too long key to table with COMPRESSED row
Problem:
=======
During InnoDB non-rebuild online alter operation, InnoDB set the
dummy log to clustered index online log. This can be used by
concurrent DML to identify whether the table undergoes online DDL.
InnoDB fails to reset the dummy log of clustered index in case
of error happened during prepare phase.

Solution:
========
Reset the InnoDB clustered index online log in case of error during
prepare phase.
2024-04-30 20:40:29 +05:30
Thirunarayanan Balathandayuthapani
f378e76434 MDEV-33980 mariadb-backup --backup is missing retry logic for undo tablespaces
Problem:
========
- Currently mariabackup have to reread the pages in case they are
modified by server concurrently. But while reading the undo
tablespace, mariabackup failed to do reread the page in case of
error.

Fix:
===
Mariabackup --backup functionality should have retry logic
while reading the undo tablespaces.
2024-04-30 16:15:26 +05:30
Thirunarayanan Balathandayuthapani
a586b6dbc8 MDEV-22855 Assertion `!field->prefix_len || field->fixed_len == field->prefix_len' failed in btr_node_ptr_max_size
Problem:
========
- InnoDB wrongly calulates the record size in
btr_node_ptr_max_size() when prefix index of
the column has to be stored externally.

Fix:
====
- InnoDB should add the maximum field size to
record size when the field is a fixed length one.
2024-04-29 16:42:26 +05:30
Sergei Golubchik
c1f3eff53f Merge branch '10.5' into 10.6 2024-04-29 10:08:58 +02:00
Yuchen Pei
3f2a5b28c6
MDEV-34003 Add testcase spider/bugfix.mdev_34003
MDEV-34003 appears to be a duplicate of MDEV-33742, and no code change
is needed. Nevertheless we add the testcase reported in the former.
2024-04-29 16:42:46 +10:00
Yuchen Pei
267dd5a993
MDEV-30727 Check spider_hton_ptr in spider udfs
We have to #undef my_error and find it from udfs when spider is not
installed.
2024-04-29 16:17:22 +10:00
Oleksandr Byelkin
bda8d4fdf7 require boost 1.53 for columnstore 2024-04-28 18:09:13 +02:00
Oleksandr Byelkin
ee59ca7ff1 Merge branch 'merge-zlib' (1.3.1) into 10.4 2024-04-26 13:50:03 +02:00
Marko Mäkelä
10d251e05a MDEV-26450 fixup: Remove a bogus assertion
mtr_t::commit_shrink(): Do not assert that some previously clean pages
will be flagged as modified by this mini-transaction. It could be the
case that there had been no recent write-back of any of the undo
tablespace pages that we are modifying when truncating the tablespace.
It suffices to assert that some pages were modified again:
ut_ad(m_modifications).

This fixes up commit f5fddae3cb
2024-04-25 15:52:38 +03:00
Marko Mäkelä
0936c13809 MDEV-33993 Possible server hang on DROP INDEX or RENAME INDEX
commit_try_norebuild(): Add the parameter statistics_exist,
similar to commit_try_rebuild(). If the InnoDB statistics tables
did not exist, we will not attempt to update statistics later on
during the transaction.

Thanks to Matthias Leich for originally reproducing this scenario.
2024-04-25 13:44:10 +03:00
Thirunarayanan Balathandayuthapani
8c8b7da017 MDEV-33979 Disallow bulk insert operation during partition update statement
Problem:
========
- Partition update operation enables the bulk insert for the
transaction while moving the row between partitions. This leads
to debug assert failure while removing the row from one
of the partition.

Solution:
========
- Disallow the bulk insert operation for non-insert operation
of partition table.
2024-04-25 10:50:34 +05:30
Meng-Hsiu Chiang
55cb2c2916 MDEV-29955: Set path for zlib library with pkg-config
`FindZLIB` module uses variable `ZLIB_ROOT`[1] to look for libraries. By
setting the variable, `FindZLIB` is able to search the libraries that
installed in a non-system path (/workspace/mylib for example).

And when using `z` in `LINK_LIBRARIES()` CMake tries to lookup the
library in system path by default. It doesn't work if the library isn't
installed in the path, and use ${ZLIB_LIBRARY} which set by FindZLIB
solve the issue.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services.

[1]: https://cmake.org/cmake/help/latest/module/FindZLIB.html#hints
2024-04-23 15:40:57 +01:00
Monty
0ccdf54b64 Check and remove high stack usage
I checked all stack overflow potential problems found with
gcc -Wstack-usage=16384
and
clang -Wframe-larger-than=16384 -no-inline

Fixes:
Added '#pragma clang diagnostic ignored "-Wframe-larger-than="'
  to a lot of function to where stack usage large but resonable.
- Added stack check warnings to BUILD scrips when using clang and debug.

Function changed to use malloc instead allocating things on stack:
- read_bootstrap_query() now allocates line_buffer (20000 bytes) with
  malloc() instead of using stack. This has a small performance impact
  but this is not releant for bootstrap.
- mroonga grn_select() used 65856 bytes on stack. Changed it to use
  malloc().
- Wsrep_schema::replay_transaction() and
  Wsrep_schema::recover_sr_transactions().
- Connect zipOpen3()

Not fixed:
- mroonga/vendor/groonga/lib/expr.c grn_proc_call() uses
  43712 byte on stack.  However this is not easy to fix as the stack
  used is caused by a lot of code generated by defines.
- Most changes in mroonga/groonga where only adding of pragmas to disable
  stack warnings.
- rocksdb/options/options_helper.cc uses 20288 of stack space.
  (no reason to fix except to get rid of the compiler warning)
- Causes using alloca() where the allocation size is resonable.
- An issue in libmariadb (reported to connectors).
2024-04-23 14:12:31 +03:00
Marko Mäkelä
07faba08b9 MDEV-27924 fixup: cmake -DWITH_INNODB_EXTRA_DEBUG=ON 2024-04-23 12:57:39 +03:00
Marko Mäkelä
15b607b552 Merge 10.5 into 10.6 2024-04-19 16:01:26 +03:00
Marko Mäkelä
4c34339426 MDEV-33946: OPT_PAGE_CHECKSUM mismatch due to mtr_t::memmove()
mtr_t::memmove(): Revert to the parent of
commit a032f14b34
where there was supposed to be an equivalent change
that would avoid hitting a warning in some old version of GCC
when this change was part of another 10.6 based developmet branch.

For some reason, this change is not equivalent but will cause
massive amounts of backup failures in the stress tests
run by Matthias Leich, caught by
commit 4179f93d28 in 10.6.
2024-04-19 15:46:21 +03:00
Marko Mäkelä
ec7db2bdf8 MDEV-33325 fixup
ibuf_remove_free_page(): Correct the calculation of root_savepoint().
The first entry acquired by ibuf_tree_root_get() will be ibuf.index.lock
and not the change buffer root page.

Thanks to Matthias Leich for finding this bug in RQG.
Unfortunately, this code is very difficult to cover
in our regression test suite.
2024-04-19 12:39:48 +03:00
Marko Mäkelä
8e663f5e90 MDEV-32791 MariaDB cannot be installed on Red Hat ubi9
The libpmem dependency that had been added in
commit 3daef523af (MDEV-17084)
did not achieve any measurable performance improvement when
comparing the same PMEM device with and without "mount -o dax"
using the Linux ext4 file system.

Because Red Hat has deprecated libpmem, let us remove the code
altogether.

Note: This is a 10.6 version of
commit 3f9f5ca48e
which will retain PMEM support in MariaDB Server 10.11.
2024-04-19 11:04:51 +03:00
mariadb-DebarunBanerjee
5928e04d5f MDEV-32489 Change buffer index fails to delete the records
When the change buffer records for a page span across multiple change
buffer leaf pages or the starting record is at the beginning of a page
with a left sibling, ibuf_delete_recs deletes only the records in first
page and fails to move to subsequent pages.

Subsequently a slow shutdown hangs trying to delete those left over
records.

Fix-A: Position the cursor to an user record in B-tree and exit only
when all records are exhausted.

Fix-B: Make sure we call ibuf_delete_recs during slow shutdown for
pages with IBUF entries to cleanup any previously left over records.
2024-04-18 08:30:21 +05:30
Marko Mäkelä
e459ce8336 MDEV-33779 InnoDB row operations could be faster
We have quite a few assertions
	ut_a(m_prebuilt->trx == thd_to_trx(ha_thd()));
in low-level functions.
These had better be debug assertions for performance reasons.
It should suffice to check that condition in the less frequently invoked
ha_innobase::change_active_index().

convert_search_mode_to_innobase(): Return whether the mode is
unsupported, and optionally update ha_innobase::m_last_match_mode.

ha_innobase::index_read(): Only branch on find_flag once, and
simplify the error handling after invoking row_search_mvcc().

ha_innobase::rnd_pos(): Remove an assertion that is duplicating one
in ha_innobase::index_read(), which we are calling unconditionally.

ha_innobase::records_in_range(): Check only once whether
min_key, max_key are null pointers.

row_sel_convert_mysql_key_to_innobase(): Declare all parameters
except the conversion buffer pointer (buf) to be nonnull.

Reviewed by: Debarun Banerjee
2024-04-17 16:47:41 +03:00
Marko Mäkelä
829cb1a49c Merge 10.5 into 10.6 2024-04-17 14:14:58 +03:00
Marko Mäkelä
46e9e92e22 MDEV-33855 MSAN use-of-uninitialized-value in rtr_pcur_getnext_from_path()
rtr_pcur_getnext_from_path(): Remove a bogus assertion
that may cause a data races with buf_LRU_block_free_non_file_page().

If my_latch_mode == BTR_MODIFY_LEAF, we would have released all page
latches and buffer-fixes by invoking mtr->rollback_to_savepoint(1).
After this point, the btr_cur->page_cur.block is no longer valid and
must not be accessed.

Before 03ca6495df this assertion had
been disabled, because the preprocessor symbol UNIV_RTR_DEBUG
had never been enabled (except when explicitly specified in
CMAKE_CXX_FLAGS).

Reviewed by: Debarun Banerjee
2024-04-17 13:25:12 +03:00
mariadb-DebarunBanerjee
040069f4ba MDEV-33431 Latching order violation reported fil_system.sys_space.latch and ibuf_pessimistic_insert_mutex
Issue:
------
The actual order of acquisition of the IBUF pessimistic insert mutex
(SYNC_IBUF_PESS_INSERT_MUTEX) and IBUF header page latch
(SYNC_IBUF_HEADER) w.r.t space latch (SYNC_FSP) differs from the order
defined in sync0types.h. It was not discovered earlier as the path to
ibuf_remove_free_page was not covered by the mtr test. Ideal order and
one defined in sync0types.h is as follows.
SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX -> SYNC_FSP

In ibuf_remove_free_page, we acquire space latch earlier and we have
the order as follows resulting in the assert with innodb_sync_debug=on.
SYNC_FSP -> SYNC_IBUF_HEADER -> SYNC_IBUF_PESS_INSERT_MUTEX

Fix:
---
We do maintain this order in other places and there doesn't seem to be
any real issue here. To reduce impact in GA versions, we avoid doing
extensive changes in mutex ordering to match the current
SYNC_IBUF_PESS_INSERT_MUTEX order. Instead we relax the ordering check
for IBUF pessimistic insert mutex using SYNC_NO_ORDER_CHECK.
2024-04-17 15:16:50 +05:30
Oleksandr Byelkin
9b18275623 Merge branch '10.4' into 10.5 2024-04-16 11:04:14 +02:00
Kristian Nielsen
16aa4b5f59 Merge from 10.4 to 10.5
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-15 17:46:49 +02:00
Marko Mäkelä
a032f14b34 MDEV-33559 matched_rec::block should be allocated from the buffer pool
matched_rec::rec_buf[], matched_rec::bufp: Remove.

matched_rec::block: Make this a pointer to something that
is allocated by buf_block_alloc(). In this way, the only
case where buf_block_t is constructed outside buf_pool
is ALTER TABLE...IMPORT TABLESPACE.

rtr_info::heap: Remove. This was only used for allocating matched_rec,
which now is smaller.

mtr_t::memmove(): Simplify some code to avoid GCC 9.4.0 -Wconversion
in the 10.6 branch as a result of these changes.

Reviewed by: Debarun Banerjee
2024-04-15 09:04:11 +03:00
Yuchen Pei
051a1fa0e9
MDEV-33777 Spider: Correct checks for show index column numbers
It was updated for 10.6+ in MDEV-7317. Because a lower version spider
node may connect to a higher version data node, we need to change this
for 10.4 and 10.5 as well.
2024-04-15 09:59:24 +10:00
Yuchen Pei
18b93d6eb0
MDEV-28993 Spider: Push down CASE statement 2024-04-15 09:56:24 +10:00
Yuchen Pei
99dc0f030f
MDEV-28993 spider: revert removal of ITEM_FUNC_CASE_PARAMS_ARE_PUBLIC
It was done in MDEV-29447.
2024-04-15 09:56:23 +10:00
Vlad Lesin
d7fc975cfe MDEV-33802 Weird read view after ROLLBACK of other transactions.
In the case if some unique key fields are nullable, there can be
several records with the same key fields in unique index with at least
one key field equal to NULL, as NULL != NULL.

When transaction is resumed after waiting on the record with at least one
key field equal to NULL, and stored in persistent cursor record is
deleted, persistent cursor can be restored to the record with all key
fields equal to the stored ones, but with at least one field equal to
NULL. And such record is wrongly treated as a record with the same unique
key as stored in persistent cursor record one, what is wrong as
NULL != NULL.

The fix is to check if at least one unique field is NULL in restored
persistent cursor position, and, if so, then don't treat the record as
one with the same unique key as in the stored record key.

dict_index_t::nulls_equal was removed, as it was initially developed for
never existed in MariaDB "intrinsic tables", and there is no code, which
would set it to "true".

Reviewed by Marko Mäkelä.
2024-04-12 18:13:51 +03:00
Sergei Golubchik
8dda602701 spider should suppress errors in close_connection
because stmt_da no longer expects errors at this stage,
but mysql_close() call can still hit my_error() in net_serv.cc
2024-04-11 21:00:39 +02:00
Marko Mäkelä
04be12a8f5 Fix g++-14 -Wtemplate-id-cdtor 2024-04-11 15:51:30 +03:00
Sergei Golubchik
41296a07c8 Merge branch '10.5' into 10.6 2024-04-11 13:58:22 +02:00
Marko Mäkelä
263932d505 MDEV-33325 Crash in flst_read_addr on corrupted data
flst_read_addr(): Remove assertions. Instead, we will check these
conditions in the callers and avoid a crash in case of corruption.
We will check the conditions more carefully, because the callers
know more exact bounds for the page numbers and the byte offsets
withing pages.

flst_remove(), flst_add_first(), flst_add_last(): Add a parameter
for passing fil_space_t::free_limit. None of the lists may point to
pages that are beyond the current initialized length of the
tablespace.

trx_rseg_mem_restore(): Access the first page of the tablespace,
so that we will correctly recover rseg->space->free_limit
in case some log based recovery is pending.

ibuf_remove_free_page(): Only look up the root page once, and
validate the last page number.

Reviewed by: Debarun Banerjee
2024-04-11 09:58:53 +03:00
Monty
3655cefc42 MDEV-33813 ERROR 1021 (HY000): Disk full (./org/test1.MAI); waiting for someone to free some space
Fixed that internal temporary tables are not waiting for freed disk space.

Other things:
- 'kill id' will now kill a query waiting for free disk space instantly.
  Before it could take up to 60 seconds for the kill would be noticed.
- Fixed that sorting one index is not using MY_WAIT_IF_FULL for temp files.
- Fixed bug where share->write_flag set MY_WAIT_IF_FULL for temp files.

It is quite hard to do a test case for this. Instead I tested all
combinations interactively.
2024-04-10 17:01:24 +03:00
Marko Mäkelä
d824977598 MDEV-33512 Corrupted table after IMPORT TABLESPACE and restart
In commit d74d95961a (MDEV-18543)
there was an error that would cause the hidden metadata record
to be deleted, and therefore cause the table to appear corrupted
when it is reloaded into the data dictionary cache.

PageConverter::update_records(): Do not delete the metadata record,
but do validate it.

RecIterator::open(): Make the API more similar to 10.6, to simplify
merges.
2024-04-10 09:47:44 +03:00
Yuchen Pei
662bb176b4
MDEV-33661 MENT-1591 Keep spider in memory until exit in ASAN builds
Same as MDEV-29579. For some reason, libodbc does not clean up
properly if unloaded too early with the dlclose() of spider. So we add
UNIQUE symbols to spider so the spider does not reload in dlclose().

This change, however, uncovers some hidden problems in the spider
codebase, for which we move the initialisation of some spider global
variables into the initialisation of spider itself.

Spider has some global variables. Their initialisation should be done
in the initialisation of spider itself, otherwise, if spider were
re-initialised without these symbol being unloaded, the values could
be inconsistent and causing issues.

One such issue is caused by the variables
spider_mon_table_cache_version and spider_mon_table_cache_version_req.
They are used for resetting the spider monitoring table cache and have
initial values of 0 and 1 respectively. We have that always
spider_mon_table_cache_version_req >= spider_mon_table_cache_version,
and when the relation is strict, the cache is reset,
spider_mon_table_cache_version is brought to be equal to
spider_mon_table_cache_version_req, and the cache is searched for
matching table_name, db_name and link_idx. If the relation is equal,
no reset would happen and the cache would be searched directly.

When spider is re-inited without resetting the values of
spider_mon_table_cache_version and spider_mon_table_cache_version_req
that were set to be equal in the previous cache reset action, the
cache was emptied in the previous spider deinit, which would result in
HA_ERR_KEY_NOT_FOUND unexpectedly.

An alternative way to fix this issue would be to call the spider udf
spider_flush_mon_cache_table(), which increments
spider_mon_table_cache_version_req thus making sure the inequality is
strict. However, there's no reason for spider to initialise these
global variables on dlopen(), rather than on spider init, which is
cleaner and "purer".

To reproduce this issue, simply revert the changes involving the two
variables and then run:

mtr --no-reorder spider.ha{,_part}
2024-04-10 10:10:30 +10:00
Marko Mäkelä
4aa92911c7 MDEV-33802 Weird read view after ROLLBACK of another transaction
Even after commit b8a6719889 there
is an anomaly where a locking read could return inconsistent results.
If a locking read would have to wait for a record lock, then by the
definition of a read view, the modifications made by the current lock
holder cannot be visible in the read view. This is because the read
view must exclude any transactions that had not been committed at the
time when the read view was created.

lock_rec_convert_impl_to_expl_for_trx(), lock_rec_convert_impl_to_expl():
Return an unsafe-to-dereference pointer to a transaction that holds or
held the lock, or nullptr if the lock was available.

lock_clust_rec_modify_check_and_lock(),
lock_sec_rec_read_check_and_lock(),
lock_clust_rec_read_check_and_lock():
Return DB_RECORD_CHANGED if innodb_strict_isolation=ON and the
lock was being held by another transaction.

The test case, which is based on a bug report by Zhuang Liu,
covers the function lock_sec_rec_read_check_and_lock().

Reviewed by: Vladislav Lesin
2024-04-09 12:50:24 +03:00
Marko Mäkelä
a4cda66e2d MDEV-33588 buf::Block_hint is a performance hog
In so-called optimistic buffer pool lookups, we must not
dereference a block descriptor before we have made sure that
it is accessible. While buf_pool_t::resize() is running,
block descriptors could become invalid.

The buf::Block_hint class was essentially duplicating
a buf_pool.page_hash lookup that was executed in
buf_page_optimistic_get() anyway. For better locality of
reference, we had better execute that lookup only once.

buf_page_optimistic_fix(): Prepare for buf_page_optimistic_get().
This basically is a simpler version of Buf::Block_hint.

buf_page_optimistic_get(): Assume that buf_page_optimistic_fix()
has been called and the page identifier verified. Should the block
be evicted, the block->modify_clock will be invalidated; we do not
need to check the block->page.id() again. It suffices to check
the block->modify_clock after acquiring the page latch.

btr_pcur_t::old_page_id: Store the expected page identifier
for buf_page_optimistic_fix().

btr_pcur_t::block_when_stored: Remove. This was duplicating
page_cur_t::block.

btr_pcur_optimistic_latch_leaves(): Remove redundant parameters.
First, invoke buf_page_optimistic_fix() on the requested page.
If needed, acquire a latch on the left page. Finally, acquire
a latch on the target page and recheck the block->modify_clock.
If the page had been freed while we were not holding a page latch,
fall back to the slow path. Validate the FIL_PAGE_PREV after
acquiring a latch on the current page. The block->modify_clock
is only being incremented when records are deleted or pages
reorganized or evicted; it does not guard against concurrent
page splits.

Reviewed by: Debarun Banerjee
2024-04-09 12:48:01 +03:00
Yuchen Pei
b7b58a2310
MDEV-33731 Only iterate over m_locked_partitions in update_next_auto_inc_val()
Only locked will participate in the query in this case. Chances are
that not-locked partitions were not opened, which is the cause of the
crash in the added test case spider/bugfix.mdev_33731 without this
patch.
2024-04-09 09:24:48 +10:00
Marko Mäkelä
73291de74e MDEV-33819 The purge of committed history is mis-parsing some log
In commit aa719b5010 (part of MDEV-32050)
a bug was introduced in the function purge_sys_t::choose_next_log(),
which reimplements some logic that previously was part of
trx_purge_read_undo_rec(). We must invoke trx_undo_get_first_rec()
with the page number and offset of the undo log header, but we were
incorrectly invoking it on the current undo page number, which caused
us to parse undo records starting at an incorrect offset.

purge_sys_t::choose_next_log(): Pass the correct parameter to
trx_undo_page_get_first_rec().

trx_undo_page_get_next_rec(), trx_undo_page_get_first_rec(),
trx_undo_page_get_last_rec(): Add debug assertions and make the
code more robust by returning nullptr on corruption. Should we
detect any corrupted undo logs during the purge of committed
transaction history, the sanest thing to do is to pretend that
the end of an undo log was reached. If any garbage is left in
the tables, it will be ignored by anything else than
CHECK TABLE ... EXTENDED, and it can be removed by OPTIMIZE TABLE.

Thanks to Matthias Leich for providing an "rr replay" trace where
this bug could be found.

Reviewed by: Vladislav Lesin
2024-04-08 11:48:46 +03:00
Yuchen Pei
f9e0ebeca4
MDEV-33742 Do not create group by handler when all tables are constant 2024-04-08 14:35:36 +10:00
Yuchen Pei
e865ef6a04
MDEV-33742 Remove macro PARTITION_HAS_GET_CHILD_HANDLERS
Similar to MDEV-27658.

Also fixing the positioning of #ifdef WITH_PARTITION_STORAGE_ENGINE
blocks and add missing ones.
2024-04-08 14:35:36 +10:00
Yuchen Pei
860c1ca9ad
MDEV-33679 Spider group by handler: skip on multiple equalities
The spider group by handler is created in
JOIN::make_aggr_tables_info(), by which time calls to
substitute_for_best_equal_field() should have already removed all the
multiple equalities (i.e. Item_equal, with MULT_EQUAL_FUNC func_type).
Therefore, if there is still such items, it is deemed as an optimizer
bug and should be skipped.
2024-04-08 14:35:35 +10:00
Yuchen Pei
9c93d41ad7
MDEV-33728 spider: remove use of MYSQL_VERSION_ID and MARIADB_BASE_VERSION
change created by:

unifdef -DMYSQL_VERSION_ID=100400 -DMARIADB_BASE_VERSION -m storage/spider/spd_* storage/spider/ha_spider.* storage/spider/hs_client/*

basically MDEV-27637, MDEV-27641, MDEV-27655
2024-04-08 14:35:35 +10:00
Yuchen Pei
44c88faeca
MDEV-28992 Spider group by handler: Push down TIMESTAMPDIFF function
Also removed ITEM_FUNC_TIMESTAMPDIFF_ARE_PUBLIC.

Similar to pr#2225, with the testcase adapted from that patch:

--8<---------------cut here---------------start------------->8---
From 884f7c6df1 Mon Sep 17 00:00:00 2001
From: "Norio Akagi (norakagi)" <norakagi@amazon.com>
Date: Wed, 3 Aug 2022 23:30:34 -0700
Subject: [PATCH] [MDEV-28992] Push down TIMESTAMP_DIFF in spider

This changes so that TIMESTAMP_DIFF function in a query is pushed down and works natively in Spider.
Instead of directly accessing item's member, now we can rely on a public accessor method to make it work.
Unit tests are added under spider.pushdown_timestamp_diff.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
--8<---------------cut here---------------end--------------->8---
2024-04-08 14:35:35 +10:00