during row insert
DB_FOREIGN_DUPLICATE_KEY in row_ins_duplicate_error_in_clust was
motivated by handling the cascade changes during versioned operations.
It was found though, that certain row_update_for_mysql calls could
return DB_FOREIGN_DUPLICATE_KEY, even if there's no foreign relations.
Change DB_FOREIGN_DUPLICATE_KEY to DB_DUPLICATE_KEY in
row_ins_duplicate_error_in_clust.
It will be later converted to DB_FOREIGN_DUPLICATE_KEY in
row_ins_check_foreign_constraint if needed.
Additionally, ha_delete_row should return neither. Ensure it by an
assertion.
We had a protection against it, by allowing versioned delete if:
trx->id != table->vers_start_id()
For replace this check fails: replace calls ha_delete_row(record[2]), but
table->vers_start_id() returns the value from record[0], which is irrelevant.
The same problem hits Field::is_max, which may have checked the wrong record.
Fix:
* Refactor Field::is_max to optionally accept a pointer as an argument.
* Refactor vers_start_id and vers_end_id to always accept a pointer to the
record. there is a difference with is_max is that is_max accepts the pointer to
the
field data, rather than to the record.
Method val_int() would be too effortful to refactor to accept the argument, so
instead the value in record is fetched directly, like it is done in
Field_longlong.
storage/connect/tabxml.cpp:1616:46: warning: ‘*this.XMLCOL::Long’ may be used uninitialized [-Wmaybe-uninitialized]
1616 | Valbuf = (char*)PlugSubAlloc(g, NULL, n * (Long + 1));
In this case we are overriding the class 3 lines earlier. Add
some constructs to preserve the value of Long as the old class
being replaced with a new subclass.
storage/connect/filter.cpp:1594:13: warning: ‘*this.FILTERCMP::FILTERX.FILTERX::FILTER.FILTER::Opc’ is used uninitialized [-Wuninitialized]
1594 | Bt = OpBmp(g, Opc);
The construction of FILTERCMP has an Opc(ode) and this should be passed
rather than relying on the uninitialized value of the parent class.
Also save its value in the class.
Clang processes the "int x=x" code from UNINIT_VAR
literally resulting in an uninitialized read and write.
This is something we want to avoid. Gcc does the same
without emitting warnings.
As the UNINIT_VAR was around avoiding compiler false detection,
and clang doesn't false detect, is default action is a
noop.
Static analysers (examined Infer and SonarQube) are
clang based and have the same detection.
Using a __clang__ instead of WITH_UBSAN would acheived
a better result, however reviewer wanted to keep WITH_UBSAN
only.
LINT_INIT_STRUCT is no longer required, even a gcc-4.8.5
doesn't warn with this construct removed which matches
the comment that it was fixed in gcc ~4.7.
mysql.cc - all paths in com_go populate buff before use.
json: Item_func_json_merge::val_str
LINT_INIT(js2) unneeded as usage in the previous statements
it is explicitly initialized to NULL.
Item_func_json_contains_path::val_bool n_found is guarded
by an uninitialized read by mode_one and from
gcc-13.3.0 in Ubuntu 24.04 this is detected. As the only
remaining use of LINIT_INIT this usage has been applied
with the expanded macro with the unused _lint define removed.
The LINT_INIT macro is removed.
_ma_ck_delete - org_key only valid under share->now_transactional
likewise with _ma_ck_write_btree_with_log
connect engine never used anything that FORCE_INIT_OF_VARS
would change.
Reviewer: Monty
When compiling with debug and valgrind on Ubuntu 22.04 with gcc 11.4.0,
the used framsize was 16400 bytes because of the code:
completion_callback callbacks[1000];
Reducing the array to 950 fixes the issue
buf_pool_t::shrink(): If we run out of pages to evict from buf_pool.LRU,
abort the operation. Also, do not leak the spare block that we may have
allocated.
buf_pool_t::shrink(): When relocating a dirty page of the temporary
tablespace, reset the oldest_modification() on the discarded block,
like we do for persistent pages in buf_flush_relocate_on_flush_list().
buf_pool_t::resize(): Add debug assertions to catch this error earlier.
This bug does not seem to affect non-debug builds.
Reviewed by: Thirunarayanan Balathandayuthapani
Server-level UNIQUE constraints (namely, WITHOUT OVERLAPS and USING HASH)
only worked with InnoDB in REPEATABLE READ isolation mode, when the
constraint was checked first and then the row was inserted or updated.
Gap locks prevented race conditions when a concurrent connection
could've also checked the constraint and inserted/updated a row
at the same time.
In READ COMMITTED there are no gap locks. To avoid race conditions,
we now check the constraint *after* the row operation. This is
enabled by the HA_CHECK_UNIQUE_AFTER_WRITE table flag that InnoDB
sets in the READ COMMITTED transactions.
Checking the constraint after the row operation is more complex.
First, the constraint will see the current (inserted/updated) row,
and needs to skip it. Second, IGNORE operations become tricky,
as we need to revert the insert/update and continue statement execution.
write_row() (INSERT IGNORE) is reverted with delete_row(). Conveniently
it deletes the current row, that is, the last inserted row.
update_row(a,b) (UPDATE IGNORE) is reverted with a reversed update,
update_row(b,a). Conveniently, it updates the current row too.
Except in InnoDB when the PK is updated - in this case InnoDB internally
performs delete+insert, but does not move the cursor, so the "current"
row is the deleted one and the reverse update doesn't work.
This combination now throws an "unsupported" error and will
be fixed in MDEV-37233
The test innodb.undo_truncate occasionally demonstrates a race condition
where scrubbing is writing zeroes to a freed undo page, and
innodb_undo_log_truncate=ON truncating the same tablespace. The
truncation is an exception to the rule that InnoDB tablespace file sizes
can only grow, never shrink.
The fields fil_space_t::size and fil_node_t::size are protected by
fil_system.mutex, which used to be a highly contended resource. We
do not want to revert back to acquiring the mutex in fil_space_t::io()
because that would introduce an obvious scalability bottleneck.
fil_space_t::flush_freed(): Do not try to scrub pages of the undo
tablespace in order to prevent a race condition between io()
and undo tablespace truncation.
fil_space_t::io(): Prevent a null pointer dereference when reporting
an out-of-bounds access to the non-first file of the system or
temporary tablespace. Do not invoke set_corrupted() after an
out-of-bounds asynchronous read.
Note: fil_space_t::flush_freed() may only invoke PUNCH_RANGE on
page_compressed tablespaces, never on an undo tablespace.
ha_innobase::store_lock(): Do not create a read view or start the
transaction if innodb_snapshot_isolation=OFF. This should save some
resources with the default settings.
ha_innobase::store_lock(): Set also trx->will_lock when starting
a transaction at SERIALIZABLE isolation level. This fixes up
commit 7fbbbc983f (MDEV-36330).
buf_page_get_low(): Do not expect a valid state of
buf_page_t::in_zip_hash for blocks that are not file pages.
This debug assertion had been misplaced in
commit aaef2e1d8c (MDEV-27058)
that removed the condition
block->page.state() == BUF_BLOCK_FILE_PAGE.
When spider encounters Item_aggregate_ref that is valid at gbh
creation, it could become invalid at gbh execution due to item list
substitution*. Therefore we ban Item_aggregate_ref for spider gbh
creation. To that end, we also make sure that str is NULL if and only
if in the creation stage, not the execution stage, including removing
a redundant check when str is not NULL.
*: Note that it is likely the same scenario as in MDEV-25116.
We spare spider_table_remove_share_from_crd/sts_thread and
spider_table_add_share_to_crd/sts_thread because the function body has
too many fields with crd/sts in the name and macro affects
debuggability.
At TRANSACTION ISOLATION LEVEL SERIALIZABLE, InnoDB would fail to flag
a write/read conflict, which would be a violation already at the more
relaxed REPEATABLE READ level when innodb_snapshot_isolation=ON.
Fix: Create a read view and start the transaction at the same time.
Thus, lock checks will be able to consult the correct read view
to flag ER_CHECKREAD if we are about to lock a record that was committed
after the start of our transaction.
innobase_start_trx_and_assign_read_view(): At any other isolation level
than READ UNCOMMITTED, do create a read view. This is needed for the
correct operation of START TRANSACTION WITH CONSISTENT SNAPSHOT.
ha_innobase::store_lock(): At SERIALIZABLE isolation level, if the
transaction was not started yet, start it and open a read view.
An alternative way to achieve this would be to make trans_begin()
treat START TRANSACTION (or BEGIN) in the same way as
START TRANSACTION WITH CONSISTENT SNAPSHOT when the isolation level
is SERIALIZABLE.
innodb_isolation_level(const THD*): A simpler version of
innobase_map_isolation_level(). Compared to earlier, we will return
READ UNCOMMITTED also if the :newraw option is set for the
InnoDB system tablespace.
Reviewed by: Vladislav Lesin
page_delete_rec_list_end(): Do not attempt to scrub the data of
an empty record.
The test case would reproduce a debug assertion failure in branches
where commit 358921ce32 (MDEV-26938)
is present. MariaDB Server 10.6 only supports ascending indexes,
and in those, the empty string would always be sorted first, never
last in a page.
Nevertheless, we fix the bug also in 10.6, in case it would be
reproducible in a slightly different scenario.
Reviewed by: Thirunarayanan Balathandayuthapani
Problem:
=======
- During slow shutdown, expectation is that change buffer index to
merge all the change buffer pages and free all the change
buffer pages except root. But InnoDB fails to free the
freed change buffer pages from segment and returns it to
system tablespace.
Solution:
=========
- During slow shutdown, remove all pages from free list of change
buffer and frees it to system tablespace
While updating the persistent defragmentation statistics
for the table, InnoDB opens the table only if it is in cache.
If dict_table_open_on_id() fails to find the table in cache
then it fails to unfreeze dict_sys.latch. This lead to crash
Problem:
=======
- There are two failures occurs for this test case:
(1) set global innodb_buf_flush_list_now=1 doesn't make sure that pages
are being flushed.
(2) InnoDB page cleaner thread aborts while writing the checkpoint information.
Problem is that When InnoDB startup aborts, InnoDB changes the shutdown
state to SRV_SHUTDOWN_EXIT_THREADS. By changing the shutdown state, InnoDB
doesn't advance the log_sys.lsn (avoids fil_names_clear()).
After InnoDB shutdown(innodb_shutdown()) is being initiated, shutdown state
again changed to SRV_SHUTDOWN_INITIATED. This leads the page cleaner thread
to fail with assertion ut_ad(srv_shutdown_state > SRV_SHUTDOWN_INITIATED)
in log_write_checkpoint_info()
Solution:
=========
(1) In order to avoid (1) failure, InnoDB can make the
variable innodb_max_dirty_pages_pct_lwm, innodb_max_dirty_pages_pct to 0.
Also make sure that InnoDB doesn't have any dirty pages in buffer pool
by adding wait_condition.
(2) Avoid changing the srv_shutdown_state to SRV_SHUTDOWN_EXIT_THREADS
when the InnoDB startup aborts
When REPAIRing a CSV table, the CSV engine marks the table as having
no rows in the case that there are rows only in memory but not yet
written to disk. In this case, there's actually nothing to repair,
since nothing exists on disk; make REPAIR idempotent for CSV tables.
This controls which linux implementation to use for
innodb_use_native_aio=ON.
innodb_linux_aio=auto is equivalent to innodb_linux_aio=io_uring when
it is available, and falling back to innodb_linux_aio=aio when not.
Debian packaging is no longer aio exclusive or uring, so
for those older Debian or Ubuntu releases, its a remove_uring directive.
For more recent releases, add mandatory liburing for consistent packaging.
WITH_LIBAIO is now an independent option from WITH_URING.
LINUX_NATIVE_AIO preprocessor constant is renamed to HAVE_LIBAIO,
analogous to existing HAVE_URING.
tpool::is_aio_supported(): A common feature check.
is_linux_native_aio_supported(): Remove. This had originally been added in
mysql/mysql-server@0da310b69d in 2012
to fix an issue where io_submit() on CentOS 5.5 would return EINVAL
for a /tmp/#sql*.ibd file associated with CREATE TEMPORARY TABLE.
But, starting with commit 2e814d4702 InnoDB
temporary tables will be written to innodb_temp_data_file_path.
The 2012 commit said that the error could occur on "old kernels".
Any GNU/Linux distribution that we currently support should be based
on a newer Linux kernel; for example, Red Hat Enterprise Linux 7
was released in 2014.
tpool::create_linux_aio(): Wraps the Linux implementations:
create_libaio() and create_liburing(), each defined in separate
compilation units (aio_linux.cc, aio_libaio.cc, aio_liburing.cc).
The CMake definitions are simplified using target_sources() and
target_compile_definitions(), all available since CMake 2.8.12.
With this change, there is no need to include ${CMAKE_SOURCE_DIR}/tpool
or add TPOOL_DEFINES flags anymore, target_link_libraries(lib tpool)
does all that.
This is joint work with Daniel Black and Vladislav Vaintroub.
In a UBSAN debug build, the comparisons with next_mrec_end are made
with index->online_log's head/tail members' block ptr with a sort buffer
size offset (1048576).
The logic that flows though to this point means that even srv_sort_buf_size
above a null pointer wouldn't contain the value of next_mrec_end.
As such this is a UBSAN type fix where we first check if the
head.block / tail.block is null before doing the asserts around
this debug condition. This would be required for the assertions
conditions not to segfault anyway.
Problem:
=========
- During truncation of a fulltext table, InnoDB does create the
table and does insert the default config fts values in fulltext
common config table using create table transaction.
- Before committing the create table transaction, InnoDB does update
the dictionary by loading the stopword into fts cache
and write the stopword configuration into fulltext common
config table by creating a separate transaction. This leads to
lock wait timeout error and rollbacks the transaction.
- But truncate table holds dict_sys.lock and rollback also
tries to acquire dict_sys.lock. This leads to assertion during
rollback.
Solution:
=========
ha_innobase::truncate(): Commit the create table transaction
before updating the dictionary after create table.
MSAN has been updated since 2022 when this macro was added
and as such the working around MSAN's deficient understanding
of the fstat/stat syscall behaviour at the time is no longer
required.
As an effective no-op a straight removal is sufficient.
Problem:
=======
- InnoDB unpoisons the freed page memory to make sure that
no other thread uses this freed page. In buf_pool_t::close(),
InnoDB unmap() the buffer pool memory during shutdown or it
encountered during startup. Later at some point, server
re-uses the same virtual address using mmap() and writes into
memory region. This leads to use_after_poison error.
This issue doesn't happen in latest clang and gcc version.
Older version of clang and gcc can still fail with this error.
ASAN should unpoison the memory while reusing the same virtual
address. This issue was already raised in
https://github.com/google/sanitizers/issues/1705
Fix:
===
In order to avoid this failure, let's unpoison the buffer
pool memory explictly during buf_pool_t::close() for
lesser than gcc-14 and clang-18 version.
Now that RocksDB has been synced up to 6.29 which includes the changes
mentioned in the CMake comment support for building on non-Linux aarch64
OSes can be enabled.
As of CMake 3.24 CMAKE_COMPILER_IS_GNU(CC|CXX) are deprecated and should
be replaced with CMAKE_(C|CXX)_COMPILER_ID which were introduced with
CMake 2.6.
In the main.plugin this function is called assuming the function
prototype int (*)(THD *, st_mysql_show_var *, void *, system_status_var *, enum_var_type)'
as changed in b4ff64568c.
We update the ha_example::show_func_example to match the prototype
on which it is called.
If the execution of the two reads in log_t::get_lsn_approx() is
interleaved with concurrent writes of those fields in
log_t::write_buf() or log_t::persist(), the returned approximation
will be an upper bound. If log_t::append_prepare_wait() is pending,
the approximation could be a lower bound.
We must adjust each caller of log_t::get_lsn_approx() for the
possibility that the return value is larger than
MAX(oldest_modification) in buf_pool.flush_list.
af_needed_for_redo(): Add a comment that explains why the glitch
is not a problem.
page_cleaner_flush_pages_recommendation(): Revise the logic for
the unlikely case cur_lsn < oldest_lsn. The original logic would have
invoked af_get_pct_for_lsn() with a very large age value, which
would likely cause an overflow of the local variable lsn_age_factor,
and make pct_for_lsn a "random number". Based on that value,
total_ratio would be normalized to something between 0.0 and 1.0.
Nothing extremely bad should have happened in this case;
the innodb_io_capacity_max should not be exceeded.
buf_pool_t::resize(): After successfully shrinking the buffer pool,
announce the success. The size had already been updated in shrunk().
After failing to shrink the buffer pool, re-enable the adaptive
hash index if it had been enabled.
Reviewed by: Debarun Banerjee