In commit 1c55b845e0 (MDEV-32932) the
test mariabackup.innodb_ddl_on_intermediate_table was introduced but
disabled.
xb_load_single_table_tablespace(): Properly handle missing FTS_ tables.
backup_file_op_fail(): Properly handle FILE_DELETE records.
The 'if (!m_abort) break' condition was inverted by accident.
Constrain the test case to environments where there is cgroupv2
runtime environment which is the same case that will pass a memory
pressure initialization.
Remove the explicit garbage_collection trigger as it hides the abnormal
termination error on the event loop for memory pressure. This
also means there is no support in non-cgroupv2 environments
(possibly some container environments).
As the trigger to memory pressure is via a different thread we
need to wait until a "[mM]emory pressure" log message is there to
know it has succeeded or failed.
Thanks Kristian Nielsen for noticing and review.
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.
In crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.
Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.
recv_max_page_lsn, recv_lsn_checks_on: Remove.
recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.
recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".
recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.
recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.
buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.
buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.
buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.
Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.
recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.
buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.
IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.
fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.
Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.
Reviewed by: Debarun Banerjee
In some places, there were redundant comparisons against TRX_SYS_SPACE
or SRV_TMP_SPACE_ID. The temporary tablespace is never the subject of
log-based recovery.
Also, consistently check for SRV_SPACE_ID_UPPER_BOUND.
Reviewed by: Debarun Barerjee
In mariadb-backup --backup, we only have to invoke the undo_space_trunc
and log_file_op callbacks as well as validate the mini-transaction
checksums. There is absolutely no need to access recv_sys.pages or
recv_spaces, or to allocate a decrypt_buf in case of innodb_encrypt_log=ON.
This is what the new mode recv_sys_t::store::BACKUP will do.
In the skip_the_rest: loop, the main thing is to process all FILE_ records
until the end of the log is reached. Additionally, we must process
INIT_PAGE and FREE_PAGE records in the same way as they would be
during storing == YES.
This was measured to reduce the CPU time between the messages
"InnoDB: Multi-batch recovery needed at LSN" and
"InnoDB: End of log at LSN"
by some 20%.
recv_sys_t::store: A ternary enumeration that specifies how records
should be stored: NO, BACKUP, or YES.
recv_sys_t::parse(), recv_sys_t::parse_mtr(), recv_sys_t::parse_pmem():
Replace template<bool store> with template<store storing>.
store_freed_or_init_rec(): Simplify some logic. We can look up also
the system tablespace.
Reviewed by: Debarun Banerjee
There where unused variable. They were not conditional
on defines, so removed them.
Added an error handing in proc_object if there was no db
as subsequent operations would have failed.
CMake rewriting the tests causes Mroonga to be un-buildable
on build environments where there source directory is read
only.
In the test results, the version wasn't particularly important.
Remove the version dependence of tests.
When calculate_cond_selectivity_for_table() takes into account multi-
column selectivities from range access, it tries to take-into account
that selectivity for some columns may have been already taken into account.
For example, for range access on IDX1 using {kp1, kp2}, the selectivity
of restrictions on "kp2" might have already been taken into account
to some extent.
So, the code tries to "discount" that using rec_per_key[] estimates.
This seems to be wrong and unreliable: the "discounting" may produce a
rselectivity_multiplier number that hints that the overall selectivity
of range access on IDX1 was greater than 1.
Do a conservative fix: if we arrive at conclusion that selectivity of
range access on condition in IDX1 >1.0, clip it down to 1.
storage/connect/tabfmt.cpp:419:24: error: '%.3d' directive writing between 3 and 10 bytes into a region of size 5 [-Werror=format-overflow=]
419 | sprintf(buf, "COL%.3d", i+1);
row_purge_reset_trx_id(): Reserve large enough offsets for accomodating
the maximum width PRIMARY KEY followed by DB_TRX_ID,DB_ROLL_PTR.
Reviewed by: Thirunarayanan Balathandayuthapani
purge_sys_t::get_page(): Avoid accessing a freed reference to pages[id]
after pages.erase(id). This heap-use-after-free would sometimes be
caught by AddressSanitizer.
purge_sys_t::iterator::free_history_rseg(): Do not crash if undo=nullptr
(the database is corrupted).
Reviewed by: Debarun Banerjee
Analysis:
The value gets appended as string instead of unescaped json value
Fix:
Append the value of json in a temporary string and then store it in the
field instead of directly storing as string.
Another chance for cutting back overhead due to C++ exceptions being
enabled; the `dict_sys_t` class is a good candidate because its
locking methods are called frequently.
Binary size reduction this time:
text data bss dec hex filename
24448622 2436488 9473537 36358647 22ac9f7 build/release/sql/mariadbd
24448474 2436488 9473601 36358563 22ac9a3 build/release/sql/mariadbd
MariaDB is compiled with C++ exceptions enabled, and that disallows
some optimizations (e.g. the stack must always be unwinding-safe). By
adding `noexcept` to functions that are guaranteed to never throw,
some of these optimizations can be regained. Low-level locking
functions that are called often are a good candidate for this.
This shrinks the executable a bit (tested with GCC 14 on aarch64):
text data bss dec hex filename
24448910 2436488 9473185 36358583 22ac9b7 build/release/sql/mariadbd
24448622 2436488 9473537 36358647 22ac9f7 build/release/sql/mariadbd
Don't allow the referencing key column from NULL TO NOT NULL
when
1) Foreign key constraint type is ON UPDATE SET NULL
2) Foreign key constraint type is ON DELETE SET NULL
3) Foreign key constraint type is UPDATE CASCADE and referenced
column declared as NULL
Don't allow the referenced key column from NOT NULL to NULL
when foreign key constraint type is UPDATE CASCADE
and referencing key columns doesn't allow NULL values
get_foreign_key_info(): InnoDB sends the information about
nullability of the foreign key fields and referenced key fields.
fk_check_column_changes(): Enforce the above rules for COPY
algorithm
innobase_check_foreign_drop_col(): Checks whether the dropped
column exists in existing foreign key relation
innobase_check_foreign_low() : Enforce the above rules for
INPLACE algorithm
dict_foreign_t::check_fk_constraint_valid(): This is used
by CREATE TABLE statement to check nullability for foreign
key relation.
The commit cd5808eb introduced a union as a storage for the format
argument passed to the internal API fmt::detail::make_arg. This was done
to solve the issue that the internal API no longer accepted temporary
variables.
However, it's generally better to avoid using internal APIs, as they are
more likely to have breaking changes in the future. Instead, we can use
the public API fmt::dynamic_format_arg_store to dynamically build the
argument list. This API accepts temporary variables, and its behavior is
more stable than the internal API. `libfmt.cmake` is updated to reflect
the change as well.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
The method was declared to return an unsigned integer, but it is
really a boolean (and used as such by all callers).
A secondary change is the addition of "const" and "noexcept" to this
method.
In ha_mroonga.cpp, I also added "inline" to the two helper methods of
referenced_by_foreign_key(). This allows the compiler to flatten the
method.
We have found that my_errno can be "passed" to the next commad in some cases.
It is practically impossible to check/fix all cases of my_errno in the server,
plugins and engines so we will reset it as we reset other errors.
The test case will be fixed by CSV engine fix so will be added with it
(see part2).
log_file_t::read(), log_file_t::write(): Invoke pread() or pwrite()
directly, so that we can give more accurate diagnostics in case of
a failure, and so that we will avoid the overhead of setting up 5(!)
stack frames and related objects.
tpool::pwrite(): Add a missing const qualifier.
Added new test scenario in galera.galera_bf_kill
test to make the issue surface. The tetst scenario has
a multi statement transaction containing a KILL command.
When the KILL is submitted, another transaction is
replicated, which causes BF abort for the KILL command
processing. Handling BF abort rollback while executing
KILL command causes node hanging, in this scenario.
sql_kill() and sql_kill_user() functions have now fix,
to perform implicit commit before starting the KILL command
execution. BEcause of the implicit commit, the KILL execution
will not happen inside transaction context anymore.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>