In an addition to test rpl.rpl_parallel_sbm added by MDEV-32265, the
test uses sleep statements alone to test Seconds_Behind_Master with
delayed replication. On slow running machines, the test can pass the
intended MASTER_DELAY duration and Seconds_Behind_Master can become
0, when the test expects the transaction to still be actively in a
delaying state.
This can be consistently reproduced by adding a sleep statement
before the call to
--let = query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1)
to sleep past the delay end point.
This patch fixes this by locking the table which the delayed
transaction targets so Second_Behind_Master cannot be updated before
the test reads it for validation.
Problem:
========
Currently import operation fails with schema mismatch when
cfg file has hidden fts document id and hidden fts document index.
Fix:
====
To fix this issue, simply add the fts doc id column,
indexes in table definition and try to import the table.
In case of success:
1) update the fts document id in sys columns.
2) update the number of columns in sys tables.
3) insert the new fts index entry in sys indexes table
and sys fields.
4) Reload the table with new table definition
This removes the error:
"Failed to load slave replication state from table mysql.gtid_slave_pos:
1017: Can't find file: './mysql/' (errno: 2 "No such file or directory")
- Use "new" math library WOLFSSL_SP_MATH_ALL, which is now promoted by
WolfSSL for faster performance. "fastmath" we used previously is going
to be deprecated, it was not really always fast.
- Optimize common RSA math operations with WOLFSSL_HAVE_SP_RSA
- Incorporate assembly optimizations, currently for Intel x64 only
This patch significantly reduces execution time for SSL tests like
main.ssl-big and main.ssl_connect, which now run 2 to 3 times faster.
Notably, when this patch is applied to 11.4, server startup in with
ephemeral certificates becomes approximately 10x faster due to optimized
wolfSSL_EVP_PKEY_keygen().
Additionally, refactored WolfSSL by removing old workarounds and
consolidating wolfssl and wolfcrypt into a single library wolfssl, just
like it was done in WolfSSL's own CMake.
When testing MariaDB on Arm64, a stall issue will occur, jira link:
https://jira.mariadb.org/browse/MDEV-28430.
The stall occurs because of an unexpected circular reference in the
LF_PINS->purgatory list which is traversed in lf_pinbox_real_free().
We found that on Arm64, ABA problem in LF_ALLOCATOR->top list was not
solved, and various undefined problems will occur, including circular
reference in LF_PINS->purgatory list.
The following codes are used to solve ABA problem, code copied
from below link.
cb4c271355/mysys/lf_alloc-pin.c (L501-)#L505
do
{
503 node= allocator->top;
504 lf_pin(pins, 0, node);
505 } while (node != allocator->top && LF_BACKOFF());
1. ABA problem on Arm64
Combine the below steps to analyze how ABA problem occur on Arm64, the
relevant codes in steps are simplified, code line numbers below are in
MariaDB v10.4.
------------------------------------------------------------------------
Abnormal case.
Initial state: pin = 0, top = A, top list: A->B
T1 T2
step1. write top=B //seq-cst, #L517
step2. write A->next= "any"
step3. read pin==0 //relaxed, #L295
step1. write pin=A //seq-cst, #L504
step2. read old value of top==A //relaxed, #L505
step3. next=A->next="any" //#L517
step4. write A->next=B,top=A //#L420-435
step4. CAS(top,A,next) //#L517
step5. write pin=0 //#L521
------------------------------------------------------------------------
Above case is due to T1.step2 reading the old value of top, causing
"T1.step3, T1.step4" and "T2.step4" to occur at the same time, in other
words, they are not mutually exclusive.
It may happen that T2.step4 is sandwiched between T1.step3 and T1.step4,
which cause top to be updated to "any", which may be in-use or invalid
address.
2. Analyze above issue with Dekker's algorithm
Above problem can be mapped to Dekker's algorithm, link is as below
https://en.wikipedia.org/wiki/Dekker%27s_algorithm.
The following extracts the read and write operations on 'top' and 'pin',
and maps them to Dekker's algorithm to analyze the root cause.
------------------------------------------------------------------------
Initial state: top = A, pin = 0
T1 T2
store_seq_cst(pin, A) // write pin store_seq_cst(top, B) //write top
rt= load_relaxed(top) // read top rp= load_relaxed(pin) //read pin
if (rt == A && rp == 0) printf("oops\n"); // will "oops" be printed?
------------------------------------------------------------------------
How T1 and T2 enter their critical section:
(1) T1, write pin, if T1 reads that top has not been updated, T1 enter
its critical section(T1.step3 and T1.step4, try to obtain 'A', #L517),
otherwise just give up (T1 without priority).
(2) T2, write top, if T2 reads that pin has not been updated, T2 enter
critical section(T2.step4, try to add 'A' to top list again, #L420-435),
otherwise wait until pin!=A (T2 with priority).
In the previous code, due to load 'top' and 'pin' with relaxed semantic,
on arm and ppc, there is no guarantee that the above critical sections
are mutually exclusive, in other words, "oops" will be printed.
This bug only happens on arm and ppc, not x86. On current x86
implementation, load is always seq-cst (relaxed and seq-cst load
generates same machine code), as shown in https://godbolt.org/z/sEzMvnjd9
3. Fix method
Add sequential-consistency semantic to read 'top' in #L505(T1.step2),
Add sequential-consistency semantic to read "el->pin[i]" in #L295
and #L320.
4. Issue reproduce
Add "delay" after #L503 in lf_alloc-pin.c, When run unit.lf, can quickly
get segment fault because "top" point to an invalid address. For detail,
see comment area of below link.
https://jira.mariadb.org/browse/MDEV-28430.
5. Futher improvement
To make this code more robust and safe on all platforms, we recommend
replacing volatile with C11 atomics and to fix all data races. This will
also make the code easier to reason.
Signed-off-by: Xiaotong Niu <xiaotong.niu@arm.com>
Thanks to Yury Chaikou for finding this problem (and the fix).
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
my_malloc_size_cb_func() can be called from contexts where it is not safe to
wait for LOCK_thd_kill, for example while holding LOCK_plugin. This could
lead to (probably very unlikely) deadlock of the server.
Fix by skipping the enforcement of --max-session-mem-used in the rare cases
when LOCK_thd_kill cannot be obtained. The limit will instead be enforced on
the following memory allocation. This does not significantly degrade the
behaviour of --max-session-mem-used; that limit is in any case only enforced
"softly", not taking effect until the next point at which the thread does a
check_killed().
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Aria temporary tables account allocated memory as specific to the current
THD. But this fails for slave threads, where the temporary tables need to be
detached from any specific THD.
Introduce a new flag to mark temporary tables in replication as "global",
and use that inside Aria to not account memory allocations as thread
specific for such tables.
Based on original suggestion by Monty.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
purge_sys_t::truncating_tablespace(): Clamp the
innodb_max_undo_log_size to the maximum number of pages
before converting the result into a 32-bit unsigned integer.
This fixes up commit f8c88d905b (MDEV-33213).
In later major versions, we would use 32-bit unsigned integer here
due to commit ca501ffb04
and the code would crash also on 64-bit processors.
Reviewed by: Debarun Banerjee
CapabilityBoundingSet included CAP_IPC_LOCK in MDEV-9095, however
it requires that the executable has the capability marked in extended
attributes also.
The alternate to this is raising the RLIMIT_MEMLOCK for the service/
process to be able to complete the mlockall system call. This needs to
be adjusted to whatever the MariaDB server was going to allocate.
Rather than leave the non-obvious mapping of settings and tuning,
add the capability so its easier for the user.
We set the capability, if possible, but may never be used depending
on user settings. As such in the Debian postinst script, don't
complain if this fails.
The CAP_IPC_LOCK also facilitates the mmaping of huge memory pages.
(see man mmap), like mariadb uses with --large-pages.
For queries with derived tables populated having some side-effect, we
will fill such a derived table more than once, but without clearing
its rows. Consequently it will have duplicate rows.
An example query exhibiting the problem is
SELECT STRAIGHT_JOIN c1 FROM t1 JOIN (SELECT @a := 0) x;
Since mysql_derived_fill will, for UNCACHEABLE_DEPENDENT tables, drop
all rows and repopulate, we relax the condition at line 1204: rather
than assume all uncacheable values prevent early return, we now
allow an early return for uncacheable values other than
UNCACHEABLE_DEPENDENT. In general, we only populate derived tables
once unless they're dependent tables.
buf_read_ahead_linear(): If buf_pool.watch_is_sentinel(*bpage),
do not attempt to read the page frame because the pointer would be null
for the elements of buf_pool.watch[].
Hitting this bug requires the use of a non-default value of
innodb_change_buffering.
The script wsrep_sst_backup was introduced on MariaDB 10.3 in commit
9b2fa2a. The new script was automatically included in RPM packages but not
in Debian packages (which started to fail on warning about stray file).
Include wsrep_sst_backup in the mariadb-server-10.5+ package, and also
include a stub man page so that packaging of a new script is complete.
Related:
https://galeracluster.com/documentation/html_docs_20210213-1355-master/documentation/backup-cluster.html
This commit was originally submitted in May 2022 in
https://github.com/MariaDB/server/pull/2129 but upstream indicated only
in May 2023 that it might get merged, thus this is for a later release.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
libxml2 2.12.0 made `xmlGetLastError()` return `const` pointer:
61034116d0
Clang 16 does not like this:
error: assigning to 'xmlErrorPtr' (aka '_xmlError *') from 'const xmlError *' (aka 'const _xmlError *') discards qualifiers
error: cannot initialize a variable of type 'xmlErrorPtr' (aka '_xmlError *') with an rvalue of type 'const xmlError *' (aka 'const _xmlError *')
Let’s update the variables to `const`.
For older versions, it will be automatically converted.
But then `xmlResetError(xmlError*)` will not like the `const` pointer:
error: no matching function for call to 'xmlResetError'
note: candidate function not viable: 1st argument ('const xmlError *' (aka 'const _xmlError *')) would lose const qualifier
Let’s replace it with `xmlResetLastError()`.
ALso remove `LIBXMLDOC::Xerr` protected member property.
It was introduced in 65b0e5455b
along with the `xmlResetError` calls.
It does not appear to be used for anything.
These changes are submitted under the BSD 3-clause License.
The original ticket describes a server crash when using a UDF in the WHERE clause of a view. The crash also happens when using a UDF in the WHERE clause of a SELECT that uses a sub-query in the FROM clause.
When the UDF does not have a _deinit function the server crashes in udf_handler::cleanup (sql/item_func.cc:3467).
When the UDF has both an _init and a _deinit function but _init does not allocate memory for initid->ptr the server crashes in udf_handler::cleanup (sql/item_func.cc:3467).
When the UDF has both an _init and a _deinit function and allocates/deallocates memory for initid->ptr the server crashes in the memory deallocation of the _deinit function.
The sequence of events seen are:
1. A UDF, U, is created for the query.
2. The UDF _init function is called using U->initid.
3. U is cloned for the sub-query using the [default|implicit] copy constructor, resulting in V.
4. The UDF _init function is called using V->initid. U->initid and V->initid are the same value.
5. The UDF function is called.
6. The UDF _deinit function is called using U->initid. If any memory was allocated for initid->ptr it is deallocated here.
7. udf_handler::cleanup deletes the U->buffers String array.
8. The UDF _deinit function is called using V->initid. If any memory was allocated for initid->ptr it was previously deallocated and _deinit crashes the server.
9. udf_handler::cleanup deletes the V->buffers String array. V->buffers was the same values as U->buffers which was already deallocated. The server crashes.
The solution is to create a[n explicit] copy constructor for udf_handler which sets not_original to true. Later, not_original is set back to false (0) after udf_handler::fix_fields has set up a new value for initid->ptr.
While a -f debian/mariadb-plugin-columnstore.install idempotent check
existed, the tying of the install file to the control file has some
weaknesses.
Used sed as an alternative to replace the debian/control
mariadb-plugin-columnstore package defination and replace it with the
one from the columnstore submodule.
The `unused-but-set-variable` warning is raised on MacOS from the
`posix_fadvise` standin macro, since offset is often otherwise unused. Add a
cast to absorb this warning.
Signed-off-by: Trevor Gross <tmgross@umich.edu>
ha_innobase::check_if_supported_inplace_alter(): Require ALGORITHM=COPY
when creating a FULLTEXT INDEX on a versioned table.
row_merge_buf_add(), row_merge_read_clustered_index(): Remove the parameter
or local variable history_fts that had been added in the attempt to fix
MDEV-25004.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
After MDEV-31400, plugins are allowed to ask for retries when failing
initialisation. However, such failures also cause plugin system
variables to be deleted (plugin_variables_deinit()) before retrying
and are not re-added during retry.
We fix this by checking that if the plugin has requested a retry the
variables are not deleted. Because plugin_deinitialize() also calls
plugin_variables_deinit(), if the retry fails, the variables will
still be deleted.
Alternatives considered:
- remove the plugin_variables_deinit() from plugin_initialize() error
handling altogether. We decide to take a more conservative approach
here.
- re-add the system variables during retry. It is more complicated
than simply iterating over plugin->system_vars and call
my_hash_insert(). For example we will need to assign values to
the test_load field and extract more code from test_plugin_options(),
if that is possible.
fts_doc_id_cmp(): Replaces several duplicated functions for
comparing two doc_id_t*. On IA-32, AMD64, ARMv7, ARMv8, RISC-V
this should make use of some conditional ALU instructions.
On POWER there will be conditional jumps. Unlike the original
functions, these will return the correct result even if the
difference of the two doc_id does not fit in the int data type.
We use static_assert() and offsetof() to check at compilation time
that this function is compatible with the rbt_create() calls.
fts_query_compare_rank(): As documented, return -1 and not 1
when the rank are equal and r1->doc_id < r2->doc_id. This will
affect the result of ha_innobase::ft_read().
fts_ptr2_cmp(), fts_ptr1_ptr2_cmp(): These replace
fts_trx_table_cmp(), fts_trx_table_id_cmp().
The fts_savepoint_t::tables will be sorted by dict_table_t*
rather than dict_table_t::id. There was no correctness bug in
the previous comparison predicates. We can avoid one level of
unnecessary pointer dereferencing in this way.
Actually, fts_savepoint_t is duplicating trx_t::mod_tables.
MDEV-33401 was filed about removing it.
The added unit test innodb_rbt-t covers both the previous buggy comparison
predicate and the revised fts_doc_id_cmp(), using keys which led to
finding the bug. Thanks to Shaohua Wang from Alibaba for providing the
example and the revised comparison predicate.
Reviewed by: Thirunarayanan Balathandayuthapani
The innodb_changed_pages plugin only was part of XtraDB, never InnoDB.
It would be useful for incremental backups.
We will remove the code from mariadb-backup for now, because it cannot
serve any useful purpose until the server part has been implemented.
fts_doc_ids_sort(): Sort an array of doc_id_t by C++11 std::sort().
fts_doc_id_cmp(), ib_vector_sort(): Remove. The comparison was
returning an incorrect result when the difference exceeded the int range.
Reviewed by: Thirunarayanan Balathandayuthapani
In MariaDB up to 10.11, the test_if_cheaper_ordering() code (that tries
to optimizer how GROUP BY is executed) assumes that if a table scan is used
then if there is any index usable by GROUP BY it will be used.
The reason MySQL 10.4 provides a better plan is because of two differences:
- Plans using 'ref' has a cost of 1/10 of what it should be (as a
protection against table scans). This is why 'ref' is used in 10.4
and not in 10.5.
- When 'ref' is used, then GROUP BY will not use an index for GROUP BY.
In MariaDB 10.5 the chosen plan is a table scan (as it calculated to be
faster) but as 'ref' is not used, the test_if_cheaper_ordering()
optimizer phase decides (as ref is not usd) to use an index for GROUP BY,
which has bad performance.
Description of fix:
- All new code is protected by the "optimizer_adjust_secondary_key_costs"
variable, which is now a bit map, and is only executed if the option
"disable_forced_index_in_group_by" set.
- Corrects GROUP BY handling in test_if_cheaper_ordering() by making
the choise of using and index with GROUP BY cost based instead of rule
based.
- Adds TIME_FOR_COMPARE to all costs, when using group by, to make
read_time, index_scan_time and range_cost comparable.
Other things:
- Made optimizer_adjust_secondary_key_costs a bit map (compatible with old
code).
Notes:
Current code ignores costs for the algorithm used when doing GROUP
BY on the first table:
- Create an in-memory temporary table for handling group by and doing a
filesort of the result file
We can probably in 10.6 continue to ignore this cost.
This patch should NOT be merged to 11.0 series (not needed in 11.0).
This test was prone to failures for a few reasons, summarized below:
1) MDEV-32168 introduced “only_running_threads=1” to
slave_stop.inc, which allowed the stop logic to bypass an
attempting-to-reconnect IO thread. That is, the IO thread could
realize the master shutdown in `read_event()`, and thereby call into
`try_to_reconnect()`. This would leave the IO thread up when the
test expected it to be stopped. Fixed by explicitly stopping the
IO thread and allowing an error state, as the above case would
lead to errno 2003.
2) On slow systems (or those running profiling tools, e.g. MSAN),
the waiting-for-ack transaction can complete before the system
processes the `SHUTDOWN WAIT FOR ALL SLAVES`. There was shutdown
preparation logic in-between the transaction and shutdown itself,
which contributes to this problem. This patch also moves this
preparation logic before the transaction, so there is less to do
in-between the calls.
3) Changed work-around for MDEV-28141 to use debug_sync instead
of sleep delay, as it was still possible to hit the bug on very
slow systems.
4) Masked MTR variable reset with disable/enable query log
Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
A race condition with the SQL thread, where depending on if it was
killed before or after it had executed the fake/generated IGN_GTIDS
Gtid_list_log_event, may or may not update gtid_slave_pos with the
position of the ignored events. Then, the slave would be restarted
while resetting IGNORE_DOMAIN_IDS to be empty, which would result in
the slave requesting different starting locations, depending on
whether or not gtid_slave_pos was updated. And, because previously
ignored events could now be requested and executed (no longer
ignored), their presence would fail the test.
This patch fixes this in two ways. First, to use GTID positions for
synchronization rather than binlog file positions. Then second, to
synchronize the SQL thread’s gtid_slave_pos with the ignored events
before killing the SQL thread.
To consistently reproduce the test failure, the following patch can
be applied:
diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
index f51f5b7deec..de62233acff 100644
--- a/sql/log_event_server.cc
+++ b/sql/log_event_server.cc
@@ -3686,6 +3686,12 @@ Gtid_list_log_event::do_apply_event(rpl_group_info *rgi)
void *hton= NULL;
uint32 i;
+ sleep(1);
+ if (rli->sql_driver_thd->killed || rli->abort_slave)
+ {
+ return 0;
+ }
+
Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
When executing a statement of the form
SELECT AGGR_FN(DISTINCT c1, c2,..,cn) FROM t1,
where AGGR_FN is an aggregate function such as COUNT(), AVG() or SUM(),
and a unique index exists on table t1 covering some or all of the
columns (c1, c2,..,cn), the retrieved values are inherently unique.
Consequently, the need for de-duplication imposed by the DISTINCT
clause can be eliminated, leading to optimization of aggregation
operations.
This optimization applies under the following conditions:
- only one table involved in the join (not counting const tables)
- some arguments of the aggregate function are fields
(not functions/subqueries)
This optimization extends to queries of the form
SELECT AGGR_FN(c1, c2,..,cn) GROUP BY cx,..cy
when a unique index covers some or all of the columns
(c1, c2,..cn, cx,..cy)
MDEV-33308 CHECK TABLE is modifying .frm file even if --read-only
As noted in commit d0ef1aaf61,
MySQL as well as older versions of MariaDB server would during
ALTER TABLE ... IMPORT TABLESPACE write bogus values to the
PAGE_MAX_TRX_ID field to pages of the clustered index, instead of
letting that field remain 0.
In commit 8777458a6e this field
was repurposed for PAGE_ROOT_AUTO_INC in the clustered index root page.
To avoid trouble when upgrading from MySQL or older versions of MariaDB,
we will try to detect and correct bogus values of PAGE_ROOT_AUTO_INC
when opening a table for the first time from the SQL layer.
btr_read_autoinc_with_fallback(): Add the parameters to mysql_version,max
to indicate the TABLE_SHARE::mysql_version of the .frm file and the
maximum value allowed for the type of the AUTO_INCREMENT column.
In case the table was originally created in MySQL or an older version of
MariaDB, read also the maximum value of the AUTO_INCREMENT column from
the table and reset the PAGE_ROOT_AUTO_INC if it is above the limit.
dict_table_t::get_index(const dict_col_t &) const: Find an index that
starts with the specified column.
ha_innobase::check_for_upgrade(): Return HA_ADMIN_FAILED if InnoDB
needs upgrading but is in read-only mode. In this way, the call to
update_frm_version() will be skipped.
row_import_autoinc(): Adjust the AUTO_INCREMENT column at the end of
ALTER TABLE...IMPORT TABLESPACE. This refinement was suggested by
Debarun Banerjee.
The changes outside InnoDB were developed by Michael 'Monty' Widenius:
Added print_check_msg() service for easy reporting of check/repair messages
in ENGINE=Aria and ENGINE=InnoDB.
Fixed that CHECK TABLE do not update the .frm file under --read-only.
Added 'handler_flags' to HA_CHECK_OPT as a way for storage engines to
store state from handler::check_for_upgrade().
Reviewed by: Debarun Banerjee
This patch fixes the issue with passing the DEFAULT or IGNORE values to
positional parameters for some kind of SQL statements to be executed
as prepared statements.
The main idea of the patch is to associate an actual value being passed
by the USING clause with the positional parameter represented by
the Item_param class. Such association must be performed on execution of
UPDATE statement in PS/SP mode. Other corner cases that results in
server crash is on handling CREATE TABLE when positional parameter
placed after the DEFAULT clause or CALL statement and passing either
the value DEFAULT or IGNORE as an actual value for the positional parameter.
This case is fixed by checking whether an error is set in diagnostics
area at the function pack_vcols() on return from the function pack_expression()
This is the prerequisite patch to refactor the method
Item_default_value::fix_fields.
The former implementation of this method was extracted and placed
into the standalone function make_default_field() and the method
Item_default_value::tie_field(). The motivation for this modification
is upcoming changes for core implementation of the task MDEV-15703
since these functions will be used from several places within
the source code.
row_discard_tablespace(): Do not invoke dict_index_t::clear_instant_alter()
because that would corrupt any adaptive hash index entries in the table.
row_import_for_mysql(): Invoke dict_index_t::clear_instant_alter()
after detaching any adaptive hash index entries.
Recording both is useful on a replication relay when the backup
can be used to replace the server, or ack as a new replica to the
server.
If an option=2, commented is selected, allow the alternate option
to exist.
This still disables --dump-slave=1 --master-data=1 as having the
a CHANGE MASTER TO and START SLAVE on different positions would be
confusing and dangerious to the try to execute the output. The
previous behaviour of silently disabling --master-data occurs in
this case.
The commented code related to --dump-slave/--master-data is greatly
expanded for human consumption.
A redundant opt_slave_data= 0 was removed from get_opts. If
--dump-slave=1 or 2, then the only possible value of --master-data
is a valid one.
Re-order to preference gtid based replication.
Based of code from Elena Stepanova.
Review by: Brandon Nesterenko and Anel Husakovic