Problem:
========
- InnoDB reads the length of the variable length field wrongly
while applying the modification log of instant table.
Solution:
========
rec_init_offsets_comp_ordinary(): For the temporary instant
file record, InnoDB should read the length of the variable length
field from the record itself.
When sending the server default collation ID to the client
in the handshake packet, translate a 2-byte collation ID
to the ID of the default collation for the character set.
When testing MariaDB on Arm64, a stall issue will occur, jira link:
https://jira.mariadb.org/browse/MDEV-28430.
The stall occurs because of an unexpected circular reference in the
LF_PINS->purgatory list which is traversed in lf_pinbox_real_free().
We found that on Arm64, ABA problem in LF_ALLOCATOR->top list was not
solved, and various undefined problems will occur, including circular
reference in LF_PINS->purgatory list.
The following codes are used to solve ABA problem, code copied
from below link.
cb4c271355/mysys/lf_alloc-pin.c (L501-)#L505
do
{
503 node= allocator->top;
504 lf_pin(pins, 0, node);
505 } while (node != allocator->top && LF_BACKOFF());
1. ABA problem on Arm64
Combine the below steps to analyze how ABA problem occur on Arm64, the
relevant codes in steps are simplified, code line numbers below are in
MariaDB v10.4.
------------------------------------------------------------------------
Abnormal case.
Initial state: pin = 0, top = A, top list: A->B
T1 T2
step1. write top=B //seq-cst, #L517
step2. write A->next= "any"
step3. read pin==0 //relaxed, #L295
step1. write pin=A //seq-cst, #L504
step2. read old value of top==A //relaxed, #L505
step3. next=A->next="any" //#L517
step4. write A->next=B,top=A //#L420-435
step4. CAS(top,A,next) //#L517
step5. write pin=0 //#L521
------------------------------------------------------------------------
Above case is due to T1.step2 reading the old value of top, causing
"T1.step3, T1.step4" and "T2.step4" to occur at the same time, in other
words, they are not mutually exclusive.
It may happen that T2.step4 is sandwiched between T1.step3 and T1.step4,
which cause top to be updated to "any", which may be in-use or invalid
address.
2. Analyze above issue with Dekker's algorithm
Above problem can be mapped to Dekker's algorithm, link is as below
https://en.wikipedia.org/wiki/Dekker%27s_algorithm.
The following extracts the read and write operations on 'top' and 'pin',
and maps them to Dekker's algorithm to analyze the root cause.
------------------------------------------------------------------------
Initial state: top = A, pin = 0
T1 T2
store_seq_cst(pin, A) // write pin store_seq_cst(top, B) //write top
rt= load_relaxed(top) // read top rp= load_relaxed(pin) //read pin
if (rt == A && rp == 0) printf("oops\n"); // will "oops" be printed?
------------------------------------------------------------------------
How T1 and T2 enter their critical section:
(1) T1, write pin, if T1 reads that top has not been updated, T1 enter
its critical section(T1.step3 and T1.step4, try to obtain 'A', #L517),
otherwise just give up (T1 without priority).
(2) T2, write top, if T2 reads that pin has not been updated, T2 enter
critical section(T2.step4, try to add 'A' to top list again, #L420-435),
otherwise wait until pin!=A (T2 with priority).
In the previous code, due to load 'top' and 'pin' with relaxed semantic,
on arm and ppc, there is no guarantee that the above critical sections
are mutually exclusive, in other words, "oops" will be printed.
This bug only happens on arm and ppc, not x86. On current x86
implementation, load is always seq-cst (relaxed and seq-cst load
generates same machine code), as shown in https://godbolt.org/z/sEzMvnjd9
3. Fix method
Add sequential-consistency semantic to read 'top' in #L505(T1.step2),
Add sequential-consistency semantic to read "el->pin[i]" in #L295
and #L320.
4. Issue reproduce
Add "delay" after #L503 in lf_alloc-pin.c, When run unit.lf, can quickly
get segment fault because "top" point to an invalid address. For detail,
see comment area of below link.
https://jira.mariadb.org/browse/MDEV-28430.
5. Futher improvement
To make this code more robust and safe on all platforms, we recommend
replacing volatile with C11 atomics and to fix all data races. This will
also make the code easier to reason.
Signed-off-by: Xiaotong Niu <xiaotong.niu@arm.com>
Thanks to Yury Chaikou for finding this problem (and the fix).
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
my_malloc_size_cb_func() can be called from contexts where it is not safe to
wait for LOCK_thd_kill, for example while holding LOCK_plugin. This could
lead to (probably very unlikely) deadlock of the server.
Fix by skipping the enforcement of --max-session-mem-used in the rare cases
when LOCK_thd_kill cannot be obtained. The limit will instead be enforced on
the following memory allocation. This does not significantly degrade the
behaviour of --max-session-mem-used; that limit is in any case only enforced
"softly", not taking effect until the next point at which the thread does a
check_killed().
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Aria temporary tables account allocated memory as specific to the current
THD. But this fails for slave threads, where the temporary tables need to be
detached from any specific THD.
Introduce a new flag to mark temporary tables in replication as "global",
and use that inside Aria to not account memory allocations as thread
specific for such tables.
Based on original suggestion by Monty.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
ha_innobase::check_if_supported_inplace_alter(): Require ALGORITHM=COPY
when creating a FULLTEXT INDEX on a versioned table.
row_merge_buf_add(), row_merge_read_clustered_index(): Remove the parameter
or local variable history_fts that had been added in the attempt to fix
MDEV-25004.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
After MDEV-31400, plugins are allowed to ask for retries when failing
initialisation. However, such failures also cause plugin system
variables to be deleted (plugin_variables_deinit()) before retrying
and are not re-added during retry.
We fix this by checking that if the plugin has requested a retry the
variables are not deleted. Because plugin_deinitialize() also calls
plugin_variables_deinit(), if the retry fails, the variables will
still be deleted.
Alternatives considered:
- remove the plugin_variables_deinit() from plugin_initialize() error
handling altogether. We decide to take a more conservative approach
here.
- re-add the system variables during retry. It is more complicated
than simply iterating over plugin->system_vars and call
my_hash_insert(). For example we will need to assign values to
the test_load field and extract more code from test_plugin_options(),
if that is possible.
This patch fixes the issue with passing the DEFAULT or IGNORE values to
positional parameters for some kind of SQL statements to be executed
as prepared statements.
The main idea of the patch is to associate an actual value being passed
by the USING clause with the positional parameter represented by
the Item_param class. Such association must be performed on execution of
UPDATE statement in PS/SP mode. Other corner cases that results in
server crash is on handling CREATE TABLE when positional parameter
placed after the DEFAULT clause or CALL statement and passing either
the value DEFAULT or IGNORE as an actual value for the positional parameter.
This case is fixed by checking whether an error is set in diagnostics
area at the function pack_vcols() on return from the function pack_expression()
This is the prerequisite patch to refactor the method
Item_default_value::fix_fields.
The former implementation of this method was extracted and placed
into the standalone function make_default_field() and the method
Item_default_value::tie_field(). The motivation for this modification
is upcoming changes for core implementation of the task MDEV-15703
since these functions will be used from several places within
the source code.
row_discard_tablespace(): Do not invoke dict_index_t::clear_instant_alter()
because that would corrupt any adaptive hash index entries in the table.
row_import_for_mysql(): Invoke dict_index_t::clear_instant_alter()
after detaching any adaptive hash index entries.
In commit 179424db the file lowercase_table2.result was made executable
for no known reason, most likely just a mistake. Test result files
definitely should not be executable.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
row_search_mvcc(): Revise an overflow check, disabling it on 64-bit
systems. The maximum number of consecutive record reads in a key range
scan should be limited by the maximum number of records per page
(less than 2^13) and the maximum number of pages per tablespace (2^32)
to less than 2^45. On 32-bit systems we can simplify the overflow check.
Reviewed by: Vladislav Lesin
rpl.rpl_seconds_behind_master_spike uses the DEBUG_SYNC mechanism to
count how many format descriptor events (FDEs) have been executed,
to attempt to pause on a specific relay log FDE after executing
transactions. However, depending on when the IO thread is stopped,
it can send an extra FDE before sending the transactions, forcing
the test to pause before executing any transactions, resulting in a
table not existing, that is attempted to be read for COUNT.
This patch fixes this by no longer counting FDEs, but rather by
programmatically waiting until the SQL thread has executed the
transaction and then automatically activating the DEBUG_SYNC point
to trigger at the next relay log FDE.
Add wait_condition between debug sync SIGNAL points and other
expected state conditions and refactor actual sync point for
easier to use in test case.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Correct used configuration and force server restarts before test
case. Add wait condition instead of sleep to verify that
all expected nodes are back to cluster.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Add more inserts before wsrep_slave_threads is set to 1 and
add wait_condition to wait all of them are replicated before
wait_condition about number of wsrep_slave_threads.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
rpl.rpl_seconds_behind_master_spike uses the DEBUG_SYNC mechanism to
count how many format descriptor events (FDEs) have been executed,
to attempt to pause on a specific relay log FDE after executing
transactions. However, depending on when the IO thread is stopped,
it can send an extra FDE before sending the transactions, forcing
the test to pause before executing any transactions, resulting in a
table not existing, that is attempted to be read for COUNT.
This patch fixes this by no longer counting FDEs, but rather by
programmatically waiting until the SQL thread has executed the
transaction and then automatically activating the DEBUG_SYNC point
to trigger at the next relay log FDE.
Problem was that if wsrep_load_data_splitting was used
streaming replication (SR) parameters were set
for MyISAM table. Galera does not currently support SR for
MyISAM.
Fix is to ignore wsrep_load_data_splitting setting (with
warning) if table is not InnoDB table.
This is 10.4-10.5 case of fix.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem is that Galera starts TOI (total order isolation) i.e.
it sends query to all nodes. Later it is discovered that
used engine or other feature is not supported by Galera.
Because TOI is executed parallelly in all nodes appliers
could execute given TOI and ignore the error and
start inconsistency voting causing node to leave from
cluster or we might have a crash as reported.
For example SEQUENCE engine does not support GEOMETRY data
type causing either inconsistency between nodes (because
some errors are ignored on applier) or crash.
Fixed my adding new function wsrep_check_support to check
can Galera support provided CREATE TABLE/SEQUENCE before TOI is
started and if not clear error message is provided to
the user.
Currently, not supported cases:
* CREATE TABLE ... AS SELECT when streaming replication is used
* CREATE TABLE ... WITH SYSTEM VERSIONING AS SELECT
* CREATE TABLE ... ENGINE=SEQUENCE
* CREATE SEQUENCE ... ENGINE!=InnoDB
* ALTER TABLE t ... ENGINE!=InnoDB where table t is SEQUENCE
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that REPLACE was using consistency check that started
TOI and we tried to rollback it.
Do not use wsrep_before_rollback and wsrep_after_rollback if
we are runing consistency check because no writeset keys are
in that case added. Do not allow consistency check usage
if table storage for target table is not InnoDB, instead
give warning. REPLACE|SELECT INTO ... SELECT will use
now TOI if table storage for target table is not InnoDB
to maintain consistency between galera nodes.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
A debug_sync signal could remain for the SQL thread that should have begun
a wait_for upon seeing a GTID event, but would instead see the old signal
and continue on without waiting. This broke an "idle" condition in
SHOW SLAVE STATUS
which should have automatically negated Seconds_Behind_Master. Instead,
because the SQL thread had already processed the GTID event, it set
sql_thread_caught_up to false, and thereby calculated the value of
Seconds_behind_master, when the test expected 0.
This patch fixes this by resetting the debug_sync state before creating a
new transaction which sends a GTID event to the replica
With this patch, 4-component MSI version can be used, e.g by setting
TINY_VERSION variable in CMake, or by adding a string, e.g
MYSQL_VERSION_EXTRA=-2
which sets TINY_VERSION to 2, and also changes the package name.
The 4-component MSI versions do not support MSI major upgrades, only minor
ones, i.e do not reinstall components, just update existing ones based
on versioning rules.
To support these rules, add DefaultVersion for the files that won't
otherwise be versioned - headers, static and import libraries,
pdbs, text - xml, python and perl scripts Also silence WiX warning
that MSI won't store hashes for those files anymore.
There are two array fields in spider_share with similar names:
share->use_sql_dbton_ids that maps from i to the i-th dbton used by
share. Thus it should be used only when i iterates over all distinct
dbtons used by share.
share->sql_dbton_ids that maps from i to the dbton used by the i-th
link of the share. Thus it should be used only when i iterates over
all links of a share.
We correct instances where share->sql_dbton_ids should be used instead
of share->use_sql_dbton_ids.
Turning REGEXP_REPLACE into two schema-qualified functions:
- mariadb_schema.regexp_replace()
- oracle_schema.regexp_replace()
Fixing oracle_schema.regexp_replace(subj,pattern,replacement) to treat
NULL in "replacement" as an empty string.
Adding new classes implementing oracle_schema.regexp_replace():
- Item_func_regexp_replace_oracle
- Create_func_regexp_replace_oracle
Adding helper methods:
- String *Item::val_str_null_to_empty(String *to)
- String *Item::val_str_null_to_empty(String *to, bool null_to_empty)
and reusing these methods in both Item_func_replace and
Item_func_regexp_replace.
When CMAKE_OSX_ARCHITECTURES isn't set we end up with
"Packaging as: mariadb-10.4.33-osx10.19-x86_64" on arm64 builders.
Instead of implying 64bit is x86, use CMAKE_SYSTEM_PROCESSOR to form
the filename.