Based on mysql/mysql-server@bc9c46bf28
but without sleeps.
The test was verified to hit the debug assertion if the change to
fts_add_doc_by_id() in commit 2d98b967e3
was reverted.
InnoDB commit fails when consecutive FTS_DOC_ID value
is greater than 4294967295.
Fix is that InnoDB should remove the delta FTS_DOC_ID
value limitations and fts should encode 8 byte value,
remove FTS_DOC_ID_MAX_STEP variable. Replaced the
fts0vlc.ic file with fts0vlc.h
fts_encode_int(): Should be able to encode 10 bytes value
fts_get_encoded_len(): Should get the length of the value
which has 10 bytes
fts_decode_vlc(): Add debug assertion to verify the maximum
length allowed is 10.
mach_read_uint64_little_endian(): Reads 64 bit stored in
little endian format
Added a unit test case which check for minimum and maximum
value to do the fts encoding
create_table_info_t::innobase_table_flags(): Refuse to create
a PAGE_COMPRESSED table with PAGE_COMPRESSION_LEVEL=0 if also
innodb_compression_level=0.
The parameter value innodb_compression_level=0 was only somewhat
meaningful for testing or debugging ROW_FORMAT=COMPRESSED tables.
For the page_compressed format, it never made any sense, and the
check in dict_tf_is_valid_not_redundant() that was added in
72378a2583 (MDEV-12873) would cause
the server to crash.
This is a duplicate of MDEV-18278 89936f11e9, but I will add an
additional assertion
Description:
The frm corruption should not be reported during CREATE TABLE. Normally
it doesn't, and the data to fill TABLE is taken by open_table_from_share
call. However, the vcol data is stored as SQL string in
table->s->vcol_defs.str and is anyway parsed on each table open.
It is impossible [or hard] to avoid, because it's hard to clone the
expression tree in general (it's easier to parse).
Normally parse_vcol_defs should only fail on semantic errors. If so,
error_reported is set to true. Any other failure is not expected during
table creation. There is either unhandled/unacknowledged error, or
something went really wrong, like memory reject. This all should be
asserted anyway.
Solution:
* Set *error_reported=true for the forward references check;
* Assert for every unacknowledged error during table creation.
Problem:
========
This patch addresses two issues.
First, if a CHANGE MASTER command is issued and an error happens
while locating the replica’s relay logs, the logs can be put into an
invalid state where future updates fail and future CHANGE MASTER
calls crash the server. More specifically, right before a replica
purges the relay logs (part of the `CHANGE MASTER TO` logic), the
relay log is temporarily closed with state LOG_TO_BE_OPENED. If the
server errors in-between the temporary log closure and purge, i.e.
during the function find_log_pos, the log should be closed.
MDEV-25284 reveals the log is not properly closed.
Second, upon issuing a RESET SLAVE ALL command, a slave’s GTID
filters are not cleared (DO_DOMAIN_IDS, IGNORE_DOMIAN_IDS,
IGNORE_SERVER_IDS). MySQL had a similar bug report, Bug #18816897,
which fixed this issue to clear IGNORE_SERVER_IDS after issuing
RESET SLAVE ALL in version 5.7.
Solution:
=========
To fix the first problem, the CHANGE MASTER error handling logic was
extended to transition the relay log state to LOG_CLOSED from
LOG_TO_BE_OPENED.
To fix the second problem, the RESET SLAVE ALL logic is extended to
clear the domain_id filter and ignore_server_ids.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
This happens upon CREATE USER and DROP ROLE.
The underlying problem is that our HASH implementation shuffles elements
around when performing an update or delete. This means that when doing a
scan through the HASH table by index, in search of elements to delete or
update one must restart the scan to make sure nothing is missed if at least
one delete / update happened.
More specifically, what happened in this case:
The hash has 131 element, DROP ROLE removes the element
[119]. Its [119]->next was element [129], so [129] is moved to [119].
Now we need to compact the hash, removing the last element [130]. It
gets one bit off its hash value and becomes element [2]. The existing
element [2] is moved to [129], and old [130] is moved to [2].
We cannot simply move [130] to [129] and make [2]->next=130, it won't
work if [2] is itself in the collision list and doesn't belong in [2].
The handle_grant_struct code assumed that it is safe to continue by only
reexamining the currently modified / deleted element index, but that is
not true.
Missing to delete an element in the hash triggered the assertion in
the test case. DROP ROLE would not clear all necessary role->role or
role->user mappings.
To fix the problem we ensure that the scan is restarted, only if an
element was deleted / updated, similar to how bubble-sort keeps sorting
until it finds no more elements to swap.
On deadlock transaction is rolled back (and trx->state is cleared) but
SELECT continued the loop because evaluate_join_record() ignored the
error status returned from lower join evaluation. val_int() does not
return error status so it is checked by thd->is_error().
Test case was created by Thirunarayanan Balathandayuthapani
<thiru@mariadb.com>
If a table has no unique indexes, write set key information will be collected on all columns in the table.
The write set key information has space only for max 3500 bytes for individual column, and if a varchar colummn of such non-primary key table is longer than
this limit, currently a crash follows.
The fix in this commit, is to truncate key values extracted from such long varhar columns to max 3500 bytes.
This may potentially lead to false positive certification failures for transactions, which operate on separate cluster nodes, and update/insert/delete table rows, which differ only in the part of such long columns after 3500 bytes border.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Occasionally, after restart, additional transactions will have been
executed, possibly related to innodb_stats_auto_recalc.
We should only care that the transaction ID sequence does
not go backwards.
Let us mask the actual values of the defragmentation-related fields,
because they may vary. Also, remove the dependency on purge,
and instead delete records by a ROLLBACK of INSERT.
Use in_sum_func (and so nest_level) only in LEX to which SELECT lex belong to
Reduce usage of current_select (because it does not always point on the correct
SELECT_LEX, for example with prepare.
Change context for all classes inherited from Item_ident (was only for Item_field) in case of pushing down it to HAVING.
Now name resolution context have to have SELECT_LEX reference if the context is present.
Fixed feedback plugin stack usage.
trx_rseg_header_create(): Add a parameter for the value that is
to be written to TRX_RSEG_MAX_TRX_ID. If we omit this write, then
the updated test innodb.undo_truncate will fail for the 4k, 8k, 16k
page sizes. This was broken ever since
commit 947efe17ed (MDEV-15158)
removed the writes of transaction identifiers to the TRX_SYS page.
srv_do_purge(): Truncate undo tablespaces also during slow shutdown
(innodb_fast_shutdown=0).
Thanks to Krunal Bauskar for noticing this problem.
This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.
In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.
TOI replication is used, in this approach, purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.
This patch also fixes mutex locking order and unprotected
THD member accesses on bf aborting case. We try to hold
THD::LOCK_thd_data during bf aborting. Only case where it
is not possible is at wsrep_abort_transaction before
call wsrep_innobase_kill_one_trx where we take InnoDB
mutexes first and then THD::LOCK_thd_data.
This will also fix possible race condition during
close_connection and while wsrep is disconnecting
connections.
Added wsrep_bf_kill_debug test case
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
At least since commit 055a3334ad
(MDEV-13564) the undo log truncation in InnoDB did not work correctly.
The main issue is that during the execution of
trx_purge_truncate_history() some pages of the newly truncated
undo tablespace could be discarded.
fsp_try_extend_data_file(): Apply the peculiar rounding of
fil_space_t::size_in_header only to the system tablespace,
whose size can be expressed in megabytes in a configuration parameter.
Other files may freely grow by a number of pages.
fseg_alloc_free_page_low(): Do allow the extension of undo tablespaces,
and mention the file name in the error message.
mtr_t::commit_shrink(): Implement crash-safe shrinking of a tablespace
file. First, durably write the log, then shrink the file, and finally
release the page latches of the rebuilt tablespace. Refactored from
trx_purge_truncate_history().
log_write_and_flush_prepare(), log_write_and_flush(): New functions
to durably write log during mtr_t::commit_shrink().
btr_defragment_save_defrag_stats_if_needed(): Do not save
defragmentation statistics for temporary tables.
They are exempt of defragmentation anyway
(ha_innobase::optimize() never invokes defragmentation for them),
and the user-visible names are not available inside InnoDB.
Furthermore, InnoDB assumes that temporary tables are never accessed
by other threads than the one that handles the session with which
the temporary table is associated with.
Furthermore, we simplify the test innodb.innodb_defrag_stats
and include a test case that demonstrates that defragmentation
statistics are no longer being saved for temporary tables.
The st_blksize returned by fstat(2) is not documented to be
a power of 2, like we assumed in
commit 58252fff15 (MDEV-26040).
While on Linux, the st_blksize appears to report the file system
block size (which hopefully is not smaller than the sector size
of the underlying block device), on FreeBSD we observed
st_blksize values that might have been something similar to st_size.
Also IBM AIX was affected by this. A simple test case would
lead to a crash when using the minimum innodb_buffer_pool_size=5m
on both FreeBSD and AIX:
seq -f 'create table t%g engine=innodb select * from seq_1_to_200000;' \
1 100|mysql test&
seq -f 'create table u%g engine=innodb select * from seq_1_to_200000;' \
1 100|mysql test&
We will fix this by not trusting st_blksize at all, and assuming that
the smallest allowed write size (for O_DIRECT) is 4096 bytes. We hope
that no storage systems with larger block size exist. Anything larger
than 4096 bytes should be unlikely, given that it is the minimum
virtual memory page size of many contemporary processors.
MariaDB Server on Microsoft Windows was not affected by this.
While the 512-byte sector size of the venerable Seagate ST-225 is still
in widespread use, the minimum innodb_page_size is 4096 bytes, and
innodb_log_file_size can be set in integer multiples of 65536 bytes.
The only occasion where InnoDB uses smaller data file block sizes than
4096 bytes is with ROW_FORMAT=COMPRESSED tables with KEY_BLOCK_SIZE=1
or KEY_BLOCK_SIZE=2 (or innodb_page_size=4096). For such tables,
we will from now on preallocate space in integer multiples of 4096 bytes
and let regular writes extend the file by 1024, 2048, or 3072 bytes.
The view INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES.FS_BLOCK_SIZE
should report the raw st_blksize.
For page_compressed tables, the function fil_space_get_block_size()
will map to 512 any st_blksize value that is larger than 4096.
os_file_set_size(): Assume that the file system block size is 4096 bytes,
and only support extending files to integer multiples of 4096 bytes.
fil_space_extend_must_retry(): Round down the preallocation size to
an integer multiple of 4096 bytes.
Improve documentation of performance_schema tables by appending COLUMN
comments to tables. Additionally improve test coverage and update corresponding
tests.
The problem was that a PREARE followed by a non prepared statement
using DEFAULT NEXT_VALUE() could change table->next_local to point to
a not persitent memory aria. The next EXECUTE would then try to use
the wrong pointer, which could cause a crash.
Fixed by reseting the pointer to it's old value when doing EXECUTE.
Backported patch from MariaDB 10.6
The issue was that the using session_mem_used to break a test does not
guarantee where the test breaks, which gives different results
depending on the environment or how MariaDB is compield.
ha_innobase::check_if_supported_inplace_alter(): Do not invoke
innobase_table_is_empty() if the tablespace has been discarded.
That is, native ALTER TABLE in InnoDB will treat an empty table
in the same way as a tablespace whose tablespace has been discarded.
(Note: ALTER TABLE...ALGORITHM=COPY will fail if the tablespace
has been discarded.)
This fixes a crash that was introduced
in commit c755974775 (MDEV-19611).
Delete-marked record is on the secondary index and the clustered index
already purged the corresponding record. We cannot detect if such
record is historical and we should not: the algorithm of
row_ins_check_foreign_constraint() skips such record anyway.
If lock type is LOCK_GAP or LOCK_ORDINARY, and the transaction holds
implicit lock for the record, then explicit gap-lock will not be set for
the record, as lock_rec_convert_impl_to_expl() returns true and
lock_rec_convert_impl_to_expl() bypasses lock_rec_lock() call.
The fix converts explicit lock to implicit one if requested lock type is
not LOCK_REC_NOT_GAP.
innodb_information_schema test result is also changed as after the fix
the following statements execution:
SET autocommit=0;
INSERT INTO t1 VALUES (5,10);
SELECT * FROM t1 FOR UPDATE;
leads to additional gap lock requests.
Import operation without .cfg file fails when there is mismatch of index
between metadata table and .ibd file. Moreover, MDEV-19022 shows
that InnoDB can end up with index tree where non-leaf page has only
one child page. So it is unsafe to find the secondary index root page.
This patch does the following when importing the table without .cfg file:
1) If the metadata contains more than one index then InnoDB stops
the import operation and report the user to drop all secondary
indexes before doing import operation.
2) When the metadata contain only clustered index then InnoDB finds the
index id by reading page 0 & page 3 instead of traversing the
whole tablespace.
ha_partition stores records in array of m_ordered_rec_buffer and uses
it for prio queue in ordered index scan. When the records are restored
from the array the blob buffers may be already freed or rewritten.
The solution is to take temporary ownership of cached blob buffers via
String::swap(). When the record is restored from m_ordered_rec_buffer
the ownership is returned to table fields.
Cleanups:
init_record_priority_queue(): removed needless !m_ordered_rec_buffer
check as there is same assertion few lines before.
dbug_print_row() for arbitrary row pointer
Server crashes in Field::register_field_in_read_map upon select from
partitioned table with indexed by prefix virtual column.
After several read-mark fixes a problem has surfaced:
Since KEY (c(10),a) uses only a prefix of c, a new field is created,
duplicated from table->field[3], with a new length. However,
vcol_inco->expr is not copied.
Therefore, (*key_info)->key_part[i].field->vcol_info->expr was left NULL
in ha_partition::index_init().
Solution: copy vcol_info from table field when it's set up.
len was containing garbage, since vctempl->mysql_col_offset was
containing old value while calling row_mysql_store_col_in_innobase_format
from innobase_get_computed_value().
It was not updated after the first ALTER TABLE call, because it's INPLACE
logic considered there's nothing to update, and exited immediately from
ha_innobase::inplace_alter_table().
However, vcol metadata needs an update, since vcols structure is changed
in mysql record.
The regression was introduced by 12614af1fe. There, refcount==1 condition
was removed, which turned out to be crucial, though racy. The idea was to
update vc_templ after each (sequencing) ALTER TABLE.
We should do the same another way, and there may be a plenty of solutions,
but the simplest one is to add a following condition:
if vcol structure is changed, drop vc_templ; it will be recreated on next
ha_innobase::open() call.
in prepare_inplace_alter_table. It is safe, since innodb inplace changes
require at least HA_ALTER_INPLACE_SHARED_LOCK_AFTER_PREPARE, which
guarantee MDL_EXCLUSIVE on this stage.
alter_templ_needs_rebuild() also has to track the columns not indexed, to
keep vc_templ correct.
Note that vc_templ is always kept constructed and available after
ha_innobase::open() call, even on INSERT, though no virtual columns are
evaluated during that statement
inside innodb.
In the test case suplied, it will be recreated on the second ALTER TABLE.
Server crashes in Field::register_field_in_read_map upon select from
partitioned table with indexed by prefix virtual column.
After several read-mark fixes a problem has surfaced:
Since KEY (c(10),a) uses only a prefix of c, a new field is created,
duplicated from table->field[3], with a new length. However,
vcol_inco->expr is not copied.
Therefore, (*key_info)->key_part[i].field->vcol_info->expr was left NULL
in ha_partition::index_init().
Solution: initialize vcols before key initialization
Also key initialization is moved to a function.
MDEV-16026: Forbid global system_versioning_asof in non-default time zone
* store `system_versioning_asof` in unix time;
* both session and global vars are processed in session timezone;
* setting `default` does not copy global variable anymore. Instead, it sets
system_time to SYSTEM_TIME_UNSPECIFIED, which means that no 'AS OF' time
is applied and `now()` can be assumed
As a regression, we cannot assign values below 1970 (UTC) anymore
MDEV-16481: set global system_versioning_asof=sf() crashes in specific case
* sys_vars.h: add `MYSQL_TIME` field to `set_var::save_result`
* sys_vars.ic: get rid of calling `var->value->get_date()` from
`Sys_var_vers_asof::update()`
* versioning.sysvars: add test; remove double warning
refactor Sys_var_vers_asof
* inherit from sys_var rather than Sys_var_enum
* remove junk "DEFAULT" keyword. There is DEFAULT in SQL grammar for it.
* make all conversions in check() to avoid possible errors
* avoid double var->value evaluation, which could
consequence in undefined behavior
Problem was that not all normal error codes where not handled
after wsrep_row_upd_check_foreign_constraints() call. Furhermore,
debug assertion did not contain all normal error cases. Changed
ib:: calls to WSREP_ calls to use wsrep instrumentation.
Problem:
=========
As a part of MDEV-14398 patch, InnoDB added and removed
the tablespace from default encrypt list. But InnoDB removes
the tablespace from the default encrypt list too early due to
i) other encryption thread working on the tablespace
ii) When tablespace is being flushed at the end of
key rotation
InnoDB fails to decrypt/encrypt the tablespace since
the tablespace removed too early and it leads to
test case failure.
Solution:
=========
Avoid the removal of tablespace from default_encrypt_list
only when
1) Another active encryption thread working on tablespace
2) Eligible for tablespace key rotation
3) Tablespace is in flushing phase
Removed the workaround in encryption.innodb_encryption_filekeys test case.
Set tests to non-valgrind:
oqgraph.social
encryption.innodb-page_encryption
binlog_encryption.encrypted_master
innodb.innodb-page_compression_lz4
main.lock_multi_bug38499
main.lock_multi_bug38691
In commit 83d2e0841e (MDEV-24041)
we failed to notice that in addition to the bug with
DELETE and ON DELETE CASCADE, there is another bug with
UPDATE and ON UPDATE CASCADE.
row_ins_foreign_fill_virtual(): Use the correct memory heap
for everything that will be reachable from the cascade->update
that we return to the caller.
Note: It is correct to use the shorter-lived cascade->heap for
rec_get_offsets(), because that memory will be abandoned when
row_ins_foreign_fill_virtual() returns.
ha_innobase::prepare_inplace_alter_table(): Unless the table is
being rebuilt, determine the maximum column length based on the
current ROW_FORMAT of the table. When TABLE_SHARE (and the .frm file)
contains no explicit ROW_FORMAT, InnoDB table creation or rebuild
will use innodb_default_row_format.
Based on mysql/mysql-server@3287d33acd
InnoDB tablespace identifiers and page numbers are 32-bit numbers.
Let us use a 32-bit type for them in innochecksum.
The changes in commit 1918bdf32c
broke the build on 32-bit Windows.
Thanks to Vicențiu Ciorbaru for an initial version of this fixup.
The columns that are part of DEFAULT expression were not read-marked
in statements like UPDATE...SET b=DEFAULT.
The problem is `F(DEFAULT)` expression depends of the left-hand side of an
assignment. However, setup_fields accepts only right-hand side value.
Neither Item::fix_fields does.
Suchwise, b=DEFAULT(b) works fine, because Item_default_field has
information on what field it is default of:
if (thd->mark_used_columns != MARK_COLUMNS_NONE)
def_field->default_value->expr->update_used_tables();
in Item_default_value::fix_fields().
It is not reasonable to pass a left-hand side to Item:fix_fields, because
the case is rare, so the rewrite
b= F(DEFAULT) -> b= F(DEFAULT(b))
is made instead.
Both UPDATE and multi-UPDATE are affected, however any form of INSERT
is not: it marks all the fields in DEFAULT expressions for read in
TABLE::mark_default_fields_for_write().
The problem is the same as in MDEV-18166: columns in virtual field
expression are not marked for read, while the field itself does.
field->register_field_in_read_map() should be called for read-marking all
fields.
The test is reproduced only in 10.4+, however the fix is applicable to
10.2+.
Several different test cases were failing under the same reason: the
fields in a vcol expression were not marked during marking columns of a key
contatining virtual column for read.
Fix: make marking columns of a key for read a special case where
register_field_in_read_map() is done instead of plain bitmap_set_bit().
Some test cases are only reproducible in 10.4+, but the fix is applicable
to 10.2+
This is a 10.2+ part of a jira task
The two bugs regarding virtual column marking have been fixed:
1. UPDATE of a partitioned table, where the optimizer has chosen a
secondary index to make a filesort;
2. INSERT into a table with a nonblob field generated from a blob, with
binlog enabled and binlog_row_image=noblob.
3. DELETE from a view on a table with virtual column.
Generally the assertion happens from update_virtual_fields() call
These bugs are root-caused by missing field marking for dependant fields
of a virtual column.
Therefore a fix is: mark all the fields involved in the vcol expression by
calling field->register_field_in_read_map() instead just setting a single
bit.
3 was reproducible only on 10.4+, however the problem might has just been
invisible in the earlier versions. The fix is applicable to 10.2-10.3 as
well.
The failing reason was inconsistent truncation rules: the value of virtual
column could have been evaluated to '2000' sometimes instead of '0000' for
value 'a'.
The reason why `c YEAR AS ('aaaa')` was not evaluated same is that len=4 is
a special case insidew Field_year::store.
The correct fix is: always evaluate a bad value to 0000 instead 2000.
The truncated values should be evaluated as usual.
$support_virtual_index is finally changed to 1 in gcol.gcol_ins_upd_innodb,
which is also enough for testing.
The test from original bug report is also added.
- Set the innodb_encrypt_tables variable before
timeout happens. It will add the pending tablespace
to default encrypt list and does the encryption/decryption
based on innodb_encrypt_tables.
Historical query with AS OF point after the last history partition
must include last history partition because it can be overflown
(contain history rows out of right endpoint).
Move left point back to hist_part->id in that case.
This is a backport of 161e4bfafd.
trans_rollback_to_savepoint(): Only release metadata locks (MDL)
if the storage engines agree, after the changes were already rolled back.
Ever since commit 3792693f31
and mysql/mysql-server@55ceedbc3f
we used to cheat here and always release MDL if the binlog is disabled.
MDL are supposed to prevent race conditions between DML and DDL also
when no replication is in use. MDL are supposed to be a superset of
InnoDB table locks: InnoDB table lock may only exist if the thread
also holds MDL on the table name.
In the included test case, ROLLBACK TO SAVEPOINT would wrongly release
the MDL on both tables and let ALTER TABLE proceed, even though the DML
transaction is actually holding locks on the table.
Until commit 1bd681c8b3 (MDEV-25506)
in MariaDB 10.6, InnoDB would often work around the locking violation
in a blatantly non-ACID way: If locks exist on a table that is being
dropped (in this case, actually a partition of a table that is being
rebuilt by ALTER TABLE), InnoDB could move the table (or partition)
into a queue, to be dropped after the locks and references had been
released. If the lock is not released and the original copy of the
table not dropped quickly enough, a name conflict could occur on
a subsequent ALTER TABLE.
The scenario of commit 3792693f31
is unaffected by this fix, because mysqldump
would use non-locking reads, and the transaction would not be holding
any InnoDB locks during the execution of ROLLBACK TO SAVEPOINT.
MVCC reads inside InnoDB are only covered by MDL and page latches,
not by any table or record locks.
FIXME: It would be nice if storage engines were specifically asked
which MDL can be released, instead of only offering a choice
between all or nothing. InnoDB should be able to release any
locks for tables that are no longer in trx_t::mod_tables, except
if another transaction had converted some implicit record locks
to explicit ones, before the ROLLBACK TO SAVEPOINT had been completed.
Reviewed by: Sergei Golubchik
A table rebuild that would truncate the default value of a
DATE column is expected to issue data truncation warnings.
But, these warnings are not being issued if the ADD COLUMN
is being executed with ALGORITHM=INSTANT. InnoDB sets the
warning of the field while assigning the default value
of the field during check_if_supported_inplace_alter().
Add KEYWORDS table and SQL_FUNCTIONS table to INFORMATION_SCHEMA.
This commits needs some minor changes when propagated upwards
(e.g. func_array in item_create.cc has a termination element that
doesn't exist in later versions of MariaDB)
wsrep_sst_common did not correctly set name for binlog index
file if custom binlog name was used and this name was
not added to script command line.
Added test case for both log_basename and log_binlog.
wsrep_sst_common did not correctly set name for binlog index
file if custom binlog name was used and this name was
not added to script command line.
Added test case for both log_basename and log_binlog.
Let us simply refuse an upgrade from earlier versions if the
upgrade procedure was not followed. This simplifies the purge,
commit, and rollback of transactions.
Before upgrading to MariaDB 10.3 or later, a clean shutdown
of the server (with innodb_fast_shutdown=1 or 0) is necessary,
to ensure that any incomplete transactions are rolled back.
The undo log format was changed in MDEV-12288. There is only
one persistent undo log for each transaction.
- During online alter conversion from compact to redundant,
virtual column field length already set during
innobase_get_computed_value(). Skip the char(n) check for
virtual column in row_merge_buf_add()
- InnoDB fails to check DB_COMPUTE_VALUE_FAILED error in
row_merge_read_clustered_index() and wrongly asserts that
the buffer shouldn't be ran out of memory. Alter table
should give warning when the column value is being
truncated.
Problem:
=======
- InnoDB iterates the fil_system space list to encrypt the
tablespace in case of key rotation. But it is not
necessary for any encryption plugin which doesn't do
key version rotation.
Solution:
=========
- Introduce a new variable called srv_encrypt_rotate to
indicate whether encryption plugin does key rotation
fil_space_crypt_t::key_get_latest_version(): Enable the
srv_encrypt_rotate only once if current key version is
higher than innodb_encyrption_rotate_key_age
fil_crypt_must_default_encrypt(): Default encryption tables
should be added to default_encryp_tables list if
innodb_encyrption_rotate_key_age is zero and encryption
plugin doesn't do key version rotation
fil_space_create(): Add the newly created space to
default_encrypt_tables list if
fil_crypt_must_default_encrypt() returns true
Removed the nondeterministic select from
innodb-key-rotation-disable test. By default,
InnoDB adds the tablespace to the rotation list and
background crypt thread does encryption of tablespace.
So these select doesn't give reliable results.
* Make Item_in_optimizer::fix_fields inherit the with_window_func
attribute of the subquery's left expression (the subquery itself
cannot have window functions that are aggregated in this select)
* Make Item_cache_wrapper::Item_cache_wrapper() inherit
with_window_func attribute of the item it is caching.
This commit reduces the likelihood of getting a busy port on
quick restarts with rsync SST (problem MDEV-25818) and fixes
a number of other flaws in SST scripts, adds new functionality,
and also synchronizes the xtrabackup-v2 script with the
mariabackup script (the latter applies only to the 10.2 branch):
1) SST via rsync: rsync and stunnel does not always get the right
time to complete by correctly handling SIGTERM. These utilities
are now given more time to complete normally (via normal SIGTERM
processing) before we move on to using "kill -9";
2) SST via rsync: attempts to terminate an rsync or stunnel process
(via "kill" utility) are only made if it did not terminated on
its own;
3) SST via rsync: if a combination of stunnel and rsync is used,
then we need to wait for both utilities to finish or stop, not
just one of them;
4) The config file and pid file for stunnel are now deleted after
successful completion of SST on the donor node;
5) The configs and pid files from rsync and stunnel should not be
deleted unless these utilities succeed (or are sucessfully
terminated) on the joiner node;
6) The configs and pid files now excluded from transfer via rsync;
7) Spaces in paths are now valid for config files as well (when
used with SST via rsync or mariabackup / xtrabackup[-v2]);
8) SST via mariabackup: added preliminary verification of keys and
certificates that are used when establishing a connection using
SSL (to avoid long timeouts and improve diagnostics) - by analogy
with how it is done for the xtrabackup-v2 (plus check for CA file),
while that check is skipped if the user does not have openssl
installed (or does not have diff utility);
9) Added backup-threads=<n> configuration option which adds
"--parallel=<n>" for mariabackup / xtrabackup at backup and
move-back stages;
10) Added encrypt-threads and encrypt-chunk-size configuration
options for xbcrypt management (when xbcrypt is used);
11) Small optimization: checking the socat version and adding
a file with parameters for 2048-bit Diffie-Hellman (if necessary)
is done only if the user has not specified "dhparam=" in the
"sockopt" option value;
12) SST via rsync now supports "backup-threads" configuration option
(in server-related sections or in the "[sst]");
13) Determining the number of available processors is now supported
for FreeBSD + mariabackup/xtrabackup: before that we might have
problems with "--compact" (rebuild indexes) or qpress on FreeBSD;
14) The check_pid() function should not raise an error state in
the rare cases when the pid file was created, but it is empty,
or if it is deleted right during the check, or when zero is read
from the pid file;
15) Iproved templates that are used to check if a requested socket
is "listening" when using the ss utility;
16) Shortened some other templates for socket state utilities;
17) Temporary files created by mariabackup / xtrabackup are moved
to a separate subdirectory inside tmpdir (so they don't get
mixed with other temporary files, which can make debugging
more difficult);
18) 10.2 only: the script for SST via xtrabackup-v2 has been brought
in full compliance with all the bugfixes made for mariabackup (as
it previously contained many flaws compared to the updated script
for mariabackup).
InnoDB should calculate the MBR for the first field of
spatial index and do the comparison with the clustered
index field MBR. Due to MDEV-25459 refactoring, InnoDB
calculate the length of the first field and fails with
too long column error.
The following features have been added:
1) Automatic addition of the pf = ip6 option for socat
when it can be recognized by the format of the connection
address;
2) Automatically add or remove extra commas at the beginning
and at the end of sockopt, for example, sockopt='pf=ip6'
and sockopt=',pf=ip6' work equally well;
Also, due to interference in the code of the get_transfer()
function, I also refactored it and now:
3) encrypt = 4 is supported not only for xtrabackup-v2,
but also for mariabackup - this can help with migration
from Percona;
4) Improved setting of 'commonname' option for encrypt=3
and encrypt=4 modes;
FTS add index fails
Problem:
========
InnoDB double frees the table if auxiliary fts table
creation fails and fails to set the dict operation
for the transaction. It leads to failure while
dropping newly added index.
Solution:
=========
InnoDB should avoid double freeing and set the
dictionary operation of transaction in
fts_create_common_tables()
report correct error codes in ed25519.
Invalid value stored in the user table or an OpenSSL error is CR_ERROR.
When a user provided incorrect password when logging in -
it's CR_AUTH_USER_CREDENTIALS.
Another batch of changes that should make the SST process
more reliable in all scenarios:
1) Added hostname or CN verification when stunnel is used
with certificate chain verification (verifyChain = yes);
2) Added check for the absence of the stunnel utility for
mtr tests;
3) Deletion of working files before and after SST is done
more accurately;
4) rsync on joiner can be run even if the path to its
configuration file contains spaces;
5) More accurate directory creation (for data files and
for logs);
6) IST with mysqldump no longer turns off statement logging;
7) Reset password for mysqldump when password is empty but
username is specified;
8) More reliable quoting when generating statements in
wsrep_sst_mysqldump;
9) Added explicit generation of 2048-bit Diffie-Hellman
parameters for sockat < 1.7.3, by analogy with xtrabackup;
10) Compression parameters for qpress are read from all
suitable server groups in configuration file, as well as
from the [sst] and [xtrabackup] groups;
11) Added a test that checks compression using qpress;
12) Checking for optional utilities is modified to work even
if they implemented as built-in shell commands (unlikely
on real systems, but more reliable).
Another batch of changes that should make the SST process
more reliable in all scenarios:
1) Added hostname or CN verification when stunnel is used
with certificate chain verification (verifyChain = yes);
2) Added check for the absence of the stunnel utility for
mtr tests;
3) Deletion of working files before and after SST is done
more accurately;
4) rsync on joiner can be run even if the path to its
configuration file contains spaces;
5) More accurate directory creation (for data files and
for logs);
6) IST with mysqldump no longer turns off statement logging;
7) Reset password for mysqldump when password is empty but
username is specified;
8) More reliable quoting when generating statements in
wsrep_sst_mysqldump;
9) Added explicit generation of 2048-bit Diffie-Hellman
parameters for sockat < 1.7.3, by analogy with xtrabackup;
10) Compression parameters for qpress are read from all
suitable server groups in configuration file, as well as
from the [sst] and [xtrabackup] groups;
11) Added a test that checks compression using qpress;
12) Checking for optional utilities is modified to work even
if they implemented as built-in shell commands (unlikely
on real systems, but more reliable).
Problem:
========
Aborting OPTIMIZE TABLE still logs in binary logs and replicates to the
Slave server. "Optimize table" command under execution, is killed by using
"Ctrl-C" as shown below.
MariaDB [test]> optimize table t2;
^CCtrl-C -- query killed. Continuing normally.
In spite of query execution being interrupted the query gets written to
binary log.
Analysis:
========
Admin command execution logic is not handling KILL command, hence it
ignores the KILL command and completes its execution.
Fix:
===
Check for thread killed notification, during admin command execution and
handle it. If thread kill occurs prior to any table modification the query
will not be written to binary log. If kill happens after at least one table
is modified then the query will be written to binary log. Ex: command in
execution is 'OPTIMIZE TABLE t1,t2' and the thread kill happens after t1
table is modified then 'OPTIMIZE TABLE t1,t2' will be written to binary log
as admin commands will not make the slave to diverge from master.
Problem:
=======
In slave_parallel_mode=optimistic configuration, when admin commands and
DML operation on the same table are scheduled simultaneously for execution,
it results in lock conflict and slave server either hangs due to
deadlock or goes down with an assert.
Analysis:
========
Admin commands OPTIMIZE, REPAIR and ANALYZE are written to binary log as
ordinary transactions. When 'slave_parallel_mode' is 'optimistic' DMLs are
allowed to run in parallel. But these locks are not detected by parallel
replication deadlock detection-and-handling mechanism. At times they result
in deadlock or assertion.
Fix:
===
Flag admin commands as DDL in Gtid_log_event at the time of writing to
binary log. Add a new bit EXECUTED_TABLE_ADMIN_CMD to
'm_unsafe_rollback_flags'. During 'mysql_admin_table' command execution it
accepts a list of tables to be processed and executes them in a loop. Upon
successful execution enable 'EXECUTED_TABLE_ADMIN_CMD' bit in
thd->transaction.stmt_unsafe_rollback_flags. Gtid_log_event constructor
will notice this flag and mark the current transaction with 'FL_DDL' flag.
Gtid_log_events marked as FL_DDL will not be scheduled parallel execution,
on the slave. They will execute in isolation to prevent deadlocks.
Note: Removed the call to 'trans_commit_implicit' from 'mysql_admin_table'
function as 'mysql_execute_command' will take care of invoking
'trans_commit_implicit'.
Problem:- When slave is shutdown, we will get this assertion failure
sql/sql_list.h:642: void ilink::assert_linked(): Assertion `prev != 0
&& next != 0' failed.
Solution:- In close_connections when we call threads.get() it resets to
prev and next to NULL. And in parallel worker thread(handle_rpl_parallel_thread)
calls unlink_not_visible_thd() which assert on prev and next being not NULL.
.unlink_not_visible_thd() should be always called first before threads.get()
is called. To make sure worker calls unlink_not_visible_thd() in
slave_prepare_for_shutdown() we are deactivating the worker thread pool
which in turn will close all worker threads. Since this is already done in 10.4
and 10.5 I am backPorting MDEV-20821 and MDEV-22370 to 10.2. Mdev-22370
is improving the MDEV-20821 patch.
Parallel slave server shutdown found to be hanging in
close_connections() triggered by shutdown due to a slave worker thread
would not be notified to exit in case the worker was sitting idle.
Fixed with destroying the worker pool earlier that is in
slave_prepare_for_shutdown() when all their driver threads have already left.
A test file is added to simulate the bug condition as well as check
multi-sourced and not-idle worker cases.
This commit contains a large set of further bug fixes and
improvements to SST scripts for Galera, continuing the work
that was started in MDEV-24962 to make SST scripts work smoothly
in different network configurations (especially using ipv6) and
with different environment settings:
1) The ipv6 addresses were incorrectly handled in the SST script
for rsync (incorrect address substitution for establishing a
connection, incorrect address substitution for bind, and so on);
2) Checking the locality of the ip-address in SST scripts did not
support ipv6 addresses (such as "[::1]"), which were falsely
identified as non-local ip, which further did not allow running
two SSTs on different local addresses on the same machine.
On the other hand, this bug masked some other errors (related
to handling ipv6 addresses);
3) The code for checking the locality of the ip address was different
in the SST scripts for rsync and for mysqldump, with individual
flaws. This code is now made common and moved to wsrep_sst_common;
4) Waiting for the start of the transport channel (socat, nc, rsync,
stunnel) in the wait_for_listen() and check_pid_and_port() functions
did not process ipv6 addresses correctly in all cases (not for all
branches);
5) Waiting for the start of the transport channel (socat, nc, rsync,
stunnel) in the wait_for_listen() and check_pid_and_port() functions
for some code branches could give a false positive result due to
the textual match of prefixes in the port number and/or PID of
the process;
6) Waiting for the start of the transport channel (socat, nc, rsync,
stunnel) was supported through different utilities in SST scripts
for mariabackup and for rsync, and with various minor flaws in
the code. Now the code is still different in these scripts, but
it supports a common set of utilities (lsof, ss, sockstat) and
is synchronized across patterns that used to check the output
of these utilities;
7) In SST via mariabackup, the signal about readiness to receive data
is sometimes sent too early - immediately after listen(), and not
after accept() (which are called by socat or netcat utility).
8) Checking availability of the some options of some utilities was
done using the grep pattern, which easily gives false positives;
9) Common name (CN) for local addresses, if not explicitly specified,
is now always replaced to "localhost" to avoid the need to generate
many separate certificates for local addresses of one machine and
not to depend on which the local address is currently used in test
(ipv4 or ipv6, etc.);
10) In tests galera_sst_mariabackup_encrypt_with_key_server and
galera_sst_rsync_encrypt_with_key_server the correct certificate
is selected to avoid commonname (CN) mismatch problems;
11) Further refactoring to protect against spaces in file names.
12) Further general refactoring to eliminate bash-specific constructs
or to improve code readability;
13) The code for setting options for the nc (netcat) utility was
different in different scripts for SST - now it is made identical.
14) Fixed long-time broken encryption via xbcrypt in combination with
mariabackup and added support for key-based encryption via openssl
utility, which is now enabled by default for encrypt=1 mode (this
default mode can be changed using a new configuration file option
"encypt-format=openssl|xbcrypt", which can be placed in the [mysqld],
[sst] or in the [xtrabackup] section) - this change will allow us
to use and to test the encypt=1 encryption without installing
non-standard third-party utilities.
This commit contains a large set of further bug fixes and
improvements to SST scripts for Galera, continuing the work
that was started in MDEV-24962 to make SST scripts work smoothly
in different network configurations (especially using ipv6) and
with different environment settings:
1) The ipv6 addresses were incorrectly handled in the SST script
for rsync (incorrect address substitution for establishing a
connection, incorrect address substitution for bind, and so on);
2) Checking the locality of the ip-address in SST scripts did not
support ipv6 addresses (such as "[::1]"), which were falsely
identified as non-local ip, which further did not allow running
two SSTs on different local addresses on the same machine.
On the other hand, this bug masked some other errors (related
to handling ipv6 addresses);
3) The code for checking the locality of the ip address was different
in the SST scripts for rsync and for mysqldump, with individual
flaws. This code is now made common and moved to wsrep_sst_common;
4) Waiting for the start of the transport channel (socat, nc, rsync,
stunnel) in the wait_for_listen() and check_pid_and_port() functions
did not process ipv6 addresses correctly in all cases (not for all
branches);
5) Waiting for the start of the transport channel (socat, nc, rsync,
stunnel) in the wait_for_listen() and check_pid_and_port() functions
for some code branches could give a false positive result due to
the textual match of prefixes in the port number and/or PID of
the process;
6) Waiting for the start of the transport channel (socat, nc, rsync,
stunnel) was supported through different utilities in SST scripts
for mariabackup and for rsync, and with various minor flaws in
the code. Now the code is still different in these scripts, but
it supports a common set of utilities (lsof, ss, sockstat) and
is synchronized across patterns that used to check the output
of these utilities;
7) In SST via mariabackup, the signal about readiness to receive data
is sometimes sent too early - immediately after listen(), and not
after accept() (which are called by socat or netcat utility).
8) Checking availability of the some options of some utilities was
done using the grep pattern, which easily gives false positives;
9) Common name (CN) for local addresses, if not explicitly specified,
is now always replaced to "localhost" to avoid the need to generate
many separate certificates for local addresses of one machine and
not to depend on which the local address is currently used in test
(ipv4 or ipv6, etc.);
10) In tests galera_sst_mariabackup_encrypt_with_key_server and
galera_sst_rsync_encrypt_with_key_server the correct certificate
is selected to avoid commonname (CN) mismatch problems;
11) Further refactoring to protect against spaces in file names.
12) Further general refactoring to eliminate bash-specific constructs
or to improve code readability;
13) The code for setting options for the nc (netcat) utility was
different in different scripts for SST - now it is made identical.
14) Fixed long-time broken encryption via xbcrypt in combination with
mariabackup and added support for key-based encryption via openssl
utility, which is now enabled by default for encrypt=1 mode (this
default mode can be changed using a new configuration file option
"encypt-format=openssl|xbcrypt", which can be placed in the [mysqld],
[sst] or in the [xtrabackup] section) - this change will allow us
to use and to test the encypt=1 encryption without installing
non-standard third-party utilities.
and configuration.
1. Pass joiner's authentication information to donor together with address
in State Transfer Request. This allows joiner to authenticate donor on
connection. Previously joiner would accept data from anywhere.
2. Deprecate custom SSL configuration variables tca, tcert and tkey in favor
of more familiar ssl-ca, ssl-cert and ssl-key. For backward compatibility
tca, tcert and tkey are still supported.
3. Allow falling back to server-wide SSL configuration in [mysqld] if no SSL
configuration is found in [sst] section of the config file.
4. Introduce ssl-mode variable in [sst] section that takes standard values
and has following effects:
- old-style SSL configuration present in [sst]: no effect
otherwise:
- ssl-mode=DISABLED or absent: retains old, backward compatible behavior
and ignores any other SSL configuration
- ssl-mode=VERIFY*: verify joiner's certificate and CN on donor,
verify donor's secret on joiner
(passed to donor via State Transfer Request)
BACKWARD INCOMPATIBLE BEHAVIOR
- anything else enables new SSL configuration convetions but does not
require verification
ssl-mode should be set to VERIFY only in a fully upgraded cluster.
Examples:
[mysqld]
ssl-cert=/path/to/cert
ssl-key=/path/to/key
ssl-ca=/path/to/ca
[sst]
-- server-wide SSL configuration is ignored, SST does not use SSL
[mysqld]
ssl-cert=/path/to/cert
ssl-key=/path/to/key
ssl-ca=/path/to/ca
[sst]
ssl-mode=REQUIRED
-- use server-wide SSL configuration for SST but don't attempt to
verify the peer identity
[sst]
ssl-cert=/path/to/cert
ssl-key=/path/to/key
ssl-ca=/path/to/ca
ssl-mode=VERIFY_CA
-- use SST-specific SSL configuration for SST and require verification
on both sides
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
1. Fix eval command line to correctly pass stunnel option to rsync on donor.
2. Deprecate `tkey`, `tcert` and `tca` options in [sst] section in favor of
conventional `ssl-key`, `ssl-cert` and `ssl-ca`, but keep their precedence
for backward compatibility.
3. Default to require SSL encryption if at least SSL key and cert files are
specified in configuration, either in [sst] or [mysqld] sections.
4. Enable `verify*` option for stunnel on donor only if
a. CA file is specified somewhere in the configuration
b. it is explicitly requested in [sst] section by either specifying
ssl-mode or CA file there. In this case if ssl-mode is not explicitly
given, it defaults to VERIFY_CA.
ssl-mode maps to stunnel options as follows:
VERIFY_CA -> verifyChain = yes
VERIFY_IDENTITY -> verifyPeer = yes
Example to require donor to verify joiner identity:
```
[mysqld]
ssl-cert=/path/to/cert
ssl-key=/path/to/key
ssl-ca=/path/to/ca
[sst]
ssl-mode=VERIFY_IDENTITY
```
5. If SSL verification is requested, joiner verifies donor by checking the
secret passed to donor via SST request.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
After switching to the new mariabackup interface (instead of
the outdated innobackupex interface, which is supported for
compatibility), we need to explicitly pass a path to the datadir
directory as a parameter, since in the new interface the value
of this option is not automatically set in such a way that it
always matches the SST/IST logic. This commit adds passing this
option as an explicit parameter to mariabackup. This commit also
removed unnecessary options that are not used and not supported
by mariabackup.
Also, numerous flaws in the common wsrep_sst_common script have
been fixed:
1) There are many bash-specific constructs in the script that
may not be supported by other interpreters, which can lead
to the most unexpected errors during SST, because failures
in the interpretation of bash-specific constructs lead to
incorrect parsing of arguments;
2) There is parse_cnf() function which is often called by other
scripts for the "mysqld" or "--mysqld" group, but it does not
take into account the default group suffix, which leads to
reading values only from the default group, which then leads
to errors due to reading the default values instead of the
values for a specific group;
3) Some options such as --user, --innodb-data-home-dir or --datadir
are not removed from the --mysqld-args list, although they are
processed inside scripts (and passing of these options funther
may cause problems for mariabackup);
4) If an argument that the script understands is present in
the --mysqld-args list twice, then this causes SST to fail,
instead of reading the most recent value;
5) The "--host" parameter is technically still supported among
the arguments of the SST scripts, but in reality scripts do not
work with it as expected, especially if it has an IPv6 address;
6) If the port number is absent in the --address parameter value,
but the port number is explicitly passed through the --port
argument, then the scripts for mariabackup and xtrabackup-v2
fail;
7) If a new address interface is used (with the --address parameter),
then automatic default port substitution is not performed, although
it is supported for the legacy --host/--port interface.
8) If there are spaces in the parameter values after --mysqld_args,
then their further transfer does not occur correctly, which
causes mariabackup to fail during SST - the space splits
the argument in such a way that it breaks the parsing of the
following parameters;
9) If most of the parameters that are names or paths to the files
or directories contain spaces, then SST scripts fail in an
unpredictable way due to incorrect variable substitutions;
10) If the --log-bin option is passed among the arguments of myqlds
(--mysqld-args) without a parameter, and the --binlog option
is not specified, then the script cannot substitute the default
name for binlog and cannot construct binlog name using the
--log-basename argument (which is against server specifications);
11) Tail slashes are not removed from the directory names, which,
upon further substitution, leads to the appearance of a double
slash in the file paths;
12) The explicit --binlog parameter (which is now always transmitted
from the server side) and the "hidden" --log-bin parameter in the
list of arguments after --mysqld-args are perceived as two different
parameters in different parts of the scripts, and if they are do not
match for some reason, this will lead to failures during SST;
Also, all new changes from the 10.6 branch have been migrated here,
including the latest pull requests for authentication (only the part
that concerns SST scripts).
It also fixes dozens of other bugs in all SST scripts.
Problem:
========
180511 11:07:58 [ERROR] Slave I/O: Unexpected master's heartbeat data:
heartbeat is not compatible with local info;the event's data: log_file_name
mysql-bin.000009 log_pos 1054262041, Error_code: 1623
Analysis:
=========
In replication setup when master server doesn't have any events to send to
slave server it sends an 'Heartbeat_log_event'. This event carries the
current binary log filename and offset details. The offset values is stored
within 4 bytes of event header. When the size of binary log is higher than
UINT32_MAX the log_pos values will not fit in 4 bytes memory. It overflows
and hence slave stops with an error.
Fix:
===
Since we cannot extend the common_header of Log_event class, a greater than
4GB value of Log_event::log_pos is made to be transported with a HeartBeat
event's sub-header. Log_event::log_pos in such case is set to zero to
indicate that the 8 byte sub-header is allocated in the event.
In case of cross version replication following behaviour is expected
OLD - Server without fix
NEW - Server with fix
OLD<->NEW : works bidirectionally as long as the binlog offset is
(normally) within 4GB.
When log_pos > UINT32_MAX
OLD->NEW : The 'log_pos' is bound to overflow and NEW slave may report
an invalid event/incompatible heart beat event error.
NEW->OLD : Since patched server sets log_pos=0 on overflow, OLD slave will
report invalid event error.
InnoDB tries to fetch the deleted doc ids for discarded
tablespace. In i_s_fts_deleted_generic_fill(), InnoDB needs
to check whether the table is discarded or not before fetching
deleted doc ids.
fil_ibd_load(): Remove a message that is basically saying that
everything works as expected. The other "Ignoring data file" message
about the presence of an extraneous file will be retained
(and expected by the test innodb.log_file_name).
The problem is that sharing default expression among set instruction
leads to attempt access result field of function created in
other instruction runtime MEM_ROOT and already freed
(a bit different then MySQL problem).
Fix is the same as in MySQL (but no optimisation for constant), turn
DECLARE a, b, c type DEFAULT expr;
to
DECLARE a type DEFAULT expr, b type DEFAULT a, c type DEFAULT a;
InnoDB startup hangs if a DDL transaction needs to be
rolled back and a recovered transaction on statistics
tables exists. In that case, InnoDB should rollback
the transaction which holds locks on innodb_table_stats
or innodb_index_stats during trx_rollback_or_clean_recovered().
InnoDB fails to fetch the index type when innodb dictionary
doesn't match with frm. InnoDB should return corrupted if it
can't find the index in ha_innobase::index_type().
table->move_fields wasn't undone in case of error.
1. move_fields is unconditionally undone even when error is occurred
2. cherry-pick an assertion in `ptr_in_record`, which is already in 10.5
The assertion is improved: storage engines like myisam always have to store
at least one field, so the assertion does not cover tables with no stored
columns.
So we are having a race condition of three of threads, resulting in a
deadlock backoff in purge, which is unexpected.
More precisely, the following happens:
T1: NOCOPY ALTER TABLE begins, and eventually it holds MDL_SHARED_NO_WRITE
lock;
T2: FLUSH TABLES begins. it sets share->tdc->flushed = true
T3: purge on a record with virtual column begins. it is going to open a
table. MDL_SHARED_READ lock is acquired therefore.
Since share->tdc->flushed is set, it waits for a TDC purge end.
T1: is going to elevate MDL LOCK to exclusive and therefore has to set
other waiters to back off.
T3: receives VICTIM status, reports a DEADLOCK, sets OT_BACKOFF_AND_RETRY
to Open_table_context::m_action
My fix is to allow opening table in purge while flushing. It is already
done the same way in other maintainance facilities like REPAIR TABLE.
Another way would be making an actual backoff, but Open_table_context
does not allow to distinguish it from other failure types, which still
seem to be unexpected. Making this would require hacking into
Open_table_context interface for no benefit, in comparison to passing
MYSQL_OPEN_IGNORE_FLUSH during table open.
innodb_debug_sync was introduced in commit
b393e2cb0c and reverted in
commit fc58c17216 due to memory leak reported
by valgrind, see MDEV-21336.
The leak is now fixed by adding `rw_lock_free(&slot->debug_sync_lock)`
after background thread working loop is finished, and the patch is
reapplied, with respect to c++98 fixes by Marko.
The missing DEBUG_SYNC for MDEV-18546 in row0vers.cc is also reapplied.
Item_func_history (is_history()) is a bool function that checks if the
row is the history row by checking row_end->is_max(). The argument to
this function must be row_end system field.
Added the above function to conjunction with SYSTEM_TIME_BEFORE
versioning condition.
row_merge_is_index_usable(): Allow access to any SEQUENCE, even if it was
created after the read view. SQL sequences are no-rollback tables with no
history at all.
This is a backport of
commit fd9ca2a742 (MDEV-23295) and
commit 9a156e1a23 (MDEV-23345) to 10.3.
An instant ADD/DROP/reorder column could create a dummy table
object with the wrong ROW_FORMAT when innodb_default_row_format
was changed between CREATE TABLE and ALTER TABLE.
prepare_inplace_alter_table_dict(): If we had promised that
ALGORITHM=INPLACE is supported, we must preserve the ROW_FORMAT.
The rest of the changes are related to adding
Alter_inplace_info::inplace_supported to cache the return value of
handler::check_if_supported_inplace_alter().
Back port upstream fix
commit 1800b015a1d487330f7b15f2020b887be348a66b
Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com>
Date: Fri Sep 8 20:29:22 2017 +0530
Bug#26027024 SLAVE_COMPRESSED_PROTOCOL DOESN'T WORK WITH
SEMI-SYNC REPLICATION IN MYSQL-5.7
Analysis: In mysql-5.6, dump thread (the thread that is created
on Master after Slave requested for a binlog dump) is also used
to receive acknowledgements from the Slave and act on them accordingly.
For performance reasons, a special thread called Ack Receiver thread
is added in mysql-5.7 Semi synchronous replication plugin.
This thread does not have special handling to receive acknowledgements
if Slave has enabled compression in the protocol. Hence Master is
unable to handle any slave if Slave_compressed_protocol is enabled
on it.
Fix: Enable compress flag on the communication channels if the Slave
has Slave_compressed_protocol ON.
row_sel_sec_rec_is_for_clust_rec(): If the field in the
clustered index record stored off page, always fetch it,
also when the secondary index field has been built on the
entire column. This was broken ever since the InnoDB Plugin
for MySQL Server 5.1 introduced ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED for InnoDB tables. That code was first
introduced in this tree in
commit 3945d5e554.
For the original ROW_FORMAT=REDUNDANT and the MySQL 5.0.3
ROW_FORMAT=COMPRESSED, there was no problem, because for
those tables we always stored at least a 768-byte prefix of
each column in the clustered index record.
row_sel_sec_rec_is_for_blob(): Allow prefix_len==0 for matching
the full column.
Before FRM is written walk vcol expressions through
check_table_name_processor() and check if field items match (db,
table_name) qualifier.
We cannot do this in check_vcol_func_processor() as there is already
no table name qualifiers in expressions of written and loaded FRM.
Buffer overflow in ib_push_warning() fixed by using vsnprintf().
InnoDB parser was obsoleted by MDEV-16417.
Thanks to Nikita Malyavin for review and suggestion.
There was race between a committing transaction and the following in binlog
order FLUSH LOGS that could create a 2nd Binlog checkpoint (BCP) event
in the new file *before* the first logged-in-old-binlog transaction gets committed in
Innodb. That would cause the transaction loss at recovery, should
the server stop right after the BCP.
The race is tackled by enforcing the necessary set of mutexes to be acquired
by FLUSH-LOGS handler in the correct order (of the group commit leader
pattern).
Note, there remain two cases where a similar race is still possible:
- the above race as it is when the server is run with ("unlikely")
non-default `--binlog-optimize-thread-scheduling=0` (MDEV-24530), and
- at unlikely event of bin-logging of Incident event (MDEV-24531) that
also triggers binlog rotation,
in both cases though with lesser chances after the current fixes.
Between btr_pcur_store_position() and btr_pcur_restore_position()
it is possible that purge empties a table and enlarges
index->n_core_fields and index->n_core_null_bytes.
Therefore, we must cache index->n_core_fields in
btr_pcur_t::old_n_core_fields so that btr_pcur_t::old_rec can be
parsed correctly.
Unfortunately, this is a huge change, because we will replace
"bool leaf" parameters with "ulint n_core"
(passing index->n_core_fields, or 0 for non-leaf pages).
For special cases where we know that index->is_instant() cannot hold,
we may also pass index->n_fields.
This commit removes the mtr test galera_sst_mariabackup_encrypt_with_key
from the list of disabled tests because the problem with it has already
been fixed.
Problem:
========
InnoDB fails to clean the index stub if it fails to add the
virtual index which contains new virtual column. But it clears
the newly virtual column from index in clear_added_indexes()
during inplace_alter_table. On commit, InnoDB evicts and
reload the table. In case of rollback, it doesn't happen.
InnoDB clears the ABORTED index while opening the table
or doing the DDL. In the mean time, InnoDB can access
the dropped virtual index columns while creating prebuilt
or rollback of concurrent DML.
Solution:
==========
(1) InnoDB should maintain newly added virtual column while
rollbacking the newly added virtual index.
(2) InnoDB must not defer the index removal
if the alter table is executed with LOCK=EXCLUSIVE.
(3) For LOCK=SHARED, InnoDB should check whether the table
has any other transaction lock other than alter transaction
before deferring the index stub.
Replaced has_new_v_col with dict_add_vcol_info in dict_index_t to
indicate whether the index has any new virtual column.
dict_index_t::has_new_v_col(): Returns whether the index has
newly added virtual column, it doesn't say which columns are
newly added virtual column
ha_innobase_inplace_ctx::is_new_vcol(): Return whether the
given column is added as a part of the current alter.
ha_innobase_inplace_ctx::clean_new_vcol_index(): Copy the newly
added virtual column to new_vcol_info in dict_index_t. Replace
the column in the index fields with virtual column stored
in new_vcol_info.
dict_index_t::assign_new_v_col(): Store the number of virtual
column added in index as a part of alter table.
dict_index_t::get_n_new_vcol(): Get the number of newly added
virtual column
dict_index_t::assign_drop_v_col(): Allocate the memory for
adding new virtual column in new_vcol_info.
dict_index_t::add_drop_v_col(): Add the newly added virtual
column in new_vcol_info.
dict_table_t::has_lock_for_other_trx(): Whether the table has
any other transaction lock than given transaction.
row_merge_drop_indexes(): Add parameter alter_trx and check
whether the table has any other lock than alter transaction.
The auto-increment variables may change intermittently during
the execution of some tests from the Galera mtr suite, causing
these tests to fail. This patch creates conditions in which
unpredictable changes to these variables are not possible
during the execution of those tests in which this problem
is noticed or their values are restored before the end of
testing.
server failure in different, confusing ways
InnoDB fails to free the buffer pool instance mutex and zip mutex
If the allocation of buffer pool instance chunk fails. So it leads
to freeing of buffer pool before freeing the mutexes and
leads to double freeing of memory while freeing the mutex
during shutdown.
InnoDB should skip the dropped aborted FTS_DOC_ID_INDEX while
checking the existing FTS_DOC_ID_INDEX in the table. InnoDB
should able to create new FTS_DOC_ID_INDEX if the fulltext
index is being added for the first time.
During the prepare phase of restoring backups, "mariabackup" does
not seem to allow (or recognize) the option "innodb_force_recovery"
for the embedded InnoDB server instance that it starts.
If page corruption observed during page recovery, the prepare step
fails. While this is indeed the correct behavior ideally, allowing
this option to be set in case of emergencies might be useful when
the current backup is the only copy available. Some error messages
during "--prepare" suggest to set "innodb_force_recovery" to 1:
[ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
For backwards compatibility, "mariabackup --innobackupex --apply-log"
should also have this option.
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Virtual column fields are not found in prebuilt data type, so we should
match InnoDB fields with `get_innobase_type_from_mysql_type` method.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
- Aborting of fulltext index creation fails to remove the
index from sys indexes table. When we try to reload the
table definition, InnoDB fails with index count mismatch
error. InnoDB should remove the index from sys indexes while
rollbacking the secondary index creation.
* Remove usage of wsrep_provider variable in galera_ist_restart_joiner
* Rename galera_load_provider.inc and galera_unload_provider.inc to
galera_stop_replication.inc and galera_start_replication.inc. Their
original names were no longer reflecting what these include files do.
followup for ce3a2a688d
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
As suggested by Vladislav Vaintroub, let us remove misleading
and malformatted startup messages.
Even if the global variable srv_use_atomic_writes were set, we would
still invoke my_test_if_atomic_write() to check if writes are atomic
with a particular page size.
When using the default innodb_page_size=16k, page writes should be
atomic on NTFS when using ROW_FORMAT=COMPRESSED and KEY_BLOCK_SIZE<=4.
Disabling srv_use_atomic_writes when innodb_file_per_table=OFF does
not make sense, because that is a dynamic parameter.
We also correct the documentation string of innodb_use_atomic_writes
and remove the duplicate variable innobase_use_atomic_writes.
The debug parameter innodb_simulate_comp_failures injected compression
failures for ROW_FORMAT=COMPRESSED tables, breaking the pre-existing
logic that I had implemented in the InnoDB Plugin for MySQL 5.1 to prevent
compressed page overflows. A much better check is already achieved by
defining UNIV_ZIP_COPY at the compilation time.
(Only UNIV_ZIP_DEBUG is part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON.)
In commit eaeb8ec4b8 (MDEV-24653)
an incorrect debug assertion was introduced.
btr_pcur_store_position(): If the only record in the page is the
instant ALTER TABLE metadata record, we cannot expect there to be
a successor page. The situation could be improved by MDEV-24673 later.
Mariabackup SST fails if "--log-bin" option is added with no value
to command line parameters at server startup. This is because the SST
scripts do not correctly interpret the "--- log-bin" option without a
value. This patch adds correct handling of the "--log-bin" parameter
without value to the general part of the parameter parsing (for SST
scripts) and fixes the problem. Also added a test that checks the
correct operation of the server after the fix.
MDEV-24649 galera.galera_bf_lock_wait MTR failed with sigabrt: Assertion `!is_ow
ned()' failed in sync0policy.ic on MutexDebug with Mutex = TTASEventMutex<GenericPolicy>
Bug was fixed as part of MDEV-23328, this just adds test cases to
regression set.
Keyvalue can be longer than REC_VERSION_56_MAX_INDEX_COL_LEN
and this leads out-of-array reference. Use dynamic memory
allocation using actual max length of key value.
Online log for insert operation of redundant table fails with
index->is_instant() assert. Purge can reset the n_core_fields when
alter is waiting to upgrade MDL for commit phase of DDL. In the
meantime, any insert DML tries to log the operation fails with
index is not being instant.
row_log_get_n_core_fields(): Get the n_core_fields of online log
for the given index.
rec_get_converted_size_comp_prefix_low(): Use n_core_fields of online
log when InnoDB calculates the size of data tuple during redundant
row format table rebuild.
rec_convert_dtuple_to_rec_comp(): Use n_core_fields of online log
when InnoDB does the conversion of data tuple to record during
redudant row format table rebuild.
- Adding the test case which has more than 129 instant columns.
MDEV-25105 (commit 7a4fbb55b0)
in MariaDB 10.6 will refuse the innodb_checksum_algorithm
values none, innodb, strict_none, strict_innodb.
We will issue a deprecation warning if innodb_checksum_algorithm
is set to any of these non-default unsafe values.
innodb_checksum_algorithm=crc32 was made the default in
MySQL 5.7 and MariaDB Server 10.2, and given that older versions
of the server have reached their end of life, there is no valid
reason to use anything else than innodb_checksum_algorithm=crc32
or innodb_checksum_algorithm=strict_crc32 in MariaDB 10.3.
Reviewed by: Sergei Golubchik
InnoDB set the space in dict_table_t as NULL when table
is discarded. So InnoDB shouldn't use the space present
in table to detect whether the given tablespace is
temporary tablespace.
Incorrect processing of an auto-incrementing field in the
WSREP-related code during applying transactions results in
a duplicate key being created. This is due to the fact that
at the beginning of the write_row() and update_row() functions,
the values of the auto-increment parameters are used, which
are read from the parameters of the current thread, but further
along the code other values are used, which are read from global
variables (when applying a transaction). This can happen when
the cluster configuration has changed while applying a transaction
(for example in the high_priority_service mode for Galera 4).
Further during IST processing duplicating key is detected, and
processing of the DB_DUPLICATE_KEY return code (inside innodb,
in the write_row() handler) results in a call to the
wsrep_thd_self_abort() function.
row_number() over () window function can be used without any column in the OVER
clause. Additionally, the item doesn't reference any tables, as it's not
effectively referencing any table. Rather it is specifically built based
on the end temporary table used for window function computation.
This caused remove_const function to wrongly drop it from the ORDER
list. Effectively, we shouldn't be dropping any window function from the
ORDER clause, so adjust remove_const to account for that.
Reviewed by: Sergei Petrunia sergey@mariadb.com
In btr_index_rec_validate(), externally stored column
check is missing while matching the length of the field
with the length of the field data stored in record.
Fetch the length of the externally stored part and compare it
with the fixed field length.
When doing a truncate on an Innodb under lock tables, InnoDB would rename
the old table to #sql-... and recreate a new 't1' table. The table lock
would still be on the #sql-table.
When doing ALTER TABLE, Innodb would do the changes on the #sql table
(which would disappear on close).
When the SQL layer, as part of inline alter table, would close the
original t1 table (#sql in InnoDB) and then reopen the t1 table, Innodb
would notice that this does not match it's own (old) t1 table and
generate an error.
Fixed by adding code in truncate table that if we are under lock tables
and truncating an InnoDB table, we would close, reopen and lock the
table after truncate. This will remove the #sql table and ensure that
lock tables is using the new empty table.
Reviewer: Marko Mäkelä
Before the changes two things could happen:
- "path required name explain_filename path" error
- unit test never finishead (as it tried to execute just /bin/sh as
a test case)
This bug caused crashes of the server when processing queries with table
value constructors (TVC) that contained subqueries and were used itself as
subselects. For such TVCs the following transformation is applied at the
prepare stage:
VALUES (v1), ... (vn) => SELECT * FROM (VALUES (v1), ... (vn)) tvc_x.
This transformation allows to reduce the problem of evaluation of TVCs used
as subselects to the problem of evaluation of regular subselects.
The transformation is implemented in the wrap_tvc(). The code the function
to mimic the behaviour of the parser when processing the result of the
transformation. However this imitation was not free of some flaws. First
the function called the method exclude() that completely destroyed the
select tree structures below the transformed TVC. Second the function
used the procedure mysql_new_select to create st_select_lex nodes for
both wrapping select of the transformation and TVC. This also led to
constructing of invalid select tree structures.
The patch actually re-engineers the code of wrap_tvc().
Approved by Oleksandr Byelkin <sanja@mariadb.com>
1. wait for the binlog thread to reach the certain state, don't use
a debug_sync that's incorrectly placed to detect the state
2. no need to do a (non-deterministic) `show binlog events` to verify
what is guaranteed by the directly preceding line
Let us avoid the excessive allocation of explicit record locks
(a work-around of MDEV-24813) so that the test will execute
much faster under AddressSanitizer, MemorySanitizer, Valgrind.
The test innodb.innodb_bug60049 used to check that the record
(ID,NAME)=(12,'SYS_FOREIGN_COLS') is the last record in the
secondary index of the system table SYS_TABLES.
But, ever since commit 2336558423
or mysql/mysql-server@082d59670f
that record no longer is the last one in the table!
The more recent test innodb.purge_secondary covers the purge
functionality much better.
innobase_rename_column_try(): When renaming SYS_FIELDS records
for secondary indexes, try to use both formats of SYS_FIELDS.POS
as keys, in case the PRIMARY KEY includes a column prefix.
Without this fix, an ALTER TABLE that renames a column followed
by a server restart (or LRU eviction of the table definition
from dict_sys) would make the table inaccessible.
The test requires adaptation for MariaDB, which is done
in this patch. In addition, this patch includes a fix for
the SST script startup code that adds escaping for special
characters on the command line (in case they are contained
in the arguments to mysqld). The fix does not require
separate tests, as the required tests are already part
of the mtr suite for Galera.
A locking SELECT from an InnoDB table is very slow especially
in debug builds. Replacing some INSERT...SELECT should not reduce
the test coverage, because the test will still do DELETE
(which will acquire explicit record locks).
Problem:
========
CHANGE MASTER TO MASTER_USER='root', MASTER_SSL=0, MASTER_SSL_CA='',
MASTER_SSL_CERT='', MASTER_SSL_KEY='', MASTER_SSL_CRL='',
MASTER_SSL_CRLPATH='';
CHANGE MASTER TO MASTER_USER='root', MASTER_PASSWORD='', MASTER_SSL=0;
use-after-poison is reported for lex_mi->ssl_crl
File: sql_repl.cc
if (lex_mi->ssl_crl)
strmake_buf(mi->ssl_crl, lex_mi->ssl_crl);
Analysis:
========
At the end of CHANGE MASTER statement execution, the LEX_MASTER_INFO
parameters are reset so that the next query will have a clean state. But
'ssl_crl' and 'ssl_crl_path' members of LEX_MASTER_INFO object are not
cleared during 'LEX_MASTER_INFO::reset'. Hence when a new CHANGE MASTER
statement is executed, the stale value of lex_mi->ssl_crl is used, so ASAN
reports use-after-poison.
Fix:
===
Clear 'ssl_crl' and 'ssl_crl_path' as part of 'reset'.
Dump sequences first.
This atch made to keep it small and
to keep number of queries to the server the same.
Order of tables in a dump can not be changed
(except sequences first) because mysql_list_tables
uses SHOW TABLES and I used SHOW FULL TABLES.
We may end up with an empty leaf page (containing only an ADD COLUMN
metadata record) that is not the root page.
innobase_add_instant_try(): Disable an optimization for a non-canonical
empty table that contains a metadata record somewhere else than in
the root page.
btr_pcur_store_position(): Tolerate a non-canonical empty table.
Problem:
========
Auto purge of relaylogs stops when relay-log-file is
'slave-relay-log.999999' and slave_parallel_threads is enabled.
Analysis:
=========
The problem is that in Relay_log_info::inc_group_relay_log_pos() function,
when two log names are compared via strcmp() function, it gives correct
result, when log name sequence numbers are of same digits(6 digits), But
when the number goes to 7 digits, a 999999 compares greater than
1000000, which is wrong, hence the bug.
Fix:
====
Extract the numeric extension part of the file name, convert it into
unsigned long and compare.
Thanks to David Zhao for the contribution.