In mariadb-backup --backup there are multiple mechanisms for ensuring that
a sufficient amount of the InnoDB write-ahead log (ib_logfile0) is being
copied at the end of the backup. The backup needs to include the latest
committed transaction. While further transaction commits are blocked by
BACKUP STAGE BLOCK_COMMIT, ongoing transactions may modify the database
contents and write log records. We were unnecessarily copying such log,
which would also cause further effort of rolling back incomplete
transactions after the backup is restored.
backup_wait_for_lsn(): Declare as static, and refactor some code
to separate functions backup_wait_for_lsn_low() and
backup_wait_timeout().
backup_wait_for_commit_lsn(): A new function to determine the current
LSN (within BACKUP STAGE BLOCK_COMMIT) and to wait for the log to be
copied until that. Invoked by BackupStages::stage_block_commit().
xtrabackup_backup_func(): Remove a condition that had already been
checked by a caller of backup_wait_timeout().
server_lsn_after_lock: Declare as a local variable in
BackupStages::stage_block_ddl().
log_copying_thread(), io_watching_thread(): Use metadata_last_lsn
instead of metadata_to_lsn as the stop condition.
BackupStages::stage_block_commit(): Ensure that the log tables
(in particular, mysql.general_log) will have been copied before
the BACKUP STAGE BLOCK_COMMIT is being followed by any further
SQL statements.
Reviewed by: Debarun Banerjee
Tested by: Matthias Leich
(Variant 4, with @@optimizer_adjust_secondary_key_costs, reuse in two
places, and conditions are replaced with equivalent simpler forms in two more)
In best_access_path(), ReuseRangeEstimateForRef-3, the check
for whether
"all used key_part_i used key_part_i=const"
was incorrect: it may produced a "NO" answer for cases when we
had:
key_part1= const // some key parts are usable
key_part2= value_not_in_join_prefix //present but unusable
key_part3= non_const_value // unusable due to gap in key parts.
This caused the optimizer to fail to apply ReuseRangeEstimateForRef
heuristics. The consequence is poor query plan choice when the index
in question has very skewed data distribution.
The fix is enabled if its @@optimizer_adjust_secondary_key_costs flag
is set.
The memory leak happened on second execution of a prepared statement
that runs UPDATE statement with correlated subquery in right hand side of
the SET clause. In this case, invocation of the method
table->stat_records()
could return the zero value that results in going into the 'if' branch
that handles impossible where condition. The issue is that this condition
branch missed saving of leaf tables that has to be performed as first
condition optimization activity. Later the PS statement memory root
is marked as read only on finishing first time execution of the prepared
statement. Next time the same statement is executed it hits the assertion
on attempt to allocate a memory on the PS memory root marked as read only.
This memory allocation takes place by the sequence of the following
invocations:
Prepared_statement::execute
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_update::execute_inner
Sql_cmd_update::update_single_table
st_select_lex::save_leaf_tables
List<TABLE_LIST>::push_back
To fix the issue, add the flag SELECT_LEX::leaf_tables_saved to control
whether the method SELECT_LEX::save_leaf_tables() has to be called or
it has been already invoked and no more invocation required.
Similar issue could take place on running the DELETE statement with
the LIMIT clause in PS/SP mode. The reason of memory leak is the same as for
UPDATE case and be fixed in the same way.
From e735cf2ed7cefb2af36f10f3cb47dfc060789df3, the PCRE_INCLUDES
changed to PCRE_INCLUDE_DIRS for consistency.
The columnstore module depends on the old name.
Create a mapping for the columnstore submodule.
10.6+ fix for submodule is:
* https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3304
Without the call to my_mutex_init, the mutex attributes
my_fast_mutexattr and my_errorcheck_mutexattr are uninitialized.
Linux tolerates this but FreeBSD doesn't (and segfaults).
We fix for all since the unit text should be testing the
standard mutexes of the system.
log_t::resize_write(): Advance log_sys.resize_lsn and reset
the resize_log offset to START_OFFSET whenever the memory-mapped buffer
would wrap around.
Previously, in case the initial target offset would be beyond the
requested innodb_log_file_size, we only adjusted the offset but
not the LSN. An incorrect LSN would cause log_sys.buf_free to be out
of bounds when the log resizing completes.
The log_sys.lsn_lock will cover the entire duration of replicating
memory-mapped log for resizing. We just need a mutex that is compatible
with the caller holding log_sys.latch. While the choice of mtr_t::finisher
(for normal log writes) depends on mtr_t::spin_wait_delay,
replicating the log during resizing is a rare operation where we can
afford possible additional context switching overhead.
MySQL-Connector-Net casts SEQ_IN_INDEX to uint and will
raise an exception if the type is a System.Int64.
As we don't support a huge number of multi-columns in
an index reducing to a uint is sufficient to represent
all values and maintain compatibility with MySQL-Connector-Net.
This matches the type (uint) returned by MySQL-8.3 and 8.0.
Reviewer: Alexander Barkov <bar@mariadb.com>
buf_flush_buffer_pool(): Wait for any pending asynchronous reads
to complete. This assertion failed in a run where buf_read_ahead_linear()
had been triggered in an SQL statement that was executed right
before shutdown.
Reviewed by: Debarun Banerjee
btr_cur_t::search_leaf(): When the index root page is also a leaf page,
we may need to upgrade our existing shared root page latch into an
exclusive latch. Even if we end up waiting, the root page won't be able
to go away while we hold an index()->lock. The index page may be split;
that is all.
btr_latch_prev(): Acquire the page latch while holding a buffer-fix
and an index tree latch. Merge the change buffer if needed. Use
buf_pool_t::page_fix() for this special case instead of complicating
buf_page_get_low() and buf_page_get_gen().
row_merge_read_clustered_index(): Remove some code that does not seem
to be useful. No difference was observed with regard to removing this
code when a CREATE INDEX or OPTIMIZE TABLE statement was run concurrently
with sysbench oltp_update_index --tables=1 --table_size=1000 --threads=16.
buf_pool_t::unzip(): Decompress a ROW_FORMAT=COMPRESSED page.
buf_pool_t::page_fix(): Handle also ROW_FORMAT=COMPRESSED pages
as well as change buffer merge. Optionally return an error.
Add a flag for suppressing a page latch wait and a special return
value -1 to indicate that the call would block.
This is the preferred way of buffer-fixing blocks.
The functions buf_page_get_gen() and buf_page_get_low() are only being
invoked with rw_latch=RW_NO_LATCH in operations on SPATIAL INDEX.
buf_page_t: Define some static functions for interpreting state().
buf_page_get_zip(), buf_read_page(),
buf_read_ahead_random(), buf_read_ahead_linear():
Remove the redundant parameter zip_size. We must look up the
tablespace and can invoke fil_space_t::zip_size() on it.
buf_page_get_low(): Require mtr!=nullptr.
buf_page_get_gen(): Implement some lock downgrading during recovery.
ibuf_page_low(): Use buf_pool_t::page_fix() in a debug check.
We do wait for a page read here, because otherwise a debug assertion in
buf_page_get_low() in the test innodb.ibuf_delete could occasionally fail.
PageConverter::operator(): Invoke buf_pool_t::page_fix() in order
to possibly evict a block. This allows us to remove some
special case code from buf_page_get_low().
It's possible that MDL conflict handling code is called more
than once for a transaction when:
- it holds more than one conflicting MDL lock
- reschedule_waiters() is executed,
which results in repeated attempts to BF-abort already aborted
transaction.
In such situations, it might be that BF-aborting logic sees
a partially rolled back transaction and erroneously decides
on future actions for such a transaction.
The specific situation tested and fixed is when a SR transaction
applied in the node gets BF-aborted by a started TOI operation.
It's then caught with the server transaction already rolled back,
but with no MDL locks yet released. This caused wrong state
detection for such a transaction during repeated MDL conflict
handling code execution.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
(Variant 2b: call greedy_search() twice, correct handling for limited
search_depth)
Modify the join optimizer to specifically try to produce join orders that
can short-cut their execution for ORDER BY..LIMIT clause.
The optimization is controlled by @@optimizer_join_limit_pref_ratio.
Default value 0 means don't construct short-cutting join orders.
Other value means construct short-cutting join order, and prefer it only
if it promises speedup of more than #value times.
In Optimizer Trace, look for these names:
* join_limit_shortcut_is_applicable
* join_limit_shortcut_plan_search
* join_limit_shortcut_choice
Problem was that wsrep_schema tables were not marked as
category information. Fix allows access to wsrep_schema
tables even when node is detached.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Replication of MyISAM and Aria DML is experimental and best
effort only. Earlier change make INSERT SELECT on both
MyISAM and Aria to replicate using TOI and STATEMENT
replication. Replication should happen only if user
has set needed wsrep_mode setting.
Note: This commit contains additional changes compared
to those already made for the 10.5 branch.
+ small refactoring after main fix.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
It's possible that MDL conflict handling code is called more
than once for a transaction when:
- it holds more than one conflicting MDL lock
- reschedule_waiters() is executed,
which results in repeated attempts to BF-abort already aborted
transaction.
In such situations, it might be that BF-aborting logic sees
a partially rolled back transaction and erroneously decides
on future actions for such a transaction.
The specific situation tested and fixed is when a SR transaction
applied in the node gets BF-aborted by a started TOI operation.
It's then caught with the server transaction already rolled back,
but with no MDL locks yet released. This caused wrong state
detection for such a transaction during repeated MDL conflict
handling code execution.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
* Fixes galera.galera_bf_kill_debug test case.
* Enable galera_ssl_upgrade, galera_ssl_reload, galera_pc_bootstrap
* Add MDEV to disabled tests that miss it
P.S. This commit contains additional changes compared
to the similar commit for 10.5 branch.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
After closing https://github.com/codership/galera-bugs/issues/947,
Galera now correctly certifies table-level keys, which made bulk
insert work again.
The corresponding MTR test is made deterministic and re-enabled.
Requires Galera 26.4.19
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
* Fixes galera.galera_bf_kill_debug test case.
* Enable galera_ssl_upgrade, galera_ssl_reload, galera_pc_bootstrap
* Add MDEV to disabled tests that miss it
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Based on logs SST was started before donor reached
Primaty state. Add wait_conditions to make sure that
nodes reach Primary state before starting next node.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
For TOI events specifically we have a situation where in case of the
same error different nodes may generate different messages. This may
be for two reasons:
- different locale setting between the current client session and
server default (we can reasonably require server locales to be
identical on all nodes, but user can change message locale for the
session)
- non-deterministic course of STATEMENT execution e.g. for ALTER TABLE
On the other hand we may reasonably expect TOI event failures since
they are executed after replication, so we must ensure that voting is
consistent. For that purpose error codes should be sufficiently unique
and deterministic for TOI event failures as DDLs normally deal with
a single object, so we can merely use MySQL error codes to vote on.
Notice that this problem does not happen with regular transactional
writesets, since the originator node will always vote success and
replica nodes are assumed to have the same global locale setting.
As such different error messages indicate different errors even if
the error code is the same (e.g. ER_DUP_KEY can happen on different
rows tables).
Use only MySQL error code (without the error message) for error voting
in case of TOI event failure.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
When handling fatal signal, shut down Galera networking
before printing out stack trace and writing core file.
This is to achieve fail-silent semantics on crashes which may
keep the process running for a long time, but not fully responding
e.g. due to core dumping or symbol resolving.
Also suppress all Galera/wsrep logging to avoid logging from
background threads to garble crash information from signal handler.
Notice that for fully fail-silent crash, Galera 26.4.19 is needed.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
The recent commit 4ca355d863 (MDEV-33894)
caused a serious regression for online InnoDB ib_logfile0 resizing,
breaking crash-safety unless the memory-mapped log file interface is
being used. However, the log resizing was broken also before this.
To prevent such regressions in the future, we extend the test
innodb.log_file_size_online with a kill and restart of the server
and with some writes running concurrently with the log size change.
When run enough many times, this test revealed all the bugs that
are being fixed by the code changes.
log_t::resize_start(): Do not allow the resized log to start before
the current log sequence number. In this way, there is no need to
copy anything to the first block of resize_buf. The previous logic
regarding that was incorrect in two ways. First, we would have to
copy from the last written buffer (buf or flush_buf). Second, we failed
to ensure that the mini-transaction end marker bytes would be 1
in the buffer. If the source ib_logfile0 had wrapped around an odd number
of times, the end marker would be 0. This was occasionally observed
when running the test innodb.log_file_size_online.
log_t::resize_write_buf(): To adjust for the resize_start() change,
do not write anything that would be before the resize_lsn.
Take the buffer (resize_buf or resize_flush_buf) as a parameter.
Starting with commit 4ca355d863
we no longer swap buffers when rewriting the last log block.
log_t::append(): Define as a static function; only some debug
assertions need to refer to the log_sys object.
innodb_log_file_size_update(): Wake up the buf_flush_page_cleaner()
if needed, and wait for it to complete a batch while waiting for
the log resizing to be completed. If the current LSN is behind the
resize target LSN, we will write redundant FILE_CHECKPOINT records to
ensure that the log resizing completes. If the buf_pool.flush_list is
empty or the buf_flush_page_cleaner() is stuck for some reason, our wait
will time out in 5 seconds, so that we can periodically check if the
execution of SET GLOBAL innodb_log_file_size was aborted. Previously,
we could get into a busy loop here while the buf_flush_page_cleaner()
would remain idle.
Problem was that wsrep_schema tables were not marked as
category information. Fix allows access to wsrep_schema
tables even when node is detached.
This is 10.4-10.9 version of fix.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that we did not found that table was partitioned
and then we should find what is actual underlaying storage
engine.
We should not use RSU for !InnoDB tables.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>