We support online log resizing by replicating the current ib_logfile0
to a new file ib_logfile101, which will eventually replace the
ib_logfile0 on the first applicable log checkpoint.
Unless the log is located in a persistent memory file system (PMEM),
an attempt to SET GLOBAL innodb_log_file_size to less than
innodb_log_buffer_size will be refused. (With PMEM, a.k.a. mmap()
based log, that parameter has no meaning.)
Should the server be killed while the log was being resized,
both files ib_logfile0 and ib_logfile101 may exist on startup,
and since commit 3b06415cb8
the extra file ib_logfile101 will be removed.
We will initiate checkpoint flushing by invoking buf_flush_ahead(),
to let buf_flush_page_cleaner() write out pages until the
buf_flush_async_lsn target has been reached.
On a log checkpoint, if the new checkpoint LSN is not older than
log_sys.resize_lsn (the start LSN of the ib_logfile101),
we can switch files and complete the log resizing. Else, we will
attempt to switch files on the next checkpoint.
Log resizing can be aborted by killing the connection that is
executing the SET GLOBAL statement.
If the ib_logfile101 wraps around to the beginning, we must
advance the log_sys.resize_lsn. In the resized log file,
the sequence bit will always be written as 1 (no wrap-around).
The log will be duplicated in log_t::resize_write(), invoked by
mtr_t::finish_write().
When the log is being written via system calls (not PMEM), the initial
log_sys.resize_lsn is the current log_sys.first_lsn, plus an integer
multiple of log_sys.block_size, corresponding to the LSN at the start
of the block that was written by log_sys.write_lsn. The log_sys.resize_buf
will be of the same size as the log_sys.buf. During resizing, the
contents of log_sys.buf and log_sys.resize_buf will be identical,
except that the sequence bit of each mini-transaction will always be 1 in
log_sys.resize_buf. If resizing is in progress, log_t::write_buf()
will write log_sys.resize_buf to log_sys.resize_log (ib_logfile101).
If the file would wrap around, the buffer will be written to
log_sys.START_OFFSET and the log_sys.resize_lsn advanced accordingly.
When using mmap() on /dev/shm or a PMEM mount -o dax file system,
the initial log_sys.resize_lsn will be the log_sys.lsn at the time
the resizing is initiated. If the log file wraps around during resizing,
then the log_sys.resize_lsn will be advanced by
(log_sys.resize_target - log_sys.START_OFFSET).
log_t::resize_start(), log_t::resize_abort(), log_t::write_checkpoint():
Unless the log is mmap() based, acquire flush_lock and write_lock.
In any case, acquire exclusive log_sys.latch to prevent race conditions.
log_t::resize_rename(): Renamed from log_t::rename_resized(),
and moved some code to the previous sole caller srv_start().
Thanks to Vladislav Vaintroub for helpful review comments
and to Matthias Leich for testing this, in particular, testing
crash recovery, multiple concurrent SET GLOBAL innodb_log_file_size
and frequently killed connections.
create_table_info_t::innobase_table_flags(): Ignore page_compressed
and page_compression_level on TEMPORARY tables.
ha_innobase::truncate(): Add a debug assertion that create() must succeed
on temporary tables.
When a partition with engine-defined partition options is added by
ALTER, it shows the correct definition in SHOW CREATE, but the actual
values get inherited from the previous partition.
The cause of the bug is that the server did not parse the options
associated with the newly added partition.
In the case of ALTER, the complete part_info, which represents the
partition structure after ALTER, is built by prep_alter_part_table().
So, we need to parse engine-defined partition options after the call
of the function.
In commit a635c40648 (MDEV-27774)
the logic for waking up the buf_pool_page_cleaner() was revised.
Because ReleaseBlocks was being passed by value to CIterate,
only copies of the data member rb.modified were ever modified
during the iteration.
We must use Iterate instead of CIterate to pass the object by
reference.
Thanks to Krunal Bauskar for pointing this out.
The test innodb.log_file_size would occasionally crash in
a debug assertion failure in srv_prepare_to_delete_redo_log_file().
In the core dump, the assertion would hold.
buf_pool_t::any_io_pending(): Avoid dirty reads by acquiring
buf_pool.mutex. This function is only being invoked in debug builds,
so we do not care about any performance impact.
The follwoing could happen if InnoDB did some background work while
test was running:
@@ -5,6 +5,8 @@
SELECT LOCK_MODE, LOCK_TYPE, TABLE_SCHEMA, TABLE_NAME FROM information_schema.metadata_lock_info;
LOCK_MODE LOCK_TYPE TABLE_SCHEMA TABLE_NAME
MDL_SHARED_HIGH_PRIO Table metadata lock test t1
+MDL_SHARED Table metadata lock mysql innodb_table_stats
+MDL_SHARED Table metadata lock mysql innodb_index_stat
In commit c7d0448797 (MDEV-15132)
MariaDB Server 10.3 stopped writing the latest transaction identifier
to the TRX_SYS page. Instead, the transaction identifier will be
recovered from undo log pages.
Unfortunately, before commit 3926673ce7
and mysql/mysql-server@dc29792ff2
(MySQL 5.1.48 or MariaDB 5.1.48) InnoDB did not always initialize all
data fields, but some garbage could be left behind in unused parts
of data pages.
In undo log pages that are essentially free, but added to a list for
reuse (TRX_UNDO_CACHED) the TRX_UNDO_TRX_NO fields could contain garbage,
instead of 0. As long as such undo pages are being reused and never
marked completely free, the garbage contents may remain forever.
In fact, the function trx_undo_header_create() and the record
MLOG_UNDO_HDR_CREATE will only initialize TRX_UNDO_TRX_ID, but leave
TRX_UNDO_TRX_NO uninitialized.
trx_undo_mem_create_at_db_start(): Only read the TRX_UNDO_TRX_NO
fields of TRX_UNDO_CACHED pages if the TRX_UNDO_PAGE_TYPE is 0,
that is, the page was updated by MariaDB Server 10.3. Earlier versions
would always write the TRX_UNDO_PAGE_TYPE as 1 or 2.
trx_undo_header_create(): Zero out the TRX_UNDO_TRX_NO field.
Strictly speaking, this will change the semantics of the
MLOG_UNDO_HDR_CREATE record, but it should not do any harm to
overwrite a potentially garbage field with zeroes.
Note: This fix will only help future upgrades straight from
MariaDB Server 10.2 or MySQL 5.6 or earlier. If such an upgrade has
already been made, then an earlier server startup could have
fast-forwarded the transaction ID sequence to a large value.
If this large value cannot be represented in 48 bits (the size of
the DB_TRX_ID column in clustered index records), then various
strange things can happen.
DEBUG_SYNC signals can get lost in certain tests due to later
DEBUG_SYNC commands overwriting them. This patch addresses
these issues in three tests: main.query_cache_debug,
main.partition_debug_sync, and
rpl.rpl_dump_request_retry_warning.
Additionally, main.partition_debug_sync needed changes to the
result file (the others did not). The synchronization happened
between two commands, one based on ALTER, the other on DROP.
A new thread/connection was needed to synchronize the DEBUG_SYNC
actions between these commands, thereby changing the result file.
Additional comments were added for clarification.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
When the log is stored in persistent memory, log_sys.buf[] is
a ring buffer that directly maps to the circular ib_logfile0 file.
There were several errors that could occur in the special case
when a log record ends exactly at the end of the log file and the
next record would start at log_sys.buf[log_sys.START_OFFSET].
mariabackup.huge_lsn,strict_full_crc32: Write the first record
at the very end of the circular file, to reproduce the failure
scenarios.
recv_sys_t::parse(): On PMEM, wrap the end offset of the record
from log_sys.file_size to log_sys.START_OFFSET if needed.
Otherwise, both InnoDB recovery and mariadb-backup would try
to parse the next record from an invalid address.
filename_to_spacename(): Remove an assumption about the format
of file names. While the server currently writes file names like
./databasename/tablename.ibd we might want to stop writing the
redundant ./ prefix in the future. The test mariabackup.huge_lsn
is generating such file names.
xtrabackup_copy_logfile(): Correctly copy a record that ends at
the very end of the log_sys.buf[].
The errors in mariadb-backup were reproduced with the test
mariabackup.huge_lsn,strict_full_crc32 and an additional patch
to use the start checkpoint of the test:
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 27dce5fa17d..e17a1692d6f 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -1796,7 +1796,8 @@ dberr_t recv_sys_t::find_checkpoint()
continue;
}
- if (checkpoint_lsn >= log_sys.next_checkpoint_lsn)
+ if (checkpoint_lsn >= log_sys.next_checkpoint_lsn &&
+ checkpoint_lsn != 0x1000fffffe10)
{
log_sys.next_checkpoint_lsn= checkpoint_lsn;
log_sys.next_checkpoint_no= field == log_t::CHECKPOINT_1;
OpenSSL 3.0.0+ does not support EVP_MD_CTX_FLAG_NON_FIPS_ALLOW any longer.
In OpenSSL 1.1.1 the non FIPS allowed flag is context specific, while
in 3.0.0+ it is a different EVP_MD provider.
Fixes#2010
json_lib.c:847:25: runtime error: index 200 out of bounds for type 'json_string_char_classes [128]'
json_lib.c:847:25: runtime error: load of address 0x56286f7175a0 with insufficient space for an object of type 'json_string_char_classes'
fixes main.json_equals and main.json_normalize