This is based on a prototype by
Thirunarayanan Balathandayuthapani <thiru@mariadb.com>.
Binlog and Galera write-set replication information was written into
TRX_SYS page on each commit. Instead of writing to the TRX_SYS during
normal operation, InnoDB can make use of rollback segment header pages,
which are already being written to during a commit.
The following list of fields in rollback segment header page are added:
TRX_RSEG_BINLOG_OFFSET
TRX_RSEG_BINLOG_NAME (NUL-terminated; empty name = not present)
TRX_RSEG_WSREP_XID_FORMAT (0=not present; 1=present)
TRX_RSEG_WSREP_XID_GTRID
TRX_RSEG_WSREP_XID_BQUAL
TRX_RSEG_WSREP_XID_DATA
trx_sys_t: Introduce the fields
recovered_binlog_filename, recovered_binlog_offset, recovered_wsrep_xid.
To facilitate upgrade from older mysql or mariaDB versions, we will read
the information in TRX_SYS page. It will be overridden by the
information that we find in rollback segment header pages.
Mariabackup --prepare will read the metadata from the rollback
segment header pages via trx_rseg_array_init(). It will still
not read any undo log pages or recover any transactions.
trx_sys_t::rseg_history_len: Make private, and clarify the
documentation.
trx_sys_t::history_size(): Read rseg_history_len.
trx_sys_t::history_insert(), trx_sys_t::history_remove(),
trx_sys_t::history_add(): Update rseg_history_len.
Disable the test encryption.innodb_encryption-page-compression
because the wait_condition would seem to time out deterministically.
MDEV-14814 has to be addressed in 10.2 separately.
Datafile::validate_first_page(): Do not invoke
page_size_t::page_size_t(flags) before validating the tablespace flags.
This avoids a crash in MDEV-15333 innodb.restart test case.
FIXME: Reduce the number of error messages. The first one is enough.
This performance regression was introduced in the MariaDB 10.1
file format incompatibility bug fix MDEV-11623 (MariaDB 10.1.21
and MariaDB 10.2.4) and partially fixed in MariaDB 10.1.25 in
MDEV-12610 without adding a regression test case.
On a normal startup (without crash recovery), InnoDB should not read
every .ibd data file, because this is slow. Like in MySQL, for now,
InnoDB will still open every data file (without reading), and it
will read every .ibd file for which an .isl file exists, or the
DATA DIRECTORY attribute has been specified for the table.
The test case shuts down InnoDB, moves data files, replaces them
with garbage, and then restarts InnoDB, expecting no messages to
be issued for the garbage files. (Some messages will for now be
issued for the table that uses the DATA DIRECTORY attribute.)
Finally, the test shuts down the server, restores the old data files,
and restarts again to drop the tables.
fil_open_single_table_tablespace(): Remove the condition on flags,
and only call fsp_flags_try_adjust() if validate==true
(reading the first page has been requested). The only caller with
validate==false is at server startup when we are processing all
records from SYS_TABLES. The flags passed to this function are
actually derived from SYS_TABLES.TYPE and SYS_TABLES.N_COLS,
and there never was any problem with SYS_TABLES in MariaDB 10.1.
The problem that MDEV-11623 was that incorrect tablespace flags
were computed and written to FSP_SPACE_FLAGS.
Note: Linux only
Core dumps of large buffer pool pages take time and space
and pose potential data expose in scenarios where data-at-rest
encryption is deployed.
Here we use madvise(MADV_DONT_DUMP) on large memory allocations
used by the innodb buffer pool, log_sys and recv_sys. The effect
of this system call is that these memory areas will not appear in
a core dump. Data from these buffers is rarely useful in fault
diagnosis.
log_sys and recv_sys structures now use large memory allocations
for their large buffer.
Debug builds don't include the madvise syscall and as such will
include full core dumps.
A function, buf_madvise_do_dump, is added but never called. It
is there to be called from a debugger to re-enable the core
dumping of all of these pages if for some reason the entire
contents of these buffers are needed.
Idea thanks to Hartmut Holzgraefe
fts_cmp_set_sync_doc_id(), fts_load_stopword(): Start the transaction
in read-only mode if innodb_read_only is set.
fts_update_sync_doc_id(), fts_commit_table(), fts_sync(),
fts_optimize_table(): Return DB_READ_ONLY if innodb_read_only is set.
fts_doc_fetch_by_doc_id(), fts_table_fetch_doc_ids():
Remove the code to start an internal transaction or to roll back,
because this is a read-only operation.
If CREATE TABLE...FULLTEXT INDEX was initiated right before shutdown,
then the function fts_load_stopword() could commit modifications
after shutdown was initiated, causing an assertion failure in
the function trx_purge_add_update_undo_to_history().
Mark as internal all the read/write transactions that
modify fulltext indexes, so that they will be ignored by
the assertion that guards against transaction commits
after shutdown has been initiated.
fts_optimize_free(): Invoke trx_commit_for_mysql() just in case,
because in fts_optimize_create() we started the transaction as
internal, and fts_free_for_backgruond() would assert that the
flag is clear. Transaction commit would clear the flag.
Also, allow the MariaDB 10.2 server to link InnoDB dynamically
against ha_innodb.so (which is what mysql-test-run.pl expects
to exist, instead of the default name ha_innobase.so).
wsrep_load_data_split(): Instead of referring to innodb_hton_ptr,
check the handlerton::db_type. This was recently broken by me in
MDEV-11415.
innodb_lock_schedule_algorithm: Define as a weak global symbol,
so that WITH_WSREP will not depend on InnoDB being linked statically.
I tested this manually. Notably, running a test that only does
SET GLOBAL wsrep_on=1;
with a static or dynamic InnoDB and
./mtr --mysqld=--loose-innodb-lock-schedule-algorithm=fcfs
will crash with SIGSEGV at shutdown. With the default VATS
combination the wsrep_on is properly refused for both the
static and dynamic InnoDB.
ha_close_connection(): Do invoke the method also for plugins
for which UNINSTALL PLUGIN was deferred due to open connections.
Thanks to @svoj for pointing this out.
thd_to_trx(): Return a pointer, not a reference to a pointer.
check_trx_exists(): Invoke thd_set_ha_data() for assigning a transaction.
log_write_checkpoint_info(): Remove an unused DEBUG_SYNC point
that would cause an assertion failure on shutdown after deferred
UNINSTALL PLUGIN.
This was tested as follows:
cmake -DWITH_WSREP=1 -DPLUGIN_INNOBASE:STRING=DYNAMIC \
-DWITH_MARIABACKUP:BOOL=OFF ...
make
cd mysql-test
./mtr innodb.innodb_uninstall
fts_cmp_set_sync_doc_id(), fts_load_stopword(): Start the transaction
in read-only mode if innodb_read_only is set.
fts_update_sync_doc_id(), fts_commit_table(), fts_sync(),
fts_optimize_table(): Return DB_READ_ONLY if innodb_read_only is set.
fts_doc_fetch_by_doc_id(), fts_table_fetch_doc_ids():
Remove the code to start an internal transaction or to roll back,
because this is a read-only operation.
If CREATE TABLE...FULLTEXT INDEX was initiated right before shutdown,
then the function fts_load_stopword() could commit modifications
after shutdown was initiated, causing an assertion failure in
the function trx_purge_add_update_undo_to_history().
Mark as internal all the read/write transactions that
modify fulltext indexes, so that they will be ignored by
the assertion that guards against transaction commits
after shutdown has been initiated.
fts_optimize_free(): Invoke trx_commit_for_mysql() just in case,
because in fts_optimize_create() we started the transaction as
internal, and fts_free_for_backgruond() would assert that the
flag is clear. Transaction commit would clear the flag.
Also, allow the MariaDB 10.2 server to link InnoDB dynamically
against ha_innodb.so (which is what mysql-test-run.pl expects
to exist, instead of the default name ha_innobase.so).
wsrep_load_data_split(): Instead of referring to innodb_hton_ptr,
check the handlerton::db_type. This was recently broken by me in
MDEV-11415.
innodb_lock_schedule_algorithm: Define as a weak global symbol,
so that WITH_WSREP will not depend on InnoDB being linked statically.
I tested this manually. Notably, running a test that only does
SET GLOBAL wsrep_on=1;
with a static or dynamic InnoDB and
./mtr --mysqld=--loose-innodb-lock-schedule-algorithm=fcfs
will crash with SIGSEGV at shutdown. With the default VATS
combination the wsrep_on is properly refused for both the
static and dynamic InnoDB.
ha_close_connection(): Do invoke the method also for plugins
for which UNINSTALL PLUGIN was deferred due to open connections.
Thanks to @svoj for pointing this out.
thd_to_trx(): Return a pointer, not a reference to a pointer.
check_trx_exists(): Invoke thd_set_ha_data() for assigning a transaction.
log_write_checkpoint_info(): Remove an unused DEBUG_SYNC point
that would cause an assertion failure on shutdown after deferred
UNINSTALL PLUGIN.
This was tested as follows:
cmake -DWITH_WSREP=1 -DPLUGIN_INNOBASE:STRING=DYNAMIC \
-DWITH_MARIABACKUP:BOOL=OFF ...
make
cd mysql-test
./mtr innodb.innodb_uninstall
This is regression after bc7a1dc1fb of
MDEV-15104 - Optimise MVCC snapshot.
Aforementioned revision removes mutex lock around ReadView creation,
which allows views to be created concurrently. Effectively it
invalidates "oldest view" approach: no single view can be considered
oldest anymore. Instead we have to iterate trx_sys.m_views to find
min(m_low_limit_no), min(m_low_limit_id) and all transaction ids below
min(m_low_limit_id), which would form oldest view.
Second regression comes from c0d5d7c0ef
of MDEV-15104 - Optimise MVCC snapshot.
It removes mutex protection around trx->no assignment, which opens up
a gap between m_max_trx_id increment and transaction serialisation
number becoming visible through rw_trx_hash. While we're in this gap
concurrent thread may come and do MVCC snapshot without seeing allocated
but not yet assigned serialisation number. Then at some point purge
thread may clone this view. As a result it won't see newly allocated
serialisation number and may remove "unnecessary" history data of this
transaction from rollback segments.
lock_trx_release_locks(): Relax a debug assertion to allow
recovered TRX_STATE_COMMITTED_IN_MEMORY transactions.
trx_commit_in_memory(): Add DEBUG_SYNC instrumentation.
trx_undo_insert_cleanup(): Skip persistent changes if innodb_read_only
is set. This should only happen when a recovered committed transaction
would be cleaned up at shutdown.
When Mariabackup gets a bad read of the first page of the system
tablespace file, it would inappropriately try to apply the doublewrite
buffer and write changes back to the data file (to the source file)!
This is very wrong and must be prevented.
The correct action would be to retry reading the system tablespace
as well as any other files whose first page was read incorrectly.
Fixing this was not attempted.
xb_load_tablespaces(): Shorten a bogus message to be more relevant.
The message can be displayed by --backup or --prepare.
xtrabackup_backup_func(), os_file_write_func(): Add a missing space
to a message.
Datafile::restore_from_doublewrite(): Do not even attempt the
operation in Mariabackup.
recv_init_crash_recovery_spaces(): Do not attempt to restore the
doublewrite buffer in Mariabackup (--prepare or --export), because
all pages should have been copied correctly in --backup already,
and because --backup should ignore the doublewrite buffer.
SysTablespace::read_lsn_and_check_flags(): Do not attempt to initialize
the doublewrite buffer in Mariabackup.
innodb_make_page_dirty(): Correct the bounds check.
Datafile::read_first_page(): Correct the name of the parameter.
When code from MySQL 5.7.9 was merged to MariaDB 10.2.2
in commit 2e814d4702
an assignment validate=true was inadvertently added to the function
dict_check_sys_tables().
This causes InnoDB to open every single .ibd file on startup, even
when no crash recovery was needed.
Simply removing the assignment would make some tests fail. We do the
best to retain almost the same level of inconsistency detection.
In the test innodb.table_flags, access to one of the tables will not
be blocked despite inconsistent flags.
dict_check_sys_tables(): Remove the problematic assignment, and skip
validation in normal startup.
dict_load_table_one(): If the .ibd file cannot be opened, mark the
table as corrupted and unreadable.
fil_node_open_file(): Validate FSP_SPACE_FLAGS with the expected
flags. If reading the tablespace fails, invalidate node->handle
instead of letting it remain stale. This bug was caught by a
fil_validate() assertion failure.
fsp_flags_try_adjust(): If the tablespace file is invalid, do nothing.