There is only one lock_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.
lock_sys_t::create(): The deferred constructor.
lock_sys_t::close(): The early destructor.
There is only one purge_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.
purge_sys_t::create(): The deferred constructor.
purge_sys_t::close(): The early destructor.
undo::Truncate::create(): The deferred constructor.
Because purge_sys.undo_trunc is constructed before the start-up
parameters are parsed, the normal constructor would copy a
wrong value of srv_purge_rseg_truncate_frequency.
TrxUndoRsegsIterator: Do not forward-declare an inline constructor,
because the static construction of purge_sys.rseg_iter would not have
access to it.
trx_purge(): Remove the parameter limit or batch_size, which is
always passed as srv_purge_batch_size.
trx_purge_attach_undo_recs(): Remove the parameters purge_sys, batch_size.
Refer to srv_purge_batch_size.
trx_purge_wait_for_workers_to_complete(): Remove the parameter purge_sys.
Add innodb debug system variable, innodb_buffer_pool_load_pages_abort, to test
the behaviour of innodb_buffer_pool_load_incomplete.
(innodb_buufer_pool_dump_abort_loads.test)
This status variable indicates that an innodb buffer pool load never
completed and dumping at shutdown would result in an incomplete dump file.
This status variable is set to 1 once a buffer pool loads. Upon a successful
load this status variable returns to 0.
With this status variable set, the system variable
innodb_buffer_pool_dump_at_shutdown==1 will have no effect as dumping after
an incomplete load will generate a less complete dump file than the current
one.
If a user aborts a buffer pool load by changing the system variable
innodb_buffer_pool_load_abort=1 will cause the the status variable
innodb_buffer_pool_load_incomplete to remain set to 1.
A shutdown that occurs while innodb is loading the buffer pool will
not save the buffer pool on shutdown.
A user may indirectly set innodb_buffer_pool_load_incomplete
to 0 by:
* Forcing a load, by setting innodb_buffer_pool_load_now=ON, or
* Forcing a dump, by setting innodb_buffer_pool_dump_now=ON
This will enable the next dump on shutdown to complete.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
Before MDEV-12288 in MariaDB 10.3.1, InnoDB used to partition
the persistent transaction undo log into insert_undo and update_undo.
MDEV-12288 repurposes the update_undo as the single undo log.
In order to support an upgrade from earlier MariaDB versions,
the insert_undo is recovered in data structures, called old_insert.
An assertion failure occurred in TrxUndoRsegsIterator::set_next()
when an incomplete transaction was recovered with both insert_undo
and update_undo log. This could be easily demonstrated by starting
./mysql-test-run --manual-gdb innodb.read_only_recovery
in MariaDB 10.2, and after the first kill, start up the MariaDB 10.3
server with the same parameters.
The problem is that MariaDB 10.3 would roll back the recovered
transaction, and finally "commit" it twice (with all changes to
data rolled back), both insert_undo and update_undo with the same
commit end identifier (trx->no).
Our fix is to introduce a "commit number" that comprises two components:
(trx->no << 1 | !old_insert). In this way, the assertion in the purge
subsystem can be relaxed so that only the trx->no component must match.
TrxUndoRsegs::append(): Remove.
TrxUndoRsegsIterator::set_next(): Add a debug assertion that
demonstrates that the merging of rollback segments never occurs.
Since MDEV-12289 or earlier, MariaDB 10.2 will not make any
temporary undo log accessible to the purge subsystem.
(Also MySQL 5.7 would skip the purge of any undo log for
temporary tables, but not before parsing and buffering those
temporary undo log records.)
Also, remove the field undo_rseg_space.
Apparently its purpose was to avoid problems with
temporary undo logs, which MySQL 5.7 unnecessarily adds to
the purge system. (Temporary undo log records are not purged.)
MariaDB 10.2 fixed this in MDEV-12289 or earlier.
purge_iter_t::operator<=(): Ordering comparison.
This replaces trx_purge_check_limit() with the difference that
we are not comparing undo_rseg_space. (In MariaDB, temporary
undo logs do not enter the purge subsystem at all.)
purge_sys_t::done: Remove. This was not used for anything.
purge_sys_t::tail: Renamed from purge_sys_t::iter.
purge_sys_t::head: Renamed from purge_sys_t::limit.
This is based on a prototype by
Thirunarayanan Balathandayuthapani <thiru@mariadb.com>.
Binlog and Galera write-set replication information was written into
TRX_SYS page on each commit. Instead of writing to the TRX_SYS during
normal operation, InnoDB can make use of rollback segment header pages,
which are already being written to during a commit.
The following list of fields in rollback segment header page are added:
TRX_RSEG_BINLOG_OFFSET
TRX_RSEG_BINLOG_NAME (NUL-terminated; empty name = not present)
TRX_RSEG_WSREP_XID_FORMAT (0=not present; 1=present)
TRX_RSEG_WSREP_XID_GTRID
TRX_RSEG_WSREP_XID_BQUAL
TRX_RSEG_WSREP_XID_DATA
trx_sys_t: Introduce the fields
recovered_binlog_filename, recovered_binlog_offset, recovered_wsrep_xid.
To facilitate upgrade from older mysql or mariaDB versions, we will read
the information in TRX_SYS page. It will be overridden by the
information that we find in rollback segment header pages.
Mariabackup --prepare will read the metadata from the rollback
segment header pages via trx_rseg_array_init(). It will still
not read any undo log pages or recover any transactions.
trx_sys_t::rseg_history_len: Make private, and clarify the
documentation.
trx_sys_t::history_size(): Read rseg_history_len.
trx_sys_t::history_insert(), trx_sys_t::history_remove(),
trx_sys_t::history_add(): Update rseg_history_len.
fsp_fill_free_list(): Correctly determine whether the temporary
tablespace file should be extended in order to respond to a
page allocation request. The inverted condition was noticed
by Thiru when he analyzed MDEV-13013.
For some simple benchmarks, a majority of time was
spend in find_head() which tries to find the best
place to put the record.
The result of this patch is a 2x or more speedup for
inserts without keys for format PAGE. All changes
are only related to how rows are stored
Should fix some of the problems mentioned in:
MDEV-8132 Temporary tables using Aria with very poor performance
MDEV-9079 Aria very slow for internal temporary tables
MDEV-5841 Mariadb very poor temporary performance
The following changes where done:
- For rows with a small row length that fits into
a page (818 bytes with 8192 pages), stop as soon as we
hit a match.
- Added markers full_head_size and full_tail_size that tells
us where to start searching on the bitmap page
- Ensure that page->used_size is correctly updated when
bitmap grows. This allows us to stop searching at used_size
- Added code to check that the bitmap variables are correct.
- Fixed a wrong test where we set "first_bitmap_with_space".
This shouldn't have caused any notable problems.
Disable the test encryption.innodb_encryption-page-compression
because the wait_condition would seem to time out deterministically.
MDEV-14814 has to be addressed in 10.2 separately.
Datafile::validate_first_page(): Do not invoke
page_size_t::page_size_t(flags) before validating the tablespace flags.
This avoids a crash in MDEV-15333 innodb.restart test case.
FIXME: Reduce the number of error messages. The first one is enough.
This performance regression was introduced in the MariaDB 10.1
file format incompatibility bug fix MDEV-11623 (MariaDB 10.1.21
and MariaDB 10.2.4) and partially fixed in MariaDB 10.1.25 in
MDEV-12610 without adding a regression test case.
On a normal startup (without crash recovery), InnoDB should not read
every .ibd data file, because this is slow. Like in MySQL, for now,
InnoDB will still open every data file (without reading), and it
will read every .ibd file for which an .isl file exists, or the
DATA DIRECTORY attribute has been specified for the table.
The test case shuts down InnoDB, moves data files, replaces them
with garbage, and then restarts InnoDB, expecting no messages to
be issued for the garbage files. (Some messages will for now be
issued for the table that uses the DATA DIRECTORY attribute.)
Finally, the test shuts down the server, restores the old data files,
and restarts again to drop the tables.
fil_open_single_table_tablespace(): Remove the condition on flags,
and only call fsp_flags_try_adjust() if validate==true
(reading the first page has been requested). The only caller with
validate==false is at server startup when we are processing all
records from SYS_TABLES. The flags passed to this function are
actually derived from SYS_TABLES.TYPE and SYS_TABLES.N_COLS,
and there never was any problem with SYS_TABLES in MariaDB 10.1.
The problem that MDEV-11623 was that incorrect tablespace flags
were computed and written to FSP_SPACE_FLAGS.
Note: Linux only
Core dumps of large buffer pool pages take time and space
and pose potential data expose in scenarios where data-at-rest
encryption is deployed.
Here we use madvise(MADV_DONT_DUMP) on large memory allocations
used by the innodb buffer pool, log_sys and recv_sys. The effect
of this system call is that these memory areas will not appear in
a core dump. Data from these buffers is rarely useful in fault
diagnosis.
log_sys and recv_sys structures now use large memory allocations
for their large buffer.
Debug builds don't include the madvise syscall and as such will
include full core dumps.
A function, buf_madvise_do_dump, is added but never called. It
is there to be called from a debugger to re-enable the core
dumping of all of these pages if for some reason the entire
contents of these buffers are needed.
Idea thanks to Hartmut Holzgraefe