The InnoDB source code contains quite a few references to a closed-source
hot backup tool which was originally called InnoDB Hot Backup (ibbackup)
and later incorporated in MySQL Enterprise Backup.
The open source backup tool XtraBackup uses the full database for recovery.
So, the references to UNIV_HOTBACKUP are only cluttering the source code.
Replace all exit() calls in InnoDB with abort() [possibly via ut_a()].
Calling exit() in a multi-threaded program is problematic also for
the reason that other threads could see corrupted data structures
while some data structures are being cleaned up by atexit() handlers
or similar.
In the long term, all these calls should be replaced with something
that returns an error all the way up the call stack.
MySQL 5.7 supports only one shared temporary tablespace.
MariaDB 10.2 does not support any other shared InnoDB tablespaces than
the two predefined tablespaces: the persistent InnoDB system tablespace
(default file name ibdata1) and the temporary tablespace
(default file name ibtmp1).
InnoDB is unnecessarily allocating a tablespace ID for the predefined
temporary tablespace on every startup, and it is in several places
testing whether a tablespace ID matches this dynamically generated ID.
We should use a compile-time constant to reduce code size and to avoid
unnecessary updates to the DICT_HDR page at every startup.
Using a hard-coded tablespace ID will should make it easier to remove the
TEMPORARY flag from FSP_SPACE_FLAGS in MDEV-11202.
Reduce the number of calls to encryption_get_key_get_latest_version
when doing key rotation with two different methods:
(1) We need to fetch key information when tablespace not yet
have a encryption information, invalid keys are handled now
differently (see below). There was extra call to detect
if key_id is not found on key rotation.
(2) If key_id is not found from encryption plugin, do not
try fetching new key_version for it as it will fail anyway.
We store return value from encryption_get_key_get_latest_version
call and if it returns ENCRYPTION_KEY_VERSION_INVALID there
is no need to call it again.
WL#7682 in MySQL 5.7 introduced the possibility to create light-weight
temporary tables in InnoDB. These are called 'intrinsic temporary tables'
in InnoDB, and in MySQL 5.7, they can be created by the optimizer for
sorting or buffering data in query processing.
In MariaDB 10.2, the optimizer temporary tables cannot be created in
InnoDB, so we should remove the dead code and related data structures.
Rather than innodb_buffer_pool_dump_pct referring to the percentage of hot data
in the buffer pool, it refers to the entire buffer pool size. This means that a
completed load followed by a shutdown will write the exact same data.
The problem was:
With innodb_buffer_pool_dump_pct say 25% (the default since 10.2.2), a server
started will restore 25% of the buffer pool size with the expectation that over
time the rest of the buffer pool will be populated. Then on shutdown 25% will
be saved.
If a server is started and then is shutdown a) without much activity occurring
b) is started as a hot spare and shutdown before being used, then 6.25% (25%
of 25%) of the buffer pool is saved.
This will generate bigger dump files for users who don't have a full
innodb_buffer_pool however a realistic scenario is a buffer pool should be
completely used.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
buf_block_init(): Initialize buf_page_t::flush_type.
For some reason, Valgrind 3.12.0 would seem to flag some
bits in adjacent bitfields as uninitialized, even though only
the two bits of flush_type were left uninitialized. Initialize
the field to get rid of many warnings.
buf_page_init_low(): Initialize buf_page_t::old.
For some reason, Valgrind 3.12.0 would seem to flag all 32
bits uninitialized when buf_page_init_for_read() invokes
buf_LRU_add_block(bpage, TRUE). This would trigger bogus warnings
for buf_page_t::freed_page_clock being uninitialized.
(The V-bits would later claim that only "old" is initialized
in the 32-bit word.) Perhaps recent compilers
(GCC 6.2.1 and clang 4.0.0) generate more optimized x86_64 code
for bitfield operations, confusing Valgrind?
mach_write_to_1(), mach_write_to_2(), mach_write_to_3():
Rewrite the assertions that ensure that the most significant
bits are zero. Apparently, clang 4.0.0 would optimize expressions
of the form ((n | 0xFF) <= 0x100) to (n <= 0x100). The redundant
0xFF was added in the first place in order to suppress a
Valgrind warning. (Valgrind would warn about comparing uninitialized
values even in the case when the uninitialized bits do not affect
the result of the comparison.)
buf_block_init(): Initialize buf_page_t::flush_type.
For some reason, Valgrind 3.12.0 would seem to flag some
bits in adjacent bitfields as uninitialized, even though only
the two bits of flush_type were left uninitialized. Initialize
the field to get rid of many warnings.
buf_page_init_low(): Initialize buf_page_t::old.
For some reason, Valgrind 3.12.0 would seem to flag all 32
bits uninitialized when buf_page_init_for_read() invokes
buf_LRU_add_block(bpage, TRUE). This would trigger bogus warnings
for buf_page_t::freed_page_clock being uninitialized.
(The V-bits would later claim that only "old" is initialized
in the 32-bit word.) Perhaps recent compilers
(GCC 6.2.1 and clang 4.0.0) generate more optimized x86_64 code
for bitfield operations, confusing Valgrind?
mach_write_to_1(), mach_write_to_2(), mach_write_to_3():
Rewrite the assertions that ensure that the most significant
bits are zero. Apparently, clang 4.0.0 would optimize expressions
of the form ((n | 0xFF) <= 0x100) to (n <= 0x100). The redundant
0xFF was added in the first place in order to suppress a
Valgrind warning. (Valgrind would warn about comparing uninitialized
values even in the case when the uninitialized bits do not affect
the result of the comparison.)
Issue:
======
Currently the approach we take to find the chunk corresponding to a given
pointer uses srv_buf_pool_chunk_unit based on the assumption that
srv_buf_pool_chunk_unit is the total size of all pages in a buffer pool
chunk. We first step back by srv_buf_pool_chunk_unit bytes and use
std::map::upper_bound() to find the first chunk in the map whose key >= the
resulting pointer.
However, the real size of a chunk (and thus, the total size of its pages)
may differ from the value configured with innodb_buffer_pool_chunk_size
due to rounding up to the OS page size. So, in some cases the above logic
gives us the wrong chunk.
Fix:
====
We find out the chunk corresponding to the give pointer without using
srv_buf_pool_chunk_unit. This is done by using std::map::upper_bound()
to find the next chunk in the map which appears right after the pointer and
decrementing the iterator, which would give us the chunk the pointer
belongs to.
Contribution by Alexey Kopytov.
RB: 13347
Reviewed-by: Debarun Banerjee <debarun.banerjee@oracle.com>
Analysis: Problem is that page is encrypted but encryption information
on page 0 has already being changed.
Fix: If page header contains key_version != 0 and even if based on
current encryption information tablespace is not encrypted we
need to check is page corrupted. If it is not, then we know that
page is not encrypted. If page is corrupted, we need to try to
decrypt it and then compare the stored and calculated checksums
to see is page corrupted or not.
Two problems:
(1) When pushing warning to sql-layer we need to check that thd != NULL
to avoid NULL-pointer reference.
(2) At tablespace key rotation if used key_id is not found from
encryption plugin tablespace should not be rotated.
MDEV-10394: Innodb system table space corrupted
Analysis: After we have read the page in buf_page_io_complete try to
find if the page is encrypted or corrupted. Encryption was determined
by reading FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION field from FIL-header
as a key_version. However, this field is not always zero even when
encryption is not used. Thus, incorrect key_version could lead situation where
decryption is tried to page that is not encrypted.
Fix: We still read key_version information from FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION
field but also check if tablespace has encryption information before trying
encrypt the page.
No point to issue RELEASE memory barrier in os_thread_create_func(): thread
creation is full memory barrier.
No point to issue os_wmb in rw_lock_set_waiter_flag() and
rw_lock_reset_waiter_flag(): this is deadcode and it is unlikely operational
anyway. If atomic builtins are unavailable - memory barriers are most certainly
unavailable too.
RELEASE memory barrier is definitely abused in buf_pool_withdraw_blocks(): most
probably it was supposed to commit volatile variable update, which is not what
memory barriers actually do. To operate properly it needs corresponding ACQUIRE
barrier without an associated atomic operation anyway.
ACQUIRE memory barrier is definitely abused in log_write_up_to(): most probably
it was supposed to synchronize dirty read of log_sys->write_lsn. To operate
properly it needs corresponding RELEASE barrier without an associated atomic
operation anyway.
Removed a bunch of ACQUIRE memory barriers from InnoDB rwlocks. They're
meaningless without corresponding RELEASE memory barriers.
Valid usage example of memory barriers without an associated atomic operation:
http://en.cppreference.com/w/cpp/atomic/atomic_thread_fence
Replaced InnoDB atomic operations with server atomic operations.
Moved INNODB_RW_LOCKS_USE_ATOMICS - it is always defined (code won't compile
otherwise).
NOTE: InnoDB uses thread identifiers as a target for atomic operations.
Thread identifiers should be considered opaque: any attempt to use a
thread ID other than in pthreads calls is nonportable and can lead to
unspecified results.
Using numa_all_nodes_ptr was excessively optimistic. Due to
constraints in systemd, containers or otherwise mysqld could of been
limited to a smaller set of cpus. Use the numa_get_mems_allowed
library function to see what we can interleave between before doing
so. The alternative is to fail interleaving overall.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
Contains also:
MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE || m_lock_type != 2' failed. (branch bb-10.2-jan)
Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
enable tests that were fixed in MDEV-10549
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables
Contains also
MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7
The failure happened because 5.7 has changed the signature of
the bool handler::primary_key_is_clustered() const
virtual function ("const" was added). InnoDB was using the old
signature which caused the function not to be used.
MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7
Fixed mutexing problem on lock_trx_handle_wait. Note that
rpl_parallel and rpl_optimistic_parallel tests still
fail.
MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
Reason: incorrect merge
MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
Reason: incorrect merge
Analysis: When pages in doublewrite buffer are analyzed compressed
pages do not have correct checksum.
Fix: Decompress page before checksum is compared. If decompression
fails we still check checksum and corrupted pages are found.
If decompression succeeds, page now contains the original
checksum.
There was two problems. Firstly, if page in ibuf is encrypted but
decrypt failed we should not allow InnoDB to start because
this means that system tablespace is encrypted and not usable.
Secondly, if page decrypt is detected we should return false
from buf_page_decrypt_after_read.
Backport pull request #125 from grooverdan/MDEV-8923_innodb_buffer_pool_dump_pct to 10.0
WL#6504 InnoDB buffer pool dump/load enchantments
This patch consists of two parts:
1. Dump only the hottest N% of the buffer pool(s)
2. Prevent hogging the server duing BP load
From MySQL - commit b409342c43ce2edb68807100a77001367c7e6b8e
Add testcases for innodb_buffer_pool_dump_pct_basic.
Part of the code authored by Daniel Black
WL#6504 InnoDB buffer pool dump/load enchantments
This patch consists of two parts:
1. Dump only the hottest N% of the buffer pool(s)
2. Prevent hogging the server duing BP load
From MySQL - commit b409342c43ce2edb68807100a77001367c7e6b8e