MySQL 5.7 supports only one shared temporary tablespace.
MariaDB 10.2 does not support any other shared InnoDB tablespaces than
the two predefined tablespaces: the persistent InnoDB system tablespace
(default file name ibdata1) and the temporary tablespace
(default file name ibtmp1).
InnoDB is unnecessarily allocating a tablespace ID for the predefined
temporary tablespace on every startup, and it is in several places
testing whether a tablespace ID matches this dynamically generated ID.
We should use a compile-time constant to reduce code size and to avoid
unnecessary updates to the DICT_HDR page at every startup.
Using a hard-coded tablespace ID will should make it easier to remove the
TEMPORARY flag from FSP_SPACE_FLAGS in MDEV-11202.
WL#7682 in MySQL 5.7 introduced the possibility to create light-weight
temporary tables in InnoDB. These are called 'intrinsic temporary tables'
in InnoDB, and in MySQL 5.7, they can be created by the optimizer for
sorting or buffering data in query processing.
In MariaDB 10.2, the optimizer temporary tables cannot be created in
InnoDB, so we should remove the dead code and related data structures.
buf_block_init(): Initialize buf_page_t::flush_type.
For some reason, Valgrind 3.12.0 would seem to flag some
bits in adjacent bitfields as uninitialized, even though only
the two bits of flush_type were left uninitialized. Initialize
the field to get rid of many warnings.
buf_page_init_low(): Initialize buf_page_t::old.
For some reason, Valgrind 3.12.0 would seem to flag all 32
bits uninitialized when buf_page_init_for_read() invokes
buf_LRU_add_block(bpage, TRUE). This would trigger bogus warnings
for buf_page_t::freed_page_clock being uninitialized.
(The V-bits would later claim that only "old" is initialized
in the 32-bit word.) Perhaps recent compilers
(GCC 6.2.1 and clang 4.0.0) generate more optimized x86_64 code
for bitfield operations, confusing Valgrind?
mach_write_to_1(), mach_write_to_2(), mach_write_to_3():
Rewrite the assertions that ensure that the most significant
bits are zero. Apparently, clang 4.0.0 would optimize expressions
of the form ((n | 0xFF) <= 0x100) to (n <= 0x100). The redundant
0xFF was added in the first place in order to suppress a
Valgrind warning. (Valgrind would warn about comparing uninitialized
values even in the case when the uninitialized bits do not affect
the result of the comparison.)
Issue:
======
Currently the approach we take to find the chunk corresponding to a given
pointer uses srv_buf_pool_chunk_unit based on the assumption that
srv_buf_pool_chunk_unit is the total size of all pages in a buffer pool
chunk. We first step back by srv_buf_pool_chunk_unit bytes and use
std::map::upper_bound() to find the first chunk in the map whose key >= the
resulting pointer.
However, the real size of a chunk (and thus, the total size of its pages)
may differ from the value configured with innodb_buffer_pool_chunk_size
due to rounding up to the OS page size. So, in some cases the above logic
gives us the wrong chunk.
Fix:
====
We find out the chunk corresponding to the give pointer without using
srv_buf_pool_chunk_unit. This is done by using std::map::upper_bound()
to find the next chunk in the map which appears right after the pointer and
decrementing the iterator, which would give us the chunk the pointer
belongs to.
Contribution by Alexey Kopytov.
RB: 13347
Reviewed-by: Debarun Banerjee <debarun.banerjee@oracle.com>
No point to issue RELEASE memory barrier in os_thread_create_func(): thread
creation is full memory barrier.
No point to issue os_wmb in rw_lock_set_waiter_flag() and
rw_lock_reset_waiter_flag(): this is deadcode and it is unlikely operational
anyway. If atomic builtins are unavailable - memory barriers are most certainly
unavailable too.
RELEASE memory barrier is definitely abused in buf_pool_withdraw_blocks(): most
probably it was supposed to commit volatile variable update, which is not what
memory barriers actually do. To operate properly it needs corresponding ACQUIRE
barrier without an associated atomic operation anyway.
ACQUIRE memory barrier is definitely abused in log_write_up_to(): most probably
it was supposed to synchronize dirty read of log_sys->write_lsn. To operate
properly it needs corresponding RELEASE barrier without an associated atomic
operation anyway.
Removed a bunch of ACQUIRE memory barriers from InnoDB rwlocks. They're
meaningless without corresponding RELEASE memory barriers.
Valid usage example of memory barriers without an associated atomic operation:
http://en.cppreference.com/w/cpp/atomic/atomic_thread_fence
Replaced InnoDB atomic operations with server atomic operations.
Moved INNODB_RW_LOCKS_USE_ATOMICS - it is always defined (code won't compile
otherwise).
NOTE: InnoDB uses thread identifiers as a target for atomic operations.
Thread identifiers should be considered opaque: any attempt to use a
thread ID other than in pthreads calls is nonportable and can lead to
unspecified results.
Using numa_all_nodes_ptr was excessively optimistic. Due to
constraints in systemd, containers or otherwise mysqld could of been
limited to a smaller set of cpus. Use the numa_get_mems_allowed
library function to see what we can interleave between before doing
so. The alternative is to fail interleaving overall.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
Contains also:
MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE || m_lock_type != 2' failed. (branch bb-10.2-jan)
Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
enable tests that were fixed in MDEV-10549
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables
Contains also
MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7
The failure happened because 5.7 has changed the signature of
the bool handler::primary_key_is_clustered() const
virtual function ("const" was added). InnoDB was using the old
signature which caused the function not to be used.
MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7
Fixed mutexing problem on lock_trx_handle_wait. Note that
rpl_parallel and rpl_optimistic_parallel tests still
fail.
MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
Reason: incorrect merge
MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
Reason: incorrect merge
There was two problems. Firstly, if page in ibuf is encrypted but
decrypt failed we should not allow InnoDB to start because
this means that system tablespace is encrypted and not usable.
Secondly, if page decrypt is detected we should return false
from buf_page_decrypt_after_read.
Analysis: When a page is read from encrypted table and page can't be
decrypted because of bad key (or incorrect encryption algorithm or
method) page was incorrectly left on buffer pool.
Fix: Remove page from buffer pool and from pending IO.
Analysis: Problem was that in fil_read_first_page we do find that
table has encryption information and that encryption service
or used key_id is not available. But, then we just printed
fatal error message that causes above assertion.
Fix: When we open single table tablespace if it has encryption
information (crypt_data) store this crypt data to the table
structure. When we open a table and we find out that tablespace
is not available, check has table a encryption information
and from there is encryption service or used key_id is not available.
If it is, add additional warning for SQL-layer.
MDEV-8409: Changing file-key-management-encryption-algorithm causes crash and no real info why
Analysis: Both bugs has two different error cases. Firstly, at startup
when server reads latest checkpoint but requested key_version,
key management plugin or encryption algorithm or method is not found
leading corrupted log entry. Secondly, similarly when reading system
tablespace if requested key_version, key management plugin or encryption
algorithm or method is not found leading buffer pool page corruption.
Fix: Firsly, when reading checkpoint at startup check if the log record
may be encrypted and if we find that it could be encrypted, print error
message and do not start server. Secondly, if page is buffer pool seems
corrupted but we find out that there is crypt_info, print additional
error message before asserting.
Analysis: Problem was that actual payload size (page size) after compression
was handled incorrectly on encryption. Additionally, some of the variables
were not initialized.
Fixed by encrypting/decrypting only the actual compressed page size.
Analysis: Problem is that there is not enough temporary buffer slots
for pending IO requests.
Fixed by allocating same amount of temporary buffer slots as there
are max pending IO requests.
Analysis: Problem is that both encrypted tables and compressed tables use
FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION to store
required metadata. Furhermore, for only compressed tables currently
code skips compression.
Fixes:
- Only encrypted pages store key_version to FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION,
no need to fix
- Only compressed pages store compression algorithm to FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION,
no need to fix as they have different page type FIL_PAGE_PAGE_COMPRESSED
- Compressed and encrypted pages now use a new page type FIL_PAGE_PAGE_COMPRESSED_ENCRYPTED and
key_version is stored on FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION and compression
method is stored after FIL header similar way as compressed size, so that first
FIL_PAGE_COMPRESSED_SIZE is stored followed by FIL_PAGE_COMPRESSION_METHOD
- Fix buf_page_encrypt_before_write function to really compress pages if compression is enabled
- Fix buf_page_decrypt_after_read function to really decompress pages if compression is used
- Small style fixes
Make sure that when we publish the crypt_data we access the
memory cache of the tablespace crypt_data. Make sure that
crypt_data is stored whenever it is really needed.
All this is not yet enough in my opinion because:
sql/encryption.cc has DBUG_ASSERT(scheme->type == 1) i.e.
crypt_data->type == CRYPT_SCHEME_1
However, for InnoDB point of view we have global crypt_data
for every tablespace. When we change variables on crypt_data
we take mutex. However, when we use crypt_data for
encryption/decryption we use pointer to this global
structure and no mutex to protect against changes on
crypt_data.
Tablespace encryption starts in fil_crypt_start_encrypting_space
from crypt_data that has crypt_data->type = CRYPT_SCHEME_UNENCRYPTED
and later we write page 0 CRYPT_SCHEME_1 and finally whe publish
that to memory cache.
Analysis: Problem was that tablespaces not encrypted might not have
crypt_data stored on disk.
Fixed by always creating crypt_data to memory cache of the tablespace.
MDEV-8138: strange results from encrypt-and-grep test
Analysis: crypt_data->type is not updated correctly on memory
cache. This caused problem with state tranfer on
encrypted => unencrypted => encrypted.
Fixed by updating memory cache of crypt_data->type correctly based on
current srv_encrypt_tables value to either CRYPT_SCHEME_1 or
CRYPT_SCHEME_UNENCRYPTED.
Analysis: Problem was that we did create crypt data for encrypted table but
this new crypt data was not written to page 0. Instead a default crypt data
was written to page 0 at table creation.
Fixed by explicitly writing new crypt data to page 0 after successfull
table creation.
With changes:
* update tests to pass (new encryption/encryption_key_id syntax).
* not merged the code that makes engine aware of the encryption mode
(CRYPT_SCHEME_1_CBC, CRYPT_SCHEME_1_CTR, storing it on disk, etc),
because now the encryption plugin is handling it.
* compression+encryption did not work in either branch before the
merge - and it does not work after the merge. it might be more
broken after the merge though - some of that code was not merged.
* page checksumming code was not moved (moving of page checksumming
from fil_space_encrypt() to fil_space_decrypt was not merged).
* restored deleted lines in buf_page_get_frame(), otherwise
innodb_scrub test failed.