Issue:
======
Currently the approach we take to find the chunk corresponding to a given
pointer uses srv_buf_pool_chunk_unit based on the assumption that
srv_buf_pool_chunk_unit is the total size of all pages in a buffer pool
chunk. We first step back by srv_buf_pool_chunk_unit bytes and use
std::map::upper_bound() to find the first chunk in the map whose key >= the
resulting pointer.
However, the real size of a chunk (and thus, the total size of its pages)
may differ from the value configured with innodb_buffer_pool_chunk_size
due to rounding up to the OS page size. So, in some cases the above logic
gives us the wrong chunk.
Fix:
====
We find out the chunk corresponding to the give pointer without using
srv_buf_pool_chunk_unit. This is done by using std::map::upper_bound()
to find the next chunk in the map which appears right after the pointer and
decrementing the iterator, which would give us the chunk the pointer
belongs to.
Contribution by Alexey Kopytov.
RB: 13347
Reviewed-by: Debarun Banerjee <debarun.banerjee@oracle.com>
No point to issue RELEASE memory barrier in os_thread_create_func(): thread
creation is full memory barrier.
No point to issue os_wmb in rw_lock_set_waiter_flag() and
rw_lock_reset_waiter_flag(): this is deadcode and it is unlikely operational
anyway. If atomic builtins are unavailable - memory barriers are most certainly
unavailable too.
RELEASE memory barrier is definitely abused in buf_pool_withdraw_blocks(): most
probably it was supposed to commit volatile variable update, which is not what
memory barriers actually do. To operate properly it needs corresponding ACQUIRE
barrier without an associated atomic operation anyway.
ACQUIRE memory barrier is definitely abused in log_write_up_to(): most probably
it was supposed to synchronize dirty read of log_sys->write_lsn. To operate
properly it needs corresponding RELEASE barrier without an associated atomic
operation anyway.
Removed a bunch of ACQUIRE memory barriers from InnoDB rwlocks. They're
meaningless without corresponding RELEASE memory barriers.
Valid usage example of memory barriers without an associated atomic operation:
http://en.cppreference.com/w/cpp/atomic/atomic_thread_fence
Replaced InnoDB atomic operations with server atomic operations.
Moved INNODB_RW_LOCKS_USE_ATOMICS - it is always defined (code won't compile
otherwise).
NOTE: InnoDB uses thread identifiers as a target for atomic operations.
Thread identifiers should be considered opaque: any attempt to use a
thread ID other than in pthreads calls is nonportable and can lead to
unspecified results.
Using numa_all_nodes_ptr was excessively optimistic. Due to
constraints in systemd, containers or otherwise mysqld could of been
limited to a smaller set of cpus. Use the numa_get_mems_allowed
library function to see what we can interleave between before doing
so. The alternative is to fail interleaving overall.
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
Contains also:
MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE || m_lock_type != 2' failed. (branch bb-10.2-jan)
Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
enable tests that were fixed in MDEV-10549
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables
Contains also
MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7
The failure happened because 5.7 has changed the signature of
the bool handler::primary_key_is_clustered() const
virtual function ("const" was added). InnoDB was using the old
signature which caused the function not to be used.
MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7
Fixed mutexing problem on lock_trx_handle_wait. Note that
rpl_parallel and rpl_optimistic_parallel tests still
fail.
MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
Reason: incorrect merge
MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
Reason: incorrect merge
There was two problems. Firstly, if page in ibuf is encrypted but
decrypt failed we should not allow InnoDB to start because
this means that system tablespace is encrypted and not usable.
Secondly, if page decrypt is detected we should return false
from buf_page_decrypt_after_read.
Analysis: When a page is read from encrypted table and page can't be
decrypted because of bad key (or incorrect encryption algorithm or
method) page was incorrectly left on buffer pool.
Fix: Remove page from buffer pool and from pending IO.
Analysis: Problem was that in fil_read_first_page we do find that
table has encryption information and that encryption service
or used key_id is not available. But, then we just printed
fatal error message that causes above assertion.
Fix: When we open single table tablespace if it has encryption
information (crypt_data) store this crypt data to the table
structure. When we open a table and we find out that tablespace
is not available, check has table a encryption information
and from there is encryption service or used key_id is not available.
If it is, add additional warning for SQL-layer.
MDEV-8409: Changing file-key-management-encryption-algorithm causes crash and no real info why
Analysis: Both bugs has two different error cases. Firstly, at startup
when server reads latest checkpoint but requested key_version,
key management plugin or encryption algorithm or method is not found
leading corrupted log entry. Secondly, similarly when reading system
tablespace if requested key_version, key management plugin or encryption
algorithm or method is not found leading buffer pool page corruption.
Fix: Firsly, when reading checkpoint at startup check if the log record
may be encrypted and if we find that it could be encrypted, print error
message and do not start server. Secondly, if page is buffer pool seems
corrupted but we find out that there is crypt_info, print additional
error message before asserting.
Analysis: Problem was that actual payload size (page size) after compression
was handled incorrectly on encryption. Additionally, some of the variables
were not initialized.
Fixed by encrypting/decrypting only the actual compressed page size.
Analysis: Problem is that there is not enough temporary buffer slots
for pending IO requests.
Fixed by allocating same amount of temporary buffer slots as there
are max pending IO requests.
Analysis: Problem is that both encrypted tables and compressed tables use
FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION to store
required metadata. Furhermore, for only compressed tables currently
code skips compression.
Fixes:
- Only encrypted pages store key_version to FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION,
no need to fix
- Only compressed pages store compression algorithm to FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION,
no need to fix as they have different page type FIL_PAGE_PAGE_COMPRESSED
- Compressed and encrypted pages now use a new page type FIL_PAGE_PAGE_COMPRESSED_ENCRYPTED and
key_version is stored on FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION and compression
method is stored after FIL header similar way as compressed size, so that first
FIL_PAGE_COMPRESSED_SIZE is stored followed by FIL_PAGE_COMPRESSION_METHOD
- Fix buf_page_encrypt_before_write function to really compress pages if compression is enabled
- Fix buf_page_decrypt_after_read function to really decompress pages if compression is used
- Small style fixes
Make sure that when we publish the crypt_data we access the
memory cache of the tablespace crypt_data. Make sure that
crypt_data is stored whenever it is really needed.
All this is not yet enough in my opinion because:
sql/encryption.cc has DBUG_ASSERT(scheme->type == 1) i.e.
crypt_data->type == CRYPT_SCHEME_1
However, for InnoDB point of view we have global crypt_data
for every tablespace. When we change variables on crypt_data
we take mutex. However, when we use crypt_data for
encryption/decryption we use pointer to this global
structure and no mutex to protect against changes on
crypt_data.
Tablespace encryption starts in fil_crypt_start_encrypting_space
from crypt_data that has crypt_data->type = CRYPT_SCHEME_UNENCRYPTED
and later we write page 0 CRYPT_SCHEME_1 and finally whe publish
that to memory cache.
Analysis: Problem was that tablespaces not encrypted might not have
crypt_data stored on disk.
Fixed by always creating crypt_data to memory cache of the tablespace.
MDEV-8138: strange results from encrypt-and-grep test
Analysis: crypt_data->type is not updated correctly on memory
cache. This caused problem with state tranfer on
encrypted => unencrypted => encrypted.
Fixed by updating memory cache of crypt_data->type correctly based on
current srv_encrypt_tables value to either CRYPT_SCHEME_1 or
CRYPT_SCHEME_UNENCRYPTED.
Analysis: Problem was that we did create crypt data for encrypted table but
this new crypt data was not written to page 0. Instead a default crypt data
was written to page 0 at table creation.
Fixed by explicitly writing new crypt data to page 0 after successfull
table creation.
With changes:
* update tests to pass (new encryption/encryption_key_id syntax).
* not merged the code that makes engine aware of the encryption mode
(CRYPT_SCHEME_1_CBC, CRYPT_SCHEME_1_CTR, storing it on disk, etc),
because now the encryption plugin is handling it.
* compression+encryption did not work in either branch before the
merge - and it does not work after the merge. it might be more
broken after the merge though - some of that code was not merged.
* page checksumming code was not moved (moving of page checksumming
from fil_space_encrypt() to fil_space_decrypt was not merged).
* restored deleted lines in buf_page_get_frame(), otherwise
innodb_scrub test failed.
Step 3:
-- Make encrytion_algorithm changeable by SUPER
-- Remove AES_ECB method from encryption_algorithms
-- Support AES method change by storing used method on InnoDB/XtraDB objects
-- Store used AES method to crypt_data as different crypt types
-- Store used AES method to redo/undo logs and checkpoint
-- Store used AES method on every encrypted page after key_version
-- Add test