Cherry-pick this fix from the upstream:
commit 6ddedd8f1e0ddcbc24e8f9a005636c5463799ab7
Author: Sergei Petrunia <psergey@askmonty.org>
Date: Tue Apr 10 11:43:01 2018 -0700
[mysql-5.6][PR] Issue #802: MyRocks: Statement rollback doesnt work correctly for nesâ¦
Summary:
â¦ted statements
Variant #1: When the statement fails, we should roll back to the latest
savepoint taken at the top level.
Closes https://github.com/facebook/mysql-5.6/pull/804
Differential Revision: D7509380
Pulled By: hermanlee
fbshipit-source-id: 9a6f414
The error occurs because of how the character set and collation are chosen for
stored procedure parameters that have a character data type. If the character
set and collation are not explicitly stated in the declaration, the server
chooses the database character set and collation in effect at routine creation
time.
To fix the problem, I added explicit character set and collation attributes
for the stored procedure parameters in the install_spider.sql script.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit ff0bf451db on bb-10.3-MDEV-15692
Problem was that if tablespace was encrypted we try to copy
also page 0 from read buffer to write buffer that are in
that case the same memory area.
fil_iterate
When tablespace is encrypted or compressed its
first page (i.e. page 0) is not encrypted or
compressed and there is no need to copy buffer.
If innodb_fast_shutdown<2, all transactions of active connections
will be rolled back on shutdown. This can take a long time, and
the systemd shutdown timeout should be extended during the wait.
logs_empty_and_mark_files_at_shutdown(): Extend the timeout when
waiting for other threads to complete.
This reverts commit 76ec37f522.
This behaviour change will be done separately in:
MDEV-15832 With innodb_fast_shutdown=3, skip the rollback
of connected transactions
In async IO completion code, after reading a page,Innodb can wait for
completion of other bufferpool reads.
This is for example what happens if change-buffering is active.
Innodb on Windows could deadlock, as it did not have dedicated threads
for processing change buffer asynchronous reads.
The fix for that is to have windows now has the same background threads,
including dedicated thread for ibuf, and log AIOs.
The ibuf/read completions are now dispatched to their threads with
PostQueuedCompletionStatus(), the write and log completions are processed
in thread where they arrive.
fil_crypt_read_crypt_data(): Do not attempt to read the tablespace
if the file is unaccessible due to a pending DDL operation, such as
renaming the file or DROP TABLE or TRUNCATE TABLE. This is only
reducing the probability of the race condition, not completely
preventing it.
When "FLUSH TABLE ... FOR EXPORT" fails, the SQL layer should rollback
the statement. Otherwise we hit an assert when we try to close the
tables while having a non-empty list of statement transaction participants.
Problem was that key rotation from encrypted to unecrypted was skipped
when encryption is disabled (i.e. set global innodb-encrypt-tables=OFF).
fil_crypt_needs_rotation
If encryption is disabled (i.e. innodb-encrypt-tables=off)
and there is tablespaces using default encryption (e.g.
system tablespace) that are still encrypted state we need
to rotate them from encrypted state to unencrypted state.
Boost includes sys/param.h on FreeBSD, which in turn defines setbit()
macro. This macro is conflicting with open_query::judy_bitset::setbit().
Reordered includes such that oqgraph_judy.h never sees this macro.
Also removed duplicate includes of graphcore-config.h, which is included
by graphcore-graph.h/oqgraph_shim.h/oqgraph_thunk.h.
buf_flush_remove(): Disable the output for now, because we
certainly do not want this after every page flush on shutdown.
It must be rate-limited somehow. There already is a timeout
extension for waiting the page cleaner to exit in
logs_empty_and_mark_files_at_shutdown().
log_write_up_to(): Use correct format.
srv_purge_should_exit(): Move the timeout extension to the
appropriate place, from one of the callers.
Use systemd EXTEND_TIMEOUT_USEC to advise systemd of progress
Move towards progress measures rather than pure time based measures.
Progress reporting at numberious shutdown/startup locations incuding:
* For innodb_fast_shutdown=0 trx_roll_must_shutdown() for rolling back incomplete transactions.
* For merging the change buffer (in srv_shutdown(bool ibuf_merge))
* For purging history, srv_do_purge
Thanks Marko for feedback and suggestions.
Suggested by Marko on github pr #576
buf_all_freed only needs to be called once, not 3 times.
buf_all_freed will always return TRUE if it returns.
It will crash if any page was not flushed so its effectively
an assert anyway.
The following calls are likely redundant and could be removed:
fil_flush_file_spaces(FIL_TYPE_TABLESPACE);
fil_flush_file_spaces(FIL_TYPE_LOG);
row_undo_step(): If fast shutdown has been requested, abort the
rollback of any non-DDL transactions. Starting with MDEV-12323,
we aborted the rollback of recovered transactions. These
transactions would be rolled back on subsequent server startup.
trx_roll_report_progress(): Renamed from trx_roll_must_shutdown(),
now that the shutdown check has been moved to the only caller.
- compile_flags already include from top CMakeLists.txt
- MY_CHECK_CXX_COMPILER_FLAG() accepts only one parameter
- output variable of MY_CHECK_CXX_COMPILER_FLAG() is have_CXX__Wa__nH
- same check for mariabackup
Based on contribution by satanson (PR#466).
dict_foreign_qualify_index(): Avoid a redundant and harmful
computation of col_name of a virtual column. This fixes the
assertion failure.
dict_foreign_push_index_error(): Do not call dict_table_get_col_name()
on a virtual column. (It is unclear if this condition is actually
reachable.)
It does not hurt to delete non-existing records from SYS_TABLESPACES
and SYS_DATAFILES. Because MariaDB does not support CREATE TABLESPACE,
only the system tablespace (space_id=0) can contain multiple tables.
But, there are no entries for the system tablespace in these tables
(which actually are stored inside the system tablespace).
MariaDB differs from the upstream for "DDL-like" command. For these,
it sets binlog_format=STATEMENT for the duration of the statement.
This doesn't play well with MyRocks, which tries to prevent DML
commands with binlog_format!=ROW.
Also, if Locked_tables_list::reopen_tables() returned an error, then
close_cached_tables should propagate the error condition and not silently
consume it (it's difficult to have test coverage for this because this
error condition is rare)
The purpose of the InnoDB buffer pool dump is to allow InnoDB to be
restarted with the same persistent data pages in the buffer pool.
The InnoDB temporary tablespace that was introduced in MariaDB 10.2.2
is always reinitialized on restart. Therefore, it does not make sense
to attempt to dump or restore any pages of the temporary tablespace.
ha_innobase::check_if_supported_inplace_alter(): Only check for
high_level_read_only. Do not unnecessarily refuse
ALTER TABLE...ALGORITHM=INPLACE if innodb_force_recovery was
specified as 1, 2, or 3.
innobase_start_or_create_for_mysql(): Block all writes from SQL
if the system tablespace was initialized with 'newraw'.
- Disallow loading of MyRocks (or any auxilary) plugins after it has been
unloaded.
- Do it carefully - Plugin's system variables may be accesssed (e.g. default
value is set) after the first rocksdb_done_func() call but before
the secon rocksdb_init_func() call.
Problem:
=======
During validation of missing tablespace, missing tablespace id is
being compared with hash table of redo logs (recv_sys->addr_hash). But if the
hash table ran out of memory then there is a possibility that it will not contain
the redo logs of all tablespace. In that case, Server will load the InnoDB
even though there is a missing tablespace.
Solution:
========
If the recv_sys->addr_hash hash table ran out of memory then InnoDB needs
to scan the remaining redo log again to validate the missing tablespace.
When the plugin is unloaded, walk the s_trx_list and delete the left over
Rdb_transaction objects.
It is responsibility of the SQL layer to make sure that the storage engine
has no open tables when the plugin is being unloaded.
MDEV--15609 engines/funcs.crash_manytables_number crashes with error 24
(too many open files)
MDEV-10286 Adjustment of table_open_cache according to system limits
does not work when open-files-limit option is provided
Fixed by adjusting tc_size downwards if there is not enough file
descriptors to use.
Other changes:
- Ensure that there is 30 (was 10) extra file descriptors for other usage
- Decrease TABLE_OPEN_CACHE_MIN to 200 as it's better to have a smaller
table cache than getting error 24
- Increase minimum of max_connections and table_open_cache from 1 to 10
as 1 is not usable for any real application, only for testing.
The following INFORMATION_SCHEMA views were unnecessarily retrieving
the data from the SYS_TABLESPACES table instead of directly fetching
it from the fil_system cache:
information_schema.innodb_tablespaces_encryption
information_schema.innodb_tablespaces_scrubbing
InnoDB always loads all tablespace metadata into memory at startup
and never evicts it while the tablespace exists.
With this fix, accessing these views will be much faster and use less
memory, and include data about all tablespaces, including undo
tablespaces.
The view information_schema.innodb_sys_tablespaces will still reflect
the contents of the SYS_TABLESPACES table.
fil_iterate(), fil_tablespace_iterate(): Replace os_file_read()
with os_file_read_no_error_handling().
os_file_read_func(), os_file_read_no_error_handling_func():
Do not retry partial reads. There used to be an infinite amount
of retries. Because InnoDB extends both data and log files upfront,
partial reads should be impossible during normal operation.
Initialize block.page.zip only once.
PageConverter::update(): Initialize m_page_zip_ptr
as late as possible.
(We should really remove it at some point.)
PageConverter::operator(): Refer to block->page.zip instead of
m_page_zip_ptr.
AbstractCallback::get_frame(): Define static. Refer
to block->page.zip.data directly.
fil_iterate(): Refer to block->page.zip.data directly.
fil_tablespace_iterate(): Initialize block.page.zip.data as soon
as possible.