If the tablespace is dropped or truncated after the
space->is_stopping() check in fil_crypt_get_page_throttle_func(),
we would proceed to request the page, and eventually report a fatal
error.
buf_page_get_gen(): Do not retry reading if mode==BUF_GET_POSSIBLY_FREED.
lock_rec_block_validate(): Be prepared for a NULL return value when
invoking buf_page_get_gen() with mode=BUF_GET_POSSIBLY_FREED.
Issue:
------
Prefix for externally stored columns were being stored in online_log when a
table is altered and alter causes table to be rebuilt. Space in online_log is
limited and if length of prefix of externally stored columns is very big, then
it is being written to online log without making sure if it fits. This leads to
memory corruption.
Fix:
----
After fix for Bug#16544143, there is no need to store prefixes of externally
stored columnd in online_log. Thus remove the code which stores column prefixes
for externally stored columns. Also, before writing anything on online_log,
make sure it fits to available memory to avoid memory corruption.
Read RB page for more details.
Reviewed-by: Annamalai Gurusami <annamalai.gurusami@oracle.com>
RB: 18239
In commit e7f4e61f6e
the call fil_flush_file_spaces(FIL_LOG) is necessary.
Tablespaces will be flushed as part of the redo log
checkpoint, but the redo log will not necessarily
be flushed, depending on innodb_flush_method.
These test can sporadically show mutex deadlock warnings between LOCK_wsrep_thd
and LOCK_thd_data mutexes. This means that these mutexes can be locked in opposite
order by different threads, and thus result in deadlock situation.
To fix such issue, the locking policy of these mutexes should be revised and
enforced to be uniform. However, a quick code review shows that the number of
lock/unlock operations for these mutexes combined is between 100-200, and all these
mutex invocations should be checked/fixed.
On the other hand, it turns out that LOCK_wsrep_thd is used for protecting access to
wsrep variables of THD (wsrep_conflict_state, wsrep_query_state), whereas LOCK_thd_data
protects query, db and mysys_var variables in THD. Extending LOCK_thd_data to protect
also wsrep variables looks like a viable solution, as there should not be a use case
where separate threads need simultaneous access to wsrep variables and THD data variables.
In this commit LOCK_wsrep_thd mutex is refactored to be replaced by LOCK_thd_data.
By bluntly replacing LOCK_wsrep_thd by LOCK_thd_data, will result in double locking
of LOCK_thd_data, and some adjustements have been performed to fix such situations.
dict_load_table_low(): When flagging an error, assign *table = NULL.
Failure to do so could cause a crash if an error was flagged when
accessing INFORMATION_SCHEMA.INNODB_SYS_TABLES.
While the test case crashes a MariaDB 10.2 debug build only,
let us apply the fix to the earliest applicable MariaDB series (10.0)
to avoid any data corruption on a table-rebuilding ALTER TABLE
using ALGORITHM=INPLACE.
innobase_create_key_defs(): Use altered_table->s->primary_key
when a new primary key is being created.
disable online alter add primary key for innodb, if the
table is opened/locked more than once in the current connection
(see assert in ha_innobase::add_index())
fil_crypt_rotate_pages
If tablespace is marked as stopping stop also page rotation
fil_crypt_flush_space
If tablespace is marked as stopping do not try to read
page 0 and write it back.
Problem was that if tablespace was encrypted we try to copy
also page 0 from read buffer to write buffer that are in
that case the same memory area.
fil_iterate
When tablespace is encrypted or compressed its
first page (i.e. page 0) is not encrypted or
compressed and there is no need to copy buffer.
If innodb_fast_shutdown<2, all transactions of active connections
will be rolled back on shutdown. This can take a long time, and
the systemd shutdown timeout should be extended during the wait.
logs_empty_and_mark_files_at_shutdown(): Extend the timeout when
waiting for other threads to complete.
This reverts commit 76ec37f522.
This behaviour change will be done separately in:
MDEV-15832 With innodb_fast_shutdown=3, skip the rollback
of connected transactions
fil_crypt_read_crypt_data(): Do not attempt to read the tablespace
if the file is unaccessible due to a pending DDL operation, such as
renaming the file or DROP TABLE or TRUNCATE TABLE. This is only
reducing the probability of the race condition, not completely
preventing it.
Problem was that key rotation from encrypted to unecrypted was skipped
when encryption is disabled (i.e. set global innodb-encrypt-tables=OFF).
fil_crypt_needs_rotation
If encryption is disabled (i.e. innodb-encrypt-tables=off)
and there is tablespaces using default encryption (e.g.
system tablespace) that are still encrypted state we need
to rotate them from encrypted state to unencrypted state.
Boost includes sys/param.h on FreeBSD, which in turn defines setbit()
macro. This macro is conflicting with open_query::judy_bitset::setbit().
Reordered includes such that oqgraph_judy.h never sees this macro.
Also removed duplicate includes of graphcore-config.h, which is included
by graphcore-graph.h/oqgraph_shim.h/oqgraph_thunk.h.
buf_flush_remove(): Disable the output for now, because we
certainly do not want this after every page flush on shutdown.
It must be rate-limited somehow. There already is a timeout
extension for waiting the page cleaner to exit in
logs_empty_and_mark_files_at_shutdown().
log_write_up_to(): Use correct format.
srv_purge_should_exit(): Move the timeout extension to the
appropriate place, from one of the callers.
Use systemd EXTEND_TIMEOUT_USEC to advise systemd of progress
Move towards progress measures rather than pure time based measures.
Progress reporting at numberious shutdown/startup locations incuding:
* For innodb_fast_shutdown=0 trx_roll_must_shutdown() for rolling back incomplete transactions.
* For merging the change buffer (in srv_shutdown(bool ibuf_merge))
* For purging history, srv_do_purge
Thanks Marko for feedback and suggestions.
Suggested by Marko on github pr #576
buf_all_freed only needs to be called once, not 3 times.
buf_all_freed will always return TRUE if it returns.
It will crash if any page was not flushed so its effectively
an assert anyway.
The following calls are likely redundant and could be removed:
fil_flush_file_spaces(FIL_TYPE_TABLESPACE);
fil_flush_file_spaces(FIL_TYPE_LOG);
row_undo_step(): If fast shutdown has been requested, abort the
rollback of any non-DDL transactions. Starting with MDEV-12323,
we aborted the rollback of recovered transactions. These
transactions would be rolled back on subsequent server startup.
trx_roll_report_progress(): Renamed from trx_roll_must_shutdown(),
now that the shutdown check has been moved to the only caller.
MDEV--15609 engines/funcs.crash_manytables_number crashes with error 24
(too many open files)
MDEV-10286 Adjustment of table_open_cache according to system limits
does not work when open-files-limit option is provided
Fixed by adjusting tc_size downwards if there is not enough file
descriptors to use.
Other changes:
- Ensure that there is 30 (was 10) extra file descriptors for other usage
- Decrease TABLE_OPEN_CACHE_MIN to 200 as it's better to have a smaller
table cache than getting error 24
- Increase minimum of max_connections and table_open_cache from 1 to 10
as 1 is not usable for any real application, only for testing.
fil_iterate(), fil_tablespace_iterate(): Replace os_file_read()
with os_file_read_no_error_handling().
os_file_read_func(), os_file_read_no_error_handling_func():
Do not retry partial reads. There used to be an infinite amount
of retries. Because InnoDB extends both data and log files upfront,
partial reads should be impossible during normal operation.