If InnoDB or XtraDB recovered committed transactions at server
startup, but the processing of recovered transactions was
prevented by innodb_read_only or by innodb_force_recovery,
an assertion would fail at shutdown.
This bug was originally reproduced when Mariabackup executed
InnoDB shutdown after preparing (applying redo log into) a backup.
trx_free_prepared(): Allow TRX_STATE_COMMITTED_IN_MEMORY.
trx_undo_free_prepared(): Allow any undo log state. For transactions
that were resurrected in TRX_STATE_COMMITTED_IN_MEMORY
the undo log state would have been reset by trx_undo_set_state_at_finish().
Replace all references in InnoDB and XtraDB error log messages
to bugs.mysql.com with references to https://jira.mariadb.org/.
The original merge
commit 4274d0bf57
was accidentally reverted by the subsequent merge
commit 3b35d745c3
InnoDB was writing unnecessary information to the
update undo log records. Most notably, if an indexed column is updated,
the old value of the column would be logged twice: first as part of
the update vector, and then another time because it is an indexed column.
Because the InnoDB undo log record must fit in a single page,
this would cause unnecessary failure of certain updates.
Even after this fix, InnoDB still seems to be unnecessarily logging
indexed column values for non-updated columns. It seems that non-updated
secondary index columns only need to be logged when a PRIMARY KEY
column is updated. To reduce risk, we are not fixing this remaining flaw
in GA versions.
trx_undo_page_report_modify(): Log updated indexed columns only once.
Problem was incorrect definition of wsrep_recovery,
trx_sys_update_wsrep_checkpoint and
trx_sys_read_wsrep_checkpoint functions causing
innodb_plugin not to load as there was undefined symbols.
Problem was incorrect definition of wsrep_recovery,
trx_sys_update_wsrep_checkpoint and
trx_sys_read_wsrep_checkpoint functions causing
innodb_plugin not to load as there was undefined symbols.
When MySQL 5.0.3 introduced InnoDB support for two-phase commit,
it also introduced the questionable logic to roll back XA PREPARE
transactions on startup when innodb_force_recovery is 1 or 2.
Remove this logic in order to avoid unwanted side effects when
innodb_force_recovery is being set for other reasons. That is,
XA PREPARE transactions will always remain in that state until
InnoDB receives an explicit XA ROLLBACK or XA COMMIT request
from the upper layer.
At the time the logic was introduced in MySQL 5.0.3, there already
was a startup parameter that is the preferred way of achieving
the behaviour: --tc-heuristic-recover=ROLLBACK.
Problem was that 4k page size is not really supported in
Galera. For reference see:
codership/galera#398
Page size 4k is problematic because WSREP XID info location
that was set to constant UNIV_PAGE_SIZE - 3500 and that is conflicting
with rseg undo slots location if there is lot of undo tablespaces.
Undo tablespace identifiers and page numbers require
at least 126*8=1024 bytes starting from offset 56. Therefore,
WSREP XID startig from offset 596 would overwrite several
space_id,page_no pairs starting from 72th undo log tablespace
space_id,page_no pair at offset 594.
This will cause InnoDB startup failure seen as
[ERROR] InnoDB: Unable to open undo tablespace './undo30579'.
Originally, the undo tablespace ID would always be between
0 and 127. Starting with MySQL 5.6.36 which introduced
Bug #25551311 BACKPORT BUG #23517560 REMOVE SPACE_ID RESTRICTION
FOR UNDO TABLESPACES (merged to MariaDB 10.0.31)
it is possible for an undo tablespace ID to be 0x7773. But in
this case, the page number should be 3, not 0x72650003.
This is just the first collision. The WSREP XID data would
overwrite subsequent slots.
trx0sys.h
trx0sys.cc
Code formatting and add comments.
When a slow shutdown is performed soon after spawning some work for
background threads that can create or commit transactions, it is possible
that new transactions are started or committed after the purge has finished.
This is violating the specification of innodb_fast_shutdown=0, namely that
the purge must be completed. (None of the history of the recent transactions
would be purged.)
Also, it is possible that the purge threads would exit in slow shutdown
while there exist active transactions, such as recovered incomplete
transactions that are being rolled back. Thus, the slow shutdown could
fail to purge some undo log that becomes purgeable after the transaction
commit or rollback.
srv_undo_sources: A flag that indicates if undo log can be generated
or the persistent, whether by background threads or by user SQL.
Even when this flag is clear, active transactions that already exist
in the system may be committed or rolled back.
innodb_shutdown(): Renamed from innobase_shutdown_for_mysql().
Do not return an error code; the operation never fails.
Clear the srv_undo_sources flag, and also ensure that the background
DROP TABLE queue is empty.
srv_purge_should_exit(): Do not allow the purge to exit if
srv_undo_sources are active or the background DROP TABLE queue is not
empty, or in slow shutdown, if any active transactions exist
(and are being rolled back).
srv_purge_coordinator_thread(): Remove some previous workarounds
for this bug.
innobase_start_or_create_for_mysql(): Set buf_page_cleaner_is_active
and srv_dict_stats_thread_active directly. Set srv_undo_sources before
starting the purge subsystem, to prevent immediate shutdown of the purge.
Create dict_stats_thread and fts_optimize_thread immediately
after setting srv_undo_sources, so that shutdown can use this flag to
determine if these subsystems were started.
dict_stats_shutdown(): Shut down dict_stats_thread. Backported from 10.2.
srv_shutdown_table_bg_threads(): Remove (unused).
The doublewrite buffer pages must fit in the first InnoDB system
tablespace data file. The checks that were added in the initial patch
(commit 112b21da37)
were at too high level and did not cover all cases.
innodb.log_data_file_size: Test all innodb_page_size combinations.
fsp_header_init(): Never return an error. Move the change buffer creation
to the only caller that needs to do it.
btr_create(): Clean up the logic. Remove the error log messages.
buf_dblwr_create(): Try to return an error on non-fatal failure.
Check that the first data file is big enough for creating the
doublewrite buffers.
buf_dblwr_process(): Check if the doublewrite buffer is available.
Display the message only if it is available.
recv_recovery_from_checkpoint_start_func(): Remove a redundant message
about FIL_PAGE_FILE_FLUSH_LSN mismatch when crash recovery has already
been initiated.
fil_report_invalid_page_access(): Simplify the message.
fseg_create_general(): Do not emit messages to the error log.
innobase_init(): Revert the changes.
trx_rseg_create(): Refactor (no functional change).
Problem was that all doublewrite buffer pages must fit to first
system datafile.
Ported commit 27a34df7882b1f8ed283f22bf83e8bfc523cbfde
Author: Shaohua Wang <shaohua.wang@oracle.com>
Date: Wed Aug 12 15:55:19 2015 +0800
BUG#21551464 - SEGFAULT WHILE INITIALIZING DATABASE WHEN
INNODB_DATA_FILE SIZE IS SMALL
To 10.1 (with extended error printout).
btr_create(): If ibuf header page allocation fails report error and
return FIL_NULL. Similarly if root page allocation fails return a error.
dict_build_table_def_step: If fsp_header_init fails return
error code.
fsp_header_init: returns true if header initialization succeeds
and false if not.
fseg_create_general: report error if segment or page allocation fails.
innobase_init: If first datafile is smaller than 3M and could not
contain all doublewrite buffer pages report error and fail to
initialize InnoDB plugin.
row_truncate_table_for_mysql: report error if fsp header init
fails.
srv_init_abort: New function to report database initialization errors.
srv_undo_tablespaces_init, innobase_start_or_create_for_mysql: If
database initialization fails report error and abort.
trx_rseg_create: If segment header creation fails return.
Significantly reduce the amount of InnoDB, XtraDB and Mariabackup
code changes by defining pfs_os_file_t as something that is
transparently compatible with os_file_t.
Fix a -fsanitizer=undefined warning that trx_undo_report_row_operation()
was being passed thr=NULL when the BTR_NO_UNDO_LOG_FLAG flag was set.
trx_undo_report_row_operation(): Remove the first two parameters.
The parameter clust_entry!=NULL distinguishes inserts from updates.
This should be a non-functional change (no observable change in
behaviour; slightly smaller code).
This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293
from current 5.6.36 innodb.
Bug #23481444 OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE
INDEX APPLIED BY UNCOMMITTED ROW
Problem:
========
row_search_for_mysql() does whole table traversal for range query
even though the end range is passed. Whole table traversal happens
when the record is not with in transaction read view.
Solution:
=========
Convert the innodb last record of page to mysql format and compare
with end range if the traversal of row_search_mvcc() exceeds 100,
no ICP involved. If it is out of range then InnoDB can avoid the
whole table traversal. Need to refactor the code little bit to
make it compile.
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com>
Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com>
RB: 14660
Problem was that bpage was referenced after it was already freed
from LRU. Fixed by adding a new variable encrypted that is
passed down to buf_page_check_corrupt() and used in
buf_page_get_gen() to stop processing page read.
This patch should also address following test failures and
bugs:
MDEV-12419: IMPORT should not look up tablespace in
PageConverter::validate(). This is now removed.
MDEV-10099: encryption.innodb_onlinealter_encryption fails
sporadically in buildbot
MDEV-11420: encryption.innodb_encryption-page-compression
failed in buildbot
MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8
Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing
and replaced these with dict_table_t::file_unreadable. Table
ibd file is missing if fil_get_space(space_id) returns NULL
and encrypted if not. Removed dict_table_t::is_corrupted field.
Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(),
buf_page_decrypt_after_read(), buf_page_encrypt_before_write(),
buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats().
Added test cases when enrypted page could be read while doing
redo log crash recovery. Also added test case for row compressed
blobs.
btr_cur_open_at_index_side_func(),
btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is
NULL.
buf_page_get_zip(): Issue error if page read fails.
buf_page_get_gen(): Use dberr_t for error detection and
do not reference bpage after we hare freed it.
buf_mark_space_corrupt(): remove bpage from LRU also when
it is encrypted.
buf_page_check_corrupt(): @return DB_SUCCESS if page has
been read and is not corrupted,
DB_PAGE_CORRUPTED if page based on checksum check is corrupted,
DB_DECRYPTION_FAILED if page post encryption checksum matches but
after decryption normal page checksum does not match. In read
case only DB_SUCCESS is possible.
buf_page_io_complete(): use dberr_t for error handling.
buf_flush_write_block_low(),
buf_read_ahead_random(),
buf_read_page_async(),
buf_read_ahead_linear(),
buf_read_ibuf_merge_pages(),
buf_read_recv_pages(),
fil_aio_wait():
Issue error if page read fails.
btr_pcur_move_to_next_page(): Do not reference page if it is
NULL.
Introduced dict_table_t::is_readable() and dict_index_t::is_readable()
that will return true if tablespace exists and pages read from
tablespace are not corrupted or page decryption failed.
Removed buf_page_t::key_version. After page decryption the
key version is not removed from page frame. For unencrypted
pages, old key_version is removed at buf_page_encrypt_before_write()
dict_stats_update_transient_for_index(),
dict_stats_update_transient()
Do not continue if table decryption failed or table
is corrupted.
dict0stats.cc: Introduced a dict_stats_report_error function
to avoid code duplication.
fil_parse_write_crypt_data():
Check that key read from redo log entry is found from
encryption plugin and if it is not, refuse to start.
PageConverter::validate(): Removed access to fil_space_t as
tablespace is not available during import.
Fixed error code on innodb.innodb test.
Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown
to innodb-bad-key-change2. Removed innodb-bad-key-change5 test.
Decreased unnecessary complexity on some long lasting tests.
Removed fil_inc_pending_ops(), fil_decr_pending_ops(),
fil_get_first_space(), fil_get_next_space(),
fil_get_first_space_safe(), fil_get_next_space_safe()
functions.
fil_space_verify_crypt_checksum(): Fixed bug found using ASAN
where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly
accessed from row compressed tables. Fixed out of page frame
bug for row compressed tables in
fil_space_verify_crypt_checksum() found using ASAN. Incorrect
function was called for compressed table.
Added new tests for discard, rename table and drop (we should allow them
even when page decryption fails). Alter table rename is not allowed.
Added test for restart with innodb-force-recovery=1 when page read on
redo-recovery cant be decrypted. Added test for corrupted table where
both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted.
Adjusted the test case innodb_bug14147491 so that it does not anymore
expect crash. Instead table is just mostly not usable.
fil0fil.h: fil_space_acquire_low is not visible function
and fil_space_acquire and fil_space_acquire_silent are
inline functions. FilSpace class uses fil_space_acquire_low
directly.
recv_apply_hashed_log_recs() does not return anything.
Introduced a new wsrep_trx_print_locking() which may be called
under lock_sys->mutex if the trx has locks.
Signed-off-by: Sachin Setiya <sachin.setiya@mariadb.com>
Problem was that trx_sys->mutex was acquired to print trx info
even when we already hold trx_sys->mutex. Fixed similarly as
in InnoDB, i.e. with wsrep_trx_print_locking() function that
does not acquire trx_sys->mutex.
Introduced a new wsrep_trx_print_locking() which may be called
under lock_sys->mutex if the trx has locks.
Signed-off-by: Sachin Setiya <sachinsetia1001@gmail.com>
In the 10.1 InnoDB Plugin, a call os_event_free(buf_flush_event) was
misplaced. The event could be signalled by rollback of resurrected
transactions while shutdown was in progress. This bug was caught
by cmake -DWITH_ASAN testing. This call was only present in the
10.1 InnoDB Plugin, not in other versions, or in XtraDB.
That said, the bug affects all InnoDB versions. Shutdown assumes the
cessation of any page-dirtying activity, including the activity of
the background rollback thread. InnoDB only waited for the background
rollback to finish as part of a slow shutdown (innodb_fast_shutdown=0).
The default is a clean shutdown (innodb_fast_shutdown=1). In a scenario
where InnoDB is killed, restarted, and shut down soon enough, the data
files could become corrupted.
logs_empty_and_mark_files_at_shutdown(): Wait for the
rollback to finish, except if innodb_fast_shutdown=2
(crash-like shutdown) was requested.
trx_rollback_or_clean_recovered(): Before choosing the next
recovered transaction to roll back, terminate early if non-slow
shutdown was initiated. Roll back everything on slow shutdown
(innodb_fast_shutdown=0).
srv_innodb_monitor_mutex: Declare as static, because the mutex
is only used within one module.
After each call to os_event_free(), ensure that the freed event
is not reachable via global variables, by setting the relevant
variables to NULL.
The function trx_purge_stop() was calling os_event_reset(purge_sys->event)
before calling rw_lock_x_lock(&purge_sys->latch). The os_event_set()
call in srv_purge_coordinator_suspend() is protected by that X-latch.
It would seem a good idea to consistently protect both os_event_set()
and os_event_reset() calls with a common mutex or rw-lock in those
cases where os_event_set() and os_event_reset() are used
like condition variables, tied to changes of shared state.
For each os_event_t, we try to document the mutex or rw-lock that is
being used. For some events, frequent calls to os_event_set() seem to
try to avoid hangs. Some events are never waited for infinitely, only
timed waits, and os_event_set() is used for early termination of these
waits.
os_aio_simulated_put_read_threads_to_sleep(): Define as a null macro
on other systems than Windows. TODO: remove this altogether and disable
innodb_use_native_aio on Windows.
os_aio_segment_wait_events[]: Initialize only if innodb_use_native_aio=0.
If InnoDB is started in innodb_read_only mode such that
recovered incomplete transactions exist at startup
(but the redo logs are clean), an assertion will fail at shutdown,
because there would exist some non-prepared transactions.
logs_empty_and_mark_files_at_shutdown(): Do not wait for incomplete
transactions to finish if innodb_read_only or innodb_force_recovery>=3.
Wait for purge to finish in only one place.
trx_sys_close(): Relax the assertion that would fail first.
trx_free_prepared(): Also free recovered TRX_STATE_ACTIVE transactions
if innodb_read_only or innodb_force_recovery>=3.
MY_THREAD_INIT IN BACKGROUND THREAD
Description:
===========
Add my_thread_init() and my_thread_exit() for background threads which
initializes and frees the st_my_thread_var structure.
Reviewed-by: Jimmy Yang<jimmy.yang@oracle.com>
RB: 15003