InnoDB could return the same list again and again if the buffer
passed to trx_recover_for_mysql() is smaller than the number of
transactions that InnoDB recovered in XA PREPARE state.
We introduce the transaction state TRX_PREPARED_RECOVERED, which
is like TRX_PREPARED, but will be set during trx_recover_for_mysql()
so that each transaction will only be returned once.
Because init_server_components() is invoking ha_recover() twice,
we must reset the state of the transactions back to TRX_PREPARED
after returning the complete list, so that repeated traversals
will see the complete list again, instead of seeing an empty list.
Without this tweak, the test main.tc_heuristic_recover would hang
in MariaDB 10.1.
failed in compare_order_elements function
The issue here is the function compare_order_lists() is called for the order by list of the window functions
so that those window function that can be computed together are adjacent.
So in the function compare_order_list we iterate over all the elements in the order list of the two functions and
compare the items in their order by clause.
The function compare_order_elements() is called for each item in the
order by clause. This function assumes that all the items that are in the order by list would be of the type
Item::FIELD_ITEM.
The case we have is that we have constants in the order by clause. We should ignore the constant and only compare
items of the type Item::FIELD_ITEM in compare_order_elements()
Problem:
========
The mysqlbinlog tool is leaking memory, causing failures in various tests when
compiling and testing with AddressSanitizer or LeakSanitizer like this:
cmake -DCMAKE_BUILD_TYPE=Debug -DWITH_ASAN:BOOL=ON /path/to/source
make -j$(nproc)
cd mysql-test
ASAN_OPTIONS=abort_on_error=1 ./mtr --parallel=auto rpl.rpl_row_mysqlbinlog
CURRENT_TEST: rpl.rpl_row_mysqlbinlog
Direct leak of 112 byte(s) in 1 object(s) allocated from:
#0 0x4eff87 in __interceptor_malloc (/dev/shm/5.5/client/mysqlbinlog+0x4eff87)
#1 0x60eaab in my_malloc /mariadb/5.5/mysys/my_malloc.c:41:10
#2 0x5300dd in Log_event::read_log_event(char const*, unsigned int, char const**,
Format_description_log_event const*, char) /mariadb/5.5/sql/log_event.cc:1568:
#3 0x564a9c in dump_remote_log_entries(st_print_event_info*, char const*)
/mariadb/5.5/client/mysqlbinlog.cc:1978:17
Analysis:
========
'mysqlbinlog' tool is being used to read binary log events from a remote server.
While reading binary log, if a fake rotate event is found following actions are
taken.
If 'to-last-log' option is specified, then fake rotate event is processed.
In the absence of 'to-last-log' skip the fake rotate event.
In this skipped case the fake rotate event object is not getting cleaned up
resulting in memory leak.
Fix:
===
Cleanup the fake rotate event.
This issues is already fixed in MariaDB 10.0.23 and higher versions as part of
commit c3018b0ff4
with GROUP BY + ORDER BY
The method JOIN::create_postjoin_aggr_table() should not call
call JOIN::add_sorting_to_table() unless the first non-constant join
table is passed as the first parameter to the method.
dict_create_foreign_constraints_low(): Tolerate the keywords
IGNORE and ONLINE between the keywords ALTER and TABLE.
We should really remove the hacky FOREIGN KEY constraint parser
from InnoDB.
The command SHOW INDEXES ignored setting of the system variable
use_stat_tables to the value of 'preferably' and and showed statistical
data received from the engine. Similarly queries over the table
STATISTICS from INFORMATION_SCHEMA ignored this setting. It happened
because the function fill_schema_table_by_open() did not read any data
from statistical tables.
The compile-time option IBUF_COUNT_DEBUG has not been used for years.
It would only work with up to 3 created .ibd files, with no buffered
changes existing while InnoDB is started up.
Archive storage engine assumed that any query that attempts to read from
the table will call ha_archive::info() beforehand. ha_archive would flush
un-written data in that call (this would make it visible for the reads).
Break this assumption. Flush the data when the table is opened for reading.
This way, one can do multiple write statements without causing a flush, but
as soon as we might need the data, we flush it.
InnoDB crash recovery used to read every data page for which
redo log exists. This is unnecessary for those pages that are
initialized by the redo log. If a newly created page is corrupted,
recovery could unnecessarily fail. It would suffice to reinitialize
the page based on the redo log records.
To add insult to injury, InnoDB crash recovery could hang if it
encountered a corrupted page. We will fix also that problem.
InnoDB would normally refuse to start up if it encounters a
corrupted page on recovery, but that can be overridden by
setting innodb_force_recovery=1.
Data pages are completely initialized by the records
MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS.
MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE,
which notifies that a page has been freed and its contents
can be discarded (filled with zeroes).
The record MLOG_INDEX_LOAD notifies that redo logging has
been re-enabled after being disabled. We can avoid loading
the page if all buffered redo log records predate the
MLOG_INDEX_LOAD record.
For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD
records were written before commit aa3f7a107c.
Hence, we will skip these optimizations for tables whose
name starts with FTS_.
This is joint work with Thirunarayanan Balathandayuthapani.
fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the
latest recovered MLOG_INDEX_LOAD record for a tablespace.
mlog_init: Page initialization operations discovered during
redo log scanning. FIXME: This really belongs in recv_sys->addr_hash,
and should be removed in MDEV-19176.
recv_addr_state: Add the new state RECV_WILL_NOT_READ to
indicate that according to mlog_init, the page will be
initialized based on redo log record contents.
recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state
if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS
as page initialization. This works around bugs in the crash
recovery of ROW_FORMAT=COMPRESSED tables.
recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record
by resetting the state to RECV_NOT_PROCESSED and by updating
the fil_name_t::enable_lsn.
recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn
to fil_space_t::enable_lsn.
recv_recover_page(): Add the parameter init_lsn, to ignore
any log records that precede the page initialization.
Add DBUG output about skipped operations.
buf_page_create(): Initialize FIL_PAGE_LSN, so that
recv_recover_page() will not wrongly skip applying
the page-initialization record due to the field containing
some newer LSN as a leftover from a different page.
Do not invoke ibuf_merge_or_delete_for_page() during
crash recovery.
recv_apply_hashed_log_recs(): Remove some unnecessary lookups.
Note if a corrupted page was found during recovery.
After invoking buf_page_create(), do invoke
ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge()
in the last recovery batch.
ibuf_merge_or_delete_for_page(): Relax a debug assertion.
innobase_start_or_create_for_mysql(): Abort startup if
a corrupted page was found during recovery. Corrupted pages
will not be flagged if innodb_force_recovery is set.
However, the recv_sys->found_corrupt_fs flag can be set
regardless of innodb_force_recovery if file names are found
to be incorrect (for example, multiple files with the same
tablespace ID).
Similar to what was done in commit aa3f7a107c
for FULLTEXT INDEX, we must ensure that MLOG_INDEX_LOAD records will always
be written if redo logging was disabled.
row_merge_build_indexes(): Invoke row_merge_write_redo() also when
online operation is not being executed or an error occurs.
In case of an error, invoke flush_observer->interrupted() so that
the pages will not be flushed but merely evicted from the buffer pool.
Before resuming redo logging, it is crucial for the correctness of
mariabackup and InnoDB crash recovery to flush or evict all affected pages
and to write MLOG_INDEX_LOAD records.
innobase_init(): Add a missing space to a warning message.
Apparently, this message was corrupted in MariaDB 10.2.2 in
commit fec844aca8 related to a
conflict resolution when applying a change from MySQL 5.7.12.
For single table updates and multi-table updates , engine independent statistics were not being
read even if the statistics were collected.
Fixed it, so when the optimizer_use_condition_selectivity > 2 then we would read the available
statistics for update queries.
The regression that was reported in MDEV-19212 occurred due to use
of macros that did not ensure that the arguments have compatible
types.
ut_2pow_remainder(), ut_2pow_round(), ut_calc_align(): Define as
inline function templates.
UT_CALC_ALIGN(): Define as a macro, because this is used in
compile_time_assert(). Only starting with C++11 (MariaDB 10.4)
we could define the inline functions as constexpr.
os_mem_alloc_large(): Invoke the macro ut_2pow_round() with the
correct argument type.
innobase_large_page_size, innobase_use_large_pages,
os_use_large_pages, os_large_page_size: Remove.
Simply refer to opt_large_page_size, my_use_large_pages.
xtrabackup_backup_func(): If the log checkpoint header changed
since we last read it, search for the most recent checkpoint again.
Otherwise, we could corrupt the backup of the redo log, because the
least significant bits of checkpoint_lsn_start would not match
log_sys->log.lsn.
btr_page_free(): Renamed from btr_page_free_low().
If scrubbing is enabled, zero out the page with proper redo logging.
Only pass ahi=true to fseg_free_page() if the page is actually indexed.
fil_space_t::modify_check(): Renamed from fsp_space_modify_check().
fsp_init_file_page(): Define inline.
Even if Makefile for some reason was checked in in a submodule,
it is still a generated file, will be cleaned, won't be in a source
package. One cannot jump to conclusions if it doesn't exist.
This is a follow-up to MDEV-18733. As part of that fix, we made
dict_check_sys_tables() skip tables that would be dropped by
row_mysql_drop_garbage_tables().
DICT_ERR_IGNORE_DROP: A new mode where the file should not be attempted
to be opened.
dict_load_tablespace(): Do not try to load the tablespace if
DICT_ERR_IGNORE_DROP has been specified.
row_mysql_drop_garbage_tables(): Pass the DICT_ERR_IGNORE_DROP mode.
fil_space_for_table_exists_in_mem(): Remove a parameter.
The only caller that passed print_error_if_does_not_exist=true
was row_drop_single_table_tablespace().
The recv_sys data structures are accessed not only from the thread
that executes InnoDB plugin initialization, but also from the
InnoDB I/O threads, which can invoke recv_recover_page().
Assert that sufficient concurrency control is in place.
Some code was accessing recv_sys data structures without
holding recv_sys->mutex.
recv_recover_page(bpage): Refactor the call from buf_page_io_complete()
into a separate function that performs necessary steps. The
main thread was unnecessarily releasing and reacquiring recv_sys->mutex.
recv_recover_page(block,mtr,recv_addr): Pass more parameters from
the caller. Avoid redundant lookups and computations. Eliminate some
redundant variables.
recv_get_fil_addr_struct(): Assert that recv_sys->mutex is being held.
That was not always the case!
recv_scan_log_recs(): Acquire recv_sys->mutex for the whole duration
of the function. (While we are scanning and buffering redo log records,
no pages can be read in.)
recv_read_in_area(): Properly protect access with recv_sys->mutex.
recv_apply_hashed_log_recs(): Check recv_addr->state only once,
and continuously hold recv_sys->mutex. The mutex will be released
and reacquired inside recv_recover_page() and recv_read_in_area(),
allowing concurrent processing by buf_page_io_complete() in I/O threads.
The record MLOG_INDEX_LOAD is supposed to be written to indicate that
some page modifications bypassed redo logging, and that redo logging
is now re-enabled. It was not written for fulltext indexes during
ALTER TABLE.
row_merge_write_redo(): Declare globally. Assert that the index
is neither a spatial nor fulltext index.
recv_mlog_index_load(): Observe a MLOG_INDEX_LOAD operation.
recv_parse_log_recs(): Handle MLOG_INDEX_LOAD also in multi-record
mini-transactions. Because of this omission, we should keep writing
MLOG_INDEX_LOAD in single-record mini-transactions, because older
versions of Mariabackup would fail.
row_fts_merge_insert(): Write MLOG_INDEX_LOAD for the auxiliary
tables of fulltext indexes.
The record MLOG_ZIP_PAGE_COMPRESS is similar to MLOG_INIT_FILE_PAGE2
that it contains all the information needed to initialize the page.
Like for the other record, do initialize the entire page on recovery.
The page_size argument to buf_page_get_gen() only matters when the
page is going to be loaded into the buffer pool. Allow callers to
pass a dummy parameter when using BUF_GET_IF_IN_POOL (which would
return NULL if the block is not in the buffer pool).
btr_root_get(): Ignore the root->page.encrypted flag.
The purpose of this flag is questionable since
commit 8c43f96388.
btr_validate_index(): Avoid crash if btr_root_get() returns NULL.
Normally, InnoDB is not in the process of executing crash recovery.
Provide a hint to the compiler that the recovery-related code paths
are rarely executed.
With INFORMATION_SCHEMA set as the default database the check that a table
referred in the processed query is defined in INORMATION_SCHEMA must
be postponed until all CTE names can be identified.
Always set SERVER_MORE_RESULTS_EXIST when executing stored procedure statements
If statements produce a result, EOF packet needs this flag (SP ends
with an OK packet). IF statetement does not produce a result, affected rows
count are part of the final OK packet.
within stored procedure
Always set SERVER_MORE_RESULTS_EXIST when executing stored procedure.
statements
If statements produce a result, EOF packet needs this flag (SP ends with
an OK packet). IF statetement does not produce a result, affected rows
count are part of the final OK packet.