The fix of MDEV-23456 (commit b1009ae5c1)
introduced a livelock between page flushing and a thread that is
executing buf_page_create().
buf_page_create(): If the current mini-transaction is holding
an exclusive latch on the page, do not attempt to acquire another
one, and do not care about any I/O fix.
mtr_t::have_x_latch(): Replaces mtr_t::get_fix_count().
dyn_buf_t::for_each_block(const Functor&) const: A new variant.
rw_lock_own(): Add a const qualifier.
Reviewed by: Thirunarayanan Balathandayuthapani
buf_page_create() is invoked when page is initialized. So that
previous contents of the page ignored. In few cases, it calls
buf_page_get_gen() is called to fetch the page from buffer pool.
It should take x-latch on the page. If other thread uses the block
or block io state is different from BUF_IO_NONE then release the
mutex and check the state and buffer fix count again. For compressed
page, use the existing free block from LRU list to create new page.
Retry to fetch the compressed page if it is in flush list
fseg_create(), fseg_create_general(): Introduce block as a parameter
where segment header is placed. It is used to avoid repetitive
x-latch on the same page
Change the assert to check whether the page has SX latch and
X latch in all callee function of buf_page_create()
mtr_t::get_fix_count(): Get the buffer fix count of the given
block added by the mtr
FindBlock is added to find the buffer fix count of the given
block acquired by the mini-transaction
When InnoDB is extending a data file, it is updating the FSP_SIZE
field in the first page of the data file.
In commit 8451e09073 (MDEV-11556)
we removed a work-around for this bug and made recovery stricter,
by making it track changes to FSP_SIZE via redo log records, and
extend the data files before any changes are being applied to them.
It turns out that the function fsp_fill_free_list() is not crash-safe
with respect to this when it is initializing the change buffer bitmap
page (page 1, or generally, N*innodb_page_size+1). It uses a separate
mini-transaction that is committed (and will be written to the redo
log file) before the mini-transaction that actually extended the data
file. Hence, recovery can observe a reference to a page that is
beyond the current end of the data file.
fsp_fill_free_list(): Initialize the change buffer bitmap page in
the same mini-transaction.
The rest of the changes are fixing a bug that the use of the separate
mini-transaction was attempting to work around. Namely, we must ensure
that no other thread will access the change buffer bitmap page before
our mini-transaction has been committed and all page latches have been
released.
That is, for read-ahead as well as neighbour flushing, we must avoid
accessing pages that might not yet be durably part of the tablespace.
fil_space_t::committed_size: The size of the tablespace
as persisted by mtr_commit().
fil_space_t::max_page_number_for_io(): Limit the highest page
number for I/O batches to committed_size.
MTR_MEMO_SPACE_X_LOCK: Replaces MTR_MEMO_X_LOCK for fil_space_t::latch.
mtr_x_space_lock(): Replaces mtr_x_lock() for fil_space_t::latch.
mtr_memo_slot_release_func(): When releasing MTR_MEMO_SPACE_X_LOCK,
copy space->size to space->committed_size. In this way, read-ahead
or flushing will never be invoked on pages that do not yet exist
according to FSP_SIZE.
Apart from page latches (buf_block_t::lock), mini-transactions
are keeping track of at most one dict_index_t::lock and
fil_space_t::latch at a time, and in a rare case, purge_sys.latch.
Let us introduce interfaces for acquiring an index latch
or a tablespace latch.
In a later version, we may want to introduce mtr_t members
for holding a latched dict_index_t* and fil_space_t*,
and replace the remaining use of mtr_t::m_memo
with std::set<buf_block_t*> or with a map<buf_block_t*,byte*>
pointing to log records.
mtr_t::Impl, mtr_t::Command: Merge to mtr_t.
MTR_MAGIC_N: Remove.
MTR_STATE_COMMITTING: Remove. This state was only being set
internally during mtr_t::commit().
mtr_t::Command::m_locks_released: Remove (set-and-never-read member).
mtr_t::Command::m_start_lsn: Replaced with the return value of
finish_write() and a parameter to release_blocks().
mtr_t::Command::m_end_lsn: Removed as a duplicate of mtr_t::m_commit_lsn.
mtr_t::Command::prepare_write(): Replace a switch () with a
comparison against 0. Only 2 m_log_mode are allowed.
Some places didn't match the previous rules, making the Floor
address wrong.
Additional sed rules:
sed -i -e 's/Place.*Suite .*, Boston/Street, Fifth Floor, Boston/g'
sed -i -e 's/Suite .*, Boston/Fifth Floor, Boston/g'
Also, related to MDEV-15522, MDEV-17304, MDEV-17835,
remove the Galera xtrabackup tests, because xtrabackup never worked
with MariaDB Server 10.3 due to InnoDB redo log format changes.
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.
main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
The parameters bool sync=true, bool read_only=false of mtr_t::start()
were added in
eca5b0fc17
(MySQL 5.7.3).
The parameter read_only was never used anywhere.
The parameter sync was only copied around, and would be returned
by the unused function mtr_t::is_async().
We do not need this dead code in MariaDB.
Remove unused InnoDB function parameters and functions.
i_s_sys_virtual_fill_table(): Do not allocate heap memory.
mtr_is_block_fix(): Replace with mtr_memo_contains().
mtr_is_page_fix(): Replace with mtr_memo_contains_page().
Bind more InnoDB parameters directly to MYSQL_SYSVAR and
remove "shadow variables".
innodb_change_buffering: Declare as ENUM, not STRING.
innodb_flush_method: Declare as ENUM, not STRING.
innodb_log_buffer_size: Bind directly to srv_log_buffer_size,
without rounding it to a multiple of innodb_page_size.
LOG_BUFFER_SIZE: Remove.
SysTablespace::normalize_size(): Renamed from normalize().
innodb_init_params(): A new function to initialize and validate
InnoDB startup parameters.
innodb_init(): Renamed from innobase_init(). Invoke innodb_init_params()
before actually trying to start up InnoDB.
srv_start(bool): Renamed from innobase_start_or_create_for_mysql().
Added the input parameter create_new_db.
SRV_ALL_O_DIRECT_FSYNC: Define only for _WIN32.
xb_normalize_init_values(): Merge to innodb_init_param().
InnoDB always keeps all tablespaces in the fil_system cache.
The fil_system.LRU is only for closing file handles; the
fil_space_t and fil_node_t for all data files will remain
in main memory. Between startup to shutdown, they can only be
created and removed by DDL statements. Therefore, we can
let dict_table_t::space point directly to the fil_space_t.
dict_table_t::space_id: A numeric tablespace ID for the corner cases
where we do not have a tablespace. The most prominent examples are
ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file.
There are a few functional differences; most notably:
(1) DROP TABLE will delete matching .ibd and .cfg files,
even if they were not attached to the data dictionary.
(2) Some error messages will report file names instead of numeric IDs.
There still are many functions that use numeric tablespace IDs instead
of fil_space_t*, and many functions could be converted to fil_space_t
member functions. Also, Tablespace and Datafile should be merged with
fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use
fil_space_t& instead of a numeric ID, and after moving to a single
buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to
fil_space_t::page_hash.
FilSpace: Remove. Only few calls to fil_space_acquire() will remain,
and gradually they should be removed.
mtr_t::set_named_space_id(ulint): Renamed from set_named_space(),
to prevent accidental calls to this slower function. Very few
callers remain.
fseg_create(), fsp_reserve_free_extents(): Take fil_space_t*
as a parameter instead of a space_id.
fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(),
fil_name_write_rename(), fil_rename_tablespace(). Mariabackup
passes the parameter log=false; InnoDB passes log=true.
dict_mem_table_create(): Take fil_space_t* instead of space_id
as parameter.
dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter
'status' with 'bool cached'.
dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name.
fil_ibd_open(): Return the tablespace.
fil_space_t::set_imported(): Replaces fil_space_set_imported().
truncate_t: Change many member function parameters to fil_space_t*,
and remove page_size parameters.
row_truncate_prepare(): Merge to its only caller.
row_drop_table_from_cache(): Assert that the table is persistent.
dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL
if the tablespace has been discarded.
row_import_update_discarded_flag(): Remove a constant parameter.
Add fil_system_t::sys_space, fil_system_t::temp_space.
These will replace lookups for TRX_SYS_SPACE or SRV_TMP_SPACE_ID.
mtr_t::m_undo_space, mtr_t::m_sys_space: Remove.
mtr_t::set_sys_modified(): Remove.
fil_space_get_type(), fil_space_get_n_reserved_extents(): Remove.
fsp_header_get_tablespace_size(), fsp_header_inc_size():
Merge to the only caller, innobase_start_or_create_for_mysql().
Problem:
========
During checkpoint, we are writing all MLOG_FILE_NAME records in one mtr
and parse buffer can't be processed till MLOG_MULTI_REC_END. Eventually parse
buffer exceeds the RECV_PARSING_BUF_SIZE and eventually it overflows.
Fix:
===
1) Break the large mtr if it exceeds LOG_CHECKPOINT_FREE_PER_THREAD into multiple mtr during checkpoint.
2) Move the parsing buffer if we are encountering only MLOG_FILE_NAME
records. So that it will never exceed the RECV_PARSING_BUF_SIZE.
Reviewed-by: Debarun Bannerjee <debarun.bannerjee@oracle.com>
Reviewed-by: Rahul M Malik <rahul.m.malik@oracle.com>
RB: 14743
Also, remove empty .ic files that were not removed by my MySQL commit.
Problem:
InnoDB used to support a compilation mode that allowed to choose
whether the function definitions in .ic files are to be inlined or not.
This stopped making sense when InnoDB moved to C++ in MySQL 5.6
(and ha_innodb.cc started to #include .ic files), and more so in
MySQL 5.7 when inline methods and functions were introduced
in .h files.
Solution:
Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from
all files, assuming that the symbols are never defined.
Remove the files fut0fut.cc and ut0byte.cc which only mattered when
UNIV_NONINL was defined.
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
InnoDB keeps track of buffer-fixed buf_block_t or acquired rw_lock_t
within a mini-transaction. There are some memo_contains assertions
in the code that document when certain blocks or rw_locks must be held.
But, these assertions only check the mini-transaction memo, not the fact
whether the rw_lock_t are actually being held by the caller.
btr_pcur_store_position(): Remove #ifdef, and assert that the block
is always buffer-fixed.
rtr_pcur_getnext_from_path(), rtr_pcur_open_low(),
ibuf_rec_get_page_no_func(), ibuf_rec_get_space_func(),
ibuf_rec_get_info_func(), ibuf_rec_get_op_type_func(),
ibuf_build_entry_from_ibuf_rec_func(), ibuf_rec_get_volume_func(),
ibuf_get_merge_page_nos_func(), ibuf_get_volume_buffered_count_func()
ibuf_get_entry_counter_low_func(), page_set_ssn_id(),
row_vers_old_has_index_entry(), row_vers_build_for_consistent_read(),
row_vers_build_for_semi_consistent_read(),
trx_undo_prev_version_build():
Make use of mtr_memo_contains_page_flagged().
mtr_t::memo_contains(): Take a const memo. Assert rw_lock_own().
FindPage, FlaggedCheck: Assert rw_lock_own_flagged().
WL#7682 in MySQL 5.7 introduced the possibility to create light-weight
temporary tables in InnoDB. These are called 'intrinsic temporary tables'
in InnoDB, and in MySQL 5.7, they can be created by the optimizer for
sorting or buffering data in query processing.
In MariaDB 10.2, the optimizer temporary tables cannot be created in
InnoDB, so we should remove the dead code and related data structures.
Contains also:
MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE || m_lock_type != 2' failed. (branch bb-10.2-jan)
Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
enable tests that were fixed in MDEV-10549
MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables
Contains also
MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7
The failure happened because 5.7 has changed the signature of
the bool handler::primary_key_is_clustered() const
virtual function ("const" was added). InnoDB was using the old
signature which caused the function not to be used.
MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7
Fixed mutexing problem on lock_trx_handle_wait. Note that
rpl_parallel and rpl_optimistic_parallel tests still
fail.
MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
Reason: incorrect merge
MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
Reason: incorrect merge
Merge Facebook commit 25295d003cb0c17aa8fb756523923c77250b3294
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
This adds a pointer to the trx to each mtr.
This allows the trx to be accessed in parts of the code
where it was otherwise not available. This is needed later.
Update InnoDB to 5.6.14
Apply MySQL-5.6 hack for MySQL Bug#16434374
Move Aria-only HA_RTREE_INDEX from my_base.h to maria_def.h (breaks an assert in InnoDB)
Fix InnoDB memory leak
Get rid of O(n^2) scan in dyn array (mtr->memo) operations, accessing
the dyn array blocks directly.
dyn_array_get_last_block(), dyn_array_get_next_block(),
dyn_array_get_prev_block(): Define as a constness-preserving macro.
Add const qualifiers to many dyn_array functions.
mtr_memo_slot_release_func(): Renamed from mtr_memo_slot_release():
Make mtr_t* a debug-only parameter. Assume that slot->object != NULL.
mtr_memo_pop_all(): Access the dyn_array blocks directly, replacing
O(n^2) operation with O(n).
mtr_memo_release(): Access the dyn_array blocks directly, replacing
O(n^2) operation with O(n). This caused the performance problem.
rb#1540 approved by Jimmy Yang