trx reference counter was updated under mutex and read without any
protection. This is both slow and unsafe. Use atomic operations for
reference counter accesses.
trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite
expensive operations under trx_sys_t::mutex protection: e.g. malloc/free
when adding/removing elements. Traversing b-tree is not that cheap either.
This has negative scalability impact, which is especially visible when running
oltp_update_index.lua benchmark on a ramdisk.
To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None
of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex)
protection.
Another interesting issue observed with std::set is reproducible ~2% performance
decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable.
All in all this patch optimises away one of three trx_sys->mutex locks per
oltp_update_index.lua query. The other two critical sections became smaller.
Relevant clean-ups:
Replaced rw_trx_set iteration at startup with local set. The latter is needed
because values inserted to rw_trx_list must be ordered by trx->id.
Removed redundant conditions from trx_reference(): it is (and even was) never
called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY.
do_ref_count doesn't (and probably even didn't) make any sense: now it is called
only when reference counter increment is actually requested.
Moved condition out of mutex in trx_erase_lists().
trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were
greatly simplified and replaced by appropriate trx_rw_hash_t methods.
Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or
ACTIVE states. Transactions in COMMITTED state were required to be found
at InnoDB startup only. They are now looked up in the local set.
Removed unused trx_assert_recovered().
Removed unused innobase_get_trx() declaration.
Removed rather semantically incorrect trx_sys_rw_trx_add().
Moved information printout from trx_sys_init_at_db_start() to
trx_lists_init_at_db_start().
Problem was timing between the thread that was killed and reading the
binary log.
Updated the test to wait until the killed thread was properly terminated
before checking what's in the binary log.
To make check safe, I changed "threads_connected" to be updated after
thd::cleanup() is done, to ensure that all binary logs updates are done
before the variable is changed. This was mainly done to get the
test deterministic and have now other real influence in how the server
works.
This bug affected tables where the PRIMARY KEY contains variable-length
columns, and ROW_FORMAT is COMPACT or DYNAMIC.
rec_init_offsets_comp_ordinary(): Do not short-cut the parsing
of the record header for records that contain explicit values
for instantly added columns.
rec_copy_prefix_to_buf(): Copy more header for records that
contain explicit values for instantly added columns.
in JOIN::inject_best_splitting_cond
The value of SplM_opt_info::last_plan should be set to NULL
before any search for a splitting plan for a splittable
materialized table.
In case libaio is not found, and required,
remove variables HAVE_LIBAIO_H and HAVE_LIBAIO from cache, so that cmake
rerun after installation of libaio would succeed.
debug_key_management
encrypt_and_grep
innodb_encryption
If real table count is different from what is expected by the test, it
just hangs on waiting to fulfill hardcoded number. And then exits with
**failed** after 10 minutes of wait: quite unfriendly and hard to
figure out what's going on.
Fix and enable some of the tests; some remain disabled.
The tests innodb_gis.rtree_old and innodb_gis.row_format
duplicated some versions of the test main.gis-rtree.
Instead of duplicating, source that test, in a new test
innodb_gis.innodb_gis_rtree.
Introduce innodb_row_format.combinations. Due to this,
ROW_FORMAT=COMPRESSED will not be covered in some tests
where it is covered in MySQL 5.7.
The function rtr_update_mbr_field_in_place() is generating
MLOG_REC_UPDATE_IN_PLACE or MLOG_COMP_REC_UPDATE_IN_PLACE records
on non-leaf pages, even though MLOG_WRITE_STRING would perfectly
suffice for updating a fixed-length data field.
btr_cur_parse_update_in_place(): If flags==7, the record may be
from rtr_update_mbr_field_in_place(), and we must check if the
page is a leaf page. Otherwise, assume that it is.
btr_cur_update_in_place(): Assert that the page is a leaf page.
While insert direction makes no sense for SPATIAL INDEX (R-tree),
the field is apparently being used (and basically garbage).
Relax the debug assertion that was added in MDEV-11369.
Other things, mainly to get
create_mysqld_error_find_printf_error tool to work:
- Added protection to not include mysqld_error.h twice
- Include "unireg.h" instead of "mysqld_error.h" in server
- Added protection if ER_XX messages are already defined
- Removed wrong calls to my_error(ER_OUTOFMEMORY) as
my_malloc() and my_alloc will do this automatically
- Added missing %s to ER_DUP_QUERY_NAME
- Removed old and wrong calls to my_strerror() when using
MY_ERROR_ON_RENAME (wrong merge)
- Fixed deadlock error message from Galera. Before the extra
information given to ER_LOCK_DEADLOCK was missing because
ER_LOCK_DEADLOCK doesn't provide any extra information.
I kept #ifdef mysqld_error_find_printf_error_used in sql_acl.h
to make it easy to do this kind of check again in the future
debug_key_management
encrypt_and_grep
innodb_encryption
If real table count is different from what is expected by the test, it
just hangs on waiting to fulfill hardcoded number. And then exits with
**failed** after 10 minutes of wait: quite unfriendly and hard to
figure out what's going on.
While the redo log format was changed in MariaDB 10.3.2 and 10.3.3
due to MDEV-12288 and MDEV-11369, it should be technically possible
to upgrade from a crashed MariaDB 10.2 instance.
On a related note, it should be possible for Mariabackup 10.3
to create a backup from a running MariaDB Server 10.2.
mlog_id_t: Put back the 10.2 specific redo log record types
MLOG_UNDO_INSERT, MLOG_UNDO_ERASE_END, MLOG_UNDO_INIT,
MLOG_UNDO_HDR_REUSE.
trx_undo_parse_add_undo_rec(): Parse or apply MLOG_UNDO_INSERT.
trx_undo_erase_page_end(): Apply MLOG_UNDO_ERASE_END.
trx_undo_parse_page_init(): Parse or apply MLOG_UNDO_INIT.
trx_undo_parse_page_header_reuse(): Parse or apply MLOG_UNDO_HDR_REUSE.
recv_log_recover_10_2(): Remove. Always parse the redo log from 10.2.
recv_find_max_checkpoint(), recv_recovery_from_checkpoint_start():
Always parse the redo log from MariaDB 10.2.
recv_parse_or_apply_log_rec_body(): Parse or apply
MLOG_UNDO_INSERT, MLOG_UNDO_ERASE_END, MLOG_UNDO_INIT.
srv_prepare_to_delete_redo_log_files(),
innobase_start_or_create_for_mysql(): Upgrade from a previous (supported)
redo log format.
trx_undo_page_report_rename(): Return a pointer to the start of the
undo log record, not to the start of the (not yet written) next free
record. The wrong return value would sometimes cause ROLLBACK to crash
in an assertion failure (trying to parse garbage from the free area at
the end of the insert_undo log page) if the TRX_UNDO_RENAME_TABLE record
was the very last thing that was written to the insert_undo log. This
would occasionally happen when an ALTER TABLE operation is rolled
back due to invalid FOREIGN KEY constraints in the innodb.innodb test.
In these tests, the error ER_ERROR_ON_RENAME (1025) would be returned
at the end of the ALGORITHM=COPY operation of ALTER TABLE.
trx_undo_page_report_modify(): For SPATIAL INDEX, keep logging
updated off-page columns twice, so that
the minimum bounding rectangle (MBR) will be logged.
Avoiding the redundant logging would require larger changes
to the undo log format.
row_build_index_entry_low(): Handle SPATIAL_UNKNOWN more robustly,
by refusing to purge the record from the spatial index.
We can get this code when processing old undo log from 10.2.10 or
10.2.11 (the releases affected by MDEV-14799, which was a regression
from MDEV-14051).
The InnoDB background tasks can modify tables while LOCK TABLES...WRITE
is in effect. The purge of InnoDB history always worked like this in
MariaDB, but in MySQL 5.7 it sometimes yields to LOCK TABLES.
Also, make gcol.innodb_virtual_index run the purge for an UPDATE
before DROP TABLE is executed.