This is another follow-up fix to
commit b393e2cb0c
which turned out to be still broken.
Replace the C++11 keyword 'constexpr' with #define.
debug_sync_t::str: Remove the zero-length array.
Replace sync->str with reinterpret_cast<char*>(&sync[1]).
Remove unused variables and type mismatch that was introduced
in commit b393e2cb0c
Also, fix a typo in the documentation of the parameter, and
update the test.
In MDEV-11369 (instant ADD COLUMN) in MariaDB Server 10.3,
we introduced the hidden metadata record that must be the
first record in the clustered index if and only if
index->is_instant() holds.
To catch MDEV-19783, in
commit ed0793e096 and
commit 99dc40d6ac
we added some assertions to find cases where
the metadata record is missing while it should not be, or a
record exists when it should not. Those assertions were invalid
when traversing the PAGE_FREE list. That list can contain anything;
we must only be able to determine the successor and the size of
each garbage record in it.
page_validate(), page_simple_validate_old(), page_simple_validate_new():
Do not invoke page_rec_get_next_const() for traversing the PAGE_FREE
list, but instead use a lower-level accessor that does not attempt to
validate the REC_INFO_MIN_REC_FLAG.
page_copy_rec_list_end_no_locks(),
page_copy_rec_list_start(), page_delete_rec_list_start():
Add assertions.
btr_page_get_split_rec_to_left(): Remove a redundant return value,
and make the output parameter the return value.
btr_page_get_split_rec_to_right(), btr_page_split_and_insert(): Clean up.
btr_cur_pessimistic_delete(): code changed in a way that allows
to put more REC_INFO_MIN_REC_FLAG assertions inside btr_set_min_rec_mark().
Without that change tests innodb.innodb-table-online,
innodb.temp_table_savepoint and innodb_zip.prefix_index_liftedlimit fail.
Removed basically duplicated page_zip_validate() calls
which fails because of temporary(!) invariant violation.
That fixed innodb_zip.wl5522_debug_zip and
innodb_zip.prefix_index_liftedlimit
Until now, InnoDB inefficiently compared the aligned fields
FIL_PAGE_PREV, FIL_PAGE_NEXT to the byte-order-agnostic value FIL_NULL.
This is a backport of 32170f8c6d
from MariaDB Server 10.3.
Problem:
=======
During dropping of fts index, InnoDB waits for fts_optimize_remove_table()
and it holds dict_sys->mutex and dict_operaiton_lock even though the
table id is not present in the queue. But fts_optimize_thread does wait
for dict_sys->mutex to process the unrelated table id from the slot.
Solution:
========
Whenever table is added to fts_optimize_wq, update the fts_status
of in-memory fts subsystem to TABLE_IN_QUEUE. Whenever drop index
wants to remove table from the queue, it can check the fts_status
to decide whether it should send the MSG_DELETE_TABLE to the queue.
Removed the following functions because these are all deadcode.
dict_table_wait_for_bg_threads_to_exit(),
fts_wait_for_background_thread_to_start(),fts_start_shutdown(), fts_shudown().
Backport the applicable part of Sergey Vojtovich's commit
0ca2ea1a65 from MariaDB Server 10.3.
trx reference counter was updated under mutex and read without any
protection. This is both slow and unsafe. Use atomic operations for
reference counter accesses.
MySQL 5.7.9 (and MariaDB 10.2.2) introduced a race condition
between InnoDB transaction commit and the conversion of implicit
locks into explicit ones.
The assertion failure can be triggered with a test that runs
3 concurrent single-statement transactions in a loop on a simple
table:
CREATE TABLE t (a INT PRIMARY KEY) ENGINE=InnoDB;
thread1: INSERT INTO t SET a=1;
thread2: DELETE FROM t;
thread3: SELECT * FROM t FOR UPDATE; -- or DELETE FROM t;
The failure scenarios are like the following:
(1) The INSERT statement is being committed, waiting for lock_sys->mutex.
(2) At the time of the failure, both the DELETE and SELECT transactions
are active but have not logged any changes yet.
(3) The transaction where the !other_lock assertion fails started
lock_rec_convert_impl_to_expl().
(4) After this point, the commit of the INSERT removed the transaction from
trx_sys->rw_trx_set, in trx_erase_lists().
(5) The other transaction consulted trx_sys->rw_trx_set and determined
that there is no implicit lock. Hence, it grabbed the lock.
(6) The !other_lock assertion fails in lock_rec_add_to_queue()
for the lock_rec_convert_impl_to_expl(), because the lock was 'stolen'.
This assertion failure looks genuine, because the INSERT transaction
is still active (trx->state=TRX_STATE_ACTIVE).
The problematic step (4) was introduced in
mysql/mysql-server@e27e0e0bb7
which fixed something related to MVCC (covered by the test
innodb.innodb-read-view). Basically, it reintroduced an error
that had been mentioned in an earlier commit
mysql/mysql-server@a17be6963f:
"The active transaction was removed from trx_sys->rw_trx_set prematurely."
Our fix goes along the following lines:
(a) Implicit locks will released by assigning
trx->state=TRX_STATE_COMMITTED_IN_MEMORY as the first step.
This transition will no longer be protected by lock_sys_t::mutex,
only by trx->mutex. This idea is by Sergey Vojtovich.
(b) We detach the transaction from trx_sys before starting to release
explicit locks.
(c) All callers of trx_rw_is_active() and trx_rw_is_active_low() must
recheck trx->state after acquiring trx->mutex.
(d) Before releasing any explicit locks, we will ensure that any activity
by other threads to convert implicit locks into explicit will have ceased,
by checking !trx_is_referenced(trx). There was a glitch
in this check when it was part of lock_trx_release_locks(); at the end
we would release trx->mutex and acquire lock_sys->mutex and trx->mutex,
and fail to recheck (trx_is_referenced() is protected by trx_t::mutex).
(e) Explicit locks can be released in batches (LOCK_RELEASE_INTERVAL=1000)
just like we did before.
trx_t::state: Document that the transition to COMMITTED is only
protected by trx_t::mutex, no longer by lock_sys_t::mutex.
trx_rw_is_active_low(), trx_rw_is_active(): Document that the transaction
state should be rechecked after acquiring trx_t::mutex.
trx_t::commit_state(): New function to change a transaction to committed
state, to release implicit locks.
trx_t::release_locks(): New function to release the explicit locks
after commit_state().
lock_trx_release_locks(): Move much of the logic to the caller
(which must invoke trx_t::commit_state() and trx_t::release_locks()
as needed), and assert that the transaction will have locks.
trx_get_trx_by_xid(): Make the parameter a pointer to const.
lock_rec_other_trx_holds_expl(): Recheck trx->state after acquiring
trx->mutex, and avoid a redundant lookup of the transaction.
lock_rec_queue_validate(): Recheck impl_trx->state while holding
impl_trx->mutex.
row_vers_impl_x_locked(), row_vers_impl_x_locked_low():
Document that the transaction state must be rechecked after
trx_mutex_enter().
trx_free_prepared(): Adjust for the changes to lock_trx_release_locks().
Revert part of fa2a74e08d.
trx_reference(): Remove, and merge the relevant part to the only caller
trx_rw_is_active(). If the statements trx = NULL; were ever executed,
the function would have dereferenced a NULL pointer and crashed in
trx_mutex_exit(trx). Hence, those statements must have been unreachable,
and they can be replaced with debug assertions.
trx_rw_is_active(): Avoid unnecessary acquisition and release of trx->mutex
when do_ref_count=false.
lock_trx_release_locks(): Do not reset trx->id=0. Had the statement been
necessary, we would have experienced crashes in trx_reference().
ha_innobase::open(): Always ignore problems with FOREIGN KEY constraints
(pass DICT_ERR_IGNORE_FK_NOKEY), no matter whether foreign_key_checks
is enabled. Instead, we must report errors when enforcing the FOREIGN KEY
constraints. As a result of ignoring these errors, the tables will be
loaded with dict_foreign_t objects whose foreign_index or referenced_index
will be NULL.
Also, pass DICT_ERR_IGNORE_FK_NOKEY instead of DICT_ERR_IGNORE_NONE
to dict_table_open_on_id_low() in many other cases. Notably, on
CREATE TABLE and ALTER TABLE, we will keep validating the FOREIGN KEY
constraints as before.
dict_table_open_on_name(): If no other flags than
DICT_ERR_IGNORE_FK_NOKEY are set, refuse access to unreadable tables.
Some encryption tests rely on this code path.
For the DML code path, we used to have the problem that when
one of the indexes was missing in dict_foreign_t, we would ignore
the FOREIGN KEY constraint altogether. The following changes
address that.
row_ins_check_foreign_constraints(): Add the parameter pk.
For the primary key, consider also foreign key constraints for which
foreign->foreign_index=NULL (no underlying index is available).
row_ins_check_foreign_constraint(): Report errors also for !check_ref.
Remove a redundant check for srv_read_only_mode.
row_ins_foreign_report_add_err(): Tolerate foreign->foreign_index=NULL.
fkerr_t: Errors for the foreign key checks. Replaces ulint,
which used #define that looked like dberr_t literals.
wsrep_dict_foreign_find_index(): Remove. Use
dict_foreign_find_index() instead, with default parameters.
dict_foreign_push_index_error(): Do not add redundant quotes
around quoted table names.
========
During ibd file creation, InnoDB flushes the page0 without crypt
information. During recovery, InnoDB encounters encrypted page read
before initialising the crypt data of the tablespace. So it leads t
corruption of page and doesn't allow innodb to start.
Solution:
=========
Write crypt_data information in page0 while creating .ibd file creation.
During recovery, crypt_data will be initialised while processing
MLOG_FILE_NAME redo log record.
MDEV-17614 flags INSERT…ON DUPLICATE KEY UPDATE unsafe for statement-based
replication when there are multiple unique indexes. This correctly fixes
something whose attempted fix in MySQL 5.7
in mysql/mysql-server@c93b0d9a97
caused lock conflicts. That change was reverted in MySQL 5.7.26
in mysql/mysql-server@066b6fdd43
(with a substantial amount of other changes).
In MDEV-17073 we already disabled the unfortunate MySQL change when
statement-based replication was not being used. Now, thanks to MDEV-17614,
we can actually remove the change altogether.
This reverts commit 8a346f31b9 (MDEV-17073)
and mysql/mysql-server@c93b0d9a97 while
keeping the test cases.
The general reason why innodb redo log file is limited by 512G is that
log_block_convert_lsn_to_no() returns value limited by 1G. But there is no
need to have unique log block numbers in log group. The fix removes 512G
limit and limits log group size by
(uint32_t maximum value) * (minimum page size), which, in turns, can be
removed if fil_io() is no longer used for innodb redo log io.
Non-owning reference to elements.
Use it as function argument instead of pointer+size pair or instead of
const std::vector<T>.
Do not use it for strings!
More info is here http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
Or just google about it.
fts_sync(): Remove the constant parameter has_dict=false.
fts_sync_table(): Remove the constant parameter has_dict=false,
and the redundant parameter unlock_cache = !wait.
Make wait=true the default parameter.
The function pointer ut_timer() was only used by the
InnoDB defragmenting thread. Let InnoDB use a single monotonic
high-precision timer, my_interval_timer() [in nanoseconds],
occasionally wrapped by microsecond_interval_timer().
srv_defragment_interval: Change from "timer" units to nanoseconds.
This concludes the InnoDB time function cleanup that was
motivated by MDEV-14154. Only ut_time_ms() will remain for now,
wrapping my_interval_timer().
lock_t::requested_time: Document what the field is used for.
lock_t::wait_time: Document that the field is only used for
diagnostics and may be garbage if the system time is being adjusted.
srv_slot_t::suspend_time: Document that this is duplicating
trx_lock_t::wait_started.
lock_table_print(), lock_rec_print(): Declare in static scope.
Add a parameter for the current time.
lock_deadlock_check_and_resolve(), lock_deadlock_lock_print(),
lock_deadlock_joining_trx_print():
Add a parameter for the current time.
srv_slot_t::suspend_time, os_aio_slot_t::reservation_time,
sync_cell_t::reservation_time: Explain what could happen
if the system time has is being adjusted.
fts_sync_t::start_time: Document that the field is mostly unused.
Replace ut_usectime() with my_interval_timer(),
which is equivalent, but monotonically counting nanoseconds
instead of counting the microseconds of real time.
os_event_wait_time_low(): Use my_hrtime() instead of ut_usectime().
FIXME: Set a clock attribute on the condition variable that allows
a monotonic clock to be chosen as the time base, so that the wait
is immune to adjustments of the system clock.
There is one directly applicable change to InnoDB:
commit 739f5239f1 in the
5.5 branch will be merged before the next MariaDB releases.
Another potentially applicable change will be tracked
separately as MDEV-20126.
Thus, here we only update the InnoDB version number and do
not change anything else.
This is a regression due to MDEV-16515 that affects some versions in
the MariaDB 10.1 server series starting with 10.1.35, and possibly
all versions starting with 10.2.17, 10.3.8, and 10.4.0.
The idea of MDEV-16515 is to allow DROP TABLE to be interrupted,
in case it was stuck due to some concurrent activity. We already
made some cases of internal DROP TABLE immune to kill in MDEV-18237,
MDEV-16647, MDEV-17470. We must include the cleanup of
CREATE TABLE...SELECT in the list of such internal DROP TABLE.
ha_innobase::delete_table(): Pass create_failed=true if the current
SQL statement is CREATE, so that the table will be dropped.
row_drop_table_for_mysql(): If create_failed=true, do not allow
the operation to be interrupted.
This is the race between DELETE and INSERT (or other any two operations accessing to the table).
What should happen in good case:
1. ALTER TABLE is issued. vc_templ->default_rec is initialized with temporary share's default_fields
2. temporary share is freed, but datadict is still there, with garbage in vc_templ->default_rec
3. DELETE is issued. It is first after ALTER TABLE finished.
4. ha_innobase::open() is called, ib_table->get_ref_count() should be one
5. we reinitialize vc_templ, so no garbage anymore
What actually happens:
3. DELETE is issued.
4. ha_innobase::open() is called and ib_table->get_ref_count() is 1
5. INSERT (or SELECT etc.) is issued in parallel
6. ha_innobase::open() is called and ib_table->get_ref_count() is 1
7. we check ib_table->get_ref_count() and it is 2 in both threads when we want reinitialize vc_templ
8. garbage is there
Fix:
* Do not store pointers to SHARE memory in table dict, copy it instead.
* But then we don't need to refresh it each time when refcount=1.