In the SUX_LOCK_GENERIC implementation, we can remember at most
one pending exclusive lock request. If multiple exclusive lock
requests are pending, the WRITER_WAITING flag will be cleared when
the first waiting writer acquires the exclusive lock.
ssux_lock_low::update_lock(): If WRITER_WAITING is set, wake up
the writer even if the UPDATER flag is set, because the waiting
writer may be in the process of upgrading its U lock to X.
rw_lock::read_unlock(): Also indicate that an X lock waiter must
be woken up if an U lock exists.
This fix may cause unnecessary wake-ups and system calls, but this
is the best that we can do. Ideally we would use the MDEV-25404
idea of a separate 'writer' mutex, but there is no portable way to
request that a non-recursive mutex be created, and InnoDB requires
the ability to transfer buf_block_t::lock ownership to an I/O thread.
To allow problems like this to be caught more reliably in the future,
we add a unit test for srw_mutex, srw_lock, ssux_lock, sux_lock.
row_ins_clust_index_entry_low(): Do not enable bulk insert if
any record locks exist on the table. Bulk insert is assumed to
be covered only by an exclusive table lock, with no row-level
locking or undo logging.
While MDEV-24589 causes the regression
MDEV-25491 Race condition between DROP TABLE and purge of
SYS_INDEXES record
we must not revert it from 10.6, because MDEV-24589 is a prerequisite for
MDEV-25180 Atomic ALTER TABLE
that is targeting 10.6. The regression will be dealt with later.
The U-to-X upgrade turned out to be incorrect. A debug assertion
failed in wr_wait(), called from mtr_defer_drop_ahi() in a stress
test with innodb_adaptive_hash_index=ON.
A correct upgrade procedure ought to be readers.fetch_add(WRITER-1)
to register ourselves as a WRITER (or waiting writer) and to release
the reference that was being held for the U lock.
Thanks to Matthias Leich for catching the problem.
This reverts commit e731a28394.
A crash occurred during the test stress.ddl_innodb when
fil_delete_tablespace() for DROP TABLE was waiting in
fil_check_pending_operations() and a purge thread for handling
an earlier DROP INDEX was attempting to load the index root page
in btr_free_if_exists() and btr_free_root_check(). The function
buf_page_get_gen() would write out several times
"trying to read...being-dropped tablespace"
before giving up and committing suicide.
It turns out that during any page access in btr_free_if_exists(),
fil_space_t::set_stopping() could have been invoked by
fil_check_pending_operations(), as part of dropping the tablespace.
Preventing this race condition would require extensive changes
to the allocation code or some locking mechanism that would ensure
that we only set the flag if btr_free_if_exists() is not in progress.
Either way, that could be a too risky change in a GA release.
Because MDEV-24589 is not strictly necessary in the 10.5 release
series and it only is a requirement for MDEV-25180 in a later
major release, we will revert the change from 10.5.
Issue:- Since there is no waiting for the actual disconnection of the con_tmp
(which does XA prepare of test1), We can have a issue when test1 is not
prepared and we are calling rollback on test1 , giving XAER_NOTA: Unknown XID
error
Solution:- Wait for the complete disconnection of con_tmp
buf_resize_callback(): Correct an invalid assertion, and enable
the assertion in debug builds only.
Between buf_resize_start() and buf_resize_shutdown(),
srv_shutdown_state must be less than SRV_SHUTDOWN_CLEANUP.
The incorrect assertion had been introduced in
commit 5e62b6a5e0 (MDEV-16264).
As a result, the server could crash if shutdown was initiated
concurrently with initiating a change of innodb_buffer_pool_size.
to mysql interpreter
InnoDB returns uninitialized statistics to mysql interpreter
when background thread is opening the table. So it leads to
assertion failure. In that case, InnoDB avoid sending
innodb statistics information to mysql interpreter.
There was race between a committing transaction and the following in binlog
order FLUSH LOGS that could create a 2nd Binlog checkpoint (BCP) event
in the new file *before* the first logged-in-old-binlog transaction gets committed in
Innodb. That would cause the transaction loss at recovery, should
the server stop right after the BCP.
The race is tackled by enforcing the necessary set of mutexes to be acquired
by FLUSH-LOGS handler in the correct order (of the group commit leader
pattern).
Note, there remain two cases where a similar race is still possible:
- the above race as it is when the server is run with ("unlikely")
non-default `--binlog-optimize-thread-scheduling=0` (MDEV-24530), and
- at unlikely event of bin-logging of Incident event (MDEV-24531) that
also triggers binlog rotation,
in both cases though with lesser chances after the current fixes.
Pushing LIMIT to temp aggregation table is possible, but not when WITH
TIES is used. In a degenerate case with constant ORDER BY, the constant
gets removed and the code assumed the limit is push-able.
Ensure that if WITH TIES is present, that this does not happen.
This commit implements the standard SQL extension
OFFSET start { ROW | ROWS }
[FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }]
To achieve this a reserved keyword OFFSET is introduced.
The general logic for WITH TIES implies:
1. The number of rows a query returns is no longer known during optimize
phase. Adjust optimizations to no longer consider this.
2. During end_send make use of an "order Cached_item"to compare if the
ORDER BY columns changed. Keep returning rows until there is a
change. This happens only after we reached the row limit.
3. Within end_send_group, the order by clause was eliminated. It is
still possible to keep the optimization of using end_send_group for
producing the final result set.
The function was originally introduced by eb0804ef5e
MDEV-18553: MDEV-16327 pre-requisits part 1: isolation of LIMIT/OFFSET handling
set_unlimited had an overloaded notion of both clearing the offset value
and the limit value. The code is used for SQL_CALC_ROWS option to
disable the limit clause after the limit is reached, while at the same
time the calling code suppreses sending of rows.
Proposed solution:
Dedicated clear method for query initialization (to ensure no garbage
remains between executions).
Dedicated set_unlimited that only alters the limit value.
Replace
* select_lex::offset_limit
* select_lex::select_limit
* select_lex::explicit_limit
with select_lex::Lex_select_limit
The Lex_select_limit already existed with the same elements and was used in
by the yacc parser.
This commit is in preparation for FETCH FIRST implementation, as it
simplifies a lot of the code.
Additionally, the parser is simplified by making use of the stack to
return Lex_select_limit objects.
Cleanup of init_query() too. Removes explicit_limit= 0 as it's done a bit later
in init_select() with limit_params.empty()
Address review input: switch Name_resolution_context::ignored_tables from
table_map to a list of TABLE_LIST objects. The rationale is that table
bits may be changed due to query rewrites, etc, which may potentially
require updating ignored_tables.