Analysis: When we have INSERT/REPLACE returning with qualified asterisk in the
RETURNING clause, '*' is not resolved properly because of wrong context.
context->table_list is NULL or has incorrect table because context->table_list
has tables from the FROM clause. For INSERT/REPLACE...SELECT...RETURNING,
context->table_list has table we are inserting from. While in other
INSERT/REPLACE syntax, context->table_list is NULL because there is no FROM
clause.
Fix: If filling fields instead of '*' for qualified asterisk in RETURNING,
use first_name_resolution_table for correct resolution of item.
row_ins_index_entry_set_vals(): Remove an assertion that trivially
holds because the 16-bit dict_col_t::len cannot represent the value
UNIV_PAGE_SIZE_MAX.
It is implementation-defined whether alignment requirements
that are larger than std::max_align_t (typically 8 or 16 bytes)
will be honored by the compiler and linker.
It turns out that on IBM AIX, both alignas() and MY_ALIGNED()
only guarantees alignment up to 16 bytes.
For some data structures, specifying alignment to the CPU
cache line size (typically 64 or 128 bytes) is a mere performance
optimization, and we do not really care whether the requested
alignment is guaranteed.
But, for the correct operation of direct I/O, we do require that
the buffers be aligned at a block size boundary.
field_ref_zero: Define as a pointer, not an array.
For innochecksum, we can make this point to unaligned memory;
for anything else, we will allocate an aligned buffer from the heap.
This buffer will be used for overwriting freed data pages when
innodb_immediate_scrub_data_uncompressed=ON. And exactly that code
hit an assertion failure on AIX, in the test innodb.innodb_scrub.
log_sys.checkpoint_buf: Define as a pointer to aligned memory
that is allocated from heap.
log_t::file::write_header_durable(): Reuse log_sys.checkpoint_buf
instead of trying to allocate an aligned buffer from the stack.
The lazy deletion of clean blocks from buf_pool.flush_list that was
introduced in commit 6441bc614a (MDEV-25113)
introduced a race condition around the function
buf_flush_relocate_on_flush_list().
The test innodb_zip.wl5522_debug_zip as well as the buffer pool
resizing tests would occasionally fail in debug builds due to
buf_pool.flush_list.count disagreeing with the actual length of the
doubly-linked list.
The safe procedure for relocating a block in buf_pool.flush_list should be
as follows, now that we implement lazy deletion from buf_pool.flush_list:
1. Acquire buf_pool.mutex.
2. Acquire the exclusive buf_pool.page_hash.latch.
3. Acquire buf_pool.flush_list_mutex.
4. Copy the block descriptor.
5. Invoke buf_flush_relocate_on_flush_list().
6. Release buf_pool.flush_list_mutex.
buf_flush_relocate_on_flush_list(): Assert that
buf_pool.flush_list_mutex is being held. Invoke
buf_page_t::oldest_modification() only once, using
std::memory_order_relaxed, now that the mutex protects us.
buf_LRU_free_page(), buf_LRU_block_remove_hashed(): Avoid
an unlock-lock cycle on hash_lock. (We must not acquire hash_lock
while already holding buf_pool.flush_list_mutex, because that
could lead to a deadlock due to latching order violation.)
SHOW PLUGINS has a more complete view of the installed
plugins into the server.
The mysql.user is a compatibility view that doesn't
show the complete authentication picture. Use global_priv.
Add `show create user` for default users to more clearly
represent its contents.
The performance_schema data related to instrument "wait/synch/mutex/threadpool/group_mutex" was incorrect. The root cause is the group_mutex was initialized in thread_group_init before registerd in PSI_register(mutex).
The fix is to put PSI_register before thread_group_init, which ensures the right order.
The problem is that array binding uses net buffer to read parameters for each
execution while each execiting with RETURNING write in the same buffer.
Solution is to allocate new net buffer to avoid changing buffer we are reading
from.
The root cause of the bug MDEV-26139 is the lack of NULL checking
on the variable `dq`.
Comments on if (dq && (!sq || sq > dq)) {...} else {...}:
* The if block corresponds to the case where parameters are
quoted by double quotes. In that case, a single quote doesn't
appear at all or only appears in the middle of double quotes.
* The else block corresponds to the case where parameters are
quoted by single quotes. In that case, a double quote doesn't
appear at all or only appears in the middle of single quotes.
* If the program reaches the if-else statement, `sq || dq` holds.
Thus, the negation of `dq && (!sq || sq > dq)` is equivalent to
`sq && (!dq || sq <= dq)`.
This bug could affect queries that had at least two references to a CTE that
used an embedded recursive CTE.
Starting from version 10.4 some code in With_element::clone_parsed_spec()
that assumed a certain order of selects after parsing the specification of
a CTE became not valid anymore. It could lead to global select lists where
some selects were missing. If a missing CTE happened to belong to the
recursive part of a recursive CTE some recursive table references were not
set as references to materialized derived tables and this caused a crash of
the server.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Port the following patch from MySQL:
commit 1b2e8ea269c80cb93cc79d8be934c40b1c58e947
Author: Kailasnath Nagarkar <kailasnath.nagarkar@oracle.com>
Date: Fri Nov 30 16:43:13 2018 +0530
Bug #20939184: INNODB: UNLOCK ROW COULD NOT FIND A 2 MODE
LOCK ON THE RECORD
Issue:
------
Consdier tables t1 and t2 such that t1 has multiple rows
and join condition for t1 left join t2 results in only
single row from t2.
In this case, access to table t2 is const since there
is a single row that qualifies the join condition.
However, while executing the query, attempt is made to
unlock t2's row multiple times.
The current algorithm to fetch rows approximates to:
1) Retrieve the row for t1.
2) Retrieve the row for t2.
3) Apply the join conditions.
a) If condition evaluates to true:
Project the row to the result.
b) If condition evaluates to false:
i) If t2's qep_tab->not_null_complement is true,
unlock t2's row.
ii) Null-complement the row by calling
"evaluate_null_complemented_join_record()". In
this function qep_tab->not_null_complement is
set to false.
The t2's only one row, that qualifies join condition,
is unlocked in Step i) when t1's row is evaluated to
false.
When t1's next row is also evaluated to false, another
attempt is made to unlock t2's already unlocked row.
This results in following error being logged in error.log:
"[ERROR] InnoDB: Unlock row could not find a 3 mode lock on
the record. Current statement:
select * from t1 left join t2 ......"
Solution:
---------
When a table's access method is "const", set record unlock
method for this table to do no operation.
buf_flush_relocate_on_flush_list(): Use dpage->physical_size()
because bpage->zip.ssize may already have been zeroed in
page_zip_set_size() invoked by buf_pool_t::realloc().
This would cause occasional failures of the test
innodb.innodb_buffer_pool_resize, which creates a
ROW_FORMAT=COMPRESSED table.
The replacement of buf_pool.page_hash with a different type of
hash table in commit 5155a300fa (MDEV-22871)
introduced a race condition with buffer pool resizing.
We have an execution trace where buf_pool.page_hash.array is changed
to point to something else while page_hash_latch::read_lock() is
executing. The same should also affect page_hash_latch::write_lock().
We fix the race condition by never resizing (and reallocating) the
buf_pool.page_hash. We assume that resizing the buffer pool is
a rare operation. Yes, there might be a performance regression if a
server is first started up with a tiny buffer pool, which is later
enlarged. In that case, the tiny buf_pool.page_hash.array could cause
increased use of the hash bucket lists. That problem can be worked
around by initially starting up the server with a larger buffer pool
and then shrinking that, until changing to a larger size again.
buf_pool_t::resize_hash(): Remove.
buf_pool_t::page_hash_table::lock(): Do not attempt to deal with
hash table resizing. If we really wanted that in a safe manner,
we would probably have to introduce a global rw-lock around the
operation, or at the very least, poll buf_pool.resizing, both of
which would be detrimental to performance.
In other ROW_FORMAT than REDUNDANT, the InnoDB record header
size calculation depends on dict_index_t::n_core_null_bytes.
In ROW_FORMAT=REDUNDANT, the record header always is 6 bytes
plus n_fields or 2*n_fields bytes, depending on the maximum
record size. But, during online ALTER TABLE, the log records
in the temporary file always use a format similar to
ROW_FORMAT=DYNAMIC, even omitting the 5-byte fixed-length part
of the header.
While creating a temporary file record for a ROW_FORMAT=REDUNDANT
table, InnoDB must refer to dict_index_t::n_nullable.
The field dict_index_t::n_core_null_bytes is only valid for
other than ROW_FORMAT=REDUNDANT tables.
The bug does not affect MariaDB 10.3, because only
commit 7a27db778e (MDEV-15563)
allowed an ALGORITHM=INSTANT change of a NOT NULL column to
NULL in a ROW_FORMAT=REDUNDANT table.
The fix was developed by Thirunarayanan Balathandayuthapani
and tested by Matthias Leich. The test case was simplified by me.
This is a backport of 161e4bfafd.
trans_rollback_to_savepoint(): Only release metadata locks (MDL)
if the storage engines agree, after the changes were already rolled back.
Ever since commit 3792693f31
and mysql/mysql-server@55ceedbc3f
we used to cheat here and always release MDL if the binlog is disabled.
MDL are supposed to prevent race conditions between DML and DDL also
when no replication is in use. MDL are supposed to be a superset of
InnoDB table locks: InnoDB table lock may only exist if the thread
also holds MDL on the table name.
In the included test case, ROLLBACK TO SAVEPOINT would wrongly release
the MDL on both tables and let ALTER TABLE proceed, even though the DML
transaction is actually holding locks on the table.
Until commit 1bd681c8b3 (MDEV-25506)
in MariaDB 10.6, InnoDB would often work around the locking violation
in a blatantly non-ACID way: If locks exist on a table that is being
dropped (in this case, actually a partition of a table that is being
rebuilt by ALTER TABLE), InnoDB could move the table (or partition)
into a queue, to be dropped after the locks and references had been
released. If the lock is not released and the original copy of the
table not dropped quickly enough, a name conflict could occur on
a subsequent ALTER TABLE.
The scenario of commit 3792693f31
is unaffected by this fix, because mysqldump
would use non-locking reads, and the transaction would not be holding
any InnoDB locks during the execution of ROLLBACK TO SAVEPOINT.
MVCC reads inside InnoDB are only covered by MDL and page latches,
not by any table or record locks.
FIXME: It would be nice if storage engines were specifically asked
which MDL can be released, instead of only offering a choice
between all or nothing. InnoDB should be able to release any
locks for tables that are no longer in trx_t::mod_tables, except
if another transaction had converted some implicit record locks
to explicit ones, before the ROLLBACK TO SAVEPOINT had been completed.
Reviewed by: Sergei Golubchik
A table rebuild that would truncate the default value of a
DATE column is expected to issue data truncation warnings.
But, these warnings are not being issued if the ADD COLUMN
is being executed with ALGORITHM=INSTANT. InnoDB sets the
warning of the field while assigning the default value
of the field during check_if_supported_inplace_alter().