Commit graph

193,262 commits

Author SHA1 Message Date
Marko Mäkelä
ed6b230744 MDEV-25919 preparation: Remove trx_t::internal
With commit 1bd681c8b3 (MDEV-25506)
it no longer is necessary to run DDL and DML operations in
separate transactions. Let us remove the flag trx_t::internal.
Dictionary transactions will be distinguished by trx_t::dict_operation.
2021-07-01 17:51:55 +03:00
Marko Mäkelä
0a67b15a9d Cleanup: Remove pointer indirection for trx_t::xid
The trx_t::xid is always allocated, so we might as well allocate it
directly in the trx_t object to improve the locality of reference.
2021-07-01 16:38:24 +03:00
Marko Mäkelä
83234719f1 MDEV-24671 fixup: Fix an off-by-one error
In commit e71e613353 we
accidentally made innodb_lock_wait_timeout=100000000
a "literal" value, not the smallest special value that
would mean "infinite" timeout.
2021-07-01 16:37:01 +03:00
Marko Mäkelä
161e4bfafd MDEV-25902 Unexpected ER_LOCK_WAIT_TIMEOUT and result
trans_rollback_to_savepoint(): Only release metadata locks (MDL)
if the storage engines agree, after the changes were already rolled back.

Ever since commit 3792693f31
and mysql/mysql-server@55ceedbc3f
we used to cheat here and always release MDL if the binlog is disabled.

MDL are supposed to prevent race conditions between DML and DDL also
when no replication is in use. MDL are supposed to be a superset of
InnoDB table locks: InnoDB table lock may only exist if the thread
also holds MDL on the table name.

In the included test case, ROLLBACK TO SAVEPOINT would wrongly release
the MDL on both tables and let ALTER TABLE proceed, even though the DML
transaction is actually holding locks on the table.

Until commit 1bd681c8b3 (MDEV-25506)
InnoDB worked around the locking violation in a blatantly non-ACID way:
If locks exist on a table that is being dropped (in this case, actually
a partition of a table that is being rebuilt by ALTER TABLE), InnoDB
would move the table (or partition) into a queue, to be dropped after
the locks and references had been released.

The scenario of commit 3792693f31
is unaffected by this fix, because mariadb-dump (a.k.a. mysqldump)
would use non-locking reads, and the transaction would not be holding
any InnoDB locks during the execution of ROLLBACK TO SAVEPOINT.
MVCC reads inside InnoDB are only covered by MDL and page latches,
not by any table or record locks.

FIXME: It would be nice if storage engines were specifically asked
which MDL can be released, instead of only offering a choice
between all or nothing. InnoDB should be able to release any
locks for tables that are no longer in trx_t::mod_tables, except
if another transaction had converted some implicit record locks
to explicit ones, before the ROLLBACK TO SAVEPOINT had been completed.

Reviewed by: Sergei Golubchik
2021-07-01 10:35:32 +03:00
Marko Mäkelä
8c5c3a4594 MDEV-26067 innodb_lock_wait_timeout values above 100,000,000 are useless
The practical maximum value of the parameter innodb_lock_wait_timeout
is 100,000,000. Any value larger than that specifies an infinite timeout.

Therefore, we should make 100,000,000 the maximum value of the parameter.
2021-07-01 10:31:08 +03:00
Marko Mäkelä
ce1c957ab1 Speed up the test innodb.lock_insert_into_empty
Let us use innodb_lock_wait_timeout=0 for an immediate timeout.
Also, do not override the timeout in the default connection,
so that further tests will use the default setting.
2021-07-01 10:04:47 +03:00
Sergei Golubchik
add782a13e fix JSON_ARRAYAGG not to over-quote json in joins
This replaces 8711adb786

if a temptable field is created for some json expression (is_json_type()
returns true), make this temptable field a proper json field.

A field is a json field (see Item_field::is_json_type()) if it
has a CHECK constraint of JSON_VALID(field).

Note that it will never be actually checked for temptable fields,
so it won't cause a run-time slowdown.
2021-06-30 22:09:19 +02:00
Sergei Golubchik
b62672af72 MDEV-26054 Server crashes in Item_func_json_arrayagg::get_str_from_field
Revert "fix JSON_ARRAYAGG not to over-quote json in joins"
This removes 8711adb786 but keeps the test case.
A different fix is coming up.

Because args can be Item_field's that are later
replaced by Item_direct_view_ref to the actual field.
While Item_field preserved in orig_args will stay unfixed
with item->field==NULL and no metadata
2021-06-30 22:08:53 +02:00
Sergei Golubchik
83684fc9a4 MDEV-23004 When using GROUP BY with JSON_ARRAYAGG with joint table, the square brackets are not included
make test results stable

followup for 98c7916f0f
2021-06-30 09:34:27 +02:00
Sergei Golubchik
8711adb786 fix JSON_ARRAYAGG not to over-quote json in joins
use metadata (in particular is_json() property) of the original
argument item, even if the actual argument was later replaced
with an Item_temptable_field
2021-06-30 09:34:27 +02:00
Sergei Golubchik
c8fb911e9c fix main.lock_kill crashes in --ps --embed
when checking whether thd wasn't killed before this
emb_advanced_command(), take into account that it
could've been killed before the *previous*
emb_advanced_command(). That is, the previous one has
already set thd to NULL and this one only wanted a COM_STMT_RESET
after a failure.
2021-06-30 09:34:26 +02:00
Sergei Golubchik
771f3cf995 make --rr work with InnoDB again
Since 420f8e24ab InnoDB uses O_DIRECT by default
2021-06-30 09:34:26 +02:00
Sergei Golubchik
6fab256bc8 disable spider/bugfix.wait_timeout 2021-06-30 09:34:26 +02:00
Sergei Golubchik
fa5c314377 fix spider tests for --ps in 10.6
see also c3a1ba0fd9, 068246c006, 690ae1de45
2021-06-30 09:34:26 +02:00
Marko Mäkelä
ff9150f3c5 MDEV-25942: Assertion failure in trx_t::drop_table()
trx_t::drop_table(): Relax also another assertion that would fail
due to an AUTO_INCREMENT lock that is being held by the current
test case. This should have been part of
commit 63e9a05440.
2021-06-30 09:00:52 +03:00
Alexey Botchkov
c2ebe8147d MDEV-25837 Assertion `thd->locked_tables_mode == LTM_NONE' failed in Locked_tables_list::init_locked_tables.
don't do prelocking for the FLUSH command.
2021-06-29 16:03:26 +04:00
Marko Mäkelä
0237e9bb65 MDEV-26041 Recovery failure due to delete-marked SYS_FIELDS record
trx_t::drop_table(): Delete-mark the SYS_TABLES and SYS_INDEXES
record before delete-marking any SYS_COLUMNS or SYS_FIELDS records.
Otherwise, dict_load_indexes() could fail on recovery. This fixes up
commit 1bd681c8b3 (MDEV-25506).
2021-06-29 15:20:33 +03:00
Marko Mäkelä
e04bbf73dc MDEV-25496 Assertion 'trx->bulk_insert' failed on INSERT
row_get_prebuilt_insert_row(): Remove some fallback code that had been
added in commit 8ea923f55b (MDEV-24818).
It seems that after all, statement boundaries are being reliably
indicated by ha_innobase::start_stmt() or
(for partitioned tables) ha_innobase::external_lock().
2021-06-29 15:20:33 +03:00
Marko Mäkelä
4b0070f642 MDEV-26029: Implement my_test_if_thinly_provisioned() for ScaleFlux
This is based on code that was contributed by Ning Zheng and Ray Kuan
from ScaleFlux.
2021-06-29 15:20:16 +03:00
Marko Mäkelä
30edd5549d MDEV-26029: Sparse files are inefficient on thinly provisioned storage
The MariaDB implementation of page_compressed tables for InnoDB used
sparse files. In the worst case, in the data file, every data page
will consist of some data followed by a hole. This may be extremely
inefficient in some file systems.

If the underlying storage device is thinly provisioned (can compress
data on the fly), it would be good to write regular files (with sequences
of NUL bytes at the end of each page_compressed block) and let the
storage device take care of compressing the data.

For reads, sparse file regions and regions containing NUL bytes will be
indistinguishable.

my_test_if_disable_punch_hole(): A new predicate for detecting thinly
provisioned storage. (Not implemented yet.)

innodb_atomic_writes: Correct the comment.

buf_flush_page(): Support all values of fil_node_t::punch_hole.
On a thinly provisioned storage device, we will always write
NUL-padded innodb_page_size bytes also for page_compressed tables.

buf_flush_freed_pages(): Remove a redundant condition.

fil_space_t::atomic_write_supported: Remove. (This was duplicating
fil_node_t::atomic_write.)

fil_space_t::punch_hole: Remove. (Duplicated fil_node_t::punch_hole.)

fil_node_t: Remove magic_n, and consolidate flags into bitfields.
For punch_hole we introduce a third value that indicates a
thinly provisioned storage device.

fil_node_t::find_metadata(): Detect all attributes of the file.
2021-06-29 15:18:22 +03:00
Marko Mäkelä
b11aa0df85 Merge 10.5 into 10.6 2021-06-29 15:18:18 +03:00
Marko Mäkelä
617dee3488 MDEV-26042 Atomic write capability is not detected correctly
my_init_atomic_write(): Detect all forms of SSD, in case multiple
types of devices are installed in the same machine.
This was broken in commit ed008a74cf
and further in commit 70684afef2.

SAME_DEV(): Match block devices, ignoring partition numbers.

Let us use stat() instead of lstat(), in case someone has a symbolic
link in /dev.

Instead of reporting errors with perror(), let us use fprintf(stderr)
with the file name, the impact of the error, and the strerror(errno).
Because this code is specific to Linux, we may depend on the
GNU libc/uClibc/musl extension %m for strerror(errno).
2021-06-29 15:04:27 +03:00
Marko Mäkelä
3d15e3c085 MDEV-22640 fixup: clang -Winconsistent-missing-override 2021-06-29 15:02:10 +03:00
Andrei Elkin
390014781b MDEV-26031 unnessary xid logging in one phase commit case
The bug was originally observed as hanging binlog background thread at
shutdown similar to one of MDEV-21120.
It occurred through unnessary xid logging in 1pc execution.

Two parts of the issue are fixed.  Per engine loop by involved engine
with attempt to mark a group requiring xid unlogging gets corrected in
two ways. Do not execute it when the termination event is irrelevant
for recovery, does not have xid in particular.  Do not break the loop
anymore unconditionally at the end of the 1st iteration.
2021-06-29 14:13:37 +03:00
Vicențiu Ciorbaru
c29f45ce77 MDEV-25481 Memory leak in Cached_item_str::Cached_item_str WITH TIES involving a blob
Make sure to call cached item's destructors.
2021-06-29 00:13:57 +03:00
Marko Mäkelä
63e9a05440 MDEV-25942: Assertion !table.n_waiting_or_granted_auto_inc_locks
trx_t::drop_table(): Remove a bogus debug assertion.
The current transaction may hold an AUTO_INCREMENT
lock on the table while
CREATE TABLE t2 (pk INT AUTO_INCREMENT PRIMARY KEY) ENGINE=InnoDB SELECT...
is being rolled back due to lock wait timeout.
Remaining debug assertions will check that only this transaction
is holding locks on the table, and that one of them is an exclusive lock.
2021-06-28 15:37:29 +03:00
Alexey Botchkov
98c7916f0f MDEV-23004 When using GROUP BY with JSON_ARRAYAGG with joint table, the
square brackets are not included.

Item_func_json_arrayagg::copy_or_same() should be implemented.
2021-06-28 11:14:18 +04:00
Marko Mäkelä
891a927e80 Merge 10.5 into 10.6 2021-06-26 11:53:28 +03:00
Marko Mäkelä
fc2ff46469 MDEV-26017: Assertion stat.flush_list_bytes <= curr_pool_size
buf_flush_relocate_on_flush_list(): If we are removing the block from
buf_pool.flush_list, subtract its size from buf_pool.stat.flush_list_bytes.
This fixes a regression that was introduced in
commit 22b62edaed (MDEV-25113).
2021-06-26 11:52:25 +03:00
Marko Mäkelä
aa95c42360 Cleanup: Remove unused mtr_block_dirtied 2021-06-26 11:17:05 +03:00
Marko Mäkelä
759deaa0a2 MDEV-26010 fixup: Use acquire/release memory order
In commit 5f22511e35 we depend on
Total Store Ordering. For correct operation on ISAs that implement
weaker memory ordering, we must explicitly use release/acquire stores
and loads on buf_page_t::oldest_modification_ to prevent a race condition
when buf_page_t::list does not happen to be on the same cache line.

buf_page_t::clear_oldest_modification(): Assert that the block is
not in buf_pool.flush_list, and use std::memory_order_release.

buf_page_t::oldest_modification_acquire(): Read oldest_modification_
with std::memory_order_acquire. In this way, if the return value is 0,
the caller may safely assume that it will not observe the buf_page_t
as being in buf_pool.flush_list, even if it is not holding
buf_pool.flush_list_mutex.

buf_flush_relocate_on_flush_list(), buf_LRU_free_page():
Invoke buf_page_t::oldest_modification_acquire().
2021-06-26 11:16:40 +03:00
Marko Mäkelä
a8350cfb5e Merge 10.5 into 10.6 2021-06-24 21:56:44 +03:00
Marko Mäkelä
5f22511e35 MDEV-26010: Assertion lsn > 2 failed in buf_pool_t::get_oldest_modification
In commit 22b62edaed (MDEV-25113)
we introduced a race condition. buf_LRU_free_page() would read
buf_page_t::oldest_modification() as 0 and assume that
buf_page_t::list can be used (for attaching the block to the
buf_pool.free list). In the observed race condition,
buf_pool_t::delete_from_flush_list() had cleared the field,
and buf_pool_t::delete_from_flush_list_low() was executing
concurrently with buf_LRU_block_free_non_file_page(),
which resulted in buf_pool.flush_list.end becoming corrupted.

buf_pool_t::delete_from_flush_list(), buf_flush_relocate_on_flush_list():
First remove the block from buf_pool.flush_list, and only then
invoke buf_page_t::clear_oldest_modification(), to ensure that
reading oldest_modification()==0 really implies that the block
no longer is in buf_pool.flush_list.
2021-06-24 21:55:10 +03:00
Marko Mäkelä
e329dc8d86 MDEV-25948 fixup: Demote a warning to a note
buf_dblwr_t::recover(): Issue a note, not a warning, about
pages whose FIL_PAGE_LSN is in the future. This was supposed to be
part of commit 762bcb81b5 (MDEV-25948)
but had been accidentally omitted.
2021-06-24 18:51:05 +03:00
Marko Mäkelä
82fe83a34c MDEV-26012 InnoDB purge and shutdown hangs after failed ALTER TABLE
ha_innobase::commit_inplace_alter_table(): Invoke
purge_sys.resume_FTS() on all error handling paths
if purge_sys.stop_FTS() had been called.

This fixes a regression that had been introduced in
commit 1bd681c8b3 (MDEV-25506).
2021-06-24 16:07:27 +03:00
Marko Mäkelä
033e29b6a1 MDEV-26007 Rollback unnecessarily initiates redo log write
trx_t::commit_in_memory(): Do not initiate a redo log write if
the transaction has no visible effect. If anything for this
transaction had been made durable, crash recovery will roll back
the transaction just fine even if the end of ROLLBACK is not
durably written.

Rollbacks of transactions that are associated with XA identifiers
(possibly internally via the binlog) will always be persisted.
The test rpl.rpl_gtid_crash covers this.
2021-06-24 15:00:34 +03:00
Marko Mäkelä
b4c9cd201b Merge 10.5 into 10.6 2021-06-24 12:39:34 +03:00
Marko Mäkelä
60ed479711 MDEV-26004 Excessive wait times in buf_LRU_get_free_block()
buf_LRU_get_free_block(): Initially wait for a single block to be
freed, signaled by buf_pool.done_free. Only if that fails and no
LRU eviction flushing batch is already running, we initiate a
flushing batch that should serve all threads that are currently
waiting in buf_LRU_get_free_block().

Note: In an extreme case, this may introduce a performance regression
at larger numbers of connections. We observed this in sysbench
oltp_update_index with 512MiB buffer pool, 4GiB of data on fast NVMe,
and 1000 concurrent connections, on a 20-thread CPU. The contention point
appears to be buf_pool.mutex, and the improvement would turn into a
regression somewhere beyond 32 concurrent connections.

On slower storage, such regression was not observed; instead, the
throughput was improving and maximum latency was reduced.

The excessive waits were pointed out by Vladislav Vaintroub.
2021-06-24 11:01:18 +03:00
Marko Mäkelä
101da87228 Merge 10.5 into 10.6 2021-06-23 19:36:45 +03:00
Marko Mäkelä
6441bc614a MDEV-25113: Introduce a page cleaner mode before 'furious flush'
MDEV-23855 changed the way how the page cleaner is signaled by
user threads. If a threshold is exceeded, a mini-transaction commit
would invoke buf_flush_ahead() in order to initiate page flushing
before all writers would eventually grind to halt in
log_free_check(), waiting for the checkpoint age to reduce.

However, buf_flush_ahead() would always initiate 'furious flushing',
making the buf_flush_page_cleaner thread write innodb_io_capacity_max
pages per batch, and sleeping no time between batches, until the
limit LSN is reached. Because this could saturate the I/O subsystem,
system throughput could significantly reduce during these
'furious flushing' spikes.

With this change, we introduce a gentler version of flush-ahead,
which would write innodb_io_capacity_max pages per second until
the 'soft limit' is reached.

buf_flush_ahead(): Add a parameter to specify whether furious flushing
is requested.

buf_flush_async_lsn: Similar to buf_flush_sync_lsn, a limit for
the less intrusive flushing.

buf_flush_page_cleaner(): Keep working until buf_flush_async_lsn
has been reached.

log_close(): Suppress a warning message in the event that a new log
is being created during startup, when old logs did not exist.
Return what type of page cleaning will be needed.

mtr_t::finish_write(): Also when m_log.is_small(), invoke log_close().
Return what type of page cleaning will be needed.

mtr_t::commit(): Invoke buf_flush_ahead() based on the return value of
mtr_t::finish_write().
2021-06-23 19:06:52 +03:00
Marko Mäkelä
22b62edaed MDEV-25113: Make page flushing faster
buf_page_write_complete(): Reduce the buf_pool.mutex hold time,
and do not acquire buf_pool.flush_list_mutex at all.
Instead, mark blocks clean by setting oldest_modification to 1.
Dirty pages of temporary tables will be identified by the special
value 2 instead of the previous special value 1.
(By design of the ib_logfile0 format, actual LSN values smaller
than 2048 are not possible.)

buf_LRU_free_page(), buf_pool_t::get_oldest_modification()
and many other functions will remove the garbage (clean blocks)
from buf_pool.flush_list while holding buf_pool.flush_list_mutex.

buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list:
Replaced with non-atomic variables, protected by buf_pool.mutex,
to avoid unnecessary synchronization when modifying the counts.

export_vars: Remove unnecessary indirection for
innodb_pages_created, innodb_pages_read, innodb_pages_written.
2021-06-23 19:06:52 +03:00
Marko Mäkelä
8af538979b MDEV-25801: buf_flush_dirty_pages() is very slow
In commit 7cffb5f6e8 (MDEV-23399)
the implementation of buf_flush_dirty_pages() was replaced with
a slow one, which would perform excessive scans of the
buf_pool.flush_list and make little progress.

buf_flush_list(), buf_flush_LRU(): Split from buf_flush_lists().
Vladislav Vaintroub noticed that we will not need to invoke
log_flush_task.wait() for the LRU eviction flushing.

buf_flush_list_space(): Replaces buf_flush_dirty_pages().
This is like buf_flush_list(), but operating on a single
tablespace at a time. Writes at most innodb_io_capacity
pages. Returns whether some of the tablespace might remain
in the buffer pool.
2021-06-23 19:06:52 +03:00
Marko Mäkelä
762bcb81b5 MDEV-25948 Remove log_flush_task
Vladislav Vaintroub suggested that invoking log_flush_up_to()
for every page could perform better than invoking a log write
between buf_pool.flush_list batches, like we started doing in
commit 3a9a3be1c6 (MDEV-23855).
This could depend on the sequence in which pages are being
modified. The buf_pool.flush_list is ordered by
oldest_modification, while the FIL_PAGE_LSN of the pages is
theoretically independent of that. In the pathological case,
we will wait for a log write before writing each individual page.

It turns out that we can defer the call to log_flush_up_to()
until just before submitting the page write. If the doublewrite
buffer is being used, we can submit a write batch of "future" pages
to the doublewrite buffer, and only wait for the log write right
before we are writing an already doublewritten page.
The next doublewrite batch will not be initiated before the last
page write from the current batch has completed.

When a future version introduces asynchronous writes if the log,
we could initiate a write at the start of a flushing batch, to
reduce waiting further.
2021-06-23 19:06:52 +03:00
Marko Mäkelä
6dfd44c828 MDEV-25954: Trim os_aio_wait_until_no_pending_writes()
It turns out that we had some unnecessary waits for no outstanding
write requests to exist. They were basically working around a
bug that was fixed in MDEV-25953.

On write completion callback, blocks will be marked clean.
So, it is sufficient to consult buf_pool.flush_list to determine
which writes have not been completed yet.

On FLUSH TABLES...FOR EXPORT we must still wait for all pending
asynchronous writes to complete, because buf_flush_file_space()
would merely guarantee that writes will have been initiated.
2021-06-23 19:06:49 +03:00
Marko Mäkelä
6e12ebd4a7 MDEV-25062: Reduce trx_rseg_t::mutex contention
redo_rseg_mutex, noredo_rseg_mutex: Remove the PERFORMANCE_SCHEMA keys.
The rollback segment mutex will be uninstrumented.

trx_sys_t: Remove pointer indirection for rseg_array, temp_rseg.
Align each element to the cache line.

trx_sys_t::rseg_id(): Replaces trx_rseg_t::id.

trx_rseg_t::ref: Replaces needs_purge, trx_ref_count, skip_allocation
in a single std::atomic<uint32_t>.

trx_rseg_t::latch: Replaces trx_rseg_t::mutex.

trx_rseg_t::history_size: Replaces trx_sys_t::rseg_history_len

trx_sys_t::history_size_approx(): Replaces trx_sys.rseg_history_len
in those places where the exact count does not matter. We must not
acquire any trx_rseg_t::latch while holding index page latches, because
normally the trx_rseg_t::latch is acquired before any page latches.

trx_sys_t::history_exists(): Replaces trx_sys.rseg_history_len!=0
with an approximation.

We remove some unnecessary trx_rseg_t::latch acquisition around
trx_undo_set_state_at_prepare() and trx_undo_set_state_at_finish().
Those operations will only access fields that remain constant
after trx_rseg_t::init().
2021-06-23 13:42:11 +03:00
Marko Mäkelä
b3e8788009 MDEV-25967: Correctly extend deferred-recovery files
recv_sys_t::recover_deferred(): Set the file size to match the number
of pages. Mariabackup might copy the file while it was being extended.
2021-06-23 13:37:11 +03:00
Marko Mäkelä
592a925c0c MDEV-25996 sux_lock::s_lock(): Assertion !have_s() failed on startup
dict_check_sys_tables(): Correctly advance the cursor position.
This fixes a regression that was caused by
commit 49e2c8f0a6 (MDEV-25743).
2021-06-23 13:36:04 +03:00
Marko Mäkelä
3a566de22d Merge 10.5 into 10.6 2021-06-23 09:24:32 +03:00
Marko Mäkelä
344e59904d Merge 10.4 into 10.5 2021-06-23 08:17:49 +03:00
Marko Mäkelä
09b03ff31b Merge 10.3 into 10.4 2021-06-23 08:05:27 +03:00