Commit graph

3,011 commits

Author SHA1 Message Date
Marko Mäkelä
126725421e MDEV-25121: innodb_flush_method=O_DIRECT fails on compressed tables
Tests with 4096-byte sector size confirm that it is
safe to use O_DIRECT with page_compressed tables.
That had been disabled on Linux, in an attempt to fix MDEV-21584
which had been filed for the O_DIRECT problems earlier.

The fil_node_t::block_size was being set mostly correctly until
commit 10dd290b4b (MDEV-17380)
introduced a regression in MariaDB Server 10.4.4.

fil_node_open_file(): Only avoid setting O_DIRECT on
ROW_FORMAT=COMPRESSED tables that use KEY_BLOCK_SIZE=1 or 2
(1024 or 2048 bytes).

fil_ibd_create(): Avoid setting O_DIRECT on ROW_FORMAT=COMPRESSED tables
that use KEY_BLOCK_SIZE=1 or 2 (1024 or 2048 bytes).

fil_node_t::find_metadata(): Require fstat() to be always invoked
outside Microsoft Windows, so that fil_node_t::block_size can be set.

fil_node_t::read_page0(): Rely on find_metadata() to assign block_size.

Thanks to Vladislav Vaintroub for testing this on Microsoft Windows
using an old-fashioned rotational hard disk with 4KiB sector size.

Reviewed by: Vladislav Vaintroub

This is a port of commit 00f620b27e
and commit 6505662c23 from 10.2.
2021-03-18 14:43:08 +02:00
Marko Mäkelä
39c015b77e Merge 10.3 into 10.4 2021-03-18 14:17:58 +02:00
Thirunarayanan Balathandayuthapani
eb7c5530ec MDEV-24730 Insert log operation fails after purge resets n_core_fields
Online log for insert operation of redundant table fails with
index->is_instant() assert. Purge can reset the n_core_fields when
alter is waiting to upgrade MDL for commit phase of DDL. In the
meantime, any insert DML tries to log the operation fails with
index is not being instant.

row_log_get_n_core_fields(): Get the n_core_fields of online log
for the given index.

rec_get_converted_size_comp_prefix_low(): Use n_core_fields of online
log when InnoDB calculates the size of data tuple during redundant
row format table rebuild.

rec_convert_dtuple_to_rec_comp(): Use n_core_fields of online log
when InnoDB does the conversion of data tuple to record during
redudant row format table rebuild.

- Adding the test case which has more than 129 instant columns.
2021-03-12 16:56:47 +05:30
Marko Mäkelä
a26e7a3726 Merge 10.3 into 10.4 2021-03-08 09:39:54 +02:00
Marko Mäkelä
bcd160753c Merge 10.2 into 10.3 2021-03-05 10:06:42 +02:00
Marko Mäkelä
7759991a06 fixup 58b56f14a0: Remove dead code
row_prebuilt_t::m_no_prefetch: Remove (it was always false).
row_prebuilt_t::m_read_virtual_key: Remove (it was always false).

Only ha_innopart ever set these fields.
2021-03-04 18:11:25 +02:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
Marko Mäkelä
3467f63764 Merge 10.3 into 10.4 2021-01-25 11:02:07 +02:00
Sergei Golubchik
5d1db34585 cleanup: void hton::abort_transaction()
and void wsrep_innobase_kill_one_trx()

as their return values are never used.
Also remove redundant cast and checks that are always true
2021-01-24 11:35:55 +01:00
Marko Mäkelä
0e10d7ea14 MDEV-22351 InnoDB may recover wrong information after RESET MASTER
Ever since commit 947efe17ed
InnoDB no longer writes binlog position in one place.
It will not at all be written to the TRX_SYS page, and
instead it will be written to the undo log header page that
changes the transaction state.

trx_rseg_mem_restore(): Recover the information from the latest
written page.
2021-01-22 16:44:17 +02:00
Marko Mäkelä
3caccc7bcd Update InnoDB version number to 5.7.33
There are only two InnoDB changes between mysql-5.7.32 and mysql-5.7.33:

mysql/mysql-server@95dc4f5f08
duplicates commit 8e8e65ed1c (MDEV-10829).

mysql/mysql-server@26e849762f
could be an attempt to fix something that was fixed in
commit dc58987eb7 (MDEV-22765).
2021-01-19 14:43:04 +02:00
sjaakola
beaea31ab1 MDEV-23851 BF-BF Conflict issue because of UK GAP locks
Some DML operations on tables having unique secondary keys cause scanning
in the secondary index, for instance to find potential unique key violations
in the seconday index. This scanning may involve GAP locking in the index.
As this locking happens also when applying replication events in high priority
applier threads, there is a probabality for lock conflicts between two wsrep
high priority threads.

This PR avoids lock conflicts of high priority wsrep threads, which do
secondary index scanning e.g. for duplicate key detection.

The actual fix is the patch in sql_class.cc:thd_need_ordering_with(), where
we allow relaxed GAP locking protocol between wsrep high priority threads.
wsrep high priority threads (replication appliers, replayers and TOI processors)
are ordered by the replication provider, and they will not need serializability
support gained by secondary index GAP locks.

PR contains also a mtr test, which exercises a scenario where two replication
applier threads have a false positive conflict in GAP of unique secondary index.
The conflicting local committing transaction has to replay, and the test verifies
also that the replaying phase will not conflict with the latter repllication applier.
Commit also contains new test scenario for galera.galera_UK_conflict.test,
where replayer starts applying after a slave applier thread, with later seqno,
has advanced to commit phase. The applier and replayer have false positive GAP
lock conflict on secondary unique index, and replayer should ignore this.
This test scenario caused crash with earlier version in this PR, and to fix this,
the secondary index uniquenes checking has been relaxed even further.

Now innodb trx_t structure has new member: bool wsrep_UK_scan, which is set to
true, when high priority thread is performing unique secondary index scanning.
The member trx_t::wsrep_UK_scan is defined inside WITH_WSREP directive, to make
it possible to prepare a MariaDB build where this additional trx_t member is
not present and is not used in the code base. trx->wsrep_UK_scan is set to true
only for the duration of function call for: lock_rec_lock() trx->wsrep_UK_scan
is used only in lock_rec_has_to_wait() function to relax the need to wait if
wsrep_UK_scan is set and conflicting transaction is also high priority.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-01-18 08:09:06 +02:00
Marko Mäkelä
ea9cd97f85 MDEV-24536 innodb_idle_flush_pct has no effect
The parameter innodb_idle_flush_pct that was introduced in
MariaDB Server 10.1.2 by MDEV-6932 has no effect ever since
the InnoDB changes from MySQL 5.7.9 were applied in
commit 2e814d4702.

Let us declare the parameter as deprecated and having no effect.
2021-01-13 18:55:56 +02:00
Marko Mäkelä
fd5e103aa4 Merge 10.3 into 10.4 2021-01-11 10:35:06 +02:00
Marko Mäkelä
5a1a714187 Merge 10.2 into 10.3 (except MDEV-17556)
The fix of MDEV-17556 (commit e25623e78a
and commit 61a362c949) has been
omitted due to conflicts and will have to be applied separately later.
2021-01-11 09:41:54 +02:00
Marko Mäkelä
18254c18d9 Cleanup: Remove unused symbol QUE_THR_PROCEDURE_WAIT 2021-01-08 16:14:26 +02:00
Marko Mäkelä
0aa02567dd Merge 10.3 into 10.4 2020-12-23 14:52:59 +02:00
Aleksey Midenkov
932ec586aa MDEV-23644 Assertion on evaluating foreign referential action for self-reference in system versioned table
First part of the fix (row0mysql.cc) addresses external columns when adding history
row on referential action. The full data must be retrieved before the
row is inserted.

Second part of the fix (the rest) avoids duplicate primary key error between
the history row generated on referential action and the history row
generated by SQL command. Both command and referential action can
happen on same table since foreign key can be self-reference (parent
and child tables are same). Moreover, the self-reference can refer
multiple rows when the key is non-unique. In such case history is
generated by referential action occured on first row but processed all
rows by a matched key. The second round is when the next row is
processed by a command but history already exists. In such case we
check TRX_ID of existing history row and if it is the same we assume
the above situation and skip adding one more history row or failing
the command.
2020-12-22 03:33:53 +03:00
Aleksey Midenkov
7410ff436e MDEV-21138 Assertion col->ord_part' or f.col->ord_part' failed in row_build_index_entry_low
First part (row0mysql.cc) fixes ins_node_set_new_row() usage workflow
as it is designed to operate on empty row (see row_get_prebuilt_insert_row()
for example).

Second part (row0ins.cc) fixes duplicate key error in FTS_DOC_ID_INDEX
since history rows must not generate entries in that index. We detect
FTS_DOC_ID_INDEX by a number of attributes and skip it if the row is
historical.

Misc fixes:

row_build_index_entry_low() does not accept non-NULL tuple
for FTS index (subject assertion fails), assertion (index->type !=
DICT_FTS) adds code understanding.

Now as historical_row is copied in row_update_vers_insert() there is
no need to copy the row twice: ROW_COPY_POINTERS is used to build
historical_row initially.

dbug_print_rec() debug functions.
2020-12-22 03:33:53 +03:00
Eugene Kosov
a50cb4867a MDEV-24334 make monitor_set_tbl global variable thread-safe
Atomic_relaxed<T>: add fetch_or() and fetch_and()

innodb_init(): rely on a zero-initialization of a global variable

monitor_set_tbl: make Atomic_relaxed<ulint> array and use proper operations
for setting bit, unsetting bit and reading bit

Reviewed by: Marko Mäkelä
2020-12-03 11:55:36 +03:00
Eugene Kosov
fccd810404 MDEV-24333 Data race in os_file_pread at os/os0file.cc:3308 on os_n_file_reads
os_n_file_reads: make Atomic_counter and correct the semantics of an imprecise
counter.

Reviewed by: Marko Mäkelä
2020-12-03 11:55:21 +03:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Marko Mäkelä
81ab9ea63f Merge 10.2 into 10.3 2020-12-01 14:55:46 +02:00
Marko Mäkelä
1c9833c511 Cleanup: row_log_free()
The nonnull attribute is not applicable to parameters that are
passed by reference, at least not in the Intel compiler.
Let us remove the reference indirection, which was only there
so that the pointer could be assigned to NULL, and let the
callers perform that task.

row_log_allocate(): Fix a bug in out-of-memory error handling
that would leave a pointer to freed memory.
2020-11-25 10:54:38 +02:00
Marko Mäkelä
749ecedfec MDEV-24188: Merge 10.3 into 10.4 2020-11-13 20:45:28 +02:00
Marko Mäkelä
f9f2f37495 MDEV-24188: Merge 10.2 into 10.3 2020-11-13 20:41:48 +02:00
Marko Mäkelä
bb328a2a27 MDEV-24188 Hang in buf_page_create() after reusing a previously freed page
The fix of MDEV-23456 (commit b1009ae5c1)
introduced a livelock between page flushing and a thread that is
executing buf_page_create().

buf_page_create(): If the current mini-transaction is holding
an exclusive latch on the page, do not attempt to acquire another
one, and do not care about any I/O fix.

mtr_t::have_x_latch(): Replaces mtr_t::get_fix_count().

dyn_buf_t::for_each_block(const Functor&) const: A new variant.

rw_lock_own(): Add a const qualifier.

Reviewed by: Thirunarayanan Balathandayuthapani
2020-11-13 20:16:39 +02:00
Marko Mäkelä
972dc6ee98 Merge 10.3 into 10.4 2020-11-12 11:18:04 +02:00
Marko Mäkelä
150f447af1 Merge 10.2 into 10.3 2020-11-12 10:37:21 +02:00
Marko Mäkelä
d6ee28582a Cleanup: Remove dict_space_is_empty(), dict_space_get_id()
As noted in commit 0b66d3f70d,
MariaDB does not support CREATE TABLESPACE for InnoDB.
Hence, some code that was added in
commit fec844aca8
and originally in
mysql/mysql-server@c71dd213bd
is unused in MariaDB and should be removed.
2020-11-11 15:48:43 +02:00
Marko Mäkelä
bd528b0c93 MDEV-24182 ibuf_merge_or_delete_for_page() contains dead code
The function ibuf_merge_or_delete_for_page() was always being
invoked with update_ibuf_bitmap=true ever since
commit cd623508df
fixed up something after MDEV-9566.

Furthermore, the parameter page_size is never being passed as a
null pointer, and therefore it should better be a reference to
a constant object.
2020-11-11 15:48:43 +02:00
Marko Mäkelä
d01a034ac6 MDEV-7620: Remove the data structures
The instrumentation that was added in
commit 90635c6fb5 (MDEV-7620)
was effectively reverted in MariaDB Server 10.2.2, in
commit 2e814d4702
(which stopped reporting the statistics) and
commit fec844aca8
(which stopped updating the statistics).

Let us remove the orphan data members to reduce the memory footprint.
2020-11-09 15:50:37 +02:00
Marko Mäkelä
7b2bb67113 Merge 10.3 into 10.4 2020-10-29 13:38:38 +02:00
Marko Mäkelä
2b6f804490 Merge 10.2 into 10.3 2020-10-28 10:44:40 +02:00
Marko Mäkelä
a8de8f261d Merge 10.2 into 10.3 2020-10-28 10:01:50 +02:00
Marko Mäkelä
cc5f4428b8 MDEV-23693 fixup: Remove unused btr_search_t::withdraw_clock 2020-10-28 08:13:06 +02:00
Marko Mäkelä
527ade2590 MDEV-23163 Merge new release of InnoDB 5.7.32 to 10.2
All relevant InnoDB changes from MySQL 5.7.32 have been applied
in preceding commits.
2020-10-28 07:27:18 +02:00
Eugene Kosov
afc9d00c66 MDEV-23991 dict_table_stats_lock() has unnecessarily long scope
Patch removes dict_index_t::stats_latch. Table/index statistics now
protected with dict_sys->mutex. That way statistics computation can
happen in parallel in several threads and dict_sys->mutex will be locked
only for a short period of time.

This patch is a joint work with Marko Mäkelä

dict_index_t:🔒 make mutable which allows to pass const pointer
when only lock is touched in an object

btr_height_get()
btr_get_size(): make index argument const for better type safety

btr_estimate_number_of_different_key_vals(): now returns computed values
instead of setting fields in dict_index_t directly

remove everything related to dict_index_t::stats_latch

dict_stats_index_set_n_diff(): now returns computed values instead
of setting fields in dict_index_t directly

dict_stats_analyze_index():  now returns computed values instead
of setting fields in dict_index_t directly

Reviewed by: Marko Mäkelä
2020-10-27 19:09:20 +03:00
Thirunarayanan Balathandayuthapani
bc540b8706 MDEV-23693 Failing assertion: my_atomic_load32_explicit(&lock->lock_word, MY_MEMORY_ORDER_RELAXED) == X_LOCK_DECR
InnoDB frees the block lock during buffer pool shrinking when other
thread is yet to release the block lock.  While shrinking the
buffer pool, InnoDB allows the page to be freed unless it is buffer
fixed. In some cases, InnoDB releases the latch after unfixing the
block.

Fix:
====
- InnoDB should unfix the block after releases the latch.

- Add more assertion to check buffer fix while accessing the page.

- Introduced block_hint structure to store buf_block_t pointer
and allow accessing the buf_block_t pointer only by passing a
functor. It returns original buf_block_t* pointer if it is valid
or nullptr if the pointer become stale.

- Replace buf_block_is_uncompressed() with
buf_pool_t::is_block_pointer()

This change is motivated by a change in mysql-5.7.32:
mysql/mysql-server@46e60de444
Bug #31036301 ASSERTION FAILURE: SYNC0RW.IC:429:LOCK->LOCK_WORD
2020-10-27 18:30:00 +05:30
Thirunarayanan Balathandayuthapani
3ba8f619e4 MDEV-23370 innodb_fts.innodb_fts_misc failed in buildbot, server crashed in dict_table_autoinc_destroy
This issue is caused by MDEV-22456 ad6171b91c. Fix involves the backported version of 10.4 patch
MDEV-22778 5f2628d1ee and few parts of
MDEV-17441 (e9a5f288f2).

dict_table_t::stats_latch_created: Removed

dict_table_t::stats_latch: make value member and always lock it for
simplicity even for stats cloned table.

zip_pad_info_t::mutex_created: Removed

zip_pad_info_t::mutex: make member value instead of pointer

os0once.h: Removed

dict_table_remove_from_cache_low(): Ensure that fts_free() is always
called, even if dict_mem_table_free() is deferred until
btr_search_lazy_free().

InnoDB would always zip_pad_info_t::mutex and
dict_table_t::autoinc_mutex, even for tables are not in
ROW_FORMAT=COMPRESSED nor include any AUTO_INCREMENT column.
2020-10-25 15:53:17 +05:30
Marko Mäkelä
46957a6a77 Merge 10.3 into 10.4 2020-10-22 13:27:18 +03:00
Marko Mäkelä
e3d692aa09 Merge 10.2 into 10.3 2020-10-22 08:26:28 +03:00
Marko Mäkelä
620ea816ad Merge 10.1 into 10.2 2020-10-21 14:02:04 +03:00
Marko Mäkelä
65b7f72b51 InnoDB 5.6.50
The only applicable InnoDB change to MariaDB that was made
between MySQL 5.6.49 and MySQL 5.6.50 is MDEV-23999.
2020-10-21 10:16:06 +03:00
Marko Mäkelä
c7552969d0 MDEV-23999 Potential stack overflow in InnoDB fulltext search
fts_query_t::nested_sub_exp: Keep track of nested
fts_ast_visit_sub_exp() calls.

fts_ast_visit_sub_exp(): Return DB_OUT_OF_MEMORY if the
maximum recursion depth is exceeded.

This is motivated by a change in MySQL 5.6.50:
mysql/mysql-server@e2a46b4834
Bug #29929684 USING MANY NESTED ARGUMENTS WITH BOOLEAN FTS CAN LEAD
TO TERMINATE SERVER
2020-10-21 10:04:44 +03:00
Marko Mäkelä
832a6acb72 MDEV-23996 Race conditions in SHOW ENGINE INNODB MUTEX
The function innodb_show_mutex_status() is the only ultimate caller of
LatchCounter::iterate() via MutexMonitor::iterate(). Because the call
is not protected by LatchCounter::m_mutex, any mutex_create() or
mutex_free() that is invoked concurrently during the execution, bad
things such as a crash could happen.

The most likely way for this to happen is buffer pool resizing,
which could cause buf_block_t::mutex (which existed before MDEV-15053)
to be created or freed. We could also register InnoDB mutexes in
TrxFactory::init() if trx_pools needs to grow.

The view INFORMATION_SCHEMA.INNODB_MUTEXES is not affected, because it
only displays information about rw-locks, not mutexes.

This commit intentionally touches also MutexMonitor::iterate()
and the only code that interfaces with LatchCounter::iterate()
to make it clearer for future readers that the scattered code
that is obfuscated by templates belongs together.

This is based on
mysql/mysql-server@273a93396f
2020-10-20 19:55:16 +03:00
Aleksey Midenkov
7eda556196 MDEV-23672 Assertion `v.v_indexes.empty()' failed in dict_table_t::instant_column
dict_v_idx_t node was shared between two dict_v_col_t objects because
of wrong object copy. Replace memory plain copy with copy constructor.

Tha patch also removes n_v_indexes property and improves "page full"
judgements for trx_undo_log_v_idx().
2020-10-20 10:57:57 +03:00
Eugene Kosov
52dad6fd26 MDEV-21584 Linux aio returned OS error 22
Sometimes blockdev --getss returns 4096.
In that case ROW_FORMAT=COMPRESSED tables might violate
that 4096 bytes alignment.

This patch disables O_DIRECT for COMPRESSED tables.

OS_DATA_FILE_NO_O_DIRECT: new possible value for os_file_create() argument

fil_node_open_file(): do not O_DIRECT
ROW_FORMAT=COMPRESSED tables

AIO::is_linux_native_aio_supported(): minimal alignment in a general case
is 4096 and not 512.
2020-10-14 14:42:54 +03:00
Thirunarayanan Balathandayuthapani
6504d3d229 MDEV-23722 InnoDB: Assertion: result != FTS_INVALID in fts_trx_row_get_new_state
Marking of deletion of row in fts index happens twice in
self-referential foreign key relation. So while performing
referential checks of foreign key, InnoDB can avoid updating
of fts index if the foreign key has self-referential relationship.

Reviewed-by: Marko Mäkelä
2020-10-08 19:41:03 +05:30