Commit graph

3,652 commits

Author SHA1 Message Date
Marko Mäkelä
0ff90b3b94 MDEV-25506: Kill during DDL leaves orphan .ibd file
Before we create an InnoDB data file, we must have persistently
started a DDL transaction and written a record in SYS_INDEXES
as well as a FILE_CREATE record for creating the file.
In that way, if InnoDB is killed before the DDL transaction is
committed, the rollback will be able to delete the file in
dict_drop_index_tree().

dict_build_table_def_step(): Do not create the tablespace.
At this point, we have not written any log, not even for
inserting the SYS_TABLES record.

dict_create_sys_indexes_tuple(): Relax an assertion to tolerate
a missing tablespace before the first index has been created in
dict_create_index_step().

dict_build_index_def_step(): Relax the dict_table_open_on_name()
parameter, because no tablespace may be available yet.

tab_create_graph_create(), row_create_table_for_mysql(), tab_node_t:
Remove key_id, mode.

ind_create_graph_create(), row_create_index_for_mysql(), ind_node_t:
Add key_id, mode.

dict_create_index_space(): New function, to create the tablespace
during clustered index creation.

dict_create_index_step(): After the SYS_INDEXES record has been
written, invoke dict_create_index_space() to create the tablespace
if needed.

fil_ibd_create(): Before creating the file, persistently write a
FILE_CREATE record. This will also ensure that an incomplete DDL
transaction will be recovered. After creating the file, invoke
fsp_header_init().
2021-05-04 13:48:55 +03:00
Marko Mäkelä
52aac131e3 MDEV-18518 Multi-table CREATE and DROP transactions for InnoDB
InnoDB used to support at most one CREATE TABLE or DROP TABLE
per transaction. This caused complications for DDL operations on
partitioned tables (where each partition is treated as a separate
table by InnoDB) and FULLTEXT INDEX (where each index is maintained
in a number of internal InnoDB tables).

dict_drop_index_tree(): Extend the MDEV-24589 logic and treat
the purge or rollback of SYS_INDEXES records of clustered indexes
specially: by dropping the tablespace if it exists. This is the only
form of recovery that we will need.

trx_undo_ddl_type: Document the DDL undo log record types better.

trx_t::dict_operation: Change the type to bool.

trx_t::ddl: Remove.

trx_t::table_id, trx_undo_t::table_id: Remove.

dict_build_table_def_step(): Remove trx_t::table_id logging.

dict_table_close_and_drop(), row_merge_drop_table(): Remove.

row_merge_lock_table(): Merged to the only callers, which can
call lock_table_for_trx() directly.

fts_aux_table_t, fts_aux_id, fts_space_set_t: Remove.

fts_drop_orphaned_tables(): Remove.

row_merge_rename_index_to_drop(): Remove. Thanks to MDEV-24589,
we can simply delete the to-be-dropped indexes from SYS_INDEXES,
while still being able to roll back the operation.

ha_innobase_inplace_ctx: Make a few data members const.
Preallocate trx.

prepare_inplace_alter_table_dict(): Simplify the logic. Let the
normal rollback take care of some cleanup.

row_undo_ins_remove_clust_rec(): Simplify the parsing of SYS_COLUMNS.

trx_rollback_active(): Remove the special DROP TABLE logic.

trx_undo_mem_create_at_db_start(), trx_undo_reuse_cached():
Always write TRX_UNDO_TABLE_ID as 0.
2021-05-04 13:48:55 +03:00
Marko Mäkelä
54e2e70194 MDEV-25524 heap-use-after-free in fil_space_t::rename()
In commit 91599701d0 (MDEV-25312)
some recovery code for TRUNCATE TABLE was broken
causing a regression in a case where undo log for a RENAME TABLE
operation had been durably written but the tablespace had not been
renamed yet.

row_rename_table_for_mysql(): Add a DEBUG_SYNC point for the
test case, and simplify the logic and trim the error messages.

fil_space_t::rename(): Simplify the operation. Merge the necessary
part of fil_rename_tablespace_check(). If there is no change to
the file name, do nothing.

dict_table_t::rename_tablespace(): Refactored from
dict_table_rename_in_cache().

row_undo_ins_parse_undo_rec(): On rolling back TRX_UNDO_RENAME_TABLE,
invoke dict_table_t::rename_tablespace() even if the table name matches.

os_file_rename_func(): Temporarily relax an assertion that would
fail during the recovery in the test innodb.truncate_crash.
2021-04-29 18:31:59 +03:00
Marko Mäkelä
a29618f3bd MDEV-25522: Purge of aborted ADD INDEX leaves orphan locks behind
lock_discard_for_index(): New function, to discard locks for an
index whose index tree has been purged. By definition, such indexes
must be ones for which the MDL upgrade failed in inplace ALTER TABLE
and the ADD INDEX operation was never committed.
Note: Because we do not support online ADD SPATIAL INDEX, we only
have to traverse the lock_sys.rec_hash for B-trees and not the
hash tables for R-trees.

row_purge_remove_clust_if_poss_low(): Invoke lock_discard_for_index()
if necessary before dropping a B-tree for a SYS_INDEXES record.
2021-04-28 17:24:44 +03:00
Marko Mäkelä
f619d79bda MDEV-25491: Race condition between DROP TABLE and purge of SYS_INDEXES record
btr_free_if_exists(): Always use the BUF_GET_POSSIBLY_FREED mode
when accessing pages, because due to MDEV-24589 the function
fil_space_t::set_stopping(true) can be called at any time during
the execution of this function.

mtr_t::m_freeing_tree: New data member for debugging purposes.

buf_page_get_low(): Assert that the BUF_GET mode is not being used
anywhere during the execution of btr_free_if_exists().

In all code related to freeing or allocating pages, we will add some
robustness, by making more use of BUF_GET_POSSIBLY_FREED and by
reporting an error instead of crashing in some cases of corruption.
2021-04-28 17:24:44 +03:00
Marko Mäkelä
a81aec1505 MDEV-25491 preparation: Clean up tablespace destruction
fil_check_pending_ops(), fil_check_pending_io(): Remove.
These functions were actually duplicating each other ever since
commit 118e258aaa (MDEV-23855).

fil_space_t::check_pending_operations(): Replaces
fil_check_pending_operations() and incorporates the logic of
fil_check_pending_ops(). Avoid unnecessary lookups for the tablespace.
Just wait for the reference count to drop to zero.

fil_space_t::io(): Remove an unnecessary condition. We can (and
probably better should) refuse asynchronous reads of undo tablespaces
that are being truncated.

fil_truncate_prepare(): Remove.

trx_purge_truncate_history(): Implement the necessary steps that used
to be in fil_truncate_prepare().
2021-04-28 17:24:44 +03:00
Thirunarayanan Balathandayuthapani
b862377c3e MDEV-25503 InnoDB hangs on startup during recovery
InnoDB startup hangs if a DDL transaction needs to be
rolled back and a recovered transaction on statistics
tables exists. In that case, InnoDB should rollback
the transaction which holds locks on innodb_table_stats
or innodb_index_stats during trx_rollback_or_clean_recovered().
2021-04-27 17:07:37 +05:30
Nikita Malyavin
300253acf1 revive innodb_debug_sync
innodb_debug_sync was introduced in commit
b393e2cb0c and reverted in
commit fc58c17216 due to memory leak reported
by valgrind, see MDEV-21336.

The leak is now fixed by adding `rw_lock_free(&slot->debug_sync_lock)`
after background thread working loop is finished, and the patch is
reapplied, with respect to c++98 fixes by Marko.

The missing DEBUG_SYNC for MDEV-18546 in row0vers.cc is also reapplied.
2021-04-27 11:51:17 +03:00
Marko Mäkelä
b81382887c MDEV-25512 Deadlock between sux_lock::u_x_upgrade() and sux_lock::u_lock()
In the SUX_LOCK_GENERIC implementation, we can remember at most
one pending exclusive lock request. If multiple exclusive lock
requests are pending, the WRITER_WAITING flag will be cleared when
the first waiting writer acquires the exclusive lock.

ssux_lock_low::update_lock(): If WRITER_WAITING is set, wake up
the writer even if the UPDATER flag is set, because the waiting
writer may be in the process of upgrading its U lock to X.

rw_lock::read_unlock(): Also indicate that an X lock waiter must
be woken up if an U lock exists.

This fix may cause unnecessary wake-ups and system calls, but this
is the best that we can do. Ideally we would use the MDEV-25404
idea of a separate 'writer' mutex, but there is no portable way to
request that a non-recursive mutex be created, and InnoDB requires
the ability to transfer buf_block_t::lock ownership to an I/O thread.

To allow problems like this to be caught more reliably in the future,
we add a unit test for srw_mutex, srw_lock, ssux_lock, sux_lock.
2021-04-25 12:58:16 +03:00
Marko Mäkelä
8121d03fd4 MDEV-25404 fixup: Fix ssux_lock_low::u_wr_upgrade()
The U-to-X upgrade turned out to be incorrect. A debug assertion
failed in wr_wait(), called from mtr_defer_drop_ahi() in a stress
test with innodb_adaptive_hash_index=ON.

A correct upgrade procedure ought to be readers.fetch_add(WRITER-1)
to register ourselves as a WRITER (or waiting writer) and to release
the reference that was being held for the U lock.

Thanks to Matthias Leich for catching the problem.
2021-04-22 15:43:30 +03:00
Marko Mäkelä
1636db5443 Revert "MDEV-24589 DROP TABLE is not crash-safe"
This reverts commit e731a28394.

A crash occurred during the test stress.ddl_innodb when
fil_delete_tablespace() for DROP TABLE was waiting in
fil_check_pending_operations() and a purge thread for handling
an earlier DROP INDEX was attempting to load the index root page
in btr_free_if_exists() and btr_free_root_check(). The function
buf_page_get_gen() would write out several times
"trying to read...being-dropped tablespace"
before giving up and committing suicide.

It turns out that during any page access in btr_free_if_exists(),
fil_space_t::set_stopping() could have been invoked by
fil_check_pending_operations(), as part of dropping the tablespace.
Preventing this race condition would require extensive changes
to the allocation code or some locking mechanism that would ensure
that we only set the flag if btr_free_if_exists() is not in progress.

Either way, that could be a too risky change in a GA release.
Because MDEV-24589 is not strictly necessary in the 10.5 release
series and it only is a requirement for MDEV-25180 in a later
major release, we will revert the change from 10.5.
2021-04-22 14:57:23 +03:00
Marko Mäkelä
4930f9c94b Merge 10.5 into 10.6 2021-04-21 11:45:00 +03:00
Marko Mäkelä
80ed136e6d Merge 10.4 into 10.5 2021-04-21 09:01:01 +03:00
Marko Mäkelä
a0588d54a2 Merge 10.3 into 10.4 2021-04-21 07:58:42 +03:00
Marko Mäkelä
75c01f39b1 Merge 10.2 into 10.3 2021-04-21 07:25:48 +03:00
Marko Mäkelä
922e676b43 MDEV-25466 Merge new release of InnoDB 5.7.34 to 10.2 2021-04-20 17:33:36 +03:00
Monty
031f11717d Fix all warnings given by UBSAN
The easiest way to compile and test the server with UBSAN is to run:
./BUILD/compile-pentium64-ubsan
and then run mysql-test-run.
After this commit, one should be able to run this without any UBSAN
warnings. There is still a few compiler warnings that should be fixed
at some point, but these do not expose any real bugs.

The 'special' cases where we disable, suppress or circumvent UBSAN are:
- ref10 source (as here we intentionally do some shifts that UBSAN
  complains about.
- x86 version of optimized int#korr() methods. UBSAN do not like unaligned
  memory access of integers.  Fixed by using byte_order_generic.h when
  compiling with UBSAN
- We use smaller thread stack with ASAN and UBSAN, which forced me to
  disable a few tests that prints the thread stack size.
- Verifying class types does not work for shared libraries. I added
  suppression in mysql-test-run.pl for this case.
- Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is
  safe to have overflows (two cases, in item_func.cc).

Things fixed:
- Don't left shift signed values
  (byte_order_generic.h, mysqltest.c, item_sum.cc and many more)
- Don't assign not non existing values to enum variables.
- Ensure that bool and enum values are properly initialized in
  constructors.  This was needed as UBSAN checks that these types has
  correct values when one copies an object.
  (gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...)
- Ensure we do not called handler functions on unallocated objects or
  deleted objects.
  (events.cc, sql_acl.cc).
- Fixed bugs in Item_sp::Item_sp() where we did not call constructor
  on Query_arena object.
- Fixed several cast of objects to an incompatible class!
  (Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc,
   sql_select.cc ...)
- Ensure we do not do integer arithmetic that causes over or underflows.
  This includes also ++ and -- of integers.
  (Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...)
- Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that
  value_type is initialized to this instead of to -1, which is not a valid
  enum value for json_value_types.
- Ensure we do not call memcpy() when second argument could be null.
- Fixed that Item_func_str::make_empty_result() creates an empty string
  instead of a null string (safer as it ensures we do not do arithmetic
  on null strings).

Other things:

- Changed struct st_position to an OBJECT and added an initialization
  function to it to ensure that we do not copy or use uninitialized
  members. The change to a class was also motived that we used "struct
  st_position" and POSITION randomly trough the code which was
  confusing.
- Notably big rewrite in sql_acl.cc to avoid using deleted objects.
- Changed in sql_partition to use '^' instead of '-'. This is safe as
  the operator is either 0 or 0x8000000000000000ULL.
- Added check for select_nr < INT_MAX in JOIN::build_explain() to
  avoid bug when get_select() could return NULL.
- Reordered elements in POSITION for better alignment.
- Changed sql_test.cc::print_plan() to use pointers instead of objects.
- Fixed bug in find_set() where could could execute '1 << -1'.
- Added variable have_sanitizer, used by mtr.  (This variable was before
  only in 10.5 and up).  It can now have one of two values:
  ASAN or UBSAN.
- Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked
  it virtual. This was an effort to get UBSAN to work with loaded storage
  engines. I kept the change as the new place is better.
- Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast
  in tabutil.cpp.
- Added HAVE_REPLICATION around usage of rgi_slave, to get embedded
  server to compile with UBSAN. (Patch from Marko).
- Added #ifdef for powerpc64 to avoid a bug in old gcc versions related
  to integer arithmetic.

Changes that should not be needed but had to be done to suppress warnings
from UBSAN:

- Added static_cast<<uint16_t>> around shift to get rid of a LOT of
  compiler warnings when using UBSAN.
- Had to change some '/' of 2 base integers to shift to get rid of
  some compile time warnings.

Reviewed by:
- Json changes: Alexey Botchkov
- Charset changes in ctype-uca.c: Alexander Barkov
- InnoDB changes & Embedded server: Marko Mäkelä
- sql_acl.cc changes: Vicențiu Ciorbaru
- build_explain() changes: Sergey Petrunia
2021-04-20 12:30:09 +03:00
Marko Mäkelä
8751aa7397 MDEV-25404: ssux_lock_low: Introduce a separate writer mutex
Having both readers and writers use a single lock word in
futex system calls caused performance regression compared to
SRW_LOCK_DUMMY (mutex and 2 condition variables).
A contributing factor is that we did not accurately keep
track of the number of waiting threads and thus had to invoke
system calls to wake up any waiting threads.

SUX_LOCK_GENERIC: Renamed from SRW_LOCK_DUMMY. This is the
original implementation, with rw_lock (std::atomic<uint32_t>),
a mutex and two condition variables. Using a separate writer
mutex (as described below) is not possible, because the mutex ownership
in a buf_block_t::lock must be able to transfer from a write submitter
thread to an I/O completion thread, and pthread_mutex_lock() may assume
that the submitter thread is recursively acquiring the mutex that it
already holds, while in reality the I/O completion thread is the real
owner. POSIX does not define an interface for requesting a mutex to
be non-recursive.

On Microsoft Windows, srw_lock_low will remain a simple wrapper of
SRWLOCK. On 32-bit Microsoft Windows, sizeof(SRWLOCK)=4 while
sizeof(srw_lock_low)=8.

On other platforms, srw_lock_low is an alias of ssux_lock_low,
the Simple (non-recursive) Shared/Update/eXclusive lock.

In the futex-based implementation of ssux_lock_low (Linux, OpenBSD,
Microsoft Windows), we shall use a dedicated mutex for exclusive
requests (writer), and have a WRITER flag in the 'readers' lock word
to inform that a writer is holding the lock or waiting for the lock to
be granted. When the WRITER flag is set, all lock requests must acquire
the writer mutex. Normally, shared (S) lock requests simply perform a
compare-and-swap on the 'readers' word.

Update locks are implemented as a combination of writer mutex
and a normal counter in the 'readers' lock word. The conflict between
U and X locks is guaranteed by the writer mutex.
Unlike SUX_LOCK_GENERIC, wr_u_downgrade() will not wake up any pending
rd_lock() waits. They will wait until u_unlock() releases the writer mutex.

The ssux_lock_low is always wrapped by sux_lock (with a recursion count
of U and X locks), used for dict_index_t::lock and buf_block_t::lock.
Their memory footprint for the futex-based implementation will increase
by sizeof(srw_mutex), or 4 bytes.

This change addresses a performance regression in read-only benchmarks,
such as sysbench oltp_read_only. Also write performance was improved.

On 32-bit Linux and OpenBSD, lock_sys_t::hash_table will allocate
two hash table elements for each srw_lock (14 instead of 15 hash
table cells per 64-byte cache line on IA-32). On Microsoft Windows,
sizeof(SRWLOCK)==sizeof(void*) and there is no change.

Reviewed by: Vladislav Vaintroub
Tested by: Axel Schwenke and Vladislav Vaintroub
2021-04-19 18:15:49 +03:00
Marko Mäkelä
040c16ab8b MDEV-25404: Optimize srw_mutex on Linux, OpenBSD, Windows
On Linux, OpenBSD and Microsoft Windows, srw_mutex was an alias for a
rw-lock while we only need mutex functionality. Let us implement a
futex-based mutex with one bit for HOLDER and 31 bits for counting
waiting requests.

srw_lock::wr_unlock() can avoid waking up a waiter when no waiting
requests exist. (Previously, we only had 1-bit rw_lock::WRITER_WAITING
flag that could be wrongly cleared if multiple waiting wr_lock() exist.
Now we have no problem with up to 2,147,483,648 conflicting threads.)

On 64-bit Microsoft Windows, the advantage is that
sizeof(srw_mutex) is 4, while sizeof(SRWLOCK) would be 8.

Reviewed by: Vladislav Vaintroub
2021-04-19 18:03:17 +03:00
Eugene Kosov
a3871cd283 MDEV-22255 SIGABRT: Assertion id' failed in trx_write_trx_id on INSERT | Assertion id > 0' failed in trx_write_trx_id | Assertion val > 0' failed in row_upd_index_entry_sys_field | Assertion thr_get_trx(thr)->id || index->table->no_rollback()' failed. 2021-04-15 17:53:33 +03:00
Marko Mäkelä
d2e2d32933 Merge 10.5 into 10.6 2021-04-14 12:32:27 +03:00
Marko Mäkelä
6c3e860cbf Merge 10.4 into 10.5 2021-04-14 11:35:39 +03:00
Marko Mäkelä
5008171b05 Merge 10.3 into 10.4 2021-04-14 10:33:59 +03:00
sjaakola
a1e70388c4 MDEV-24966 Galera multi-master regression
After the merging of MDEV-24915, 10.6 branch has regressions with handling of
concurrent write load against two or more cluster nodes. These regressions may
surface as cluster hanging, node crashes or data inconsistency. With some test
scenarios, the only visible symptom could be that the BF victim aborting happens
only by innodb lock wait timeout expiration. This would result only to poor
performance (by default 50 sec hang for each BF conflict), and could be somewhat
difficult to diagnose.

This pull request has following fixes to handle concurrent write load from
multiple nodes:

In lock_wait_wsrep_kill(), the victim trx was expected to be only in
TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen
that victim has advanced into pre commit state. This was fixed by choosing
victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states.

Victim transaction may be in several different states at the time of detected
lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim
may advance further before the actual BF aborting takes place. The BF aborting
in MDEV-24915 did not wake the victim, if it was in the state of waiting for
some other lock (than the one that was blocking the high priority thread).
This anomaly caused the innodb lock wait timeout expiration delays and poor
performance symptom. To fix this, lock_wait_wsrep_kill() now looks if
victim is in lock waiting state, and uses lock_cancel_waiting_and_release()
to cancel this lock wait.

wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib),
and starts a new transaction if there was no active transaction before.
Due to late BF aborting, the victim may have e.g. failed in certification
and is already aborting or has aborted at this stage. This has caused
problems in testing where BF aborter tries to BF abort himself.
The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting
or has aborted. Victim may not have started transaction yet in wsrep context,
but it may have acquired MDL locks (due to DDL execution), and this has
caused BF conflict. Such case does not require aborting in wsrep or
replication provider state.

BF aborting could cause BF-BF conflict scenario, if victim was already aborted
and changed to replayer having high priority as well. This BF-BF conflict
scenario is now avoided in lock_wait_wsrep() where we now check if blocking
lock holder is also high priority and is ordered before, caller should wait
for the lock in this situation.

The natural innodb deadlock resolving algorithm could pick BF thread as
deadlock victim. This is fixed by giving max weigh to BF threads in
Deadlock::report().

MDEV-24341 has changed excution paths in do_command() and this affects BF
aborted victim execution. This PR fixes one assert in do_command():
 DBUG_ASSERT(!thd->async_state.pending_ops())
Which fired if the thd was BF aborted earlier. This assert is now changed
to allow pending_ops() if thd was BF aborted before.

With these fixes, long term highly conflicting write load could be run against
to node cluster. If binlogging is configured, log_slave_updates should be
also set.
2021-04-13 14:58:54 +03:00
Marko Mäkelä
b8c8692fd9 MDEV-24620 ASAN heap-buffer-overflow in btr_pcur_restore_position()
Between btr_pcur_store_position() and btr_pcur_restore_position()
it is possible that purge empties a table and enlarges
index->n_core_fields and index->n_core_null_bytes.
Therefore, we must cache index->n_core_fields in
btr_pcur_t::old_n_core_fields so that btr_pcur_t::old_rec can be
parsed correctly.

Unfortunately, this is a huge change, because we will replace
"bool leaf" parameters with "ulint n_core"
(passing index->n_core_fields, or 0 for non-leaf pages).
For special cases where we know that index->is_instant() cannot hold,
we may also pass index->n_fields.
2021-04-13 10:28:13 +03:00
Marko Mäkelä
6e6318b29b Merge 10.2 into 10.3 2021-04-13 10:26:01 +03:00
Thirunarayanan Balathandayuthapani
cf2c6b7f8d MDEV-24971 InnoDB access freed virtual column after rollback of secondary index
Problem:
========
 InnoDB fails to clean the index stub if it fails to add the
virtual index which contains new virtual column. But it clears
the newly virtual column from index in clear_added_indexes()
during inplace_alter_table. On commit, InnoDB evicts and
reload the table. In case of rollback, it doesn't happen.
InnoDB clears the ABORTED index while opening the table
or doing the DDL. In the mean time, InnoDB can access
the dropped virtual index columns while creating prebuilt
or rollback of concurrent DML.

Solution:
==========
(1) InnoDB should maintain newly added virtual column while
rollbacking the newly added virtual index.
(2) InnoDB must not defer the index removal
if the alter table is executed with LOCK=EXCLUSIVE.
(3) For LOCK=SHARED, InnoDB should check whether the table
has any other transaction lock other than alter transaction
before deferring the index stub.

Replaced has_new_v_col with dict_add_vcol_info in dict_index_t to
indicate whether the index has any new virtual column.

dict_index_t::has_new_v_col(): Returns whether the index has
newly added virtual column, it doesn't say which columns are
newly added virtual column

ha_innobase_inplace_ctx::is_new_vcol(): Return whether the
given column is added as a part of the current alter.

ha_innobase_inplace_ctx::clean_new_vcol_index(): Copy the newly
added virtual column to new_vcol_info in dict_index_t. Replace
the column in the index fields with virtual column stored
in new_vcol_info.

dict_index_t::assign_new_v_col(): Store the number of virtual
column added in index as a part of alter table.

dict_index_t::get_n_new_vcol(): Get the number of newly added
virtual column

dict_index_t::assign_drop_v_col(): Allocate the memory for
adding new virtual column in new_vcol_info.

dict_index_t::add_drop_v_col(): Add the newly added virtual
column in new_vcol_info.

dict_table_t::has_lock_for_other_trx(): Whether the table has
any other transaction lock than given transaction.

row_merge_drop_indexes(): Add parameter alter_trx and check
whether the table has any other lock than alter transaction.
2021-04-12 16:06:06 +05:30
Thirunarayanan Balathandayuthapani
1ac4d0c168 BtrBulk::table_name(): Return the table name while displaying
table name for fts diagnostics
2021-04-09 17:38:21 +05:30
Marko Mäkelä
450c017c2d Merge 10.2 into 10.3 2021-04-09 14:32:06 +03:00
Thirunarayanan Balathandayuthapani
5a3151bcda Improve diagnostics in order to catch MDEV-18868 and similar bugs 2021-04-09 12:01:42 +05:30
Marko Mäkelä
de119fa2b6 MDEV-25297 Assertion: trx->roll_limit <= trx->undo_no in ROLLBACK TO SAVEPOINT
In commit 8ea923f55b (MDEV-24818)
when we optimized multi-statement INSERT transactions into empty tables,
we would roll back the entire transaction on any error. But, we would
fail to invalidate any SAVEPOINT that had been requested in the past.

trx_t::savepoints_discard(): Renamed from trx_roll_savepoints_free().

row_mysql_handle_errors(): If we were in bulk insert, invoke
trx_t::savepoints_discard(). In this way, a future attempt of
ROLLBACK TO SAVEPOINT will return an error.
2021-04-09 09:18:07 +03:00
Marko Mäkelä
1fde581237 MDEV-25315 Crash in SHOW ENGINE INNODB STATUS
In commit 8ea923f55b (MDEV-24818)
when we optimized multi-statement INSERT into an empty table,
we would sometimes wrongly enable bulk insert into a table that
is actually already using row-level locking and undo logging.

trx_has_lock_x(): New predicate, to check if the transaction of
the current thread is holding an exclusive lock on a table.

trx_undo_report_row_operation(): Only invoke
trx_mod_table_time_t::start_bulk_insert() if
trx_has_lock_x() holds.
2021-04-08 18:01:27 +03:00
Marko Mäkelä
7c524d4414 MDEV-25371 Potential hang in wsrep_is_BF_lock_timeout()
In commit e71e613353 (MDEV-24671),
lock_sys.wait_mutex was moved above lock_sys.mutex
(which was later replaced with lock_sys.latch) in the latching order.

In commit 7cf4419fc4 (MDEV-24789),
a potential hang was introduced to Galera. The function lock_wait()
would hold lock_sys.wait_mutex while invoking wsrep_is_BF_lock_timeout(),
which in turn could acquire LockMutexGuard for some diagnostic printout.

wsrep_is_BF_lock_timeout(): Do not invoke trx_print_latched() or
LockMutexGuard.

lock_sys_t::wr_lock(), lock_sys_t::rd_lock(): Assert that the current
thread is not holding lock_sys.wait_mutex.
Unfortunately, RW-locks are not covered by SAFE_MUTEX.

Reviewed by: Jan Lindström
2021-04-08 13:32:16 +03:00
Daniel Black
553ef1a78b MDEV-13115: Implement SELECT SKIP LOCKED
Adds an implementation for SELECT ... FOR UPDATE SKIP LOCKED /
SELECT ... LOCK IN SHARED MODE SKIP LOCKED

This is implemented only InnoDB at the moment, not in RockDB yet.

This adds a new hander flag HA_CAN_SKIP_LOCKED than
will be used when the storage engine advertises the flag.

When a storage engine indicates this flag it will get
TL_WRITE_SKIP_LOCKED and TL_READ_SKIP_LOCKED transaction types.

The Lex structure has been updated to store both the FOR UPDATE/LOCK IN
SHARE as well as the SKIP LOCKED so the SHOW CREATE VIEW
implementation is simplier.

"SELECT FOR UPDATE ... SKIP LOCKED" combined with CREATE TABLE AS or
INSERT.. SELECT on the result set is not safe for STATEMENT based
replication. MIXED replication will replicate this as row based events."

Thanks to guidance from Facebook commit
193896c466
This helped verify basic test case, and components that need implementing
(even though every part was implemented differently).

Thanks Marko for guidance on simplier InnoDB implementation.

Reviewers: Marko, Monty
2021-04-08 16:51:36 +10:00
Marko Mäkelä
cf552f5886 MDEV-25312 Replace fil_space_t::name with fil_space_t::name()
A consistency check for fil_space_t::name is causing recovery failures
in MDEV-25180 (Atomic ALTER TABLE). So, we'd better remove that field
altogether.

fil_space_t::name was more or less a copy of dict_table_t::name
(except for some special cases), and it was not being used for
anything useful.

There used to be a name_hash, but it had been removed already in
commit a75dbfd718 (MDEV-12266).

We will also remove os_normalize_path(), OS_PATH_SEPARATOR,
OS_PATH_SEPATOR_ALT. On Microsoft Windows, we will treat \ and /
roughly in the same way. The intention is that for per-table
tablespaces, the filenames will always follow the pattern
prefix/databasename/tablename.ibd. (Any \ in the prefix must not
be converted.)

ut_basename_noext(): Remove (unused function).

read_link_file(): Replaces RemoteDatafile::read_link_file().
We will ensure that the last two path component separators are
forward slashes (converting up to 2 trailing backslashes on
Microsoft Windows), so that everywhere else we can
assume that data file names end in "/databasename/tablename.ibd".

Note: On Microsoft Windows, path names that start with \\?\ must
not contain / as path component separators. Previously, such paths
did work in the DATA DIRECTORY argument of InnoDB tables.

Reviewed by: Vladislav Vaintroub
2021-04-07 18:01:13 +03:00
Marko Mäkelä
75db05c599 Merge 10.5 into 10.6 (except MDEV-24630)
Commit 76d2846a71 was for 10.5 only.
It caused some performance regression on 10.6 in some cases,
likely related to the removal of ib_mutex_t in MDEV-21452.
2021-03-30 13:28:05 +03:00
Marko Mäkelä
8c2e3259c1 MDEV-24302 follow-up: RESET MASTER hangs
As pointed out by Andrei Elkin, the previous fix did not fix one
race condition that may have caused the observed hang.

innodb_log_flush_request(): If we are enqueueing the very first
request at the same time the log write is being completed,
we must ensure that a near-concurrent call to log_flush_notify()
will not result in a missed notification. We guarantee this by
release-acquire operations on log_requests.start and
log_sys.flushed_to_disk_lsn.

log_flush_notify_and_unlock(): Cleanup: Always release the mutex.

log_sys_t::get_flushed_lsn(): Use acquire memory order.

log_sys_t::set_flushed_lsn(): Use release memory order.

log_sys_t::set_lsn(): Use release memory order.

log_sys_t::get_lsn(): Use relaxed memory order by default, and
allow the caller to specify acquire memory order explicitly.
Whenever the log_sys.mutex is being held or when log writes are
prohibited during startup, we can use a relaxed load. Likewise,
in some assertions where reading a stale value of log_sys.lsn
should not matter, we can use a relaxed load.

This will cause some additional instructions to be emitted on
architectures that do not implement Total Store Ordering (TSO),
such as POWER, ARM, and RISC-V Weak Memory Ordering (RVWMO).
2021-03-30 10:29:11 +03:00
Marko Mäkelä
2ad61c6782 Merge 10.5 into 10.6 2021-03-29 16:16:12 +03:00
Krunal Bauskar
0f6f72965b MDEV-25281: Switch to use non-atomic (vs atomic) distributed counter to
track page-access counter

As part of MDEV-21212, n_page_gets that is meant to track page access,
is ported to use distributed counter that default uses atomic sub-counters.

n_page_gets originally was a non-atomic counter that represented an approximate
value of pages tracked. Using the said analogy it doesn't need to be
an atomic distributed counter.

This patch introduces an interface that allows distributed counter to be
used with atomic and non-atomic sub-counter (through template) and also
port n_page_gets to use non-atomic distributed counter using the said
updated interface.
2021-03-29 16:15:28 +03:00
Marko Mäkelä
e8b7fceb82 MDEV-24302: RESET MASTER hangs
Starting with MariaDB 10.5, roughly after MDEV-23855 was fixed,
we are observing sporadic hangs during the execution of the
RESET MASTER statement. We are hoping to fix the hangs with these
changes, but due to the rather infrequent occurrence of the hangs
and our inability to reliably reproduce the hangs, we cannot be
sure of this.

What we do know is that innodb_force_recovery=2 (or a larger setting)
will prevent srv_master_callback (the former srv_master_thread) from
running. In that mode, periodic log flushes would never occur and
RESET MASTER could hang indefinitely. That is demonstrated by the new
test case that was developed by Andrei Elkin. We fix this case by
implementing a special case for it.

This also includes some code cleanup and renames of misleadingly
named code. The interface has nothing to do with log checkpoints in
the storage engine; it is only about requesting log writes to be
persistent.

handlerton::commit_checkpoint_request,
commit_checkpoint_notify_ha(): Remove the unused parameter hton.

log_requests.start: Replaces pending_checkpoint_list.
log_requests.end: Replaces pending_checkpoint_list_end.
log_requests.mutex: Replaces pending_checkpoint_mutex.

log_flush_notify_and_unlock(), log_flush_notify(): Replaces
innobase_mysql_log_notify().  The new implementation should be
functionally equivalent to the old one.

innodb_log_flush_request(): Replaces innobase_checkpoint_request().
Implement a fast path for common cases, and reduce the mutex hold time.
POSSIBLE FIX OF THE HANG: We will invoke commit_checkpoint_notify_ha()
for the current request if it is already satisfied, as well as invoke
log_flush_notify_and_unlock() for any satisfied requests.

log_write(): Invoke log_flush_notify() when the write is already durable.
This was missing WITH_PMEM when the log is in persistent memory.

Reviewed by: Vladislav Vaintroub
2021-03-29 15:16:23 +03:00
Marko Mäkelä
e538cb095f Merge 10.5 into 10.6 2021-03-27 18:03:03 +02:00
Marko Mäkelä
80459bcbd4 Merge 10.4 into 10.5 2021-03-27 17:37:42 +02:00
Marko Mäkelä
7ae37ff74f Merge 10.3 into 10.4 2021-03-27 17:12:28 +02:00
Marko Mäkelä
3157fa182a Merge 10.2 into 10.3 2021-03-27 16:11:26 +02:00
Marko Mäkelä
356c149603 Merge 10.5 into 10.6 2021-03-26 11:50:32 +02:00
Daniel Black
bcb9ca4105 MEM_CHECK_DEFINED: replace HAVE_valgrind
HAVE_valgrind_or_MSAN to HAVE_valgrind was incorrect in
af784385b4.

In my_valgrind.h when clang exists (hence no __has_feature(memory_sanitizer),
and -DWITH_VALGRIND=1, but without memcheck.h, we end up with a MEM_CHECK_DEFINED
being empty.

If we are also doing a CMAKE_BUILD_TYPE=Debug this results a number of
[-Werror,-Wunused-variable] errors because MEM_CHECK_DEFINED is empty.
With MEM_CHECK_DEFINED empty, there becomes no uses of this of the
fixed field and innodb variables in this patch.

So we stop using HAVE_valgrind as catchall and use the name
HAVE_CHECK_MEM to indicate that a CHECK_MEM_DEFINED function exists.

Reviewer: Monty

Corrects: af784385b4
2021-03-26 07:58:49 +11:00
Eugene Kosov
1274bb7729 MDEV-25223 change fil_system_t::space_list and fil_system_t::named_spaces from UT_LIST to ilist
Mostly a refactoring. Also, debug functions were added for ease of life while debugging
2021-03-24 15:15:18 +03:00
Marko Mäkelä
e731a28394 MDEV-24589 DROP TABLE is not crash-safe
row_upd_clust_step(): Remove the "trigger" on DELETE SYS_INDEXES
that would invoke dict_drop_index_tree(). Let us do it on purge.

row_purge_remove_clust_if_poss_low(): Invoke
dict_drop_index_tree() when purging a delete-marked SYS_INDEXES record.
2021-03-23 16:20:15 +02:00
Marko Mäkelä
8f23aab068 MDEV-15528 fixup: Remove dict_table_open_on_index_id()
dict_table_open_on_index_id(): Remove. This function was used by
the background scrubbing, which was removed in
commit a5584b13d1.
2021-03-23 14:47:04 +02:00
Marko Mäkelä
0f8caadc96 MDEV-22653: Remove the useless parameter innodb_simulate_comp_failures
The debug parameter innodb_simulate_comp_failures injected compression
failures for ROW_FORMAT=COMPRESSED tables, breaking the pre-existing
logic that I had implemented in the InnoDB Plugin for MySQL 5.1 to prevent
compressed page overflows. A much better check is already achieved by
defining UNIV_ZIP_COPY at the compilation time.
(Only UNIV_ZIP_DEBUG is part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON.)
2021-03-22 18:12:44 +02:00