row_merge_is_index_usable(): Allow access to any SEQUENCE, even if it was
created after the read view. SQL sequences are no-rollback tables with no
history at all.
This is a backport of
commit fd9ca2a742 (MDEV-23295) and
commit 9a156e1a23 (MDEV-23345) to 10.3.
An instant ADD/DROP/reorder column could create a dummy table
object with the wrong ROW_FORMAT when innodb_default_row_format
was changed between CREATE TABLE and ALTER TABLE.
prepare_inplace_alter_table_dict(): If we had promised that
ALGORITHM=INPLACE is supported, we must preserve the ROW_FORMAT.
The rest of the changes are related to adding
Alter_inplace_info::inplace_supported to cache the return value of
handler::check_if_supported_inplace_alter().
It is possible that an object that was originally created by
open_purge_table() will remain cached and reused for SQL execution.
Our previous fix wrongly assumed that ha_innobase::open() would
always be called before SQL execution starts. Therefore, we must
invoke dict_stats_init() in ha_innobase::info_low() instead of
only doing it in ha_innobase::open().
Note: Concurrent execution of dict_stats_init() on the same table
is possible, but it also was possible between two calls to
ha_innobase::open(), with no ill effects observed.
This should fix the assertion failure on stat_initialized.
A possibly easy way to reproduce it would have been
to run the server with innodb_force_recovery=2 (disable the purge of
history), update a table so that an indexed virtual column will be
affected, and finally restart the server normally (purge enabled),
to observe a crash when the table is accessed from SQL.
The problem was first observed and this fix verified by
Elena Stepanova. Also Thirunarayanan Balathandayuthapani
repeated the problem.
row_sel_sec_rec_is_for_clust_rec(): If the field in the
clustered index record stored off page, always fetch it,
also when the secondary index field has been built on the
entire column. This was broken ever since the InnoDB Plugin
for MySQL Server 5.1 introduced ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED for InnoDB tables. That code was first
introduced in this tree in
commit 3945d5e554.
For the original ROW_FORMAT=REDUNDANT and the MySQL 5.0.3
ROW_FORMAT=COMPRESSED, there was no problem, because for
those tables we always stored at least a 768-byte prefix of
each column in the clustered index record.
row_sel_sec_rec_is_for_blob(): Allow prefix_len==0 for matching
the full column.
Buffer overflow in ib_push_warning() fixed by using vsnprintf().
InnoDB parser was obsoleted by MDEV-16417.
Thanks to Nikita Malyavin for review and suggestion.
to mysql interpreter
InnoDB returns uninitialized statistics to mysql interpreter
when background thread is opening the table. So it leads to
assertion failure. In that case, InnoDB avoid sending
innodb statistics information to mysql interpreter.
PROBLEM
-------
1. The customer had presented a stack which had many threads waiting on
multiple mutexes like LOCK_Status, srv_innodb_monitor_mutex, ibuf_mutex etc.
2. The root cause was that the AHI latch was held in S (shared) mode by the a thread which was
doing a truncate of a large table .
3. There was another thread which was trying to acquire the AHI latch in X (exclusive) mode
4. With our lock implementation any thread requesting a X lock ,blocks rest of the threads
requesting S(shared) locks,this caused many threads to wait for this shared lock.
5. The main reason why we hold the latches in truncate is to avoid disabling of AHI
during truncate
FIX
InnoDB purge thread locks the root page of clustered index
while accessing the undo log records and later same thread
tries to open the table, initialize statistics and tries
to lock the clustered index root page while doing virtual
column computation.
Solution:
=========
InnoDB should prevent statistics initialization when the
table is being opened by purge thread
Between btr_pcur_store_position() and btr_pcur_restore_position()
it is possible that purge empties a table and enlarges
index->n_core_fields and index->n_core_null_bytes.
Therefore, we must cache index->n_core_fields in
btr_pcur_t::old_n_core_fields so that btr_pcur_t::old_rec can be
parsed correctly.
Unfortunately, this is a huge change, because we will replace
"bool leaf" parameters with "ulint n_core"
(passing index->n_core_fields, or 0 for non-leaf pages).
For special cases where we know that index->is_instant() cannot hold,
we may also pass index->n_fields.
Problem:
========
InnoDB fails to clean the index stub if it fails to add the
virtual index which contains new virtual column. But it clears
the newly virtual column from index in clear_added_indexes()
during inplace_alter_table. On commit, InnoDB evicts and
reload the table. In case of rollback, it doesn't happen.
InnoDB clears the ABORTED index while opening the table
or doing the DDL. In the mean time, InnoDB can access
the dropped virtual index columns while creating prebuilt
or rollback of concurrent DML.
Solution:
==========
(1) InnoDB should maintain newly added virtual column while
rollbacking the newly added virtual index.
(2) InnoDB must not defer the index removal
if the alter table is executed with LOCK=EXCLUSIVE.
(3) For LOCK=SHARED, InnoDB should check whether the table
has any other transaction lock other than alter transaction
before deferring the index stub.
Replaced has_new_v_col with dict_add_vcol_info in dict_index_t to
indicate whether the index has any new virtual column.
dict_index_t::has_new_v_col(): Returns whether the index has
newly added virtual column, it doesn't say which columns are
newly added virtual column
ha_innobase_inplace_ctx::is_new_vcol(): Return whether the
given column is added as a part of the current alter.
ha_innobase_inplace_ctx::clean_new_vcol_index(): Copy the newly
added virtual column to new_vcol_info in dict_index_t. Replace
the column in the index fields with virtual column stored
in new_vcol_info.
dict_index_t::assign_new_v_col(): Store the number of virtual
column added in index as a part of alter table.
dict_index_t::get_n_new_vcol(): Get the number of newly added
virtual column
dict_index_t::assign_drop_v_col(): Allocate the memory for
adding new virtual column in new_vcol_info.
dict_index_t::add_drop_v_col(): Add the newly added virtual
column in new_vcol_info.
dict_table_t::has_lock_for_other_trx(): Whether the table has
any other transaction lock than given transaction.
row_merge_drop_indexes(): Add parameter alter_trx and check
whether the table has any other lock than alter transaction.
When a table has been evicted from dict_sys and reloaded internally by
InnoDB for FOREIGN KEY processing, statistics may not be initialized,
but nevertheless row_update_cascade_for_mysql() could invoke
dict_stats_update_if_needed(). In that case, we cannot really update
the statistics. For tables that have STATS_PERSISTENT=1 and
STATS_AUTO_RECALC=1, ANALYZE TABLE might have to be executed later.
dict_stats_update_if_needed(): Replace the assertion with
a conditional early return.
trx_sys_any_active_transactions(): Remove a bogus debug assertion.
In trx_commit_in_memory() and trx_erase_lists(), we will remove
the transaction from trx_sys->rw_trx_list and set the state to
TRX_STATE_COMMITTED_IN_MEMORY.
server failure in different, confusing ways
InnoDB fails to free the buffer pool instance mutex and zip mutex
If the allocation of buffer pool instance chunk fails. So it leads
to freeing of buffer pool before freeing the mutexes and
leads to double freeing of memory while freeing the mutex
during shutdown.
InnoDB should skip the dropped aborted FTS_DOC_ID_INDEX while
checking the existing FTS_DOC_ID_INDEX in the table. InnoDB
should able to create new FTS_DOC_ID_INDEX if the fulltext
index is being added for the first time.
During the prepare phase of restoring backups, "mariabackup" does
not seem to allow (or recognize) the option "innodb_force_recovery"
for the embedded InnoDB server instance that it starts.
If page corruption observed during page recovery, the prepare step
fails. While this is indeed the correct behavior ideally, allowing
this option to be set in case of emergencies might be useful when
the current backup is the only copy available. Some error messages
during "--prepare" suggest to set "innodb_force_recovery" to 1:
[ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
For backwards compatibility, "mariabackup --innobackupex --apply-log"
should also have this option.
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Virtual column fields are not found in prebuilt data type, so we should
match InnoDB fields with `get_innobase_type_from_mysql_type` method.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
- Aborting of fulltext index creation fails to remove the
index from sys indexes table. When we try to reload the
table definition, InnoDB fails with index count mismatch
error. InnoDB should remove the index from sys indexes while
rollbacking the secondary index creation.
- Importing table operation fails to punch the hole in
the filesystem when page compressed table is involved.
To achieve that, InnoDB firstly punches the hole for
the IOBuffer size(1MB). After that, InnoDB should write
page by page when page compression is involved.
Add condition on trx->state == TRX_STATE_COMMITTED_IN_MEMORY in order to
avoid unnecessary work. If a transaction has already been committed or
rolled back, it will release its locks in lock_release() and let
the waiting thread(s) continue execution.
Let BF wait on lock_rec_has_to_wait and if necessary other BF
is replayed.
wsrep_trx_order_before
If BF is not even replicated yet then they are ordered
correctly.
bg_wsrep_kill_trx
Make sure victim_trx is found and check also its state. If
state is TRX_STATE_COMMITTED_IN_MEMORY transaction is
already committed or rolled back and will release it locks
soon.
wsrep_assert_no_bf_bf_wait
Transaction requesting new record lock should be TRX_STATE_ACTIVE
Conflicting transaction can be in states TRX_STATE_ACTIVE,
TRX_STATE_COMMITTED_IN_MEMORY or in TRX_STATE_PREPARED.
If conflicting transaction is already committed in memory or
prepared we should wait. When transaction is committed in memory we
held trx mutex, but not lock_sys->mutex. Therefore, we
could end here before transaction has time to do lock_release()
that is protected with lock_sys->mutex.
lock_rec_has_to_wait
We very well can let bf to wait normally as other BF will be
replayed in case of conflict. For debug builds we will do
additional sanity checks to catch unsupported bf wait if any.
wsrep_kill_victim
Check is victim already in TRX_STATE_COMMITTED_IN_MEMORY state and
if it is we can return.
lock_rec_dequeue_from_page
lock_rec_unlock
Remove unnecessary wsrep_assert_no_bf_bf_wait function calls.
We can very well let BF wait here.
The function row_upd_clust_step() is invoking several static functions,
some of which used to commit the mini-transaction in some cases.
If innobase_get_computed_value() would fail due to some reason,
we would fail to invoke mtr_t::commit() and release buffer pool
page latches. This would likely lead to a hanging server later.
This regression was introduced in
commit 97db6c15ea (MDEV-20618).
row_upd_index_is_referenced(), row_upd_sec_index_entry(),
row_upd_sec_index_entry(): Cleanup: Replace some ibool with bool.
row_upd_clust_rec_by_insert(), row_upd_clust_rec(): Guarantee that
the mini-transaction will always remain in active state.
row_upd_del_mark_clust_rec(): Guarantee that
the mini-transaction will always remain in active state.
This fixes one "leak" of mini-transaction on DB_COMPUTE_VALUE_FAILED.
row_upd_clust_step(): Use only one return path, which will always
invoke mtr.commit(). After a failed row_upd_store_row() call, we
will no longer "leak" the mini-transaction.
This fix was verified by RQG on 10.6 (depending on MDEV-371 that
was introduced in 10.4). Unfortunately, it is challenging to
create a regression test for this, and a test case could soon become
invalid as more bugs in virtual column evaluation are fixed.
As suggested by Vladislav Vaintroub, let us remove misleading
and malformatted startup messages.
Even if the global variable srv_use_atomic_writes were set, we would
still invoke my_test_if_atomic_write() to check if writes are atomic
with a particular page size.
When using the default innodb_page_size=16k, page writes should be
atomic on NTFS when using ROW_FORMAT=COMPRESSED and KEY_BLOCK_SIZE<=4.
Disabling srv_use_atomic_writes when innodb_file_per_table=OFF does
not make sense, because that is a dynamic parameter.
We also correct the documentation string of innodb_use_atomic_writes
and remove the duplicate variable innobase_use_atomic_writes.
The debug parameter innodb_simulate_comp_failures injected compression
failures for ROW_FORMAT=COMPRESSED tables, breaking the pre-existing
logic that I had implemented in the InnoDB Plugin for MySQL 5.1 to prevent
compressed page overflows. A much better check is already achieved by
defining UNIV_ZIP_COPY at the compilation time.
(Only UNIV_ZIP_DEBUG is part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON.)
In commit eaeb8ec4b8 (MDEV-24653)
an incorrect debug assertion was introduced.
btr_pcur_store_position(): If the only record in the page is the
instant ALTER TABLE metadata record, we cannot expect there to be
a successor page. The situation could be improved by MDEV-24673 later.
Before MDEV-14638, there was no race condition between the
execution of fetch_data_into_cache() and transaction commit.
fetch_data_into_cache(): Acquire trx_t::mutex before checking
trx_t::state, to prevent a concurrent transition from
TRX_STATE_COMMITTED_IN_MEMORY to TRX_STATE_NOT_STARTED
in trx_commit_in_memory().
ha_innobase::info_low(): While collecting statistics for
ANALYZE TABLE, ensure that dict_stats_process_entry_from_recalc_pool()
is not executing on the same table.
We observed result differences for the test innodb.innodb_stats because
dict_stats_empty_index() was being invoked by the background statistics
calculation while ha_innobase::analyze() was executing
dict_stats_analyze_index_level().
Tests with 4096-byte sector size confirm that it is
safe to use O_DIRECT with page_compressed tables.
That had been disabled on Linux, in an attempt to fix MDEV-21584
which had been filed for the O_DIRECT problems earlier.
The fil_node_t::block_size was being set mostly correctly until
commit 10dd290b4b (MDEV-17380)
introduced a regression in MariaDB Server 10.4.4.
fil_node_t::read_page0(): Initialize fil_node_t::block_size.
This will probably make similar code in fil_space_extend_must_retry()
redundant, but we play it safe and will not remove that code.
Thanks to Vladislav Vaintroub for testing this on Microsoft Windows
using an old-fashioned rotational hard disk with 4KiB sector size.
Reviewed by: Vladislav Vaintroub
This had been originally added in
mysql/mysql-server@192bb153b6
with the motivation to disable O_DIRECT for the dedicated tablespace
for temporary tables. In MariaDB Server,
commit 5eb539555b (MDEV-12227)
should be a better solution.
The code became orphaned later in
mysql/mysql-server@c61244c0e6
and it had been applied to MariaDB Server 10.2.2 in
commit 2e814d4702 and
commit fec844aca8.
Thanks to Vladislav Vaintroub for spotting this.
Keyvalue can be longer than REC_VERSION_56_MAX_INDEX_COL_LEN
and this leads out-of-array reference. Use dynamic memory
allocation using actual max length of key value.
Online log for insert operation of redundant table fails with
index->is_instant() assert. Purge can reset the n_core_fields when
alter is waiting to upgrade MDL for commit phase of DDL. In the
meantime, any insert DML tries to log the operation fails with
index is not being instant.
row_log_get_n_core_fields(): Get the n_core_fields of online log
for the given index.
rec_get_converted_size_comp_prefix_low(): Use n_core_fields of online
log when InnoDB calculates the size of data tuple during redundant
row format table rebuild.
rec_convert_dtuple_to_rec_comp(): Use n_core_fields of online log
when InnoDB does the conversion of data tuple to record during
redudant row format table rebuild.
- Adding the test case which has more than 129 instant columns.