Problem:
=========
One of the purge thread access the corrupted page and tries to remove from
LRU list. In the mean time, other purge threads are waiting for same page
in buf_wait_for_read(). Assertion(buf_fix_count == 0) fails for the
purge thread which tries to remove the page from LRU list.
Solution:
========
- Set the page id as FIL_NULL to indicate the page is corrupted before
removing the block from LRU list. Acquire hash lock for the particular
page id and wait for the other threads to release buf_fix_count
for the block.
- Added the error check for btr_cur_open() in row_search_on_row_ref().
Explicitly mention every options in .clang-format to protect us from possible
future changes.
Remove separate InnoDB style.
Change style to look more like this script:
for x in $@
do
indent -kr -bl -bli0 -l79 -i2 -nut -c48 -dj -cp0 $x
sed -ri -e 's/ = /= /g'\
-e '/switch.*\)$/{N;s/\n[ ]+/ /}' $x
done
Significant different is that 'switch' and '{' are put on different lines
because it's impossible in clang-format to set formatting rules just for
'switch' statement.
GCC 9.1.1 noticed that sd_notifyf() was always being invoked with
str=NULL argument for "%s". This code was added in
commit 2e814d4702
but not mentioned in the commit comment.
The STATUS messages for systemd matter during startup and shutdown,
and should not be emitted during normal operation.
ib_senderrf(): Remove the potentially harmful sd_notifyf() calls.
- If InnoDB encounters garbage or incomplete written log block during
recovery then don't throw the error. Treat it as end of the log.
- This kind of incomplete or empty block can be result of killing
InnoDB when writing the redo log.
Some I/O functions and macros that are declared in os0file.h used to
return a Boolean status code (nonzero on success). In MySQL 5.7, they
were changed to return dberr_t instead. Alas, in MariaDB Server 10.2,
some uses of functions were not adjusted to the changed return value.
Until MDEV-19231, the valid values of dberr_t were always nonzero.
This means that some code that was incorrectly checking for a zero
return value from the functions would never detect a failure.
After MDEV-19231, some tests for ALTER ONLINE TABLE would fail with
cmake -DPLUGIN_PERFSCHEMA=NO. It turned out that the wrappers
pfs_os_file_read_no_error_handling_int_fd_func() and
pfs_os_file_write_int_fd_func() were wrongly returning
bool instead of dberr_t. Also the callers of these functions were
wrongly expecting bool (nonzero on success) instead of dberr_t.
This mistake had been made when the addition of these functions was
merged from MySQL 5.6.36 and 5.7.18 into MariaDB Server 10.2.7.
This fix also reverts commit 40becbc3c7
which attempted to work around the problem.
Problem:
=======
fil_iterate() writes imported tablespace page0 as it is to discarded
tablespace. Space id wasn't even changed. While opening the tablespace,
tablespace fails with space id mismatch error.
Fix:
====
fil_iterate() copies the page0 with discarded space id to imported
tablespace.
os_file_write_func() and os_file_read_no_error_handling_func() returned
different result values depending on if UNIV_PFS_IO was defined or not.
Other things:
- Added some comments about return values for some functions
The issue is that two MARIA_HA instances shares the same MARIA_STATUS_INFO
object during UNION execution, so the second MARIA_HA instance state pointer
MARIA_HA::state points to the MARIA_HA::state_save of the first MARIA instance.
This happens in
thr_multi_lock(...) {
...
for (first_lock=data, pos= data+1 ; pos < end ; pos++)
{
...
if (pos[0]->lock == pos[-1]->lock && pos[0]->lock->copy_status)
(pos[0]->lock->copy_status)((*pos)->status_param,
(*first_lock)->status_param);
...
}
...
}
Usually the state is restored from ha_maria::external_lock(...):
\#0 _ma_update_status (param=0x6290000e6270) at ./storage/maria/ma_state.c:309
\#1 0x00005555577ccb15 in _ma_update_status_with_lock (info=0x6290000e6270) at ./storage/maria/ma_state.c:361
\#2 0x00005555577c7dcc in maria_lock_database (info=0x6290000e6270, lock_type=2) at ./storage/maria/ma_locking.c:66
\#3 0x0000555557802ccd in ha_maria::external_lock (this=0x61d0001b1308, thd=0x62a000048270, lock_type=2) at ./storage/maria/ha_maria.cc:2727
But _ma_update_status() does not take into account the case when
MARIA_HA::status points to the MARIA_HA::state_save of the other MARIA_HA
instance.
The fix is to restore MARIA_HA::state in ha_maria::external_lock() after
maria_lock_database() call for transactional tables.
Reverted incorrect change introduced by 548d03d7.
As result is char**, third qsort() parameter must be sizeof(char*).
Not sizeof(result[0] + 2), which is same as sizeof(result[0]).
Not even sizeof(result[0]) + 2, which would cause invalid memory access.
Proper sorting is responsibility of logfilenamecompare() callback.
Problem:
=======
rpl_blackhole.test fails when executed with following options
mysqld=--binlog_annotate_row_events=1, mysqld=--replicate_annotate_row_events=1
Test output:
------------
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
rpl.rpl_blackhole_bug 'mix' [ pass ] 791
rpl.rpl_blackhole_bug 'row' [ fail ]
Replicate_Wild_Ignore_Table
Last_Errno 1032
Last_Error Could not execute Update_rows_v1 event on table test.t1; Can't find
record in 't1', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's
master log master-bin.000001, end_log_pos 1510
Analysis:
=========
Enabling "replicate_annotate_row_events" on slave, Tells the slave to write
annotate rows events received from the master to its own binary log. The
received annotate events are applied after the Gtid event as shown below.
thd->query() will be set to the actual query received from the master, through
annotate event. Annotate_rows event should not be deleted after the event is
applied as the thd->query will be used to generate new Annotate_rows event
during applying the subsequent Rows events. After the last Rows event has been
applied, the saved Annotate_rows event (if any) will be deleted.
In balckhole engine all the DML operations are noops as they donot store any
data. They simply return success without doing any operation. But the existing
strictly expects thd->query() to be 'NULL' to identify that row based
replication is in use. This assumption will fail when row annotations are
enabled as the query is not 'NULL'. Hence various row based operations like
'update', 'delete', 'index lookup' will fail when row annotations are enabled.
Fix:
===
Extend the row based replication check to include row annotations as well.
i.e Either the thd->query() is NULL or thd->query() points to query and row
annotations are in use.
row_search_mvcc(): Duplicate the logic of btr_pcur_move_to_next()
so that an infinite loop can be avoided when advancing to the next
page fails due to a corrupted page.
At higher levels of innodb_force_recovery, the InnoDB transaction
subsystem will not be set up at all.
At slightly lower levels, recovered transactions will not be rolled back,
and DDL operations could hang due to locks being held at all.
Let us consistently refuse all writes if the predicate
high_level_read_only holds. We failed to refuse DROP TABLE
and DROP DATABASE. (Refusing DROP TABLE is a partial backport
from MDEV-19570 in the 10.5 branch.)
- If one of the encryption threads already started the initialization
of the tablespace then don't remove the other uninitialized tablespace
from the rotation list.
- If there is a change in innodb_encrypt_tables then
don't remove the processed tablespace from rotation list.
- Don't apply redo log for the corrupted page when innodb_force_recovery > 0.
- Allow the table to be dropped when index root page is
corrupted when innodb_force_recovery > 0.
The update callback functions for several settable global InnoDB variables
are acquiring InnoDB latches while holding LOCK_global_system_variables.
On the other hand, some InnoDB code is invoking THDVAR() while holding
InnoDB latches. An example of this is thd_lock_wait_timeout() that is
called by lock_rec_enqueue_waiting(). In some cases, the
intern_sys_var_ptr() that is invoked by THDVAR() may acquire
LOCK_global_system_variables, via sync_dynamic_session_variables().
In lock_rec_enqueue_waiting(), we really must be holding some InnoDB
latch while invoking THDVAR(). This implies that
LOCK_global_system_variables must conceptually reside below any InnoDB
latch in the latching order. That in turns implies that the various
update callback functions must release LOCK_global_system_variables
before acquiring any InnoDB mutexes or rw-locks, and reacquire
LOCK_global_system_variables later. The validate functions are being
invoked while not holding LOCK_global_system_variables and thus they
do not need any changes.
The following statements are affected by this:
SET GLOBAL innodb_adaptive_hash_index = …;
SET GLOBAL innodb_cmp_per_index_enabled = 1;
SET GLOBAL innodb_old_blocks_pct = …;
SET GLOBAL innodb_fil_make_page_dirty_debug = …; -- debug builds only
SET GLOBAL innodb_buffer_pool_evict = uncompressed; -- debug builds only
SET GLOBAL innodb_purge_run_now = 1; -- debug builds only
SET GLOBAL innodb_purge_stop_now = 1; -- debug builds only
SET GLOBAL innodb_log_checkpoint_now = 1; -- debug builds only
SET GLOBAL innodb_buf_flush_list_now = 1; -- debug builds only
SET GLOBAL innodb_buffer_pool_dump_now = 1;
SET GLOBAL innodb_buffer_pool_load_now = 1;
SET GLOBAL innodb_buffer_pool_load_abort = 1;
SET GLOBAL innodb_status_output = …;
SET GLOBAL innodb_status_output_locks = …;
SET GLOBAL innodb_encryption_threads = …;
SET GLOBAL innodb_encryption_rotate_key_age = …;
SET GLOBAL innodb_encryption_rotation_iops = …;
SET GLOBAL innodb_encrypt_tables = …;
SET GLOBAL innodb_disallow_writes = …;
buf_LRU_old_ratio_update(): Correct the return type.
The INFORMATION_SCHEMA plugin INNODB_SYS_VIRTUAL, which was introduced
in MariaDB 10.2.2 along with the dictionary table SYS_VIRTUAL,
is similar to other, much older and already stable plugins that
provide access to InnoDB dictionary tables.
Bootstrapping a new cluster from a backup created from a MariaDB
version prior to 10.3.5 may result in error "SST position can't be
set in past" when attempting to join additional nodes.
The problem stems from the fact that when reading the wsrep position
from InnoDB, the position is looked up in two places:
the TRX_SYS page, where versions prior to 10.3.5 used to store
WSREP's position; and rollback segments, this is where newer versions
store the position.
When starting a new cluster, the starting seqno is 0 and a new cluster
UUID is generated. This is persisted in rollback segments, but the old
UUID and seqno are not cleared from TRX_SYS page.
Subsequently, when reading back the position,
trx_rseg_read_wsrep_checkpoint() is going to return the maximum seqno
found in both TRX_SYS page and rollback segments. So in the case of a
newly bootstrapped cluster, it's always going to return the old
cluster information.
The fix consists of changing trx_rseg_read_wsrep_checkpoint() so that
only rollback segments are looked up. On startup, position is read
from the TRX_SYS page, and if present, it is copied to rollback
segments (unless a newer position is already present in the rollback
segments).
Finally the position stored in TRX_SYS page is cleared.
row_insert_for_mysql(): InnoDB sets values for row_start and row_end.
And this function used to return those values to server in
ha_innobase::write_row(). This buggy behavior was removed. Also,
a piece of code in this function was reformatted.
upd_node_t::make_versioned_helper(): Assert that the preallocated size
of the update vector is not exceeded.
Fix both code paths:
- Change the test source code so it doesn't cause the "Unused variable"
warning (which -Werror converted into error and caused CMake not to set
HAVE_THREAD_LOCAL)
- If the system doesn't seem to support HAVE_THREAD_LOCAL, refuse to
compile (rather than producing a binary that crashes for some tests)
Originally submitted at https://github.com/facebook/mysql-5.6/pull/905
Use thd_get_ha_data()/thd_set_ha_data() which protect against plugin
removal until it has THD ha_data.
Do not reset THD ha_data in rocksdb_close_connection(), cleaner approach
is to let ha_close_connection() do it.
Removed transaction objects cleanup from rocksdb_done_func(). As we lock
plugin properly, there must be no transaction objects during RocksDB
shutdown.
log_buffer_extend(): Do not write to disk. Just allocate new bigger
buffer and copy contents of old one to it. Do not acquire write_mutex.
log_t::is_extending: Removed as unneeded now.
LOG_BUFFER_SIZE: Removed to make the dependence on srv_log_buffer_size
visible.