This is happening because they are declared as packed
and clang has -Waddress-of-packed-member when passing the
address of a packed member, a legit concern on different
architectures. The easiest way to get rid of the errors is to
remove the packed attribute from said structs.
Fix build on macOS 10.13:
39dceaae60 MDEV-10983: TokuDB does not compile on OS X 10.12
Make use of a different function to get the current tid.
Additionally, librt doesn't exist on OS X. Use System library instead.
storage/tokudb/PerconaFT/cmake_modules/TokuFeatureDetection.cmake | 4 +++-
storage/tokudb/PerconaFT/portability/portability.cc | 9 ++++++++-
storage/tokudb/PerconaFT/portability/tests/test-xid.cc | 9 ++++++++-
storage/tokudb/PerconaFT/portability/toku_config.h.in | 1 +
4 files changed, 20 insertions(+), 3 deletions(-)
Add variable rocksdb_remove_mariabackup_checkpoint.
If set, it will remove $rocksdb_datadir/mariabackup-checkpoint directory.
The variable is to be used by exclusively by mariabackup,
to remove temporary checkpoints.
fil_iterate(): Invoke fil_encrypt_buf() correctly when
a ROW_FORMAT=COMPRESSED table with a physical page size of
innodb_page_size is being imported. Also, validate the page checksum
before decryption, and reduce the scope of some variables.
AbstractCallback::operator()(): Remove the parameter 'offset'.
The check for it in FetchIndexRootPages::operator() was basically
redundant and dead code since the previous refactoring.
InnoDB insisted on closing the file handle before renaming a file.
Renaming a file should never be a problem on POSIX systems. Also on
Windows it should work if the file was opened in FILE_SHARE_DELETE
mode.
fil_space_t::stop_ios: Remove. We no longer need to stop file access
during rename operations.
fil_mutex_enter_and_prepare_for_io(): Remove the wait for stop_ios.
fil_rename_tablespace(): Remove the retry logic; do not close the
file handle. Remove the unused fault injection that was added along
with the DATA DIRECTORY functionality (MySQL WL#5980).
os_file_create_simple_func(), os_file_create_func(),
os_file_create_simple_no_error_handling_func(): Include FILE_SHARE_DELETE
in the share_mode. (We will still prevent multiple InnoDB instances
from using the same files by not setting FILE_SHARE_WRITE.)
ha_innobase::optimize(): If both innodb_defragment and
innodb_optimize_fulltext_only are at their default settings (OFF),
fall back to ALTER TABLE. Else process one or both options.
ha_innobase::set_partition_owner_stats(): Remove (unused function).
ha_innobase::ha_partition_stats: Remove (the variable is never read).
Remove unused ut_timer functions.
After a failed ADD INDEX, dict_index_remove_from_cache_low()
could iterate the index fields and dereference a freed virtual
column object when trying to remove the index from the v_indexes
of the virtual column.
This regression was caused by a merge of
MDEV-16119 InnoDB lock->index refers to a freed object.
ha_innobase_inplace_ctx::clear_added_indexes(): Detach the
indexes of uncommitted indexes from virtual columns, so that
the iteration in dict_index_remove_from_cache_low() can be avoided.
ha_innobase::prepare_inplace_alter_table(): Ignore uncommitted
corrupted indexes when rejecting ALTER TABLE. (This minor bug was
revealed by the extension of the test case.)
dict_index_t::detach_columns(): Detach an index from virtual columns.
Invoked by both dict_index_remove_from_cache_low() and
ha_innobase_inplace_ctx::clear_added_indexes().
dict_col_t::detach(const dict_index_t& index): Detach an index from
a column.
dict_col_t::is_virtual(): Replaces dict_col_is_virtual().
dict_index_t::has_virtual(): Replaces dict_index_has_virtual().
log_crypt_101_read_block(): Mimic MariaDB 10.1, and use the first
encryption key if the key for the checkpoint cannot be found.
Redo log encryption key rotation was ultimately disabled
in MDEV-9422 (MariaDB 10.1.13) due to design issues. So, from
MariaDB 10.1.13 onwards only one log encryption key should matter.
recv_log_format_0_recover(): Add the parameter 'bool crypt'.
Indicate when the log cannot be decrypted for upgrade, instead of
making a possibly false claim that the log requires crash recovery.
init_crypt_key(): Remove extra space from a message.
Also fixes MDEV-14727, MDEV-14491
InnoDB: Error: Waited for 5 secs for hash index ref_count (1) to drop to 0
by replacing the flawed wait logic in dict_index_remove_from_cache_low().
On DISCARD TABLESPACE, there is no need to drop the adaptive hash index.
We must drop it on IMPORT TABLESPACE, and eventually on DROP TABLE or
DROP INDEX. As long as the dict_index_t object remains in the cache
and the table remains inaccessible, the adaptive hash index entries
to orphaned pages would not do any harm. They would be dropped when
buffer pool pages are reused for something else.
btr_search_drop_page_hash_when_freed(), buf_LRU_drop_page_hash_batch():
Remove the parameter zip_size, and pass 0 to buf_page_get_gen().
buf_page_get_gen(): Ignore zip_size if mode==BUF_PEEK_IF_IN_POOL.
buf_LRU_drop_page_hash_for_tablespace(): Drop the adaptive hash index
even if the tablespace is inaccessible.
buf_LRU_drop_page_hash_for_tablespace(): New global function, to drop
the adaptive hash index.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Remove the parameter drop_ahi.
dict_index_remove_from_cache_low(): Actively drop the adaptive hash index
if entries exist. This should prevent InnoDB hangs on DROP TABLE or
DROP INDEX.
row_import_for_mysql(): Drop any adaptive hash index entries for the table.
row_drop_table_for_mysql(): Drop any adaptive hash index for the table,
except if the table resides in the system tablespace. (DISCARD TABLESPACE
does not apply to the system tablespace, and we do no want to drop the
adaptive hash index for other tables than the one that is being dropped.)
row_truncate_table_for_mysql(): Drop any adaptive hash index entries for
the table, except if the table resides in the system tablespace.
When the transaction isolation level is SERIALIZABLE, or when
a locking read is performed in the REPEATABLE READ isolation level,
InnoDB must lock delete-marked records in order to prevent another
transaction from inserting something.
However, at READ UNCOMMITTED or READ COMMITTED isolation level or
when the parameter innodb_locks_unsafe_for_binlog is set, the
repeatability of the reads does not matter, and there is no need
to lock any records.
row_search_for_mysql(): Skip locks on delete-marked committed records
upfront, instead of invoking row_unlock_for_mysql() afterwards.
The unlocking never worked for secondary index records.
infos[]: Allocate enough entries to accommodate all keys from both
checkpoint pages.
infos_used: The size of infos[].
get_crypt_info(): Merge to the only caller, log_crypt_101_read_block().
log_crypt_101_read_block(): Do not validate the log block checksum,
because it will not be valid when upgrading from MariaDB 10.1.10.
Instead, check that the encryption key exists.
log_crypt_101_read_checkpoint(): Append to infos[] instead of overwriting.
thd_destructor_proxy(): Ensure that purge actually exits,
like the logic should have been ever since MDEV-14080.
srv_purge_shutdown(): A new function to wait for the
purge coordinator to exit. Before exiting, the
purge coordinator will ensure that all purge workers have exited.
This is the MariaDB 10.2 version of the patch.
field_store_string(): Simplify the code.
field_store_index_name(): Remove, and use field_store_string()
instead. Starting with MariaDB 10.2.2, there is the predicate
dict_index_t::is_committed(), and dict_index_t::name never
contains the magic byte 0xff.
Correct some comments to refer to TEMP_INDEX_PREFIX_STR.
i_s_cmp_per_index_fill_low(): Use the appropriate value NULL to
identify that an index was not found. Check that storing each
column value succeeded.
i_s_innodb_buffer_page_fill(), i_s_innodb_buf_page_lru_fill():
Only invoke Field::set_notnull() if the index was found.
(This fixes the bug.)
i_s_dict_fill_sys_indexes(): Adjust the index->name that was
directly loaded from SYS_INDEXES.NAME (which can start with
the 0xff byte). This was the only function that depended
on the translation in field_store_index_name().
MDEV-10130 Assertion `share->in_trans == 0' failed in storage/maria/ma_close.c
MDEV-10378 Assertion `trn' failed in virtual int ha_maria::start_stmt
The problem was that maria_handler->trn was not properly reset
at commit/rollback and ha_maria::exernal_lock() could get confused
because.
There was some old code in ha_maria::implicit_commit() that tried
to take care of this, but it was not bullet proof.
Fixed by adding list of all tables that is part of the maria transaction to
TRN.
A nice side effect was of the fix is that loops in
ha_maria::implict_commit() got to be much simpler.
Other things:
- Fixed a bug in mysql_admin_table() where argument open_for_modify
was wrongly reset for the next table in the chain
- rollback admin command also in case of fatal error.
- Split _ma_set_trn_for_table() to three version to simplify code
and debugging.
- Several new asserts to detect the original problem (that file was
not properly removed from trn before calling ma_close())
Step#1: RocksDB files require a special #define when they are compiled
with valgrind. Without that, valgrind fails with an 'unimplemented syscall'
error for fcntl call.
The failures with valgrind occur as a result of Spider sometimes using the
wrong transaction for operations in background threads that send requests to
the data nodes. The use of the wrong transaction caused the networking to the
data nodes to use the wrong thread in some cases. Valgrind eventually
detects this when such a thread is destroyed before it is used to disconnect
from the data node by that wrong transaction when it is freed.
I have fixed the problem by correcting the transaction used in each of these
cases.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit afe5a51 on branch 10.2
The failures with valgrind occur as a result of Spider sometimes using the
wrong transaction for operations in background threads that send requests to
the data nodes. The use of the wrong transaction caused the networking to the
data nodes to use the wrong thread in some cases. Valgrind eventually
detects this when such a thread is destroyed before it is used to disconnect
from the data node by that wrong transaction when it is freed.
I have fixed the problem by correcting the transaction used in each of these
cases.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Merged:
Commit 4d576d9 on branch bb-10.3-MDEV-12900
Description:- MyISAM table gets corrupted with concurrent
executions of INSERT, DELETE statements in a particular
sequence.
Analysis:- Due to the inappropriate manipulation of w_lock
and r_lock associated with a MyISAM table, there arises a
scenario where the table's state information becomes
invalid.
Fix:- A lock is introduced to resolve this issue.
The crash occurs when a thread that is closing its connection attempts to
access Spider transaction information when another thread has freed that memory
while processing Spider plugin deinit. This occurs because Spider does not
adjust the plugin's reference count when it sets a transaction information
pointer for the plugin.
The fix I implemented changes the way Spider sets the transaction information
pointer to use thd_set_ha_data() so that Spider's plugin reference counter is
adjusted as well.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Merged From:
Commit ab9d420 on branch 10.2
Fix two issues:
1. Rdb_ddl_manager::rename() loses the value of m_hidden_pk_val. new
object used to get 0, which means "not loaded from the db yet".
2. ha_rocksdb::load_hidden_pk_value() uses current transaction (and its
snapshot) when loading hidden PK value from disk. This may cause it to
load an out-of-date value.
The crash occurs when a thread that is closing its connection attempts to
access Spider transaction information when another thread has freed that memory
while processing Spider plugin deinit. This occurs because Spider does not
adjust the plugin's reference count when it sets a transaction information
pointer for the plugin.
The fix I implemented changes the way Spider sets the transaction information
pointer to use thd_set_ha_data() so that Spider's plugin reference counter is
adjusted as well.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Merged From:
Commit eabfadc on branch bb-10.3-MDEV-7914
jemalloc > 5.0.0 doesn't like to be linked with
a dlopen-ed module.
Don't link tokudb with jemalloc on Fedora 28,
LD_PRELOAD it instead with mysqld_safe
and with systemd.
srv_purge_coordinator_thread(): Wait for all purge worker threads
to actually exit. An analysis of a core dump of a hung 10.3 server
revealed that one srv_worker_thread did not exit, even though the
purge coordinator had exited. This caused kill_server_thread and
mysqld_main to wait indefinitely. The main InnoDB shutdown was
never called, because unireg_end() was never called.
Problem was that if copy_data_between_tables() didn't do proper
clean up in case of failures:
- copy object was not properly freed
- end_bulk_insert() was not called
- mysql_trans_prepare_alter_copy_data() set THD->transaction.on to
false which was not properly restored
The last part caused a crash in Aria as Aria depends on that THD
is correct.
Other things:
- Reset info->switched_transactional after usage (safety)
- Reset bulk_insert_single_undo (safety)
RocksDB now supports "iterator bounds" which are min and max keys
that an iterator is interested in.
Iterator initialization function doesn't copy the keys, though, it keeps
pointers to them.
So if the buffer space for the keys is used for another iterator (the one
for checking for UNIUQE constraint violation in ha_rocksdb::ha_update_row)
then one can get incorrect query result.
Fixed by using a separate buffer for iterator bounds in the unique constraint
violation check.
Problem was that we the bitmap needs to be flushed before disabling
logging of redo entires, as writing the bitmap to disk by
background checkpoint may cause redo entries.
Merge a test case and a code change from MySQL 5.7.22.
There was no commit message, but a test case was included.
d3ec326bcd
There is no Bug 25899959 mentioned in the MySQL 8.0.11 history.
Based on the number, it should have been filed before August 2017.
Maybe it was initially fixed in a not-yet-public MySQL 9.0 branch?
The code change differs from MySQL 5.7, because the mbminmaxlen
were split in MariaDB in MDEV-7049.
MariaDB has a scaled-down version of the test so we need to set
@@rocksdb_max_row_locks lower to trigger the desired error
(didn't catch this on test BB run because this test is marked as "big")
Bloom filter is only used when reading the data from disk. If the data
happens to be still in the memtable, bloomfilter wont be used.
Stabilize the testcase by making sure the data is on disk before we
read it.
trx_purge_stop(): Release purge_sys->latch before attempting to
wake up the purge threads, so that they can actually wake up.
This is a regression of commit a13a636c74
which attempted to fix MDEV-11802 by ensuring that srv_purge_wakeup()
will actually wait until all purge threads wake up.
Due to the purge_sys->latch, the threads might never wake up,
because some purge threads could end up waiting for purge_sys->latch
in trx_undo_prev_version_build() while holding dict_operation_lock
in shared mode. This in turn would block any DDL operations, the
InnoDB master thread, and InnoDB shutdown.
Backport the fix from the upstream and add our testcase.
Backported cset:
commit 997a979bf5e2f75ab88781d9d3fd22dddc1fc21f
Author: Manuel Ung <mung@fb.com>
Date: Thu Feb 15 08:38:12 2018 -0800
Fix crashes in autoincrement code paths
Summary:
There are two issues related to autoincrement that can lead to crashes:
1. The max value for double/float type for autoincrement was not implemented in MyRocks, and can lead to assertions. The fix is to add them in.
2. If we try to set auto_increment via alter table on a table without an auto_increment column defined, we segfault because there is no index from which to read the last value. The fix is to perform a check to see if autoincrement exists before reading from index (similar to code ha_rocksdb::open).
Fixes https://github.com/facebook/mysql-5.6/issues/792
Closes https://github.com/facebook/mysql-5.6/pull/794
Differential Revision: D6995096
Pulled By: lth
fbshipit-source-id: 1130ce1
Problem:
Fix for Bug #21348684 (#Rb9581) introduced a conditional debug execute
'buf_pool_resize_chunk_null', which causes new chunks memory for 2nd
buffer pool instance is freed.
Buffer pool resize function removes all old chunks entry from
'buf_chunk_map_reg' and add new chunks entry into it. But when
'buf_pool_resize_chunk_null' is set true, 2nd buffer pool
instance's chunk entries are not added into 'buf_chunk_map_reg'.
When purge thread tries to access that buffer chunk, it leads to
debug assertion.
Fix:
Added old chunk entries into 'buf_chunk_map_reg' for 2nd buffer pool
instance when 'buf_pool_resize_chunk_null' debug condition is set to true.
Reviewed by: Jimmy <Jimmy.Yang@oracle.com>
RB: 18664
PROBLEM
Issue found during ntest run is a regression of Bug #27141613. The
issue is basically when index is being freed due to an error during its
creation,when the index isn't added to dictionary cache its field
columns are not set, the derefrencing of null col pointer during the
clean of index from the virtual column's leads to a crash.
NOTE: Also test i_innodb.virtual_debug was failing on 32k page size and
above for the newly added scenario. Fixed that.
FIX
Added a check that if only the index is cached , the virtual index
freeing from the virtual cols index list is performed.
Reviewed by: Satya Bodapati<satya.bodapati@oracle.com>
RB: 18670
PROBLEM
=======
When add of virtual index fails with DB_TOO_BIG_RECORD , the virtual
index being freed isn't removed from the list of indexes a virtual
column(which is part of the index). This while the undo log is read
could fetch a wrong value during rollback and cause the assertion
reported in the bug particularly.
FIX
===
Added a function that is called when the virtual index being freed would
allow the index be removed from the index list of virtual column which
was a field of that index.
Reviwed By: Jimmy Yang<Jimmy.Yang@oracle.com>
RB: 18528
PROBLEM
-------
Whenever an fts table is created it registers itself in a queue which
is operated by a background thread whose job is to optimize the
fts tables in background. Additionally we place these fts tables in
non-LRU list so that they cannot be evicted from cache. But in the
scenario when a node is brought up which is already having fts
tables ,we first try to load the fts tables in dictionary ,but we skip
the part where it is added in background queue and in non-LRU list because
the background thread is not yet created,so these tables are loaded
but they can be evicted from the cache. Now coming to the deadlock scenario
1. A Server background thread is trying to evict a table from the cache
because the cache is full,so it scans the LRU list for the tables it can
evict.It finds the fts table (because of the reason explained above)
can be evicted and it takes the dict_sys->mutex (this is a system wide mutex)
submits a request to the background thread to remove this table from queue
and waits it to be completed.
2. In the mean time fts_optimize_thread() is processing another job
in the queue and needs dict_sys->mutex for a small amount of time,
but it cannot get it because it is blocked by the first background thread.
So Thread 1 is waiting for its job to be completed by Thread 2,whereas Thread 2
is waiting for dict_sys->mutex held by thread 1 ,causing the deadlock.
FIX
Problem:
when incorrect value is assigned to innodb_data_file_path or
innodb_temp_data_file_path parameter, Innodb returns error and logs error
message in mysqlds.err file but there is no information in error message about
the parameter which causes Innodb initialization is failed.
Fix:
Added error message with parameter name and value, which causes Innodb
initialization is failed.
Reviewed by: Jimmy <Jimmy.Yang@oracle.com>
RB: 18206
Problem:
During ALTER, when filling stored column info, wrong column number is used.
This is because we ignored virtual column when iterating over columns in
table and lead to debug assertion.
Fix:
In InnoDB table cache object, vcols are on stored on one list, stored and
normal columns are stored in another list.
When looking for stored column, ignore the virtual columns to get the right
column number of stored column.
Reviewed by: Thiru <thirunarayanan.balathandayuth@oracle.com>,
Satya <satya.bodapati@oracle.com>
RB: 17939
This is the MariaDB equivalent of fixing the MySQL 5.7 regression
Bug #26935001 ALTER TABLE AUTO_INCREMENT TRIES TO READ
INDEX FROM DISCARDED TABLESPACE
Oracle did not publish a test case, but it is easy to guess
based on the commit message. The MariaDB code is different
due to MDEV-6076 implementing persistent AUTO_INCREMENT.
commit_set_autoinc(): Report ER_TABLESPACE_DISCARDED if the
tablespace is missing.
prepare_inplace_alter_table_dict(): Avoid accessing a discarded
tablespace. (This avoids generating warnings in fil_space_acquire().)
Problem:
When FTS index is added into a table which doesn't have 'FTS_DOC_ID'
column, Innodb rebuilds table to add column 'FTS_DOC_ID'. when this FTS
index is dropped from this table. Innodb doesn't not rebuild table to
remove 'FTS_DOC_ID' column and deletes FTS index auxiliary tables.
But it doesn't delete FTS common auxiliary tables.
Later when the database having this table is renamed, FTS auxiliary
tables are not renamed because table's flags2 (dict_table_t.flags2)
has been resetted for DICT_TF2_FTS flag during FTS index drop operation.
Now when we drop old database, it leads to an assert.
Fix:
During renaming of FTS auxiliary tables, ORed a condition to check if
table has DICT_TF2_FTS_HAS_DOC_ID flag set.
RB: 18769
Reviewed by : Jimmy.Yang@oracle.com
Problem:
=======
Multiple insert statement in table contains FULLTEXT KEY and a
FTS_DOC_ID column aborts the server if the FTS_DOC_ID exceeds
FTS_DOC_ID_MAX_STEP.
Solution:
========
Remove the exception for first committed insert statement.
Reviewed-by: Jimmy Yang<jimmy.yang@oracle.com>
RB: 18023
When Oracle fixed MDEV-13899 in their own way, they moved the
condition to the only caller of PageConverter::update_records().
Thus, the merge of 5.6.40 into MariaDB added a redundant condition.
PageConverter::update_records(): Move the page_is_leaf() condition
to the only caller, PageConverter::update_index_page().
The remote users need the SUPER privilege because by default Spider sends a
'SET SQL_LOG_OFF' statement to the data nodes. This is controlled by the
spider_internal_sql_log_off configuration setting on the Spider node, which
can only be set to 0 or 1, with a default value of 1.
I have fixed the problem by changing this configuration setting so that if it
is NOT SET, which is the most likely case, the Spider node DOES NOT SEND the
'SET SQL_LOG_OFF' statement to the data nodes. However if the
spider_internal_sql_log_off setting IS EXPLICITLY SET to either 0 or 1, then
the Spider node DOES SEND the 'SET SQL_LOG_OFF' statement, requiring a remote
user with the SUPER privilege. The Spider documentation will be updated to
reflect this change.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit 72f0efa on branch bb-10.3-MDEV-15697
Problem:
=======
InnoDB master thread encounters the shutdown state as SRV_SHUTDOWN_FLUSH_PHASE
when innodb_force_recovery >=2 and slow scheduling of master thread during
shutdown.
Fix:
====
There is no need for master thread itself if innodb_force_recovery >=2.
Don't create the master thread if innodb_force_recovery >= 2
The problem is hard to repeat, and I failed to create a deterministic
test case. Online index creation creates stubs for to-be-created indexes.
If index creation fails, we could remove these stubs while locks exist
in the indexes. (This would require that the index creation was completed,
and a concurrent DML operation acquired a lock on a record in the
uncommitted index. If a duplicate key error occurs in an uncommitted
index, the error will be reported for the CREATE UNIQUE INDEX, not for
the DML operation that tried to insert the duplicate.)
dict_table_try_drop_aborted(), row_merge_drop_indexes(): If transactional
locks exist on the table, keep the table->indexes intact.
ibuf_restore_pos(): Do not issue any messages if the tablespace
is being dropped or truncated. In MariaDB 10.2, TRUNCATE TABLE
would not change the tablespace ID, and therefore the tablespace
would be found, even though TRUNCATE is in progress.
Furthermore, do not commit suicide if restoring the change buffer
cursor fails. The worst that could happen is that a secondary index
becomes corrupted due to incomplete change buffer merge.
The following variables are used in this project, but they are set to NOTFOUND.
LZ4_LIBS
The reason for the failure is that pkg_check_modules will not guarantee
<prefix>_LIBRARY_DIRS variable to be set, according to documentation.
When it's not set, we would force find_library to look in an empty path
and thus fail to correctly find LZ4_LIBS, although pck_check_modules
did previously discover that the library is installed.
To fix the problem and still keep the logic of first following
LIBLZ4_LIBRARY_DIRS and *then* look at other paths, we call find_library
twice. This is the recommended approach, according to CMake 3.11
documentation.
In reverse-ordered column families, if one wants to start reading at the
logical end of the index, they should Seek() to a key value that is not
covered by the index. This may (and typically does) prevent use of a bloom
filter.
The calls to setup_scan_iterator() that are made for index and table scan
didn't take this into account and passed eq_cond_len=INDEX_NUMBER_SIZE.
Fixed them to compute and pass correct eq_cond_len.
Also, removed an incorrect assert in ha_rocksdb::setup_iterator_bounds.
If creating a secondary index fails (typically, ADD UNIQUE INDEX fails
due to duplicate key), it is possible that concurrently running UPDATE
or DELETE will access the index stub and hit the debug assertion.
It does not make any sense to keep updating an uncommitted index whose
creation has failed.
dict_index_t::is_corrupted(): Replaces dict_index_is_corrupted().
Also take online_status into account.
Replace some calls to dict_index_is_clust() with calls to
dict_index_t::is_primary().
ha_innobase::commit_inplace_alter_table(): Defer the freeing of ctx->trx
until after the operation has been successfully committed. In this way,
rollback on a partitioned table will be possible.
rollback_inplace_alter_table(): Handle ctx->new_table == NULL when
ctx->trx != NULL.
If the tablespace is dropped or truncated after the
space->is_stopping() check in fil_crypt_get_page_throttle_func(),
we would proceed to request the page, and eventually report a fatal
error.
buf_page_get_gen(): Do not retry reading if mode==BUF_GET_POSSIBLY_FREED.
lock_rec_block_validate(): Be prepared for a NULL return value when
invoking buf_page_get_gen() with mode=BUF_GET_POSSIBLY_FREED.
Issue:
------
Prefix for externally stored columns were being stored in online_log when a
table is altered and alter causes table to be rebuilt. Space in online_log is
limited and if length of prefix of externally stored columns is very big, then
it is being written to online log without making sure if it fits. This leads to
memory corruption.
Fix:
----
After fix for Bug#16544143, there is no need to store prefixes of externally
stored columnd in online_log. Thus remove the code which stores column prefixes
for externally stored columns. Also, before writing anything on online_log,
make sure it fits to available memory to avoid memory corruption.
Read RB page for more details.
Reviewed-by: Annamalai Gurusami <annamalai.gurusami@oracle.com>
RB: 18239
When a comma separator is missing between COMMENT fields, Spider ignores the
parameter values that are beyond the last expected parameter value. There are
also some error messages that Spider does generate on COMMENT fields that are
incorrectly formed.
I have introduced additional infrastructure in Spider to fix these problems.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit c10da98 on branch bb-10.3-MDEV-15698
When an attempt to connect to the remote server fails, Spider retries to
connect to the remote server 1000 times or until the connection attempt
succeeds. This is perceived as a hang if the remote server remains
unavailable.
I have introduced changes in Spider's table status handler to fix this problem.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit 6ee6933 on branch bb-10.3-MDEV-15712
In the merge of commit e7f4e61f6e
the call fil_flush_file_spaces(FIL_TYPE_LOG) is necessary.
Tablespaces will be flushed as part of the redo log
checkpoint, but the redo log will not necessarily
be flushed, depending on innodb_flush_method.
In commit e7f4e61f6e
the call fil_flush_file_spaces(FIL_LOG) is necessary.
Tablespaces will be flushed as part of the redo log
checkpoint, but the redo log will not necessarily
be flushed, depending on innodb_flush_method.
InnoDB takes a lot of time to perform null updates. The reason is that
even though an empty update vector was created, InnoDB will go on to
write undo log records and update the system columns
DB_TRX_ID and DB_ROLL_PTR in the clustered index, and of course write
redo log for all this.
This could have been fixed properly in
commit 54a492ecac more than 10 years ago.
Remove the local variable srv_buf_pool_size_org, which was always 0.
In MySQL 5.7, InnoDB was made a mandatory storage engine, which would
force InnoDB to start up when executing
mysqld --verbose --help
which is what mysql-test-run.pl is doing as a first step. With a
large innodb_buffer_pool_size, this would take a long time.
So, MySQL 5.7 includes a hack that starts up InnoDB with a smaller
buffer pool when the option --verbose is present.
MariaDB uses HAVE_LZO, not HAVE_LZO1X (which was never defined).
Also, the variable srv_lzo_disabled was never defined or read
(only declared and assigned to, in unreachable code).
The InnoDB system table column SYS_TABLES.MIX_LEN was repurposed
in InnoDB Plugin for MySQL 5.1, in
commit 91111174ee (MySQL 5.1.46).
Until MySQL 5.6, it only contained a flag DICT_TF2_TEMPORARY.
MySQL 5.6 introduced a number of flags that were transient
in nature. One of these was introduced in 5.6.5, originally
called DICT_TF2_USE_TABLESPACE and later renamed to
DICT_TF2_USE_FILE_PER_TABLE. MySQL 5.7.6 introduced logic
that insists that the flag be set for any table that does not
reside in a shared tablespace, breaking upgrade from MySQL 5.5.
MariaDB does not support shared tablespaces other than the
InnoDB system tablespace. Also, some dependencies on
SYS_TABLES.MIX_LEN were removed in an earlier fix:
MDEV-13084 MariaDB 10.2 crashes on corrupted SYS_TABLES.MIX_LEN field
(commit e813fe8622).
dict_check_sys_tables(): Remove a bogus debug assertion, and add a
comment that explains how DICT_TF2_USE_FILE_PER_TABLE is used.
dict_table_is_file_per_table(): Remove a bogus debug assertion.
Pool::mem_free(): Poison the freed memory. Assert that it was
fully initialized, because the reuse of trx_t objects will
assume that the objects were previously initialized.
Pool::~Pool(), Pool::get(): Unpoison the allocated memory,
and mark it initialized.
trx_free(): After invoking Pool::mem_free(), unpoison
trx_t::mutex and trx_t::undo_mutex, because MutexMonitor
will access these even for freed trx_t objects.
These test can sporadically show mutex deadlock warnings between LOCK_wsrep_thd
and LOCK_thd_data mutexes. This means that these mutexes can be locked in opposite
order by different threads, and thus result in deadlock situation.
To fix such issue, the locking policy of these mutexes should be revised and
enforced to be uniform. However, a quick code review shows that the number of
lock/unlock operations for these mutexes combined is between 100-200, and all these
mutex invocations should be checked/fixed.
On the other hand, it turns out that LOCK_wsrep_thd is used for protecting access to
wsrep variables of THD (wsrep_conflict_state, wsrep_query_state), whereas LOCK_thd_data
protects query, db and mysys_var variables in THD. Extending LOCK_thd_data to protect
also wsrep variables looks like a viable solution, as there should not be a use case
where separate threads need simultaneous access to wsrep variables and THD data variables.
In this commit LOCK_wsrep_thd mutex is refactored to be replaced by LOCK_thd_data.
By bluntly replacing LOCK_wsrep_thd by LOCK_thd_data, will result in double locking
of LOCK_thd_data, and some adjustements have been performed to fix such situations.
Modern compilers (such as GCC 8) emit warnings that the
'register' keyword is deprecated and not valid C++17.
Let us remove most use of the 'register' keyword.
Code in 'extra/' is not touched.
dict_load_table_low(): When flagging an error, assign *table = NULL.
Failure to do so could cause a crash if an error was flagged when
accessing INFORMATION_SCHEMA.INNODB_SYS_TABLES.
While the test case crashes a MariaDB 10.2 debug build only,
let us apply the fix to the earliest applicable MariaDB series (10.0)
to avoid any data corruption on a table-rebuilding ALTER TABLE
using ALGORITHM=INPLACE.
innobase_create_key_defs(): Use altered_table->s->primary_key
when a new primary key is being created.
Problem:
=======
InnoDB cleans all temporary undo logs during commit. During rollback
of secondary index entry, InnoDB tries to build the previous version
of clustered index. It leads to access of freed undo page during
previous transaction commit and it leads to undo log corruption.
Solution:
=========
During rollback, temporary undo logs should not try to build
the previous version of the record.
disable online alter add primary key for innodb, if the
table is opened/locked more than once in the current connection
(see assert in ha_innobase::add_index())
"The Store binlog position inside RocksDB" feature is only needed for
obtaining binlog position after having restored a MyRocks backup.
This is not yet supported in MariaDB, so properly disable it in both
places where it is done.
MyRocks internally will print non-critical messages to
sql_print_verbose_info() which will do what InnoDB does in similar cases:
check if (global_system_variables.log_warnings > 2).
slave node killed himself.
Problem:- If we try to delete table with foreign key and table whom it is
referring with wsrep_slave_threads>1 then galera tries to execute both
Delete_rows_log-event in parallel, which should not happen.
Solution:- This is happening because we do not have foreign key info in
write set. Upto version 10.2.7 it used to work fine. Actually it happening
because of issue in commit 2f342c4. wsrep_must_process_fk has changed to
make it similar to original condition.
rocksdb_aux_lib.lib(rdb_sst_info.obj) : error LNK2001: unresolved
external symbol PSI_server
[D:\winx64-packages\build\storage\rocksdb\rocksdb.vcxproj]
rocksdb_aux_lib is an utility library used by standalone tools. These
won't have PSI_server.
FIxed by moving rdb_sst_info.* out of the library (as they do not seem
to be used by these standalone tools)
This bug happened due to a defect of the implementation of the handler
function ha_delete_all_rows() for the ARIA engine.
The function maria_delete_all_rows() truncated the table, but it didn't
touch the write cache, so the cache's write offset was not reset.
In the scenario like in the function st_select_lex_unit::exec_recursive
when first all records were deleted from the table and then several new
records were added some metadata became inconsistent with the state of
the cache. As a result the table scan function could not read records
at the end of the table.
The same defect could be found in the implementation of ha_delete_all_rows()
for the MYISAM engine mi_delete_all_rows().
Additionally made late instantiation for the temporary table used to store
rows that were used for each new iteration when executing a recursive CTE.
fil_crypt_rotate_pages
If tablespace is marked as stopping stop also page rotation
fil_crypt_flush_space
If tablespace is marked as stopping do not try to read
page 0 and write it back.
if volume can't be opened due to permissions, or
IOCTL_STORAGE_QUERY_PROPERTY fails with not implemented, do not report it.
Those errors happen, there is nothing user can do.
This patch amends fix for MDEV-12948.
Cherry-pick this fix from the upstream:
commit 6ddedd8f1e0ddcbc24e8f9a005636c5463799ab7
Author: Sergei Petrunia <psergey@askmonty.org>
Date: Tue Apr 10 11:43:01 2018 -0700
[mysql-5.6][PR] Issue #802: MyRocks: Statement rollback doesnt work correctly for nesâ¦
Summary:
â¦ted statements
Variant #1: When the statement fails, we should roll back to the latest
savepoint taken at the top level.
Closes https://github.com/facebook/mysql-5.6/pull/804
Differential Revision: D7509380
Pulled By: hermanlee
fbshipit-source-id: 9a6f414
In the upstream include/search_pattern_in_file.inc prints nothing
when it has found the searched string (if it hasn't, it produces an error)
In MariaDB, it prints "FOUND n ..."
The error occurs because of how the character set and collation are chosen for
stored procedure parameters that have a character data type. If the character
set and collation are not explicitly stated in the declaration, the server
chooses the database character set and collation in effect at routine creation
time.
To fix the problem, I added explicit character set and collation attributes
for the stored procedure parameters in the install_spider.sql script.
Author:
Jacob Mathew.
Reviewer:
Kentoku Shiba.
Cherry-Picked:
Commit ff0bf451db on bb-10.3-MDEV-15692
Commit being reverted is:
commit 689168be12
Author: Vladislav Vaintroub <wlad@mariadb.com>
Date: Thu Nov 16 18:57:18 2017 +0000
MDEV-13852 - redefine WinWriteableFile such as IsSyncThreadSafe()
is set to true, as it should.
Copy and modify original io_win.h header file to a different location
(as we cannot patch anything in submodule). Make sure modified header is
used.
Problem was that if tablespace was encrypted we try to copy
also page 0 from read buffer to write buffer that are in
that case the same memory area.
fil_iterate
When tablespace is encrypted or compressed its
first page (i.e. page 0) is not encrypted or
compressed and there is no need to copy buffer.
Problem was that if tablespace was encrypted we try to copy
also page 0 from read buffer to write buffer that are in
that case the same memory area.
fil_iterate
When tablespace is encrypted or compressed its
first page (i.e. page 0) is not encrypted or
compressed and there is no need to copy buffer.
If innodb_fast_shutdown<2, all transactions of active connections
will be rolled back on shutdown. This can take a long time, and
the systemd shutdown timeout should be extended during the wait.
logs_empty_and_mark_files_at_shutdown(): Extend the timeout when
waiting for other threads to complete.
This reverts commit 76ec37f522.
This behaviour change will be done separately in:
MDEV-15832 With innodb_fast_shutdown=3, skip the rollback
of connected transactions
In async IO completion code, after reading a page,Innodb can wait for
completion of other bufferpool reads.
This is for example what happens if change-buffering is active.
Innodb on Windows could deadlock, as it did not have dedicated threads
for processing change buffer asynchronous reads.
The fix for that is to have windows now has the same background threads,
including dedicated thread for ibuf, and log AIOs.
The ibuf/read completions are now dispatched to their threads with
PostQueuedCompletionStatus(), the write and log completions are processed
in thread where they arrive.
fil_crypt_read_crypt_data(): Do not attempt to read the tablespace
if the file is unaccessible due to a pending DDL operation, such as
renaming the file or DROP TABLE or TRUNCATE TABLE. This is only
reducing the probability of the race condition, not completely
preventing it.
When "FLUSH TABLE ... FOR EXPORT" fails, the SQL layer should rollback
the statement. Otherwise we hit an assert when we try to close the
tables while having a non-empty list of statement transaction participants.
Problem was that key rotation from encrypted to unecrypted was skipped
when encryption is disabled (i.e. set global innodb-encrypt-tables=OFF).
fil_crypt_needs_rotation
If encryption is disabled (i.e. innodb-encrypt-tables=off)
and there is tablespaces using default encryption (e.g.
system tablespace) that are still encrypted state we need
to rotate them from encrypted state to unencrypted state.
Boost includes sys/param.h on FreeBSD, which in turn defines setbit()
macro. This macro is conflicting with open_query::judy_bitset::setbit().
Reordered includes such that oqgraph_judy.h never sees this macro.
Also removed duplicate includes of graphcore-config.h, which is included
by graphcore-graph.h/oqgraph_shim.h/oqgraph_thunk.h.
buf_flush_remove(): Disable the output for now, because we
certainly do not want this after every page flush on shutdown.
It must be rate-limited somehow. There already is a timeout
extension for waiting the page cleaner to exit in
logs_empty_and_mark_files_at_shutdown().
log_write_up_to(): Use correct format.
srv_purge_should_exit(): Move the timeout extension to the
appropriate place, from one of the callers.
Use systemd EXTEND_TIMEOUT_USEC to advise systemd of progress
Move towards progress measures rather than pure time based measures.
Progress reporting at numberious shutdown/startup locations incuding:
* For innodb_fast_shutdown=0 trx_roll_must_shutdown() for rolling back incomplete transactions.
* For merging the change buffer (in srv_shutdown(bool ibuf_merge))
* For purging history, srv_do_purge
Thanks Marko for feedback and suggestions.
Suggested by Marko on github pr #576
buf_all_freed only needs to be called once, not 3 times.
buf_all_freed will always return TRUE if it returns.
It will crash if any page was not flushed so its effectively
an assert anyway.
The following calls are likely redundant and could be removed:
fil_flush_file_spaces(FIL_TYPE_TABLESPACE);
fil_flush_file_spaces(FIL_TYPE_LOG);
row_undo_step(): If fast shutdown has been requested, abort the
rollback of any non-DDL transactions. Starting with MDEV-12323,
we aborted the rollback of recovered transactions. These
transactions would be rolled back on subsequent server startup.
trx_roll_report_progress(): Renamed from trx_roll_must_shutdown(),
now that the shutdown check has been moved to the only caller.
- compile_flags already include from top CMakeLists.txt
- MY_CHECK_CXX_COMPILER_FLAG() accepts only one parameter
- output variable of MY_CHECK_CXX_COMPILER_FLAG() is have_CXX__Wa__nH
- same check for mariabackup
Based on contribution by satanson (PR#466).
dict_foreign_qualify_index(): Avoid a redundant and harmful
computation of col_name of a virtual column. This fixes the
assertion failure.
dict_foreign_push_index_error(): Do not call dict_table_get_col_name()
on a virtual column. (It is unclear if this condition is actually
reachable.)
It does not hurt to delete non-existing records from SYS_TABLESPACES
and SYS_DATAFILES. Because MariaDB does not support CREATE TABLESPACE,
only the system tablespace (space_id=0) can contain multiple tables.
But, there are no entries for the system tablespace in these tables
(which actually are stored inside the system tablespace).
MariaDB differs from the upstream for "DDL-like" command. For these,
it sets binlog_format=STATEMENT for the duration of the statement.
This doesn't play well with MyRocks, which tries to prevent DML
commands with binlog_format!=ROW.
Also, if Locked_tables_list::reopen_tables() returned an error, then
close_cached_tables should propagate the error condition and not silently
consume it (it's difficult to have test coverage for this because this
error condition is rare)
The purpose of the InnoDB buffer pool dump is to allow InnoDB to be
restarted with the same persistent data pages in the buffer pool.
The InnoDB temporary tablespace that was introduced in MariaDB 10.2.2
is always reinitialized on restart. Therefore, it does not make sense
to attempt to dump or restore any pages of the temporary tablespace.
ha_innobase::check_if_supported_inplace_alter(): Only check for
high_level_read_only. Do not unnecessarily refuse
ALTER TABLE...ALGORITHM=INPLACE if innodb_force_recovery was
specified as 1, 2, or 3.
innobase_start_or_create_for_mysql(): Block all writes from SQL
if the system tablespace was initialized with 'newraw'.
- Disallow loading of MyRocks (or any auxilary) plugins after it has been
unloaded.
- Do it carefully - Plugin's system variables may be accesssed (e.g. default
value is set) after the first rocksdb_done_func() call but before
the secon rocksdb_init_func() call.