recv_reset_logs(): Initialize the redo log buffer, so that no data
from the old redo log can be written to the new redo log.
This bug has very little impact before MariaDB 10.2. The
innodb_log_encrypt option that was introduced in MariaDB 10.1
increases the impact. If the redo log used to be encrypted, and
it is being resized and encryption disabled, then previously
encrypted data could end up being written to the new redo log
in clear text. This resulted in encryption.innodb_encrypt_log
test failures in MariaDB 10.2.
row_merge_read_clustered_index(): The row->fields[] could point
to a record in the clustered index page of the source table, or
to an old version of the record that was constructed in row_heap.
If the row->fields[] points to the clustered index page, then
we were modifying buffer pool data without holding appropriate
block->lock and without appropriate redo logging. The intention
was to modify a copy of the data, not the source file page,
because concurrent readers would still very much need the original
values of the DB_TRX_ID,DB_ROLL_PTR for their multi-versioning.
Either way, it is simplest to not write anything at all, and to
make row->fields[] point to the constant reset_trx_id.
buf_page_print(): Remove the parameter 'flags',
and when a server abort is intended, perform that in the caller.
In this way, page corruption reports due to different reasons
can be distinguished better.
This is non-functional code refactoring that does not fix any
page corruption issues. The change is only made to avoid falsely
grouping together unrelated causes of page corruption.
This is a backport of the following:
MDEV-13009 10.1.24 does not compile on architectures without 64-bit atomics
Add a missing #include "sync0types.h" that was removed in MDEV-12674.
logs_empty_and_mark_files_at_shutdown(): Actually skip the debug assertion
when the buf_resize_thread is active. The previous fix skipped the
debug assertion failure when buf_dump_thread is active. Both these
threads are created also in innodb_read_only mode. Depending on how
fast these threads react to the shutdown signal, the debug assertion
could be triggered.
There is no impact on non-debug servers, and very little impact on
debug servers either, because in innodb_read_only shutdown, no InnoDB
files will need to be written.
innobase_rec_reset(): Remove. This function was introduced in the
InnoDB Plugin for MySQL 5.1, which later evolved into MySQL 5.5.
There used to be a bug that ADD UNIQUE INDEX would not always correctly
report the duplicate key value of the secondary index. This function
ensured that instead of reporting total garbage values, InnoDB
would report NULL.
It looks like the function was made unnecessary in MySQL 5.6.6 by
d143097eb1
The corresponding test was subsequently adjusted in
fde80cf49d
The ALTER TABLE tests were imported to MariaDB as part of MDEV-13625,
and these tests do pass with this change.
The unnecessary function did not do any harm before MDEV-11371 introduced
compressed columns.
One question remains: What if we needed to report a duplicate key value
for a compressed column? The simple answer is that the test
main.column_compression demonstrates that no indexes can be defined
on compressed columns.
After MDEV-12288 and MDEV-13536, the DB_TRX_ID of old clustered index
records for which no history is available should be reset to 0.
This caused crashes in online table-rebuilding ALTER, because the
row_log_table_apply() is built on the assumption that the PRIMARY KEY
together with DB_TRX_ID,DB_ROLL_PTR identifies the record.
Both when copying the old table and when writing log about changes to
the old table, we must map "old" DB_TRX_ID to 0. "old" here is simply
"older than the trx_id of the ALTER TABLE transaction", because
the MDL_EXCLUSIVE (and exclusive InnoDB table lock) in
ha_innobase::prepare_inplace_alter_table() forces any transactions
accessing the table to commit or rollback. So, we know that we can
safely reset any DB_TRX_ID in the table that is older than the
transaction ID of the ALTER TABLE, because the undo log history would be
lost in a table-rebuilding ALTER.
Note: After a table-rebuilding online ALTER TABLE, the rebuilt table
may end up containing some nonzero DB_TRX_ID columns. The apply logic
identifies the rows by the combination of PRIMARY KEY and DB_TRX_ID.
These nonzero DB_TRX_ID would necessarily refer to concurrent DML
operations that were started during ha_innobase::inplace_alter_table().
row_log_allocate(): Add a parameter for the ALTER TABLE transaction.
row_log_t::min_trx: The ALTER TABLE transaction ID.
trx_id_check(): A debug function to check that DB_TRX_ID makes sense
(is either 0 or bigger than the ALTER TABLE transaction ID).
reset_trx_id[]: The reset DB_TRX_ID,DB_ROLL_PTR columns.
row_log_table_delete(), row_log_table_get_pk(): Reset the
DB_TRX_ID,DB_ROLL_PTR when they precede the ALTER TABLE transaction.
row_log_table_apply_delete(), row_log_table_apply_update():
Assert trx_id_check().
row_merge_insert_index_tuples(): Remove the unused parameter trx_id.
row_merge_read_clustered_index(): In a table-rebuilding ALTER,
reset the DB_TRX_ID,DB_ROLL_PTR when they precede the ALTER TABLE
transaction. Assert trx_id_check() on clustered index records that
are being buffered.
Storage engine independent support for column compression.
TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT,
VARCHAR and VARBINARY columns can be compressed.
New COMPRESSED column attribute added:
COMPRESSED[=<compression_method>]
System variables added:
column_compression_threshold
column_compression_zlib_level
column_compression_zlib_strategy
column_compression_zlib_wrap
Status variables added:
Column_compressions
Column_decompressions
Limitations:
- the only supported method currently is zlib
- CSV storage engine stores data uncompressed on-disk even if COMPRESSED
attribute is present
- it is not possible to create indexes over compressed columns.
Background thread is doing ibuf merge, in buf0rea.cc buf_read_ibuf_merge_pages().
It first tries to get page_size and if space is not found it deletes them, but
as we do not hold any mutexes, space can be marked as stopped between that
and buf_read_page_low() for same space. This naturally leads seen error
message on log.
buf_read_page_low(): Add parameter ignore_missing_space = false that
is passed to fil_io()
buf_read_ibuf_merge_pages(): call buf_read_page_low with
ignore_missing_space = true, this function will handle missing
space error code after buf_read_page_low returns.
fil_io(): if ignore_missing_space = true do not print error
message about trying to do I/0 for missing space, just return
correct error code that is handled later.
There is a race condition in InnoDB startup. A number of
fil_crypt_thread are created by fil_crypt_threads_init(). These threads
may call btr_scrub_complete_space() before btr_scrub_init() was called.
Those too early calls would be accessing an uninitialized scrub_stat_mutex.
innobase_start_or_create_for_mysql(): Invoke btr_scrub_init() before
fil_crypt_threads_init().
fil_crypt_complete_rotate_space(): Only invoke btr_scrub_complete_space()
if scrubbing is enabled. There is no need to update the statistics if
it is not enabled.
ATTRIBUTE_NORETURN is supported on all platforms (MSVS and GCC-like).
It declares that a function will not return; instead, the thread or
the whole process will terminate.
ATTRIBUTE_COLD is supported starting with GCC 4.3. It declares that
a function is supposed to be executed rarely. Rarely used error-handling
functions and functions that emit messages to the error log should be
tagged such.
For running the Galera tests, the variable my_disable_leak_check
was set to true in order to avoid assertions due to memory leaks
at shutdown.
Some adjustments due to MDEV-13625 (merge InnoDB tests from MySQL 5.6)
were performed. The most notable behaviour changes from 10.0 and 10.1
are the following:
* innodb.innodb-table-online: adjustments for the DROP COLUMN
behaviour change (MDEV-11114, MDEV-13613)
* innodb.innodb-index-online-fk: the removal of a (1,NULL) record
from the result; originally removed in MySQL 5.7 in the
Oracle Bug #16244691 fix
377774689b
* innodb.create-index-debug: disabled due to MDEV-13680
(the MySQL Bug #77497 fix was not merged from 5.6 to 5.7.10)
* innodb.innodb-alter-autoinc: MariaDB 10.2 behaves like MySQL 5.6/5.7,
while MariaDB 10.0 and 10.1 assign different values when
auto_increment_increment or auto_increment_offset are used.
Also MySQL 5.6/5.7 exhibit different behaviour between
LGORITHM=INPLACE and ALGORITHM=COPY, so something needs to be tested
and fixed in both MariaDB 10.0 and 10.2.
* innodb.innodb-wl5980-alter: disabled because it would trigger an
InnoDB assertion failure (MDEV-13668 may need additional effort in 10.2)
Problem was incorrect definition of wsrep_recovery,
trx_sys_update_wsrep_checkpoint and
trx_sys_read_wsrep_checkpoint functions causing
innodb_plugin not to load as there was undefined symbols.
Fixes also MDEV-13488: InnoDB writes CRYPT_INFO even though
encryption is not enabled.
Fixes also MDEV-13093: Leak of Datafile::m_crypt_info on
shutdown after failed startup.
Problem was that we created encryption metadata (crypt_data) for
system tablespace even when no encryption was enabled and too early.
System tablespace can be encrypted only using key rotation.
Test innodb-key-rotation-disable, innodb_encryption, innodb_lotoftables
require adjustment because INFORMATION_SCHEMA INNODB_TABLESPACES_ENCRYPTION
contain row only if tablespace really has encryption metadata.
xb_load_single_table_tablespace(): Do not call
fil_space_destroy_crypt_data() any more, because Datafile::m_crypt_data
has been removed.
fil_crypt_realloc_iops(): Avoid divide by zero.
fil_crypt_set_thread_cnt(): Set fil_crypt_threads_event if
encryption threads exist. This is required to find tablespaces
requiring key rotation if no other changes happen.
fil_crypt_find_space_to_rotate(): Decrease the amount of time waiting
when nothing happens to better enable key rotation on startup.
fil_ibd_open(), fil_ibd_load(): Load possible crypt_data from first
page.
class Datafile, class SysTablespace : remove m_crypt_info field.
Datafile::get_first_page(): Return a pointer to first page buffer.
fsp_header_init(): Write encryption metadata to page 0 only if
tablespace is encrypted or encryption is disabled by table option.
i_s_dict_fill_tablespaces_encryption(): Skip tablespaces that do not
contain encryption metadata. This is required to avoid too early
wait condition trigger in encrypted -> unencrypted state transfer.
wsrep_drop_table_query(): Remove the definition of this ununsed function.
row_upd_sec_index_entry(), row_upd_clust_rec_by_insert():
Evaluate the simplest conditions first. The merge could have slightly
hurt performance by causing extra calls to wsrep_on().
recv_find_max_checkpoint(): Refer to MariaDB 10.2.2 instead of
MySQL 5.7.9. Do not hint that a binary downgrade might be possible,
because there are many changes in InnoDB 5.7 that could make
downgrade impossible: a column appended to SYS_INDEXES, added
SYS_* tables, undo log format changes, and so on.
Fixes also MDEV-13488: InnoDB writes CRYPT_INFO even though
encryption is not enabled.
Problem was that we created encryption metadata (crypt_data) for
system tablespace even when no encryption was enabled and too early.
System tablespace can be encrypted only using key rotation.
Test innodb-key-rotation-disable, innodb_encryption, innodb_lotoftables
require adjustment because INFORMATION_SCHEMA INNODB_TABLESPACES_ENCRYPTION
contain row only if tablespace really has encryption metadata.
fil_crypt_set_thread_cnt: Send message to background encryption threads
if they exits when they are ready. This is required to find tablespaces
requiring key rotation if no other changes happen.
fil_crypt_find_space_to_rotate: Decrease the amount of time waiting
when nothing happens to better enable key rotation on startup.
fsp_header_init: Write encryption metadata to page 0 only if tablespace is
encrypted or encryption is disabled by table option.
i_s_dict_fill_tablespaces_encryption : Skip tablespaces that do not
contain encryption metadata. This is required to avoid too early
wait condition trigger in encrypted -> unencrypted state transfer.
open_or_create_data_files: Do not create encryption metadata
by default to system tablespace.
Assertions failed due to incorrect handling of the --tc-heuristic-recover
option when InnoDB is in read-only mode either due to innodb_read_only=1
or innodb_force_recovery>3. InnoDB failed to refuse a XA COMMIT or
XA ROLLBACK operation, and there were errors in the error handling in
the upper layer.
This was fixed by making InnoDB XA operations respect the
high_level_read_only flag. The InnoDB part of the fix and
parts of the test main.tc_heuristic_recover were provided
by Marko Mäkelä.
LOCK_log mutex lock/unlock had to be added to fix MDEV-13438.
The measure is confirmed by mysql sources as well.
For testing of the conflicting option combination, mysql-test-run is
made to export a new $MYSQLD_LAST_CMD. It holds the very last value
generated by mtr.mysqld_start(). Even though the options have been
also always stored in $mysqld->{'started_opts'} there were no access
to them beyond the automatic server restart by mtr through the expect
file interface.
Effectively therefore $MYSQLD_LAST_CMD represents a more general
interface to $mysqld->{'started_opts'} which can be used in wider
scopes including server launch with incompatible options.
Notice another existing method to restart the server with incompatible
options relying on $MYSQLD_CMD is is aware of $mysqld->{'started_opts'}
(the actual options that the server is launched by mtr). In order to use
this method they would have to be provided manually.
NOTE: When merging to 10.2, the file search_pattern_in_file++.inc
should be replaced with the pre-existing search_pattern_in_file.inc.
Problem is that page 0 and its possible enrryption information
is not read for undo tablespaces.
fil_crypt_get_latest_key_version(): Do not send event to
encryption threads if event does not yet exists. Seen
on regression testing.
fil_read_first_page: Add new parameter does page belong to
undo tablespace and if it does, we do not read FSP_HEADER.
srv_undo_tablespace_open : Read first page of the tablespace
to get crypt_data if it exists and pass it to fil_space_create.
Tested using innodb_encryption with combinations with
innodb-undo-tablespaces.
The function ibuf_remove_free_page() may be called while the caller
is holding several mutexes or rw-locks. Because of this, this
housekeeping loop may cause performance glitches for operations that
involve tables that are stored in the InnoDB system tablespace.
Also deadlocks might be possible.
The worst impact of all is that due to the mutexes being held, calls to
log_free_check() had to be skipped during this housekeeping.
This means that the cyclic InnoDB redo log may be overwritten.
If the system crashes during this, it would be unable to recover.
The entry point to the problematic code is ibuf_free_excess_pages().
It would make sense to call it before acquiring any mutexes or rw-locks,
in any 'pessimistic' operation that involves the system tablespace.
fseg_create_general(), fseg_alloc_free_page_general(): Do not call
ibuf_free_excess_pages() while potentially holding some latches.
ibuf_remove_free_page(): Do call log_free_check(), like every operation
that is about to generate redo log should do.
ibuf_free_excess_pages(): Remove some assertions that are replaced
by stricter assertions in the log_free_check() that is now called by
ibuf_remove_free_page().
row_mtr_start(): New function, to perform necessary preparations when
starting a mini-transaction for row operations. For pessimistic operations
on secondary indexes that are located in the system tablespace,
this includes calling ibuf_free_excess_pages().
row_undo_ins_remove_sec_low(), row_undo_mod_del_mark_or_remove_sec_low(),
row_undo_mod_del_unmark_sec_and_undo_update(): Call row_mtr_start().
row_ins_sec_index_entry(): Call ibuf_free_excess_pages() if the operation
may involve allocating pages and change buffering in the system tablespace.
row_upd_sec_index_entry(): Slightly refactor the code. The
delete-marking of the old entry is done in-place. It could be
change-buffered, but the old code should be unlikely to have
invoked ibuf_free_excess_pages() in this case.
The function ibuf_remove_free_page() may be called while the caller
is holding several mutexes or rw-locks. Because of this, this
housekeeping loop may cause performance glitches for operations that
involve tables that are stored in the InnoDB system tablespace.
Also deadlocks might be possible.
The worst impact of all is that due to the mutexes being held, calls to
log_free_check() had to be skipped during this housekeeping.
This means that the cyclic InnoDB redo log may be overwritten.
If the system crashes during this, it would be unable to recover.
The entry point to the problematic code is ibuf_free_excess_pages().
It would make sense to call it before acquiring any mutexes or rw-locks,
in any 'pessimistic' operation that involves the system tablespace.
fseg_create_general(), fseg_alloc_free_page_general(): Do not call
ibuf_free_excess_pages() while potentially holding some latches.
ibuf_remove_free_page(): Do call log_free_check(), like every operation
that is about to generate redo log should do.
ibuf_free_excess_pages(): Remove some assertions that are replaced
by stricter assertions in the log_free_check() that is now called by
ibuf_remove_free_page().
row_ins_sec_index_entry(), row_undo_ins_remove_sec_low(),
row_undo_mod_del_mark_or_remove_sec_low(),
row_undo_mod_del_unmark_sec_and_undo_update(): Call
ibuf_free_excess_pages() if the operation may involve allocating pages
and change buffering in the system tablespace.
This bug was a regression caused by MDEV-12698.
On non-leaf pages, the delete-mark flag in the node pointer records is
basically garbage. (Delete-marking only makes sense at the leaf level
anyway. The purpose of the delete-mark is to tell MVCC, locking and purge
that a leaf-level record does not exist in the READ UNCOMMITTED view,
but it used to exist.)
Node pointer records and non-leaf pages are glue that attaches multiple
leaf pages to an index. This glue is supposed to be transparent to the
transactional layer.
When a page is split, InnoDB creates a node pointer record out of the
child page record that the cursor is positioned on. The node pointer record
for the parent page will be a copy of the child page record, amended with
the child page number. If the child page record happened to carry the
delete-mark flag, then the node pointer record would also carry this flag
(even though the flag makes no sense outside child pages).
(On a related note, for the first node pointer record in the first
node pointer page of each tree level, if the MIN_REC_FLAG is set,
the rest of the record contents (except the child page number)
is basically garbage. From this garbage you could deduce at which point
the child was originally split.)
page_scan_method_t: Replace with bool, as there are only 2 values.
dict_stats_scan_page(): Replace the parameter scan_method with is_leaf.
Ignore the bogus (garbage) delete-mark flag if !is_leaf.
If compiling a non DBUG binary with
-DDBUG_ASSERT_AS_PRINTF asserts will be
changed to printf + stack trace (of stack
trace are enabled).
- Changed #ifndef DBUG_OFF to
#ifdef DBUG_ASSERT_EXISTS
for those DBUG_OFF that was just used to enable
assert
- Assert checking that could greatly impact
performance where changed to DBUG_ASSERT_SLOW which
is not affected by DBUG_ASSERT_AS_PRINTF
- Added one extra option to my_print_stacktrace() to
get more silent in case of stack trace printing as
part of assert.
- Added sql/mariadb.h file that should be included first by files in sql
directory, if sql_plugin.h is not used (sql_plugin.h adds SHOW variables
that must be done before my_global.h is included)
- Removed a lot of include my_global.h from include files
- Removed include's of some files that my_global.h automatically includes
- Removed duplicated include's of my_sys.h
- Replaced include my_config.h with my_global.h