PROBLEM
-------
This warning message is printed when trx_sys->rseg_history_len is greater than some
arbitrary magic number (2000000). By seeing the reproducing scenario where we keep
a read view open and do a lot of transactions on table which increases the hitsory
length it is entirely possible that trx_sys->rseg_history_len can exceed 2000000.
So this is not a bug due to corruption of history length.The warning message was
just added to test some scenario and not removed.
FIX
---
1.Print this warning message only for debug versions.
2.Modified the warning message with more detailed information.
3.Don't crash even in debug versions.
[#rb 17929 Reviewed by jimmy and satya]
* created tests focusing in multi-master conflicts during cascading foreign key
processing
* in row0upd.cc, calling wsrep_row_ups_check_foreign_constraints only when
running in cluster
* in row0ins.cc fixed regression from MW-369, which caused crash with MW-402.test
With a big buffer pool that contains many data pages,
DISCARD TABLESPACE took a long time, because it would scan the
entire buffer pool to remove any pages that belong to the tablespace.
With a large buffer pool, this would take a lot of time, especially
when the table-to-discard is empty.
The minimum amount of work that DISCARD TABLESPACE must do is to
remove the pages of the to-be-discarded table from the
buf_pool->flush_list because any writes to the data file must be
prevented before the file is deleted.
If DISCARD TABLESPACE does not evict the pages from the buffer pool,
then IMPORT TABLESPACE must do it, because we must prevent pre-DISCARD,
not-yet-evicted pages from being mistaken for pages of the imported
tablespace.
It would not be a useful fix to simply move the buffer pool scan to
the IMPORT TABLESPACE step. What we can do is to actively evict those
pages that could be mistaken for imported pages. In this way, when
importing a small table into a big buffer pool, the import should
still run relatively fast.
Import is bypassing the buffer pool when reading pages for the
adjustment phase. In the adjustment phase, if a page exists in
the buffer pool, we could replace it with the page from the imported
file. Unfortunately I did not get this to work properly, so instead
we will simply evict any matching page from the buffer pool.
buf_page_get_gen(): Implement BUF_EVICT_IF_IN_POOL, a new mode
where the requested page will be evicted if it is found. There
must be no unwritten changes for the page.
buf_remove_t: Remove. Instead, use trx!=NULL to signify that a write
to file is desired, and use a separate parameter bool drop_ahi.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Replace buf_remove_t.
buf_LRU_remove_pages(), buf_LRU_remove_all_pages(): Remove.
PageConverter::m_mtr: A dummy mini-transaction buffer
PageConverter::PageConverter(): Complete the member initialization list.
PageConverter::operator()(): Evict any 'shadow' pages from the
buffer pool so that pre-existing (garbage) pages cannot be mistaken
for pages that exist in the being-imported file.
row_discard_tablespace(): Remove a bogus comment that seems to
refer to IMPORT TABLESPACE, not DISCARD TABLESPACE.
ibuf_check_bitmap_on_import(): Only access the pages that
are below FSP_FREE_LIMIT. It is possible that especially with
ROW_FORMAT=COMPRESSED, the FSP_SIZE will be much bigger than
the FSP_FREE_LIMIT, and the bitmap pages (page_size*N, 1+page_size*N)
are filled with zero bytes.
buf_page_is_corrupted(), buf_page_io_complete(): Make the
fault injection compatible with MariaDB 10.2.
Backport the IMPORT tests from 10.2.
On some old GNU/Linux systems, invoking posix_fallocate() with
offset=0 would sometimes cause already allocated bytes in the
data file to be overwritten.
Fix a correctness regression that was introduced in
commit 420798a81a
by invoking posix_fallocate() in a safer way.
A similar change was made in MDEV-5746 earlier.
os_file_get_size(): Avoid changing the state of the file handle,
by invoking fstat() instead of lseek().
os_file_set_size(): Determine the current size of the file
by os_file_get_size(), and then extend the file from that point
onwards.
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.
MariaDB used to ignore any possible return value from posix_fallocate()
ever since innodb_use_fallocate was introduced in MDEV-4338. If EINVAL
was returned, the file would not be extended.
Starting with MDEV-11520, MariaDB would treat EINVAL as a hard error.
Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
When MariaDB 10.1.0 introduced table options for encryption and
compression, it unnecessarily changed
ha_innobase::check_if_supported_inplace_alter() so that ALGORITHM=COPY
is forced when these parameters differ.
A better solution is to move the check to innobase_need_rebuild().
In that way, the ALGORITHM=INPLACE interface (yes, the syntax is
very misleading) can be used for rebuilding the table much more
efficiently, with merge sort, with no undo logging, and allowing
concurrent DML operations.
If InnoDB or XtraDB recovered committed transactions at server
startup, but the processing of recovered transactions was
prevented by innodb_read_only or by innodb_force_recovery,
an assertion would fail at shutdown.
This bug was originally reproduced when Mariabackup executed
InnoDB shutdown after preparing (applying redo log into) a backup.
trx_free_prepared(): Allow TRX_STATE_COMMITTED_IN_MEMORY.
trx_undo_free_prepared(): Allow any undo log state. For transactions
that were resurrected in TRX_STATE_COMMITTED_IN_MEMORY
the undo log state would have been reset by trx_undo_set_state_at_finish().
Replace all references in InnoDB and XtraDB error log messages
to bugs.mysql.com with references to https://jira.mariadb.org/.
The original merge
commit 4274d0bf57
was accidentally reverted by the subsequent merge
commit 3b35d745c3
MariaDB did not import the Oracle Bug #23481444
(commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293) from MySQL.
Initially, the code changes were disabled, and later removed.
Let us remove the last remaining references.
InnoDB was writing unnecessary information to the
update undo log records. Most notably, if an indexed column is updated,
the old value of the column would be logged twice: first as part of
the update vector, and then another time because it is an indexed column.
Because the InnoDB undo log record must fit in a single page,
this would cause unnecessary failure of certain updates.
Even after this fix, InnoDB still seems to be unnecessarily logging
indexed column values for non-updated columns. It seems that non-updated
secondary index columns only need to be logged when a PRIMARY KEY
column is updated. To reduce risk, we are not fixing this remaining flaw
in GA versions.
trx_undo_page_report_modify(): Log updated indexed columns only once.
i_s_sys_tables_fill_table_stats(): Acquire dict_operation_lock
S-latch before acquiring dict_sys->mutex, to prevent the table
from being removed from the data dictionary cache and from
being freed while i_s_dict_fill_sys_tablestats() is accessing
the table handle.
The ownership of the field query->intersection usually transfers
to query->doc_ids. In some error scenario, it could be possible
that fts_query_free() would be invoked with query->intersection!=NULL.
Let us handle that case, instead of intentionally crashing the server.
When MySQL 5.6.10 introduced innodb_read_only mode, it skipped the
creation of the InnoDB buffer pool dump/restore subsystem in that mode.
Attempts to set the variable innodb_buf_pool_dump_now would have
no effect in innodb_read_only mode, but the corresponding condition
was forgotten in from the other two update functions.
MySQL 5.7.20 would fix the innodb_buffer_pool_load_now,
but not innodb_buffer_pool_load_abort. Let us fix both in MariaDB.
fts_get_next_doc_id(): Assign the first and subsequent FTS_DOC_ID
in the same way: by post-incrementing the cached value.
If there is a user-specified FTS_DOC_ID, do not touch the internal
sequence.
There are two bugs related to failed ADD INDEX and
the InnoDB table cache eviction.
dict_table_close(): Try dropping failed ADD INDEX when releasing
the last table handle, not when releasing the last-but-one.
dict_table_remove_from_cache_low(): Do not invoke
row_merge_drop_indexes() after freeing all index metadata.
Instead, directly invoke row_merge_drop_indexes_dict() to
remove the metadata from the persistent data dictionary
and to free the index pages.
Reverted incorrect changes done on MDEV-7367 and MDEV-9469. Fixes properly
also related bugs:
MDEV-13668: InnoDB unnecessarily rebuilds table when renaming a column and adding index
MDEV-9469: 'Incorrect key file' on ALTER TABLE
MDEV-9548: Alter table (renaming and adding index) fails with "Incorrect key file for table"
MDEV-10535: ALTER TABLE causes standalone/wsrep cluster crash
MDEV-13640: ALTER TABLE CHANGE and ADD INDEX on auto_increment column fails with "Incorrect key file for table..."
Root cause for all these bugs is the fact that MariaDB .frm file
can contain virtual columns but InnoDB dictionary does not and
previous fixes were incorrect or unnecessarily forced table
rebuilt. In index creation key_part->fieldnr can be bigger than
number of columns in InnoDB data dictionary. We need to skip not
stored fields when calculating correct column number for InnoDB
data dictionary.
dict_table_get_col_name_for_mysql
Remove
innobase_match_index_columns
Revert incorrect change done on MDEV-7367
innobase_need_rebuild
Remove unnecessary rebuild force when column is renamed.
innobase_create_index_field_def
Calculate InnoDB column number correctly and remove
unnecessary column name set.
innobase_create_index_def, innobase_create_key_defs
Remove unneeded fields parameter. Revert unneeded memset.
prepare_inplace_alter_table_dict
Remove unneeded col_names parameter
index_field_t
Remove unneeded col_name member.
row_merge_create_index
Remove unneeded col_names parameter and resolution.
Effected tests:
innodb-alter-table : Add test case for MDEV-13668
innodb-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
innodb-wl5980-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
fts_create_doc_id(): Remove.
row_mysql_convert_row_to_innobase(): Implement the logic of
fts_create_doc_id(). Reuse a buffer for the hidden FTS_DOC_ID.
row_get_prebuilt_insert_row(): Allocate a buffer for the hidden
FTS_DOC_ID at the end of prebuilt->ins_upd_rec_buff.
In MariaDB Server 10.1, this problem manifests itself only as
a debug assertion failure in page_zip_decompress() when an insert
requires a page to be decompressed.
In MariaDB 10.1, the encryption of InnoDB data files repurposes the
previously unused field FILE_FLUSH_LSN for an encryption key version.
This field was only used in the first page of each file of the system
tablespace. For ROW_FORMAT=COMPRESSED tables, the field was always
written as 0 until encryption was implemented.
There is no bug in the encryption, because the buffer pool blocks will
not be written to files. Instead, copies of the blocks will be encrypted.
In these encrypted copies, the key version field will be updated before
the buffer is written to the file. The field in the buffer pool is
basically garbage that does not really matter.
Already in MariaDB 10.0, the memset() calls to reset this unused field
in buf_flush_update_zip_checksum() and buf_flush_write_block_low()
are unnecessary, because fsp_init_file_page_low() would guarantee that
the field is always 0 in the buffer pool (unless 10.1 encryption is
used).
Removing the unnecessary memset() calls makes page_zip_decompress()
happy and will prevent a SPATIAL INDEX corruption bug in
MariaDB Server 10.2. In MySQL 5.7.5, as part of WL#6968, the same
field was repurposed for an R-tree split sequence number (SSN) and
these memset() were removed. (Because of the repurposing, MariaDB
encryption is not available for tables that contain SPATIAL INDEX.)
btr_cur_pessimistic_delete(): Discard a possible record lock also in
the case when the record was the only one in the page. Failure to
do this would corrupt the record lock data structures in a partial
rollback (ROLLBACK TO SAVEPOINT or rolling back a row operation due
to some error, such as a duplicate key in a unique secondary index).