Commit graph

244 commits

Author SHA1 Message Date
Marko Mäkelä
68d9d512e9 Merge 10.3 into 10.4 2020-06-05 18:05:22 +03:00
Marko Mäkelä
680463a8d9 Merge 10.2 into 10.3 2020-06-05 16:51:26 +03:00
Thirunarayanan Balathandayuthapani
ad2bf1129c MDEV-22646 Assertion `table2->cached' failed in dict_table_t::add_to_cache
Problem:
========
  During buffer pool resizing, InnoDB recreates the dictionary hash
tables. Dictionary hash table reuses the heap of AHI hash tables.
It leads to memory corruption.

Fix:
====
- While disabling AHI, free the heap and AHI hash tables. Recreate the
AHI hash tables and assign new heap when AHI is enabled.

- btr_blob_free() access invalid page if page was reallocated during
buffer poolresizing. So btr_blob_free() should get the page from
buf_pool instead of using existing block.

- btr_search_enabled and block->index should be checked after
acquiring the btr_search_sys latch

- Moved the buffer_pool_scan debug sync to earlier before accessing the
btr_search_sys latches to avoid the hang of truncate_purge_debug
test case

- srv_printf_innodb_monitor() should acquire btr_search_sys latches
before AHI hash tables.
2020-06-03 16:02:02 +05:30
Marko Mäkelä
8059148154 Merge 10.3 into 10.4 2020-06-03 07:32:09 +03:00
Marko Mäkelä
8300f639a1 Merge 10.2 into 10.3 2020-06-02 10:25:11 +03:00
Marko Mäkelä
83d0e72b34 Cleanup: Remove thr_is_recv(), trx_is_recv()
Compare to trx_roll_crash_recv_trx directly where needed.
2020-06-01 10:23:11 +03:00
Marko Mäkelä
9e6e43551f Merge 10.3 into 10.4
We will expose some more std::atomic internals in Atomic_counter,
so that dict_index_t::lock will support the default assignment operator.
2020-05-16 07:39:15 +03:00
Marko Mäkelä
6a6bcc53b8 Merge 10.2 into 10.3 2020-05-15 17:55:01 +03:00
Marko Mäkelä
ad6171b91c MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB
If the InnoDB buffer pool contains many pages for a table or index
that is being dropped or rebuilt, and if many of such pages are
pointed to by the adaptive hash index, dropping the adaptive hash index
may consume a lot of time.

The time-consuming operation of dropping the adaptive hash index entries
is being executed while the InnoDB data dictionary cache dict_sys is
exclusively locked.

It is not actually necessary to drop all adaptive hash index entries
at the time a table or index is being dropped or rebuilt. We can let
the LRU replacement policy of the buffer pool take care of this gradually.
For this to work, we must detach the dict_table_t and dict_index_t
objects from the main dict_sys cache, and once the last
adaptive hash index entry for the detached table is removed
(when the garbage page is evicted from the buffer pool) we can free
the dict_table_t and dict_index_t object.

Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE
skip both the buffer pool eviction and the drop of the adaptive hash index.
We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE.
We can remove the eviction from DROP TABLE. We must retain the eviction
in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the
discarded table is being re-imported with the same tablespace identifier,
the fresh data from the imported tablespace will replace any stale pages
in the buffer pool.

rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can
no longer be interrupted inside InnoDB.

fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(),
fseg_free_page_low(), fseg_free_extent(): Remove the parameter
that specifies whether the adaptive hash index should be dropped.

btr_search_lazy_free(): Lazily free an index when the last
reference to it is dropped from the adaptive hash index.

buf_pool_clear_hash_index(): Declare static, and move to the
same compilation unit with the bulk of the adaptive hash index
code.

dict_index_t::clone(), dict_index_t::clone_if_needed():
Clone an index that is being rebuilt while adaptive hash index
entries exist. The original index will be inserted into
dict_table_t::freed_indexes and dict_index_t::set_freed()
will be called.

dict_index_t::set_freed(), dict_index_t::freed(): Note that
or check whether the index has been freed. We will use the
impossible page number 1 to denote this condition.

dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count().

dict_index_t::detach_columns(): Move the assignment n_fields=0
to ha_innobase_inplace_ctx::clear_added_indexes().
We must have access to the columns when freeing the
adaptive hash index. Note: dict_table_t::v_cols[] will remain
valid. If virtual columns are dropped or added, the table
definition will be reloaded in ha_innobase::commit_inplace_alter_table().

buf_page_mtr_lock(): Drop a stale adaptive hash index if needed.

We will also reduce the number of btr_get_search_latch() calls
and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT
in order to benefit cmake -DWITH_INNODB_AHI=OFF.
2020-05-15 17:23:08 +03:00
Marko Mäkelä
a12aed0398 Fix GCC 9.3.0 -Wunused-but-set-variable 2020-05-14 13:36:11 +03:00
Marko Mäkelä
38f6c47f8a Merge 10.3 into 10.4 2020-05-13 12:52:57 +03:00
Marko Mäkelä
15fa70b840 Merge 10.2 into 10.3 2020-05-13 11:45:05 +03:00
Marko Mäkelä
ba3d58ad4c MDEV-22523 index->rtr_ssn.mutex is wasting memory
As part of the SPATIAL INDEX implementation in InnoDB,
dict_index_t was expanded by a rtr_ssn_t field. There are only
3 operations for this field, all protected by rtr_ssn_t::mutex:

* btr_cur_search_to_nth_level() stores the least significant 32 bits
of the 64-bit value that is stored in the index root page.
(This would better be done when the table is opened for the
very first time.)
* rtr_get_new_ssn_id() increments the value by 1.
* rtr_get_current_ssn_id() reads the current value.

All these operations can be implemented equally safely by using
atomic memory access operations.
2020-05-11 14:23:37 +03:00
Marko Mäkelä
2c3c851d2c Merge 10.3 into 10.4 2020-05-05 20:33:10 +03:00
Oleksandr Byelkin
7fb73ed143 Merge branch '10.2' into 10.3 2020-05-04 16:47:11 +02:00
Daniel Black
ba2061da52 MDEV-21595: innodb offset_t rename to rec_offs
thanks to:

perl -i -pe 's/\boffset_t\b/rec_offs/g' $(git grep -lw offset_t storage/innobase)
2020-04-29 12:02:47 +03:00
Marko Mäkelä
87a61355e8 Merge 10.3 into 10.4
The MDEV-17062 fix in commit c4195305b2
was omitted.
2020-01-20 15:49:48 +02:00
Marko Mäkelä
6373ec3ec7 Merge 10.2 into 10.3 2020-01-18 16:56:16 +02:00
Marko Mäkelä
457ce97ef2 MDEV-21512 InnoDB may hang due to SPATIAL INDEX
MySQL 5.7.29 includes the following fix:
Bug #30287668 INNODB: A LONG SEMAPHORE WAIT
mysql/mysql-server@5cdbb22b51

There is no test case. It seems that the problem could occur when
a spatial index is large and peculiar enough so that multiple R-tree
leaf pages will have the exactly same maximum bounding rectangle (MBR).

The commit message suggests that the hang can occur when R-tree
non-leaf pages are being merged, which should only be possible
during transaction rollback or the purge of transaction history,
when the R-tree index is at least 2 levels high and very many records
are being deleted. The message says that a comparison result that two
spatial index node pointer records are equal will cause an infinite loop
in rtr_page_copy_rec_list_end_no_locks(). Hence, we must include the
child page number in the comparison to be consistent with
mysql/mysql-server@2e11fe0e15.

We fix this bug in a simpler way, involving fewer code changes.

cmp_rec_rec(): Renamed from cmp_rec_rec_with_match().
Assert that rec2 always resides in an index page.
Treat non-leaf spatial index pages specially.
2020-01-17 14:27:29 +02:00
Marko Mäkelä
c3695b4058 MDEV-21511: Remove unnecessary code
Now that we will be invoking dtuple_get_n_ext() instead of
letting btr_push_update_extern_fields() update an already
calculated value, it is unnecessary to calculate the n_ext
upfront.

row_rec_to_index_entry(), row_rec_to_index_entry_low():
Remove the output parameter n_ext.
2020-01-17 14:27:29 +02:00
Marko Mäkelä
5838b52743 MDEV-21511 Wrong estimate of affected BLOB columns in update
During update, rollback, or MVCC read, we may miscalculate
the number of off-page columns, and thus the size of the
clustered index record. The function btr_push_update_extern_fields()
is mostly redundant, because the off-page columns would also be
moved by row_upd_index_replace_new_col_val(), which is invoked
via row_upd_index_replace_new_col_vals().

btr_push_update_extern_fields(): Remove.

This is based on
mysql/mysql-server@1fa475b85d
which refines a fix for a recovery bug fix
mysql/mysql-server@ce0a1e85e2
in MySQL 5.7.5.

No test case was provided by Oracle.
Some of the changed code is being covered by the existing test
innodb.blob-crash.
2020-01-17 14:27:28 +02:00
Marko Mäkelä
3e38d15585 MDEV-21509 Possible hang during purge of history, or rollback
WL#6326 in MariaDB 10.2.2 introduced a potential hang on purge or rollback
when an index tree is being shrunk by multiple levels.

This fix is based on
mysql/mysql-server@f2c5852630
with the main difference that our version of the test case uses
DEBUG_SYNC instrumentation on ROLLBACK, not on purge.

btr_cur_will_modify_tree(): Simplify the check further.
This is the actual bug fix.

row_undo_mod_remove_clust_low(), row_undo_mod_clust(): Add DEBUG_SYNC
instrumentation for the test case.
2020-01-17 14:27:28 +02:00
Oleksandr Byelkin
9d036f840a Merge branch '10.3' into 10.4 2020-01-03 15:05:50 +01:00
Oleksandr Byelkin
7753a29064 Merge branch '10.2' into 10.3 2020-01-03 13:44:16 +01:00
Marko Mäkelä
4a012ce2f4 Post-fix for MDEV-12253: Remove redundant log writes
The 8 bytes at FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION
will be overwritten at page flush, in buf_page_encrypt_before_write(),
ever since commit 765a43605a
(MariaDB 10.1.23, 10.2.6, 10.3.1).

btr_store_big_rec_extern_fields(): Remove useless writes to the
page header (and to the redo log) for ROW_FORMAT=COMPRESSED BLOB pages.
2019-12-30 18:12:55 +02:00
Marko Mäkelä
02e3006957 MDEV-21405 Assertion failed on instant ADD COLUMN
btr_cur_pessimistic_insert(): Relax a too strict debug assertion that
would fail when the function is invoked by btr_cur_pessimistic_update()
during innobase_add_instant_try(), that is, when updating the hidden
metadata record during a subsequent ADD COLUMN operation involves
splitting the leftmost clustered index leaf page.

This is a partial backport of 301bd62b25
from 10.4.
2019-12-30 10:08:18 +02:00
Marko Mäkelä
8fa759a576 Merge 10.3 into 10.4
We disable the MDEV-21189 test galera.galera_partition
because it times out.
2019-12-13 17:30:37 +02:00
Marko Mäkelä
3466b47b0d Merge 10.2 into 10.3 2019-12-13 10:08:57 +02:00
Eugene Kosov
f0aa073f2b MDEV-20950 Reduce size of record offsets
offset_t: this is a type which represents one record offset.
It's unsigned short int.

a lot of functions: replace ulint with offset_t

btr_pcur_restore_position_func(),
page_validate(),
row_ins_scan_sec_index_for_duplicate(),
row_upd_clust_rec_by_insert_inherit_func(),
row_vers_impl_x_locked_low(),
trx_undo_prev_version_build():
  allocate record offsets on the stack instead of waiting for rec_get_offsets()
  to allocate it from mem_heap_t. So, reducing  memory allocations.

RECORD_OFFSET, INDEX_OFFSET:
  now it's less convenient to store pointers in offset_t*
  array. One pointer occupies now several offset_t. And those constant are start
  indexes into array to places where to store pointer values

REC_OFFS_HEADER_SIZE: adjusted for the new reality

REC_OFFS_NORMAL_SIZE:
  increase size from 100 to 300 which means less heap allocations.
  And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which
  is smaller than previous 800 bytes.

REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality

rem0rec.h, rem0rec.ic, rem0rec.cc:
  various arguments, return values and local variables types were changed to
  fix numerous integer conversions issues.

enum field_type_t:
  offset types concept was introduces which replaces old offset flags stuff.
  Like in earlier version, 2 upper bits are used to store offset type.
  And this enum represents those types.

REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed

get_type(), set_type(), get_value(), combine():
  these are convenience functions to work with offsets and it's types

rec_offs_base()[0]:
  still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL

rec_offs_base()[i]:
  these have type offset_t now. Two upper bits contains type.
2019-12-13 00:26:50 +07:00
Marko Mäkelä
3eda03d0fe MDEV-21148: Assertion index->n_core_fields + n_add >= index->n_fields
Revert part of commit 6cedb671e9
because it turns out to be theoretically impossible to parse a
ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC metadata record where
the variable-length fields in the PRIMARY KEY have been written
as nonempty strings.
2019-11-26 20:46:25 +02:00
Marko Mäkelä
6cedb671e9 MDEV-21088 Table cannot be loaded after instant ADD/DROP COLUMN
btr_cur_instant_init_low(): Accurately parse the metadata record
header for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT. CHAR columns
used to be unnecessarily written as nonempty strings of bytes.
2019-11-20 14:12:53 +08:00
Marko Mäkelä
89ae01fd00 Merge 10.3 into 10.4 2019-11-14 13:23:36 +02:00
Marko Mäkelä
3d4a801533 MDEV-12353 preparation: Replace mtr_x_lock() and friends
Apart from page latches (buf_block_t::lock), mini-transactions
are keeping track of at most one dict_index_t::lock and
fil_space_t::latch at a time, and in a rare case, purge_sys.latch.

Let us introduce interfaces for acquiring an index latch
or a tablespace latch.

In a later version, we may want to introduce mtr_t members
for holding a latched dict_index_t* and fil_space_t*,
and replace the remaining use of mtr_t::m_memo
with std::set<buf_block_t*> or with a map<buf_block_t*,byte*>
pointing to log records.
2019-11-14 11:40:33 +02:00
Marko Mäkelä
3da895a736 Merge 10.3 into 10.4 2019-11-11 15:03:46 +02:00
Marko Mäkelä
4fcfdb60e7 Merge 10.2 into 10.3 2019-11-11 14:56:51 +02:00
Marko Mäkelä
98e1d603bf MDEV-21024: Optimize writing BTR_EXTERN_LEN
btr_store_big_rec_extern_fields(): Remove the redundant initialization
of the most significant 32 bits of BTR_EXTERN_LEN. InnoDB never supported
BLOBs that are longer than 4GiB. In fact, dtuple_convert_big_rec()
would write emit an error message if a clustered index record tuple would
exceed 1,000,000,000 bytes in length.

The BTR_EXTERN_LEN in the BLOB pointers in clustered index leaf page
records is zero-initialized at least since
commit 41bb3537ba
2019-11-11 14:14:26 +02:00
Marko Mäkelä
29d67d051a Cleanup btr_page_get_prev(), btr_page_get_next()
Remove the redundant parameter mtr_t*.

Make use of page_has_prev(), page_has_next() whenever possible.
2019-11-11 13:36:21 +02:00
Marko Mäkelä
8a5eb4141b MDEV-17138 follow-up: Use MLOG_MEMSET for writing FIL_NULL
Always use the MLOG_MEMSET record for writing FIL_NULL,
because it is more compact.
2019-11-08 09:00:10 +02:00
Marko Mäkelä
ec40980ddd Merge 10.3 into 10.4 2019-11-01 15:23:18 +02:00
Marko Mäkelä
0b9cee2cbf Merge 10.2 into 10.3 2019-10-18 09:05:27 +03:00
Marko Mäkelä
fa32d28f2f MDEV-20852 BtrBulk is unnecessarily holding dict_index_t::lock
The BtrBulk class, which was introduced in MySQL 5.7, is by design
the exclusive writer to an index. It is therefore unnecessary to
acquire the dict_index_t::lock in that code.

Holding the dict_index_t::lock would unnecessarily block other threads
(SQL connections and the InnoDB purge threads) from buffering concurrent
modifications to being-created secondary indexes.

This fix is motivated by a change in MySQL 5.7.28:
Bug #29008298 MYSQLD CRASHES ITSELF WHEN CREATING INDEX
mysql/mysql-server@f9fb96c20f

PageBulk::init(), PageBulk::latch(): Never acquire m_index->lock.

PageBulk::storeExt(): Remove some pointer indirection, and improve
a debug assertion that seems to prove that some code is redundant.

BtrBulk::pageCommit(): Assert that m_index->lock is not being held.

btr_blob_log_check_t: Do not acquire m_index->lock if
m_op == BTR_STORE_INSERT_BULK. Add UNIV_UNLIKELY hints around
that condition.

btr_store_big_rec_extern_fields(): Allow index->lock not to be held
while op == BTR_STORE_INSERT_BULK. Add UNIV_UNLIKELY hints around
that condition.
2019-10-17 14:04:07 +03:00
Marko Mäkelä
09afd3da1a Merge 10.3 into 10.4 2019-10-10 21:30:40 +03:00
Marko Mäkelä
7f84e3ad75 Merge 10.2 into 10.3 2019-10-10 20:38:44 +03:00
Marko Mäkelä
6d7a826953 MDEV-20788: Bogus assertion failure for PAGE_FREE list
In MDEV-11369 (instant ADD COLUMN) in MariaDB Server 10.3,
we introduced the hidden metadata record that must be the
first record in the clustered index if and only if
index->is_instant() holds.

To catch MDEV-19783, in
commit ed0793e096 and
commit 99dc40d6ac
we added some assertions to find cases where
the metadata record is missing while it should not be, or a
record exists when it should not. Those assertions were invalid
when traversing the PAGE_FREE list. That list can contain anything;
we must only be able to determine the successor and the size of
each garbage record in it.

page_validate(), page_simple_validate_old(), page_simple_validate_new():
Do not invoke page_rec_get_next_const() for traversing the PAGE_FREE
list, but instead use a lower-level accessor that does not attempt to
validate the REC_INFO_MIN_REC_FLAG.

page_copy_rec_list_end_no_locks(),
page_copy_rec_list_start(), page_delete_rec_list_start():
Add assertions.

btr_page_get_split_rec_to_left(): Remove a redundant return value,
and make the output parameter the return value.

btr_page_get_split_rec_to_right(), btr_page_split_and_insert(): Clean up.
2019-10-10 20:29:30 +03:00
Marko Mäkelä
c11e5cdd12 Merge 10.3 into 10.4 2019-10-10 11:19:25 +03:00
Marko Mäkelä
892378fb9d Merge 10.2 into 10.3 2019-10-09 13:25:11 +03:00
Eugene Kosov
ed0793e096 MDEV-19783: Add more REC_INFO_MIN_REC_FLAG checks
btr_cur_pessimistic_delete(): code changed in a way that allows
to put more REC_INFO_MIN_REC_FLAG assertions inside btr_set_min_rec_mark().
Without that change tests innodb.innodb-table-online,
innodb.temp_table_savepoint and innodb_zip.prefix_index_liftedlimit fail.

Removed basically duplicated page_zip_validate() calls
which fails because of temporary(!) invariant violation.
That fixed innodb_zip.wl5522_debug_zip and
innodb_zip.prefix_index_liftedlimit
2019-10-09 08:29:26 +03:00
Marko Mäkelä
d480d28f4f Add page_has_prev(), page_has_next(), page_has_siblings()
Until now, InnoDB inefficiently compared the aligned fields
FIL_PAGE_PREV, FIL_PAGE_NEXT to the byte-order-agnostic value FIL_NULL.

This is a backport of 32170f8c6d
from MariaDB Server 10.3.
2019-10-09 08:29:26 +03:00
Marko Mäkelä
60c04be659 Merge 10.3 into 10.4 2019-09-12 12:16:40 +03:00
Marko Mäkelä
0fa5ad3acf Merge 10.2 into 10.3 2019-09-11 16:42:01 +03:00