Commit graph

61 commits

Author SHA1 Message Date
Marko Mäkelä
1bd681c8b3 MDEV-25506 (3 of 3): Do not delete .ibd files before commit
This is a complete rewrite of DROP TABLE, also as part of other DDL,
such as ALTER TABLE, CREATE TABLE...SELECT, TRUNCATE TABLE.

The background DROP TABLE queue hack is removed.
If a transaction needs to drop and create a table by the same name
(like TRUNCATE TABLE does), it must first rename the table to an
internal #sql-ib name. No committed version of the data dictionary
will include any #sql-ib tables, because whenever a transaction
renames a table to a #sql-ib name, it will also drop that table.
Either the rename will be rolled back, or the drop will be committed.

Data files will be unlinked after the transaction has been committed
and a FILE_RENAME record has been durably written. The file will
actually be deleted when the detached file handle returned by
fil_delete_tablespace() will be closed, after the latches have been
released. It is possible that a purge of the delete of the SYS_INDEXES
record for the clustered index will execute fil_delete_tablespace()
concurrently with the DDL transaction. In that case, the thread that
arrives later will wait for the other thread to finish.

HTON_TRUNCATE_REQUIRES_EXCLUSIVE_USE: A new handler flag.
ha_innobase::truncate() now requires that all other references to
the table be released in advance. This was implemented by Monty.

ha_innobase::delete_table(): If CREATE TABLE..SELECT is detected,
we will "hijack" the current transaction, drop the table in
the current transaction and commit the current transaction.
This essentially fixes MDEV-21602. There is a FIXME comment about
making the check less failure-prone.

ha_innobase::truncate(), ha_innobase::delete_table():
Implement a fast path for temporary tables. We will no longer allow
temporary tables to use the adaptive hash index.

dict_table_t::mdl_name: The original table name for the purpose of
acquiring MDL in purge, to prevent a race condition between a
DDL transaction that is dropping a table, and purge processing
undo log records of DML that had executed before the DDL operation.
For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the
dict_table_t::mdl_name will differ from dict_table_t::name.

dict_table_t::parse_name(): Use mdl_name instead of name.

dict_table_rename_in_cache(): Update mdl_name.

For the internal FTS_ tables of FULLTEXT INDEX, purge would
acquire MDL on the FTS_ table name, but not on the main table,
and therefore it would be able to run concurrently with a
DDL transaction that is dropping the table. Previously, the
DROP TABLE queue hack prevented a race between purge and DDL.
For now, we introduce purge_sys.stop_FTS() to prevent purge from
opening any table, while a DDL transaction that may drop FTS_
tables is in progress. The function fts_lock_table(), which will
be invoked before the dictionary is locked, will wait for
purge to release any table handles.

trx_t::drop_table_statistics(): Drop statistics for the table.
This replaces dict_stats_drop_index(). We will drop or rename
persistent statistics atomically as part of DDL transactions.
On lock conflict for dropping statistics, we will fail instantly
with DB_LOCK_WAIT_TIMEOUT, because we will be holding the
exclusive data dictionary latch.

trx_t::commit_cleanup(): Separated from trx_t::commit_in_memory().
Relax an assertion around fts_commit() and allow DB_LOCK_WAIT_TIMEOUT
in addition to DB_DUPLICATE_KEY. The call to fts_commit() is
entirely misplaced here and may obviously break the consistency
of transactions that affect FULLTEXT INDEX. It needs to be fixed
separately.

dict_table_t::n_foreign_key_checks_running: Remove (MDEV-21175).
The counter was a work-around for missing meta-data locking (MDL)
on the SQL layer, and not really needed in MariaDB.

ER_TABLE_IN_FK_CHECK: Replaced with ER_UNUSED_28.

HA_ERR_TABLE_IN_FK_CHECK: Remove.

row_ins_check_foreign_constraints(): Do not acquire
dict_sys.latch either. The SQL-layer MDL will protect us.

This was reviewed by Thirunarayanan Balathandayuthapani
and tested by Matthias Leich.
2021-06-09 17:06:07 +03:00
Marko Mäkelä
1669c8890c MDEV-24167 fixup: Improve the PERFORMANCE_SCHEMA instrumentation
Let us try to avoid code bloat for the common case that
performance_schema is disabled at runtime, and use
ATTRIBUTE_NOINLINE member functions for instrumented latch acquisition.

Also, let us distinguish lock waits from non-contended lock requests
by using write_lock,read_lock for the requests that lead to waits,
and try_write_lock,try_read_lock for the wait-free lock acquisitions.
Actual 'try' operations are not being instrumented at all.
2020-12-03 09:55:53 +02:00
Marko Mäkelä
581aebe29f MDEV-24167: Fix -DPLUGIN_PERFSCHEMA=NO and Windows debug builds 2020-11-24 21:01:22 +02:00
Marko Mäkelä
c561f9e6e8 MDEV-24167: Use lightweight srw_lock for btr_search_latch
Many InnoDB rw-locks unnecessarily depend on the complex
InnoDB rw_lock_t implementation that support the SX lock mode
as well as recursive acquisition of X or SX locks.
One of them is the bunch of adaptive hash index search latches,
instrumented as btr_search_latch in PERFORMANCE_SCHEMA.
Let us introduce a simpler lock for those in order to
reduce overhead.

srw_lock: A simple read-write lock that does not support recursion.
On Microsoft Windows, this wraps SRWLOCK, only adding
runtime overhead if PERFORMANCE_SCHEMA is enabled.
On Linux (all architectures), this is implemented with
std::atomic<uint32_t> and the futex system call.
On other platforms, we will wrap mysql_rwlock_t with
zero runtime overhead.

The PERFORMANCE_SCHEMA instrumentation differs
from InnoDB rw_lock_t in that we will only invoke
PSI_RWLOCK_CALL(start_rwlock_wrwait) or
PSI_RWLOCK_CALL(start_rwlock_rdwait)
if there is an actual conflict.
2020-11-24 15:41:03 +02:00
Marko Mäkelä
156cb94b4b MDEV-22871 fixup: Relax a debug assertion
In commit bf3c862faa we introduced
an assertion that may dereference a null pointer.

This regression was caught by running the following:
./mtr --parallel=auto --suite=innodb \
--mysqld=--loose-innodb-adaptive-hash-index

The adaptive hash index is disabled by default since
commit 88cdfc5c7d (MDEV-20487)
and hence the problem was not caught earlier.
2020-11-20 08:49:46 +02:00
Marko Mäkelä
898521e2dd Merge 10.4 into 10.5 2020-10-30 11:15:30 +02:00
Marko Mäkelä
2b6f804490 Merge 10.2 into 10.3 2020-10-28 10:44:40 +02:00
Marko Mäkelä
cc5f4428b8 MDEV-23693 fixup: Remove unused btr_search_t::withdraw_clock 2020-10-28 08:13:06 +02:00
Marko Mäkelä
e341fb0dae MDEV-22871 follow-up fix: AHI corruption & leak
Commit bf3c862faa
accidentally introduced two bugs.

btr_search_update_hash_ref(): Pass the correct parameter part->heap.

btr_search_sys_t::free(): Free all memory.

Thanks to Michael Widenius and Thirunarayanan Balathandayuthapani
for pointing out these bugs.
2020-06-19 12:34:21 +03:00
Marko Mäkelä
bf3c862faa MDEV-22871: Clean up btr_search_sys
btr_search_sys::parts[]: A single structure for the partitions of
the adaptive hash index. Replaces the 3 separate arrays:
btr_search_latches[], btr_search_sys->hash_tables,
btr_search_sys->hash_tables[i]->heap.

hash_table_t::heap, hash_table_t::adaptive: Remove.

ha0ha.cc: Remove. Move all code to btr0sea.cc.
2020-06-18 14:16:01 +03:00
Marko Mäkelä
6877ef9a7c Merge 10.4 into 10.5 2020-06-05 20:36:43 +03:00
Marko Mäkelä
680463a8d9 Merge 10.2 into 10.3 2020-06-05 16:51:26 +03:00
Thirunarayanan Balathandayuthapani
ad2bf1129c MDEV-22646 Assertion `table2->cached' failed in dict_table_t::add_to_cache
Problem:
========
  During buffer pool resizing, InnoDB recreates the dictionary hash
tables. Dictionary hash table reuses the heap of AHI hash tables.
It leads to memory corruption.

Fix:
====
- While disabling AHI, free the heap and AHI hash tables. Recreate the
AHI hash tables and assign new heap when AHI is enabled.

- btr_blob_free() access invalid page if page was reallocated during
buffer poolresizing. So btr_blob_free() should get the page from
buf_pool instead of using existing block.

- btr_search_enabled and block->index should be checked after
acquiring the btr_search_sys latch

- Moved the buffer_pool_scan debug sync to earlier before accessing the
btr_search_sys latches to avoid the hang of truncate_purge_debug
test case

- srv_printf_innodb_monitor() should acquire btr_search_sys latches
before AHI hash tables.
2020-06-03 16:02:02 +05:30
Marko Mäkelä
23047d3ed4 Merge 10.4 into 10.5 2020-05-18 17:30:02 +03:00
Marko Mäkelä
9e6e43551f Merge 10.3 into 10.4
We will expose some more std::atomic internals in Atomic_counter,
so that dict_index_t::lock will support the default assignment operator.
2020-05-16 07:39:15 +03:00
Marko Mäkelä
3d0bb2b7f1 Merge 10.2 into 10.3 2020-05-15 19:11:57 +03:00
Marko Mäkelä
ad6171b91c MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB
If the InnoDB buffer pool contains many pages for a table or index
that is being dropped or rebuilt, and if many of such pages are
pointed to by the adaptive hash index, dropping the adaptive hash index
may consume a lot of time.

The time-consuming operation of dropping the adaptive hash index entries
is being executed while the InnoDB data dictionary cache dict_sys is
exclusively locked.

It is not actually necessary to drop all adaptive hash index entries
at the time a table or index is being dropped or rebuilt. We can let
the LRU replacement policy of the buffer pool take care of this gradually.
For this to work, we must detach the dict_table_t and dict_index_t
objects from the main dict_sys cache, and once the last
adaptive hash index entry for the detached table is removed
(when the garbage page is evicted from the buffer pool) we can free
the dict_table_t and dict_index_t object.

Related to this, in MDEV-16283, we made ALTER TABLE...DISCARD TABLESPACE
skip both the buffer pool eviction and the drop of the adaptive hash index.
We shifted the burden to ALTER TABLE...IMPORT TABLESPACE or DROP TABLE.
We can remove the eviction from DROP TABLE. We must retain the eviction
in the ALTER TABLE...IMPORT TABLESPACE code path, so that in case the
discarded table is being re-imported with the same tablespace identifier,
the fresh data from the imported tablespace will replace any stale pages
in the buffer pool.

rpl.rpl_failed_drop_tbl_binlog: Remove the test. DROP TABLE can
no longer be interrupted inside InnoDB.

fseg_free_page(), fseg_free_step(), fseg_free_step_not_header(),
fseg_free_page_low(), fseg_free_extent(): Remove the parameter
that specifies whether the adaptive hash index should be dropped.

btr_search_lazy_free(): Lazily free an index when the last
reference to it is dropped from the adaptive hash index.

buf_pool_clear_hash_index(): Declare static, and move to the
same compilation unit with the bulk of the adaptive hash index
code.

dict_index_t::clone(), dict_index_t::clone_if_needed():
Clone an index that is being rebuilt while adaptive hash index
entries exist. The original index will be inserted into
dict_table_t::freed_indexes and dict_index_t::set_freed()
will be called.

dict_index_t::set_freed(), dict_index_t::freed(): Note that
or check whether the index has been freed. We will use the
impossible page number 1 to denote this condition.

dict_index_t::n_ahi_pages(): Replaces btr_search_info_get_ref_count().

dict_index_t::detach_columns(): Move the assignment n_fields=0
to ha_innobase_inplace_ctx::clear_added_indexes().
We must have access to the columns when freeing the
adaptive hash index. Note: dict_table_t::v_cols[] will remain
valid. If virtual columns are dropped or added, the table
definition will be reloaded in ha_innobase::commit_inplace_alter_table().

buf_page_mtr_lock(): Drop a stale adaptive hash index if needed.

We will also reduce the number of btr_get_search_latch() calls
and enclose some more code inside #ifdef BTR_CUR_HASH_ADAPT
in order to benefit cmake -DWITH_INNODB_AHI=OFF.
2020-05-15 17:23:08 +03:00
Marko Mäkelä
a786f50de5 MDEV-21962 Allocate buf_pool statically
Thanks to MDEV-15058, there is only one InnoDB buffer pool.
Allocating buf_pool statically removes one level of pointer indirection
and makes code more readable, and removes the awkward initialization of
some buf_pool members.

While doing this, we will also declare some buf_pool_t data members
private and replace some functions with member functions. This is
mostly affecting buffer pool resizing.

This is not aiming to be a complete rewrite of buf_pool_t to
a proper class. Most of the buffer pool interface, such as
buf_page_get_gen(), will remain in the C programming style
for now.

buf_pool_t::withdrawing: Replaces buf_pool_withdrawing.
buf_pool_t::withdraw_clock_: Replaces buf_withdraw_clock.

buf_pool_t::create(): Repalces buf_pool_init().
buf_pool_t::close(): Replaces buf_pool_free().

buf_bool_t::will_be_withdrawn(): Replaces buf_block_will_be_withdrawn(),
buf_frame_will_be_withdrawn().

buf_pool_t::clear_hash_index(): Replaces buf_pool_clear_hash_index().
buf_pool_t::get_n_pages(): Replaces buf_pool_get_n_pages().
buf_pool_t::validate(): Replaces buf_validate().
buf_pool_t::print(): Replaces buf_print().
buf_pool_t::block_from_ahi(): Replaces buf_block_from_ahi().
buf_pool_t::is_block_field(): Replaces buf_pointer_is_block_field().
buf_pool_t::is_block_mutex(): Replaces buf_pool_is_block_mutex().
buf_pool_t::is_block_lock(): Replaces buf_pool_is_block_lock().
buf_pool_t::is_obsolete(): Replaces buf_pool_is_obsolete().
buf_pool_t::io_buf: Make default-constructible.
buf_pool_t::io_buf::create(): Delayed 'constructor'
buf_pool_t::io_buf::close(): Early 'destructor'

HazardPointer: Make default-constructible. Define all member functions
inline, also for derived classes.
2020-03-18 22:32:40 +02:00
Marko Mäkelä
f224525204 MDEV-21907: InnoDB: Enable -Wconversion on clang and GCC
The -Wconversion in GCC seems to be stricter than in clang.
GCC at least since version 4.4.7 issues truncation warnings for
assignments to bitfields, while clang 10 appears to only issue
warnings when the sizes in bytes rounded to the nearest integer
powers of 2 are different.

Before GCC 10.0.0, -Wconversion required more casts and would not
allow some operations, such as x<<=1 or x+=1 on a data type that
is narrower than int.

GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining
about x|=y even when x and y are compatible types that are narrower
than int.  Hence, we must rewrite some x|=y as
x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion.

In GCC 6 and later, the warning for assigning wider to bitfields
that are narrower than 8, 16, or 32 bits can be suppressed by
applying a bitwise & with the exact bitmask of the bitfield.
For older GCC, we must disable -Wconversion for GCC 4 or 5 in such
cases.

The bitwise negation operator appears to promote short integers
to a wider type, and hence we must add explicit truncation casts
around them. Microsoft Visual C does not allow a static_cast to
truncate a constant, such as static_cast<byte>(1) truncating int.
Hence, we will use the constructor-style cast byte(~1) for such cases.

This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0,
clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019)
on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
2020-03-12 19:46:41 +02:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
5fd7502e77 MDEV-19513: Allocate dict_sys statically
dict_sys_t::create(): Renamed from dict_init().

dict_sys_t::close(): Renamed from dict_close().

dict_sys_t::add(): Sliced from dict_table_t::add_to_cache().

dict_sys_t::remove(): Renamed from dict_table_remove_from_cache().

dict_sys_t::prevent_eviction(): Renamed from
dict_table_move_from_lru_to_non_lru().

dict_sys_t::acquire(): Replaces dict_move_to_mru() and some more logic.

dict_sys_t::resize(): Renamed from dict_resize().

dict_sys_t::find(): Replaces dict_lru_find_table() and
dict_non_lru_find_table().
2019-05-17 14:32:53 +03:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
c0ac0b8860 Update FSF address 2019-05-11 19:25:02 +03:00
Marko Mäkelä
0abd2766b1 Merge 10.2 into 10.3
Also, related to MDEV-15522, MDEV-17304, MDEV-17835,
remove the Galera xtrabackup tests, because xtrabackup never worked
with MariaDB Server 10.3 due to InnoDB redo log format changes.
2018-11-30 09:38:56 +02:00
Marko Mäkelä
447e493179 Remove some unnecessary InnoDB #include 2018-11-29 12:53:44 +02:00
Marko Mäkelä
df563e0c03 Merge 10.2 into 10.3
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.

main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
2018-11-06 09:40:39 +02:00
Eugene Kosov
14be814380 MDEV-17491 micro optimize page_id_t
page_id_t: remove m_fold member

various places: pass page_id_t by value instead of by reference
2018-10-25 18:46:27 +03:00
Marko Mäkelä
f418661efa Merge 10.2 into 10.3 2018-07-23 18:56:52 +03:00
Marko Mäkelä
a3539bbb2a Merge 10.2 into 10.3 2018-05-29 17:34:49 +03:00
Marko Mäkelä
18934fb583 Merge 10.1 into 10.2 2018-05-29 16:52:12 +03:00
Marko Mäkelä
6aa50bad39 MDEV-16283 ALTER TABLE...DISCARD TABLESPACE still takes long on a large buffer pool
Also fixes MDEV-14727, MDEV-14491
InnoDB: Error: Waited for 5 secs for hash index ref_count (1) to drop to 0
by replacing the flawed wait logic in dict_index_remove_from_cache_low().

On DISCARD TABLESPACE, there is no need to drop the adaptive hash index.
We must drop it on IMPORT TABLESPACE, and eventually on DROP TABLE or
DROP INDEX. As long as the dict_index_t object remains in the cache
and the table remains inaccessible, the adaptive hash index entries
to orphaned pages would not do any harm. They would be dropped when
buffer pool pages are reused for something else.

btr_search_drop_page_hash_when_freed(), buf_LRU_drop_page_hash_batch():
Remove the parameter zip_size, and pass 0 to buf_page_get_gen().

buf_page_get_gen(): Ignore zip_size if mode==BUF_PEEK_IF_IN_POOL.

buf_LRU_drop_page_hash_for_tablespace(): Drop the adaptive hash index
even if the tablespace is inaccessible.

buf_LRU_drop_page_hash_for_tablespace(): New global function, to drop
the adaptive hash index.

buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Remove the parameter drop_ahi.

dict_index_remove_from_cache_low(): Actively drop the adaptive hash index
if entries exist. This should prevent InnoDB hangs on DROP TABLE or
DROP INDEX.

row_import_for_mysql(): Drop any adaptive hash index entries for the table.

row_drop_table_for_mysql(): Drop any adaptive hash index for the table,
except if the table resides in the system tablespace. (DISCARD TABLESPACE
does not apply to the system tablespace, and we do no want to drop the
adaptive hash index for other tables than the one that is being dropped.)

row_truncate_table_for_mysql(): Drop any adaptive hash index entries for
the table, except if the table resides in the system tablespace.
2018-05-29 14:00:20 +03:00
Marko Mäkelä
715e4f4320 MDEV-12218 Clean up InnoDB parameter validation
Bind more InnoDB parameters directly to MYSQL_SYSVAR and
remove "shadow variables".

innodb_change_buffering: Declare as ENUM, not STRING.

innodb_flush_method: Declare as ENUM, not STRING.

innodb_log_buffer_size: Bind directly to srv_log_buffer_size,
without rounding it to a multiple of innodb_page_size.

LOG_BUFFER_SIZE: Remove.

SysTablespace::normalize_size(): Renamed from normalize().

innodb_init_params(): A new function to initialize and validate
InnoDB startup parameters.

innodb_init(): Renamed from innobase_init(). Invoke innodb_init_params()
before actually trying to start up InnoDB.

srv_start(bool): Renamed from innobase_start_or_create_for_mysql().
Added the input parameter create_new_db.

SRV_ALL_O_DIRECT_FSYNC: Define only for _WIN32.

xb_normalize_init_values(): Merge to innodb_init_param().
2018-04-29 09:41:42 +03:00
Marko Mäkelä
5dd5b6cadc MDEV-12266: Clean up btr_search_drop_page_hash_when_freed()
Remove the parameter page_size, and pass a dummy page size
to buf_page_get_gen() along with BUF_PEEK_IF_IN_POOL.
2018-03-29 20:47:40 +03:00
Vladislav Vaintroub
d995dd2865 Windows : reenable warning C4805 (unsafe mix of types in bool operations) 2018-02-07 20:12:12 +00:00
Marko Mäkelä
0664d633e4 MDEV-14952 Avoid repeated calls to btr_get_search_latch()
btr_cur_search_to_nth_level(), btr_search_guess_on_hash(),
btr_pcur_open_with_no_init_func(), row_sel_open_pcur():
Replace the parameter has_search_latch with the ahi_latch
(passed as NULL if the caller does not hold the latch).

btr_search_update_hash_node_on_insert(),
btr_search_update_hash_on_insert(),
btr_search_build_page_hash_index(): Add the parameter ahi_latch.

btr_search_x_lock(), btr_search_x_unlock(),
btr_search_s_lock(), btr_search_s_unlock(): Remove.
2018-01-15 19:51:09 +02:00
Marko Mäkelä
a4948dafcd MDEV-11369 Instant ADD COLUMN for InnoDB
For InnoDB tables, adding, dropping and reordering columns has
required a rebuild of the table and all its indexes. Since MySQL 5.6
(and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing
concurrent modification of the tables.

This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT
and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously,
with only minor changes performed to the table structure. The counter
innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS
is incremented whenever a table rebuild operation is converted into
an instant ADD COLUMN operation.

ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN.

Some usability limitations will be addressed in subsequent work:

MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY
and ALGORITHM=INSTANT
MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE

The format of the clustered index (PRIMARY KEY) is changed as follows:

(1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT,
and a new field PAGE_INSTANT will contain the original number of fields
in the clustered index ('core' fields).
If instant ADD COLUMN has not been used or the table becomes empty,
or the very first instant ADD COLUMN operation is rolled back,
the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset
to 0 and FIL_PAGE_INDEX.

(2) A special 'default row' record is inserted into the leftmost leaf,
between the page infimum and the first user record. This record is
distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the
same format as records that contain values for the instantly added
columns. This 'default row' always has the same number of fields as
the clustered index according to the table definition. The values of
'core' fields are to be ignored. For other fields, the 'default row'
will contain the default values as they were during the ALTER TABLE
statement. (If the column default values are changed later, those
values will only be stored in the .frm file. The 'default row' will
contain the original evaluated values, which must be the same for
every row.) The 'default row' must be completely hidden from
higher-level access routines. Assertions have been added to ensure
that no 'default row' is ever present in the adaptive hash index
or in locked records. The 'default row' is never delete-marked.

(3) In clustered index leaf page records, the number of fields must
reside between the number of 'core' fields (dict_index_t::n_core_fields
introduced in this work) and dict_index_t::n_fields. If the number
of fields is less than dict_index_t::n_fields, the missing fields
are replaced with the column value of the 'default row'.
Note: The number of fields in the record may shrink if some of the
last instantly added columns are updated to the value that is
in the 'default row'. The function btr_cur_trim() implements this
'compression' on update and rollback; dtuple::trim() implements it
on insert.

(4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new
status value REC_STATUS_COLUMNS_ADDED will indicate the presence of
a new record header that will encode n_fields-n_core_fields-1 in
1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header
always explicitly encodes the number of fields.)

We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for
covering the insert of the 'default row' record when instant ADD COLUMN
is used for the first time. Subsequent instant ADD COLUMN can use
TRX_UNDO_UPD_EXIST_REC.

This is joint work with Vin Chen (陈福荣) from Tencent. The design
that was discussed in April 2017 would not have allowed import or
export of data files, because instead of the 'default row' it would
have introduced a data dictionary table. The test
rpl.rpl_alter_instant is exactly as contributed in pull request #408.
The test innodb.instant_alter is based on a contributed test.

The redo log record format changes for ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPACT are as contributed. (With this change present,
crash recovery from MariaDB 10.3.1 will fail in spectacular ways!)
Also the semantics of higher-level redo log records that modify the
PAGE_INSTANT field is changed. The redo log format version identifier
was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1.

Everything else has been rewritten by me. Thanks to Elena Stepanova,
the code has been tested extensively.

When rolling back an instant ADD COLUMN operation, we must empty the
PAGE_FREE list after deleting or shortening the 'default row' record,
by calling either btr_page_empty() or btr_page_reorganize(). We must
know the size of each entry in the PAGE_FREE list. If rollback left a
freed copy of the 'default row' in the PAGE_FREE list, we would be
unable to determine its size (if it is in ROW_FORMAT=COMPACT or
ROW_FORMAT=DYNAMIC) because it would contain more fields than the
rolled-back definition of the clustered index.

UNIV_SQL_DEFAULT: A new special constant that designates an instantly
added column that is not present in the clustered index record.

len_is_stored(): Check if a length is an actual length. There are
two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL.

dict_col_t::def_val: The 'default row' value of the column.  If the
column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT.

dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(),
instant_value().

dict_col_t::remove_instant(): Remove the 'instant ADD' status of
a column.

dict_col_t::name(const dict_table_t& table): Replaces
dict_table_get_col_name().

dict_index_t::n_core_fields: The original number of fields.
For secondary indexes and if instant ADD COLUMN has not been used,
this will be equal to dict_index_t::n_fields.

dict_index_t::n_core_null_bytes: Number of bytes needed to
represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable).

dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that
n_core_null_bytes was not initialized yet from the clustered index
root page.

dict_index_t: Add the accessors is_instant(), is_clust(),
get_n_nullable(), instant_field_value().

dict_index_t::instant_add_field(): Adjust clustered index metadata
for instant ADD COLUMN.

dict_index_t::remove_instant(): Remove the 'instant ADD' status
of a clustered index when the table becomes empty, or the very first
instant ADD COLUMN operation is rolled back.

dict_table_t: Add the accessors is_instant(), is_temporary(),
supports_instant().

dict_table_t::instant_add_column(): Adjust metadata for
instant ADD COLUMN.

dict_table_t::rollback_instant(): Adjust metadata on the rollback
of instant ADD COLUMN.

prepare_inplace_alter_table_dict(): First create the ctx->new_table,
and only then decide if the table really needs to be rebuilt.
We must split the creation of table or index metadata from the
creation of the dictionary table records and the creation of
the data. In this way, we can transform a table-rebuilding operation
into an instant ADD COLUMN operation. Dictionary objects will only
be added to cache when table rebuilding or index creation is needed.
The ctx->instant_table will never be added to cache.

dict_table_t::add_to_cache(): Modified and renamed from
dict_table_add_to_cache(). Do not modify the table metadata.
Let the callers invoke dict_table_add_system_columns() and if needed,
set can_be_evicted.

dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the
system columns (which will now exist in the dict_table_t object
already at this point).

dict_create_table_step(): Expect the callers to invoke
dict_table_add_system_columns().

pars_create_table(): Before creating the table creation execution
graph, invoke dict_table_add_system_columns().

row_create_table_for_mysql(): Expect all callers to invoke
dict_table_add_system_columns().

create_index_dict(): Replaces row_merge_create_index_graph().

innodb_update_n_cols(): Renamed from innobase_update_n_virtual().
Call my_error() if an error occurs.

btr_cur_instant_init(), btr_cur_instant_init_low(),
btr_cur_instant_root_init():
Load additional metadata from the clustered index and set
dict_index_t::n_core_null_bytes. This is invoked
when table metadata is first loaded into the data dictionary.

dict_boot(): Initialize n_core_null_bytes for the four hard-coded
dictionary tables.

dict_create_index_step(): Initialize n_core_null_bytes. This is
executed as part of CREATE TABLE.

dict_index_build_internal_clust(): Initialize n_core_null_bytes to
NO_CORE_NULL_BYTES if table->supports_instant().

row_create_index_for_mysql(): Initialize n_core_null_bytes for
CREATE TEMPORARY TABLE.

commit_cache_norebuild(): Call the code to rename or enlarge columns
in the cache only if instant ADD COLUMN is not being used.
(Instant ADD COLUMN would copy all column metadata from
instant_table to old_table, including the names and lengths.)

PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields.
This is repurposing the 16-bit field PAGE_DIRECTION, of which only the
least significant 3 bits were used. The original byte containing
PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B.

page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT.

page_ptr_get_direction(), page_get_direction(),
page_ptr_set_direction(): Accessors for PAGE_DIRECTION.

page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION.

page_direction_increment(): Increment PAGE_N_DIRECTION
and set PAGE_DIRECTION.

rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes,
and assume that heap_no is always set.
Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records,
even if the record contains fewer fields.

rec_offs_make_valid(): Add the parameter 'leaf'.

rec_copy_prefix_to_dtuple(): Assert that the tuple is only built
on the core fields. Instant ADD COLUMN only applies to the
clustered index, and we should never build a search key that has
more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR.
All these columns are always present.

dict_index_build_data_tuple(): Remove assertions that would be
duplicated in rec_copy_prefix_to_dtuple().

rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose
number of fields is between n_core_fields and n_fields.

cmp_rec_rec_with_match(): Implement the comparison between two
MIN_REC_FLAG records.

trx_t::in_rollback: Make the field available in non-debug builds.

trx_start_for_ddl_low(): Remove dangerous error-tolerance.
A dictionary transaction must be flagged as such before it has generated
any undo log records. This is because trx_undo_assign_undo() will mark
the transaction as a dictionary transaction in the undo log header
right before the very first undo log record is being written.

btr_index_rec_validate(): Account for instant ADD COLUMN

row_undo_ins_remove_clust_rec(): On the rollback of an insert into
SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the
last column from the table and the clustered index.

row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(),
trx_undo_update_rec_get_update(): Handle the 'default row'
as a special case.

dtuple_t::trim(index): Omit a redundant suffix of an index tuple right
before insert or update. After instant ADD COLUMN, if the last fields
of a clustered index tuple match the 'default row', there is no
need to store them. While trimming the entry, we must hold a page latch,
so that the table cannot be emptied and the 'default row' be deleted.

btr_cur_optimistic_update(), btr_cur_pessimistic_update(),
row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low():
Invoke dtuple_t::trim() if needed.

row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling
row_ins_clust_index_entry_low().

rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number
of fields to be between n_core_fields and n_fields. Do not support
infimum,supremum. They are never supposed to be stored in dtuple_t,
because page creation nowadays uses a lower-level method for initializing
them.

rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the
number of fields.

btr_cur_trim(): In an update, trim the index entry as needed. For the
'default row', handle rollback specially. For user records, omit
fields that match the 'default row'.

btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete():
Skip locking and adaptive hash index for the 'default row'.

row_log_table_apply_convert_mrec(): Replace 'default row' values if needed.
In the temporary file that is applied by row_log_table_apply(),
we must identify whether the records contain the extra header for
instantly added columns. For now, we will allocate an additional byte
for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table
has been subject to instant ADD COLUMN. The ROW_T_DELETE records are
fine, as they will be converted and will only contain 'core' columns
(PRIMARY KEY and some system columns) that are converted from dtuple_t.

rec_get_converted_size_temp(), rec_init_offsets_temp(),
rec_convert_dtuple_to_temp(): Add the parameter 'status'.

REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED:
An info_bits constant for distinguishing the 'default row' record.

rec_comp_status_t: An enum of the status bit values.

rec_leaf_format: An enum that replaces the bool parameter of
rec_init_offsets_comp_ordinary().
2017-10-06 09:50:10 +03:00
Marko Mäkelä
849af74a48 MariaDB adjustments for Oracle Bug#23070734 fix
Split the test case so that a server restart is not needed.
Reduce the test cases and use a simpler mechanism for triggering
and waiting for purge.

fil_table_accessible(): Check if a table can be accessed without
enjoying MDL protection.
2017-04-26 23:03:32 +03:00
Marko Mäkelä
4e1116b2c6 MDEV-12271 Port MySQL 8.0 Bug#23150562 REMOVE UNIV_MUST_NOT_INLINE AND UNIV_NONINL
Also, remove empty .ic files that were not removed by my MySQL commit.

Problem:
InnoDB used to support a compilation mode that allowed to choose
whether the function definitions in .ic files are to be inlined or not.
This stopped making sense when InnoDB moved to C++ in MySQL 5.6
(and ha_innodb.cc started to #include .ic files), and more so in
MySQL 5.7 when inline methods and functions were introduced
in .h files.

Solution:
Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from
all files, assuming that the symbols are never defined.
Remove the files fut0fut.cc and ut0byte.cc which only mattered when
UNIV_NONINL was defined.
2017-03-17 12:42:07 +02:00
Marko Mäkelä
27b9989d31 MDEV-12121 Introduce build option WITH_INNODB_AHI to disable innodb_adaptive_hash_index
The InnoDB adaptive hash index is sometimes degrading the performance of
InnoDB, and it is sometimes disabled to get more consistent performance.
We should have a compile-time option to disable the adaptive hash index.

Let us introduce two options:

OPTION(WITH_INNODB_AHI "Include innodb_adaptive_hash_index" ON)
OPTION(WITH_INNODB_ROOT_GUESS "Cache index root block descriptors" ON)

where WITH_INNODB_AHI always implies WITH_INNODB_ROOT_GUESS.

As part of this change, the misleadingly named function
trx_search_latch_release_if_reserved(trx) will be replaced with the macro
trx_assert_no_search_latch(trx) that will be empty unless
BTR_CUR_HASH_ADAPT is defined (cmake -DWITH_INNODB_AHI=ON).

We will also remove the unused column
INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_TIMEOUT.
In MariaDB Server 10.1, it used to reflect the value of
trx_t::search_latch_timeout which could be adjusted during
row_search_for_mysql(). In 10.2, there is no such field.

Other than the removal of the unused column TRX_ADAPTIVE_HASH_TIMEOUT,
this is an almost non-functional change to the server when using the
default build options.

Some tests are adjusted so that they will work with both
-DWITH_INNODB_AHI=ON and -DWITH_INNODB_AHI=OFF. The test
innodb.innodb_monitor has been renamed to innodb.monitor
in order to track MySQL 5.7, and the duplicate tests
sys_vars.innodb_monitor_* are removed.
2017-03-03 16:55:50 +02:00
Jan Lindström
fec844aca8 Merge InnoDB 5.7 from mysql-5.7.14.
Contains also:
       MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE || m_lock_type != 2' failed. (branch bb-10.2-jan)
       Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB

       MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
       enable tests that were fixed in MDEV-10549

       MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan)
       fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables
2016-09-08 15:49:03 +03:00
Jan Lindström
2e814d4702 Merge InnoDB 5.7 from mysql-5.7.9.
Contains also

MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7

	The failure happened because 5.7 has changed the signature of
	the bool handler::primary_key_is_clustered() const
	virtual function ("const" was added). InnoDB was using the old
	signature which caused the function not to be used.

MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7

	Fixed mutexing problem on lock_trx_handle_wait. Note that
	rpl_parallel and rpl_optimistic_parallel tests still
	fail.

MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan)
  Reason: incorrect merge

MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan)
  Reason: incorrect merge
2016-09-02 13:22:28 +03:00
Sergei Golubchik
932646b1ff Merge branch '10.1' into 10.2 2016-06-30 16:38:05 +02:00
Sergei Golubchik
720e04ff67 5.6.31 2016-06-21 14:21:03 +02:00
Daniel Black
a0b11b9033 Remove btr_search_n_succ and btr_search_n_hash_fail counters 2016-06-02 17:03:31 +10:00
Sergei Golubchik
6d06fbbd1d move to storage/innobase 2015-05-04 19:17:21 +02:00
Sergei Golubchik
15ee97214d InnoDB 5.6.15 merge.
update test results
2014-02-26 19:36:33 +01:00
Michael Widenius
068c61978e Temporary commit of 10.0-merge 2013-03-26 00:03:13 +02:00
Michael Widenius
1d0f70c2f8 Temporary commit of merge of MariaDB 10.0-base and MySQL 5.6 2012-08-01 17:27:34 +03:00
Sergei Golubchik
4f435bddfd 5.3 merge 2012-01-13 15:50:02 +01:00