stats_deinit(): Replaces dict_stats_deinit().
Deinitialize the statistics for persistent tables,
so that they will be reloaded or recalculated
on a subsequent ha_innobase::open().
ha_innobase::rename_table(): Invoke stats_deinit() so that the
subsequent ha_innobase::open() will reload the InnoDB persistent
statistics. That is, it will remain possible to have the InnoDB
persistent statistics reloaded by executing the following:
RENAME TABLE t TO tmp, tmp TO t;
dict_table_close(table): Replaced with table->release().
There will no longer be any logic that would attempt to ensure
that the InnoDB persistent statistics will be reloaded after
FLUSH TABLES has been executed. This also fixes the problem that
dict_table_t::stat_modified_counter would be frequently reset to 0,
whenever ha_innobase::open() is invoked after the table reference
count had dropped to 0.
dict_table_close(table, thd, mdl): Remove the parameter "dict_locked".
Do not try to invalidate the statistics.
ha_innobase::statistics_init(): Replaces dict_stats_init(table).
Reviewed by: Thirunarayanan Balathandayuthapani
innodb_stats_transient_sample_pages, innodb_stats_persistent_sample_pages:
Change the type to UNSIGNED, because the number of pages in a table
is limited to 32 bits by the InnoDB file format.
btr_get_size_and_reserved(), fseg_get_n_frag_pages(),
fseg_n_reserved_pages_low(), fseg_n_reserved_pages(): Return uint32_t.
The file format limits page numbers to 32 bits.
dict_table_t::stat: An Atomic_relaxed<uint32_t> that combines a
number of metadata fields.
innodb_copy_stat_flags(): Copy the statistics flags from
TABLE_SHARE or HA_CREATE_INFO.
dict_table_t::stats_initialized(), dict_table_t::stats_is_persistent():
Accessors to dict_table_t::stat.
Reviewed by: Thirunarayanan Balathandayuthapani
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
(checks are done in a different order)
enum rename_fk: Replaces the "bool use_fk" parameter of
row_rename_table_for_mysql() and innobase_rename_table():
RENAME_IGNORE_FK: Replaces use_fk=false when the operation cannot
involve any FOREIGN KEY constraints, that is, it is a partitioned
table or an internal table for FULLTEXT INDEX.
RENAME_REBUILD: Replaces use_fk=false when the table may contain
FOREIGN KEY constraints, which must not be modified in the data
dictionary tables SYS_FOREIGN and SYS_FOREIGN_COLS.
RENAME_ALTER_COPY: Replaces use_fk=true. This is only specified
in ha_innobase::rename_table(), which may be invoked as part of
ALTER TABLE…ALGORITHM=COPY, but also during RENAME TABLE.
An alternative value RENAME_FK could be useful to specify in
ha_innobase::rename_table() when it is executed as part of
CREATE OR REPLACE TABLE, which currently is not an atomic operation.
Reviewed by: Debarun Banerjee
innodb_convert_name(): Convert a schema or table name to
my_charset_filename compatible format.
dict_table_lookup(): Replaces dict_get_referenced_table().
Make the callers responsible for invoking innodb_convert_name().
innobase_casedn_str(): Remove. Let us invoke my_casedn_str() directly.
dict_table_rename_in_cache(): Do not duplicate a call to
dict_mem_foreign_table_name_lookup_set().
innobase_convert_to_filename_charset(): Defined static in the only
compilation unit that needs it.
dict_scan_id(): Remove the constant parameters
table_id=FALSE, accept_also_dot=TRUE. Invoke strconvert() directly.
innobase_convert_from_id(): Remove; only called from dict_scan_id().
innobase_convert_from_table_id(): Remove (dead code).
table_name_t::dblen(), table_name_t::basename(): In non-debug builds,
tolerate names that may miss a '/' separator.
Reviewed by: Debarun Banerjee
- MDEV-34392(commit cc810e64d4) adds
the check for nullability of foreign key column when foreign key
relation is of UPDATE_CASCADE or UPDATE SET NULL. This check
makes DDL fail when it violates foreign key nullability.
This patch basically does the nullability check for foreign key
column only for strict sql mode
For the adaptive hash index, dtuple_fold() and rec_fold() were employing
a slow rolling hash algorithm, computing hash values ("fold") for one
field and one byte at a time, while depending on calls to
rec_get_offsets().
We already have optimized implementations of CRC-32C and have been
successfully using that function in some other InnoDB tables, but not
yet in the adaptive hash index.
Any linear function such as any CRC will fail the avalanche test that
any cryptographically secure hash function is expected to pass:
any single-bit change in the input key should affect on average half
the bits in the output.
But we always were happy with less than cryptographically secure:
in fact, ut_fold_ulint_pair() or ut_fold_binary() are just about as
linear as any CRC, using a combination of multiplication and addition,
partly carry-less. It is worth noting that exclusive-or corresponds to
carry-less subtraction or addition in a binary Galois field, or GF(2).
We only need some way of reducing key prefixes into hash values.
The CRC-32C should be better than a Rabin–Karp rolling hash algorithm.
Compared to the old hash algorithm, it has the drawback that there will
be only 32 bits of entropy before we choose the hash table cell by a
modulus operation. The size of each adaptive hash index array is
(innodb_buffer_pool_size / 512) / innodb_adaptive_hash_index_parts.
With the maximum number of partitions (512), we would not exceed 1<<32
elements per array until the buffer pool size exceeds 1<<50 bytes (1 PiB).
We would hit other limits before that: the virtual address space on many
contemporary 64-bit processor implementations is only 48 bits (256 TiB).
So, we can simply go for the SIMD accelerated CRC-32C.
rec_fold(): Take a combined parameter n_bytes_fields. Determine the
length of each field on the fly, and compute CRC-32C over a single
contiguous range of bytes, from the start of the record payload area
to the end of the last full or partial field. For secondary index records
in ROW_FORMAT=REDUNDANT, also the data area that is reserved for NULL
values (to facilitate in-place updates between NULL and NOT NULL values)
will be included in the count. Luckily, InnoDB always zero-initialized
such unused area; refer to data_write_sql_null() in
rec_convert_dtuple_to_rec_old(). For other than ROW_FORMAT=REDUNDANT,
no space is allocated for NULL values, and therefore the CRC-32C will
only cover the actual payload of the key prefix.
dtuple_fold(): For ROW_FORMAT=REDUNDANT, include the dummy NULL values
in the CRC-32C, so that the values will be comparable with rec_fold().
innodb_ahi-t: A unit test for rec_fold() and dtuple_fold().
btr_search_build_page_hash_index(), btr_search_drop_page_hash_index():
Use a fixed-size stack buffer for computing the fold values, to avoid
dynamic memory allocation.
btr_search_drop_page_hash_index(): Do not release part.latch if we
need to invoke multiple batches of rec_fold().
dtuple_t: Allocate fewer bits for the fields. The maximum number of
data fields is about 1023, so uint16_t will be fine for them. The
info_bits is stored in less than 1 byte.
ut_pair_min(), ut_pair_cmp(): Remove. We can actually combine and compare
int(n_fields << 16 | n_bytes).
PAGE_CUR_LE_OR_EXTENDS, PAGE_CUR_DBG: Remove. These were never defined,
because they would only work with latin1_swedish_ci if at all.
btr_cur_t::check_mismatch(): Replaces !btr_search_check_guess().
cmp_dtuple_rec_bytes(): Replaces cmp_dtuple_rec_with_match_bytes().
Determine the offsets of fields on the fly.
page_cur_try_search_shortcut_bytes(): This caller of
cmp_dtuple_rec_bytes() will not be invoked on the change buffer tree.
cmp_dtuple_rec_leaf(): Replaces cmp_dtuple_rec_with_match()
for comparing leaf-page records.
buf_block_t::ahi_left_bytes_fields: Consolidated Atomic_relaxed<uint32_t>
of curr_left_side << 31 | curr_n_bytes << 16 | curr_n_fields.
The other set of parameters (n_fields, n_bytes, left_side) was removed
as redundant.
btr_search_update_hash_node_on_insert(): Merged to
btr_search_update_hash_on_insert().
btr_search_build_page_hash_index(): Take combined left_bytes_fields
instead of n_fields, n_bytes, left_side.
btr_search_update_block_hash_info(), btr_search_update_hash_ref():
Merged to btr_search_info_update_hash().
btr_cur_t::n_bytes_fields: Replaces n_bytes << 16 | n_fields.
We also remove many redundant checks of btr_search.enabled.
If we are holding any btr_sea::partition::latch, then a nonnull pointer
in buf_block_t::index must imply that the adaptive hash index is enabled.
Reviewed by: Vladislav Lesin
Let us use implement a simple fixed-size allocator for the adaptive hash
index, insted of complicating mem_heap_t or mem_block_info_t.
MEM_HEAP_BTR_SEARCH: Remove.
mem_block_info_t::free_block(), mem_heap_free_block_free(): Remove.
mem_heap_free_top(), mem_heap_get_top(): Remove.
btr_sea::partition::spare: Replaces mem_block_info_t::free_block.
This keeps one spare block per adaptive hash index partition, to
process an insert.
We must not wait for buf_pool.mutex while holding
any btr_sea::partition::latch. That is why we cache one block for
future allocations. This is protected by a new
btr_sea::partition::blocks_mutex in order to relieve pressure on
btr_sea::partition::latch.
btr_sea::partition::prepare_insert(): Replaces
btr_search_check_free_space_in_heap().
btr_sea::partition::erase(): Replaces ha_search_and_delete_if_found().
btr_sea::partition::cleanup_after_erase(): Replaces the most part of
ha_delete_hash_node(). Unlike the previous implementation, we will
retain a spare block for prepare_insert().
This should reduce some contention on buf_pool.mutex.
btr_search.n_parts: Replaces btr_ahi_parts.
btr_search.enabled: Replaces btr_search_enabled. This must hold
whenever buf_block_t::index is set while a thread is holding a
btr_sea::partition::latch.
dict_index_t::search_info: Remove pointer indirection, and use
Atomic_relaxed or Atomic_counter for most fields.
btr_search_guess_on_hash(): Let the caller ensure that latch_mode is
BTR_MODIFY_LEAF or BTR_SEARCH_LEAF. Release btr_sea::partition::latch
before buffer-fixing the block. The page latch that we already acquired
is preventing buffer pool eviction. We must validate both
block->index and block->page.state while holding part.latch
in order to avoid race conditions with buffer page relocation
or buf_pool_t::resize().
btr_search_check_guess(): Remove the constant parameter
can_only_compare_to_cursor_rec=false.
ahi_node: Replaces ha_node_t.
This has been tested by running the regression test suite
with the adaptive hash index enabled:
./mtr --mysqld=--loose-innodb-adaptive-hash-index=ON
Reviewed by: Vladislav Lesin
Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)
Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.
The idea of this fix is from Michael 'Monty' Widenius.
HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.
trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.
trx_t::is_started(): Replaces trx_is_started().
ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.
ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().
The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.
trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.
trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.
fts_trx_create(): Do not copy previous savepoints.
fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.
Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
the information about index algorithm was stored in two
places inconsistently split between both.
BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user
explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not.
RTREE index had key->algorithm == HA_KEY_ALG_RTREE
and always had key->flags & HA_SPATIAL
FULLTEXT index had key->algorithm == HA_KEY_ALG_FULLTEXT
and always had key->flags & HA_FULLTEXT
HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF
long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH
In this commit:
All indexes except BTREE and HASH always have key->algorithm
set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except
for storage to keep frms backward compatible).
As a side effect ALTER TABLE now detects FULLTEXT index renames correctly
- InnoDB fulltext rebuilds the FTS COMMON table while adding the
new fulltext index. This can be optimized by avoiding rebuilding
the FTS COMMON table in case of FTS COMMON TABLE already exists.
Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
- InnoDB fails to set the index information or index number
for the spatial index error HA_ERR_NULL_IN_SPATIAL.
row_build_spatial_index_key(): Initialize the tmp_mbr array completely.
check_if_supported_inplace_alter(): Fix the spelling mistake of alter
Don't allow the referencing key column from NULL TO NOT NULL
when
1) Foreign key constraint type is ON UPDATE SET NULL
2) Foreign key constraint type is ON DELETE SET NULL
3) Foreign key constraint type is UPDATE CASCADE and referenced
column declared as NULL
Don't allow the referenced key column from NOT NULL to NULL
when foreign key constraint type is UPDATE CASCADE
and referencing key columns doesn't allow NULL values
get_foreign_key_info(): InnoDB sends the information about
nullability of the foreign key fields and referenced key fields.
fk_check_column_changes(): Enforce the above rules for COPY
algorithm
innobase_check_foreign_drop_col(): Checks whether the dropped
column exists in existing foreign key relation
innobase_check_foreign_low() : Enforce the above rules for
INPLACE algorithm
dict_foreign_t::check_fk_constraint_valid(): This is used
by CREATE TABLE statement to check nullability for foreign
key relation.
Reason:
======
- InnoDB fails to load the instant alter table metadata from
clustered index while loading the table definition.
The reason is that InnoDB metadata blob has the column length
exceeds maximum fixed length column size.
Fix:
===
- InnoDB should treat the long fixed length column as variable
length fields that needs external storage while initializing
the field map for instant alter operation
- commit 85db534731 (MDEV-33400)
retains the instantness in the table definition after discard
tablespace. So there is no need to assign n_core_null_bytes
during instant table preparation unless they are not
initialized.
- During copy algorithm, InnoDB should use bulk insert operation
for row by row insert operation. By doing this, copy algorithm
can effectively build indexes. This optimization is disabled
for temporary table, versioning table and table which has
foreign key relation.
Introduced the variable innodb_alter_copy_bulk to allow
the bulk insert operation for copy alter operation
inside InnoDB. This is enabled by default
ha_innobase::extra(): HA_EXTRA_END_ALTER_COPY mode tries to apply
the buffered bulk insert operation, updates the non-persistent
table stats.
row_merge_bulk_t::write_to_index(): Update stat_n_rows after
applying the bulk insert operation
row_ins_clust_index_entry_low(): In case of copy algorithm,
switch to bulk insert operation.
copy_data_error_ignore(): Handles the error while copying
the data from source to target file.