PROBLEM
By design stats estimation always reading uncommitted data. In this scenario
an uncommitted transaction has deleted all rows in the table. In Innodb
uncommitted delete records are marked as delete but not actually removed
from Btree until the transaction has committed or a read view for the rows
is present.While calculating persistent stats we were ignoring the delete
marked records,since all the records are delete marked we were estimating
the number of rows present in the table as zero which leads to bad plans
in other transaction operating on the table.
Fix
Introduced a system variable called innodb_stats_include_delete_marked
which when enabled includes delete marked records for stat
calculations .
Apparently, WL#8149 QA did not cover the code changes made to
online table rebuild (which was introduced in MySQL 5.6.8 by WL#6255)
for ROW_FORMAT=REDUNDANT tables.
row_log_table_low_redundant(): Log the new values of indexed virtual
columns (ventry) only once.
row_log_table_low(): Assert that if o_ventry is specified, the
logged operation must not be ROW_T_INSERT, and ventry must be specified
as well.
row_log_table_low(): When computing the size of old_pk, pass v_entry=NULL to
rec_get_converted_size_temp(), to be consistent with the subsequent call
to rec_convert_dtuple_to_temp() for logging old_pk. Assert that
old_pk never contains information on virtual columns, thus proving that this
change is a no-op.
RB: 13822
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Problem:
=======
Autoincrement value gives duplicate values because of the following reasons.
(1) In InnoDB handler function, current autoincrement value is not changed
based on newly set auto_increment_increment or auto_increment_offset variable.
(2) Handler function does the rounding logic and changes the current
autoincrement value and InnoDB doesn't aware of the change in current
autoincrement value.
Solution:
========
Fix the problem(1), InnoDB always respect the auto_increment_increment
and auto_increment_offset value in case of current autoincrement value.
By fixing the problem (2), handler layer won't change any current
autoincrement value.
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
RB: 13748
Problem:
=======
Inplace alter algorithm determines the table to be rebuild if the table
undergoes row format change, key block size if handler flag contains only
change table create option. If alter with inplace ignore flag operations and change table create options then it leads to table rebuild operation.
Solution:
========
During the check for rebuild, ignore the inplace ignore flag and check for
table create options.
Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Marko Makela <marko.makela@oracle.com>
RB: 13172
Alias the InnoDB ulint and lint data types to size_t and ssize_t,
which are the standard names for the machine-word-width data types.
Correspondingly, define ULINTPF as "%zu" and introduce ULINTPFx as "%zx".
In this way, better compiler warnings for type mismatch are possible.
Furthermore, use PRIu64 for that 64-bit format, and define
the feature macro __STDC_FORMAT_MACROS to enable it on Red Hat systems.
Fix some errors in error messages, and replace some error messages
with assertions.
Most notably, an IMPORT TABLESPACE error message in InnoDB was
displaying the number of columns instead of the mismatching flags.
When MDEV-6076 repurposed the field PAGE_MAX_TRX_ID, it was assumed
that the field always was 0 in the clustered index of old data files.
This was not the case in IMPORT TABLESPACE (introduced in MySQL 5.6
and MariaDB 10.0), which is writing the transaction ID to all index
pages, including clustered index pages.
This means that on a data file that was at some point of its life
IMPORTed to an InnoDB instance, MariaDB 10.2.4 or later could interpret
the transaction ID as a persistent AUTO_INCREMENT value.
This also means that future changes that repurpose PAGE_MAX_TRX_ID
in the clustered index may cause trouble with files that were imported
at some point of their life.
There is a separate minor issue that InnoDB is writing PAGE_MAX_TRX_ID
to every secondary index page, even though it is only needed on leaf
pages. From now on we will write PAGE_MAX_TRX_ID as 0 to non-leaf pages,
just to be able to keep stricter debug assertions.
btr_root_raise_and_insert(): Reset the PAGE_MAX_TRX_ID field on non-root
pages of the clustered index, and on the no-longer-leaf root page of
secondary indexes.
AbstractCallback::is_root_page(): Remove. Use page_is_root() instead.
PageConverter::update_index_page(): Reset the PAGE_MAX_TRX_ID to 0
on other pages than the clustered index root page or secondary index
leaf pages.
Fixed the bug by failing the statement with an error message that explains
that an auto-increment column may not be used in an expression for a
check constraint.
Added a test case in check_constraint.test.
Updated existing tests and results.
because on Windows it cannot properly append to the file,
doesn't use CreateFile(..., FILE_APPEND_DATA, ...)
this fixes main.shutdown failures on Windows
automatic shortening of a too-long non-unique key should
be not a warning, but a note. It's a normal optimization,
doesn't affect correctness, and should never be converted to
an error, no matter how strict sql_mode is.
ha_innobase::defragment_table(): Skip corrupted indexes and
FULLTEXT INDEX. In InnoDB, FULLTEXT INDEX is implemented with
auxiliary tables. We will not defragment them on OPTIMIZE TABLE.
vcols in the table definition are intentionaly bad and produce
warnings when a table is opened. In some cases (depending on the
TDC/TC state as left by earlier tests) a table might be opened in the
middle of the test, resulting in spurious warnings. Suppress them.
Also, don't run main.mdev-504 for ps-protocol at all (instead of
disabling ps-protocol inside the test, but still running the test)
1. Special mode to search in error logs: if SEARCH_RANGE is not set,
the file is considered an error log and the search is performed
since the last CURRENT_TEST: line
2. Number of matches is printed too. "FOUND 5 /foo/ in bar".
Use greedy .* at the end of the pattern if number of matches
isn't stable. If nothing is found it's still "NOT FOUND",
not "FOUND 0".
3. SEARCH_ABORT specifies the prefix of the output.
Can be "NOT FOUND" or "FOUND" as before,
but also "FOUND 5 " if needed.
namely, restart_mysqld_with_option.inc and kill_and_restart_mysqld.inc -
use restart_mysqld.inc instead.
Also remove innodb_wl6501_crash_stripped.inc that wasn't used anywhere.
InnoDB divides the allocation of undo logs into rollback segments.
The DB_ROLL_PTR system column of clustered indexes can address up to
128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only
created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin
for MySQL 5.1, all 128 rollback segments were created.
MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs.
On upgrade, unless a slow shutdown (innodb_fast_shutdown=0)
was performed on the old server instance, these rollback segments
could be in use by transactions that are in XA PREPARE state or
transactions that were left behind by a server kill followed by a
normal shutdown immediately after restart.
Persistent tables cannot refer to temporary undo logs or vice versa.
Therefore, we should keep two distinct sets of rollback segments:
one for persistent tables and another for temporary tables. In this way,
all 128 rollback segments will be available for both types of tables,
which could improve performance. Also, MariaDB 10.2 will remain more
compatible than MySQL 5.7 with data files from earlier versions of
MySQL or MariaDB.
trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary
rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will
be solely for persistent undo logs.
srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS.
srv_available_undo_logs: Change the type to ulong.
trx_rseg_get_on_id(): Remove. Instead, let the callers refer to
trx_sys directly.
trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters.
These functions only deal with persistent undo logs.
trx_temp_rseg_create(): New function, to create all temporary rollback
segments at server startup.
trx_rseg_t::is_persistent(): Determine if the rollback segment is for
persistent tables.
trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on
context (such as table handle) whether the DB_ROLL_PTR is referring to
a persistent undo log.
trx_sys_create_rsegs(): Remove all parameters, which were always passed
as global variables. Instead, modify the global variables directly.
enum trx_rseg_type_t: Remove.
trx_t::get_temp_rseg(): A method to ensure that a temporary
rollback segment has been assigned for the transaction.
trx_t::assign_temp_rseg(): Replaces trx_assign_rseg().
trx_purge_free_segment(), trx_purge_truncate_rseg_history():
Remove the redundant variable noredo=false.
Temporary undo logs are discarded immediately at transaction commit
or rollback, not lazily by purge.
trx_purge_mark_undo_for_truncate(): Remove references to the
temporary rollback segments.
trx_purge_mark_undo_for_truncate(): Remove a check for temporary
rollback segments. Only the dedicated persistent undo log tablespaces
can be truncated.
trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the
parameter is_temp.
trx_rseg_mem_restore(): Split from trx_rseg_mem_create().
Initialize the undo log and the rollback segment from the file
data structures.
trx_sysf_get_n_rseg_slots(): Renamed from
trx_sysf_used_slots_for_redo_rseg(). Count the persistent
rollback segment headers that have been initialized.
trx_sys_close(): Also free trx_sys->temp_rsegs[].
get_next_redo_rseg(): Merged to trx_assign_rseg_low().
trx_assign_rseg_low(): Remove the parameters and access the
global variables directly. Revert to simple round-robin, now that
the whole trx_sys->rseg_array[] is for persistent undo log again.
get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg().
srv_undo_tablespaces_init(): Remove some parameters and use the
global variables directly. Clarify some error messages.
Adjust the test innodb.log_file. Apparently, before these changes,
InnoDB somehow ignored missing dedicated undo tablespace files that
are pointed by the TRX_SYS header page, possibly losing part of
essential transaction system state.
MyISAM in compute_vcols() - which is used only in mi_check code -
was computing indexed vcols into an internally allocated buffer
(not record[0]) and the buffer was calculated to be long enough to fit
every keyseg (a keyseg knows where its value in a record buffer is
and the length of the value).
This logic didn' work for prefix keys, because the keyseg length is the
length of a prefix, but the record buffer needs to fit the complete
value of a vcol. In this bug MyISAM was writing a 2K varchar
into a buffer too short.
Also it didn't work for repair-with-keycache, because that code
recalculats all vcols, not only indexed ones.
So, the buffer size (MYISAM_SHARE::vreclength) should include all
vcols' full lengths. But it was calculated in mi_open and low-level
MyISAM code has no knowledge of vcols.
As a fix we now recalculate MYISAM_SHARE::vreclength in
ha_myisam::setup_vcols_for_repair() which is always called
before compute_vcols().
The test was unnecessarily depending on InnoDB purge, which can
sometimes fail to proceed.
Let us rewrite the test to use BEGIN;INSERT;ROLLBACK to cause the
immediate removal of the desired records.
The test was unnecessarily depending on InnoDB purge, which can
sometimes fail to proceed.
Let us rewrite the test to use BEGIN;INSERT;ROLLBACK to cause the
immediate removal of the desired records.
The test is not expected to crash. With a non-debug server,
Valgrind completes in reasonable time without any failure.
Also, it does not make sense to store and restore parameters
when the parameters are already being restored by a server restart.
This is a partial port of my patch in MySQL 8.0.
In MySQL 8.0, all InnoDB references to DBUG_OFF were replaced
with UNIV_DEBUG. We will not do that in MariaDB.
InnoDB used two independent compile-time flags that distinguish
debug and non-debug builds, which is confusing.
Also, make ut_ad() and alias of DBUG_ASSERT().
dict_create_or_check_foreign_constraint_tables(): Change the warning
about the foreign key metadata table creation to a note.
Remove messages after metadata table creation. If the creation fails,
startup will abort with a message. Normally the creation succeeds on
bootstrap, and the messages would only be noise.
Remove the related suppressions from the tests.
MDEV-11581: Mariadb starts InnoDB encryption threads
when key has not changed or data scrubbing turned off
Background: Key rotation is based on background threads
(innodb-encryption-threads) periodically going through
all tablespaces on fil_system. For each tablespace
current used key version is compared to max key age
(innodb-encryption-rotate-key-age). This process
naturally takes CPU. Similarly, in same time need for
scrubbing is investigated. Currently, key rotation
is fully supported on Amazon AWS key management plugin
only but InnoDB does not have knowledge what key
management plugin is used.
This patch re-purposes innodb-encryption-rotate-key-age=0
to disable key rotation and background data scrubbing.
All new tables are added to special list for key rotation
and key rotation is based on sending a event to
background encryption threads instead of using periodic
checking (i.e. timeout).
fil0fil.cc: Added functions fil_space_acquire_low()
to acquire a tablespace when it could be dropped concurrently.
This function is used from fil_space_acquire() or
fil_space_acquire_silent() that will not print
any messages if we try to acquire space that does not exist.
fil_space_release() to release a acquired tablespace.
fil_space_next() to iterate tablespaces in fil_system
using fil_space_acquire() and fil_space_release().
Similarly, fil_space_keyrotation_next() to iterate new
list fil_system->rotation_list where new tables.
are added if key rotation is disabled.
Removed unnecessary functions fil_get_first_space_safe()
fil_get_next_space_safe()
fil_node_open_file(): After page 0 is read read also
crypt_info if it is not yet read.
btr_scrub_lock_dict_func()
buf_page_check_corrupt()
buf_page_encrypt_before_write()
buf_merge_or_delete_for_page()
lock_print_info_all_transactions()
row_fts_psort_info_init()
row_truncate_table_for_mysql()
row_drop_table_for_mysql()
Use fil_space_acquire()/release() to access fil_space_t.
buf_page_decrypt_after_read():
Use fil_space_get_crypt_data() because at this point
we might not yet have read page 0.
fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly
to functions needing it and store fil_space_t* to rotation state.
Use fil_space_acquire()/release() when iterating tablespaces
and removed unnecessary is_closing from fil_crypt_t. Use
fil_space_t::is_stopping() to detect when access to
tablespace should be stopped. Removed unnecessary
fil_space_get_crypt_data().
fil_space_create(): Inform key rotation that there could
be something to do if key rotation is disabled and new
table with encryption enabled is created.
Remove unnecessary functions fil_get_first_space_safe()
and fil_get_next_space_safe(). fil_space_acquire()
and fil_space_release() are used instead. Moved
fil_space_get_crypt_data() and fil_space_set_crypt_data()
to fil0crypt.cc.
fsp_header_init(): Acquire fil_space_t*, write crypt_data
and release space.
check_table_options()
Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_*
i_s.cc: Added ROTATING_OR_FLUSHING field to
information_schema.innodb_tablespace_encryption
to show current status of key rotation.