This is preparation for MDEV-12288, which would set DB_TRX_ID=0
when purging history. Also with that change in place, delete-marked
records must always refer to an undo log record via a nonzero
DB_TRX_ID column. (The DB_TRX_ID is only present in clustered index
leaf page records.)
btr_cur_parse_del_mark_set_clust_rec(), rec_get_trx_id():
Statically allocate the offsets
(should never use the heap). Add some debug assertions.
Replace some use of rec_get_trx_id() with row_get_rec_trx_id().
trx_undo_report_row_operation(): Add some sanity checks that are
common for all operations that produce undo log.
- Added variable tmp_disk_table_size
- Added variable tmp_memory_table_size as an alias for tmp_table_size
- Changed internal variable tmp_table_size to tmp_memory_table_size
- create_info.data_file_length is now set with tmp_disk_table_size
- Fixed that Aria doesn't reset max_data_file_length for internal tables
- Added status flag if table is full so that we can detect this on next insert.
This ensures that the table is always 'correct', but we get the error one
row after the row that grow the table too big.
- Removed some mutex lock for internal temporary tables
The field fts_token->position is not initialized in
row_merge_fts_doc_tokenize(). We cannot have that field
without changing the fulltext parser plugin ABI
(adding st_mysql_ftparser_boolean_info::position,
as it was done in MySQL 5.7 in WL#6943).
The InnoDB fulltext parser plugins "ngram" and "Mecab" that were
introduced in MySQL 5.7 do depend on that field. But the simple_parser
does not. Apparently, simple_parser is leaving the field as 0.
So, in our fix we will assume that the missing position field is 0.
In Mariabackup, we would want the backed-up redo log file size to be
a multiple of 512 bytes, or OS_FILE_LOG_BLOCK_SIZE. However, at startup,
InnoDB would be picky, requiring the file size to be a multiple of
innodb_page_size.
Furthermore, InnoDB would require the parameter to be a multiple of
one megabyte, while the minimum granularity is 512 bytes. Because
the data-file-oriented fil_io() API is being used for writing the
InnoDB redo log, writes will for now require innodb_log_file_size to
be a multiple of the maximum innodb_page_size (65536 bytes).
To complicate matters, InnoDB startup divided srv_log_file_size by
UNIV_PAGE_SIZE, so that initially, the unit was bytes, and later it
was innodb_page_size. We will simplify this and keep srv_log_file_size
in bytes at all times.
innobase_log_file_size: Remove. Remove some obsolete checks against
overflow on 32-bit systems. srv_log_file_size is always 64 bits, and
the maximum size 512GiB in multiples of innodb_page_size always fits
in ulint (which is 32 or 64 bits). 512GiB would be 8,388,608*64KiB or
134,217,728*4KiB.
log_init(): Remove the parameter file_size that was always passed as
srv_log_file_size.
log_set_capacity(): Add a parameter for passing the requested file size.
srv_log_file_size_requested: Declare static in srv0start.cc.
create_log_file(), create_log_files(),
innobase_start_or_create_for_mysql(): Invoke fil_node_create()
with srv_log_file_size expressed in multiples of innodb_page_size.
innobase_start_or_create_for_mysql(): Require the redo log file sizes
to be multiples of 512 bytes.
trx_sys_print_mysql_binlog_offset(): Use 64-bit arithmetics and ib::info().
TRX_SYS_MYSQL_LOG_OFFSET: Replaces TRX_SYS_MYSQL_LOG_OFFSET_HIGH,
TRX_SYS_MYSQL_LOG_OFFSET_LOW.
trx_sys_update_mysql_binlog_offset(): Remove the constant parameter
field=TRX_SYS_MYSQL_LOG_INFO. Use 64-bit arithmetics.
A similar change was contributed to Percona XtraBackup, but for some
reason, it is not present in Percona XtraDB. Since MDEV-9566
(MariaDB 10.1.23), that change is present in the MariaDB XtraDB.
recv_sys_init(): Remove the parameter.
recv_sys_create(): Merge to recv_sys_init().
recv_sys_mem_free(): Merge to recv_sys_close().
log_mem_free(): Merge to log_shutdown().
These self references were previously used to avoid having to check the
IO_CACHE's type. However, a benchmark shows that on x86 5930k stock,
the type comparison is marginally faster than the double pointer dereference.
For 40 billion my_b_tell calls, the difference is .1 seconds in favor of performing the
type check. (Basically there is no measurable difference)
To prevent bugs from copying the structure using the equals(=) operator,
and having to do the bookkeeping manually, remove these "convenience"
variables.
srv_log_files_created: A debug flag to ensure that InnoDB redo log
files can only be created once in the server lifetime, and that
after log files have been created, no crash recovery will take place.
recv_scan_log_recs(): Detect the special case where the log consists
of a sole MLOG_CHECKPOINT record, such as immediately after creating
the redo logs.
recv_recovery_from_checkpoint_start(): Skip the recovery message
if the redo log is logically empty.
A merge error caused InnoDB bootstrap to fail when
innodb_undo_tablespaces was set to more than 2.
This was because of a bug that was introduced to
srv_undo_tablespaces_init() by the merge.
Furthermore, some adjustments for Oracle Bug#25551311 aka
Bug#23517560 changes were forgotten. We must minimize direct
references to srv_undo_tablespaces_open and use predicates
instead.
srv_undo_tablespaces_init(): Increment srv_undo_tablespaces_open
once, not twice, per loop iteration.
is_system_or_undo_tablespace(): Remove (unused function).
is_predefined_tablespace(): Invoke srv_is_undo_tablespace().
When it comes to DEFAULT values of columns, InnoDB is imposing both
unnecessary and insufficient conditions on whether ALGORITHM=INPLACE
should be allowed for ALTER TABLE.
When changing an existing column to NOT NULL, any NULL values in the
columns only get a special treatment if the column is changed to an
AUTO_INCREMENT column (which is not supported by ALGORITHM=INPLACE)
or the column type is TIMESTAMP. In all other cases, an error
must be reported for the failure to convert a NULL value to NOT NULL.
InnoDB was unnecessarily interested in whether the DEFAULT value
is not constant when altering other than TIMESTAMP columns. Also,
when changing a TIMESTAMP column to NOT NULL, InnoDB was performing
an insufficient check, and it was incorrectly allowing a constant
DEFAULT value while not being able to replace NULL values with that
constant value.
Furthermore, in ADD COLUMN, InnoDB is unnecessarily rejecting certain
nondeterministic DEFAULT expressions (depending on the session
parameters or the current time).
buf_flush_page_cleaner_coordinator(): Signal the thread creator
that the error log output regarding setpriority() has been issued.
innobase_start_or_create_for_mysql(): Wait for
buf_flush_page_cleaner_coordinator() to completely start up.
This prevents sporadic failures of tests that search the server
error log for InnoDB redo log recovery messages.
While the primary purpose of innodb_force_recovery is to allow
data to be rescued from an InnoDB instance that would crash due
to some data corruption, the settings 1, 2, or 3 are relatively
safe to use and there is no need to prevent write transactions
in these modes.
The setting innodb_force_recovery=4 and above can cause database
corruption. For those modes, we already set the flag
high_level_read_only to disable modifications, except DROP TABLE.
MODIFICATIONS_NOT_ALLOWED_MSG_FORCE_RECOVERY: Remove. There is no
need to spam the error log for each refused DML operation. It suffices
to return an error to the client. There will be messages at startup
if innodb_read_only or innodb_force_recovery are preventing writes.
The original intention of the setting innodb_force_recovery=3 was to
disable background activity that could create trouble, most notably,
the rollback of incomplete transactions, and the purge of transaction
history.
MySQL 5.6 introduced more background threads, it is creating
dict_stats_thread and fts_optimize_thread even though these threads
are at least as non-essential as the rollback and purge. These
threads are in fact worse, because they can create new transactions
on their own.
innobase_start_or_create_for_mysql(): Do not create any internal
undo log sources unless innodb_force_recovery<3.
buf_flush_init_for_writing(): Reset the FIL_PAGE_TYPE
of the TRX_SYS page to the canonical value
FIL_PAGE_TYPE_TRX_SYS instead of FIL_PAGE_TYPE_UNKNOWN.
log_calc_max_ages(): Use the requested size in the check, instead of
the detected redo log size. The redo log will be resized at startup
if it differs from what has been requested.
When the btr_search_latch was split into an array of latches
in MySQL 5.7.8 as part of the Oracle Bug#20985298 fix, the "caching"
of the latch across storage engine API calls was removed, and
the field trx->has_search_latch would only be set during a short
time frame in the execution of row_search_mvcc(), which was
formerly called row_search_for_mysql().
This means that the column
INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_LATCHED will always
report 0. That column cannot be removed in MariaDB 10.2, but it
can be removed in future releases.
trx_t::has_search_latch: Remove.
trx_assert_no_search_latch(): Remove.
row_sel_try_search_shortcut_for_mysql(): Remove a redundant condition
on trx->has_search_latch (it was always true).
sync_check_iterate(): Make the parameter const.
sync_check_functor_t: Make the operator() const, and remove result()
and the virtual destructor. There is no need to have mutable state
in the functors.
sync_checker<bool>: Replaces dict_sync_check and btrsea_sync_check.
sync_check: Replaces btrsea_sync_check.
dict_sync_check: Instantiated from sync_checker.
sync_allowed_latches: Use std::find() directly on the array.
Remove the std::vector.
TrxInInnoDB::enter(), TrxInInnoDB::exit(): Remove obviously redundant
debug assertions on trx->in_depth, and use equality comparison against 0
because it could be more efficient on some architectures.
The sole purpose of handlerton::release_temporary_latches and its wrapper
function was to release the InnoDB adaptive hash index latch
(btr_search_latch).
When the btr_search_latch was split into an array of latches
in MySQL 5.7.8 as part of the Oracle Bug#20985298 fix, the "caching"
of the latch across storage engine API calls was removed. As part of that,
the function trx_search_latch_release_if_reserved() was changed to an
assertion and the function trx_reserve_search_latch_if_not_reserved()
was removed, and handlerton::release_temporary_latches() practically
became a no-op.
Note: MDEV-12121 replaced the function
trx_search_latch_release_if_reserved()
with the more appropriately named macro trx_assert_no_search_latch().
dict_sys_tables_type_to_tf(): Change the parameter n_cols to not_redundant.
dict_tf_is_valid_not_redundant(): Refactored from dict_tf_is_valid().
dict_sys_tables_type_valid(): Replaces dict_sys_tables_type_validate().
Use the common function dict_tf_is_valid_not_redundant(), which validates
PAGE_COMPRESSION_LEVEL more strictly.
DICT_TF_GET_UNUSED(flags): Remove.
innodb.table_flags: Adjust the test case. Due to the MDEV-12873 fix
in 10.2, the corrupted flags for table test.td would be converted,
and a tablespace flag mismatch will occur when trying to open the file.
Remove the SHARED_SPACE flag that was erroneously introduced in
MariaDB 10.2.2, and shift the SYS_TABLES.TYPE flags back to where
they were before MariaDB 10.2.2. While doing this, ensure that
tables created with affected MariaDB versions can be loaded,
and also ensure that tables created with MySQL 5.7 using the
TABLESPACE attribute cannot be loaded.
MariaDB 10.2.2 picked the SHARED_SPACE flag from MySQL 5.7,
shifting the MariaDB 10.1 flags PAGE_COMPRESSION, PAGE_COMPRESSION_LEVEL,
ATOMIC_WRITES by one bit. The SHARED_SPACE flag would always
be written as 0 by MariaDB, because MariaDB does not support
CREATE TABLESPACE or CREATE TABLE...TABLESPACE for InnoDB.
So, instead of the bits AALLLLCxxxxxxx we would have
AALLLLC0xxxxxxx if the table was created with MariaDB 10.2.2
to 10.2.6. (AA=ATOMIC_WRITES, LLLL=PAGE_COMPRESSION_LEVEL,
C=PAGE_COMPRESSED, xxxxxxx=7 bits that were not moved.)
PAGE_COMPRESSED=NO implies LLLLC=00000. That is not a problem.
If someone created a table in MariaDB 10.2.2 or 10.2.3 with
the attribute ATOMIC_WRITES=OFF (value 2; AA=10) and without
PAGE_COMPRESSED=YES or PAGE_COMPRESSION_LEVEL, the table should be
rejected. We ignore this problem, because it should be unlikely
for anyone to specify ATOMIC_WRITES=OFF, and because 10.2.2 and
10.2.2 were not mature releases. The value ATOMIC_WRITES=ON (1)
would be interpreted as ATOMIC_WRITES=OFF, but starting with
MariaDB 10.2.4 the ATOMIC_WRITES attribute is ignored.
PAGE_COMPRESSED=YES implies that PAGE_COMPRESSION_LEVEL be between
1 and 9 and that ROW_FORMAT be COMPACT or DYNAMIC. Thus, the affected
wrong bit pattern in SYS_TABLES.TYPE is of the form AALLLL10DB00001
where D signals the presence of a DATA DIRECTORY attribute and B is 1
for ROW_FORMAT=DYNAMIC and 0 for ROW_FORMAT=COMPACT. We must interpret
this bit pattern as AALLLL1DB00001 (discarding the extraneous 0 bit).
dict_sys_tables_rec_read(): Adjust the affected bit pattern when
reading the SYS_TABLES.TYPE column. In case of invalid flags,
report both SYS_TABLES.TYPE (after possible adjustment) and
SYS_TABLES.MIX_LEN.
dict_load_table_one(): Replace an unreachable condition on
!dict_tf2_is_valid() with a debug assertion. The flags will already
have been validated by dict_sys_tables_rec_read(); if that validation
fails, dict_load_table_low() will have failed.
fil_ibd_create(): Shorten an error message about a file pre-existing.
Datafile::validate_to_dd(): Clarify an error message about tablespace
flags mismatch.
ha_innobase::open(): Remove an unnecessary warning message.
dict_tf_is_valid(): Simplify and stricten the logic. Validate the
values of PAGE_COMPRESSION. Remove error log output; let the callers
handle that.
DICT_TF_BITS: Remove ATOMIC_WRITES, PAGE_ENCRYPTION, PAGE_ENCRYPTION_KEY.
The ATOMIC_WRITES is ignored once the SYS_TABLES.TYPE has been validated;
there is no need to store it in dict_table_t::flags. The PAGE_ENCRYPTION
and PAGE_ENCRYPTION_KEY are unused since MariaDB 10.1.4 (the GA release
was 10.1.8).
DICT_TF_BIT_MASK: Remove (unused).
FSP_FLAGS_MEM_ATOMIC_WRITES: Remove (the flags are never read).
row_import_read_v1(): Display an error if dict_tf_is_valid() fails.
dict_table_t::thd: Remove. This was only used by btr_root_block_get()
for reporting decryption failures, and it was only assigned by
ha_innobase::open(), and never cleared. This could mean that if a
connection is closed, the pointer would become stale, and the server
could crash while trying to report the error. It could also mean
that an error is being reported to the wrong client. It is better
to use current_thd in this case, even though it could mean that if
the code is invoked from an InnoDB background operation, there would
be no connection to which to send the error message.
Remove dict_table_t::crypt_data and dict_table_t::page_0_read.
These fields were never read.
fil_open_single_table_tablespace(): Remove the parameter "table".
innodb.row_format_redundant: Really corrupt the SYS_TABLES.MIX_LEN,
and do not use any debug instrumentation. For tables created in the
system tablespace, the contents of the column will be ignored.
Only the table t1 will refuse to load.
dict_load_table_one(): Remove the DBUG_EXECUTE_IF instrumentation.
Omit a redundant error message "incorrect flags in SYS_TABLES".
dict_sys_tables_rec_read(): Partially revert the Oracle Bug#21644827
fix, and always report errors by the return value.
fts_create_in_mem_aux_table(): Do not rely on dict_table_t::flags2,
but instead evaluate the tablespace ID.
DICT_TF2_BITS: Reduce to the correct value of 7. The two extra
high-order bits were specific to MySQL 5.7.
Cover innodb.table_flags with the new innodb_page_size.combinations
32k and 64k.
dict_sys_tables_type_validate(): Remove an assertion that made a
check in the function redundant. Remove the excessive output to
the error log, as the invalid SYS_TABLES.TYPE value is already being
output.
in innodb_read_only mode.
The reason for the hang is that there was no notification received about
completed read io. File handles are bound to completion_port, and there
were no background "write" threads that would be waiting on completion_port,
only 2 "read" threads waiting on read_completion_port were active.
The fix is to use a single IO completion port for all IOs, if
innodb_read_only is set.
log_crypt(): Remove the useless error log output that was
accidentally introduced in MDEV-11782. These messages could be emitted
to the server error log during crash recovery.
srv_start_state_t: Document the flags. Replace SRV_START_STATE_STAT
with SRV_START_STATE_REDO. The srv_bg_undo_sources replaces the
original use of SRV_START_STATE_STAT.
dict_stats_thread_started, buf_dump_thread_started,
buf_flush_page_cleaner_thread_started: Remove (unused).
srv_shutdown_all_bg_threads(): Always wait for the I/O threads
to exit, also in read-only mode.
os_thread_free(): Remove.
When the server is started in innodb_read_only mode, there cannot be
any writes to persistent InnoDB/XtraDB files. Just like the creation
of buf_flush_page_cleaner_thread is skipped in this case, also
the creation of the XtraDB-specific buf_flush_lru_manager_thread
should be skipped.
When a slow shutdown is performed soon after spawning some work for
background threads that can create or commit transactions, it is possible
that new transactions are started or committed after the purge has finished.
This is violating the specification of innodb_fast_shutdown=0, namely that
the purge must be completed. (None of the history of the recent transactions
would be purged.)
Also, it is possible that the purge threads would exit in slow shutdown
while there exist active transactions, such as recovered incomplete
transactions that are being rolled back. Thus, the slow shutdown could
fail to purge some undo log that becomes purgeable after the transaction
commit or rollback.
srv_undo_sources: A flag that indicates if undo log can be generated
or the persistent, whether by background threads or by user SQL.
Even when this flag is clear, active transactions that already exist
in the system may be committed or rolled back.
innodb_shutdown(): Renamed from innobase_shutdown_for_mysql().
Do not return an error code; the operation never fails.
Clear the srv_undo_sources flag, and also ensure that the background
DROP TABLE queue is empty.
srv_purge_should_exit(): Do not allow the purge to exit if
srv_undo_sources are active or the background DROP TABLE queue is not
empty, or in slow shutdown, if any active transactions exist
(and are being rolled back).
srv_purge_coordinator_thread(): Remove some previous workarounds
for this bug.
innobase_start_or_create_for_mysql(): Set buf_page_cleaner_is_active
and srv_dict_stats_thread_active directly. Set srv_undo_sources before
starting the purge subsystem, to prevent immediate shutdown of the purge.
Create dict_stats_thread and fts_optimize_thread immediately
after setting srv_undo_sources, so that shutdown can use this flag to
determine if these subsystems were started.
dict_stats_shutdown(): Shut down dict_stats_thread. Backported from 10.2.
srv_shutdown_table_bg_threads(): Remove (unused).
Problem appears to be that the function fsp_flags_try_adjust()
is being unconditionally invoked on every .ibd file on startup.
Based on performance investigation also the top function
fsp_header_get_crypt_offset() needs to addressed.
Ported implementation of fsp_header_get_encryption_offset()
function from 10.2 to fsp_header_get_crypt_offset().
Introduced a new function fil_crypt_read_crypt_data()
to read page 0 if it is not yet read.
fil_crypt_find_space_to_rotate(): Now that page 0 for every .ibd
file is not read on startup we need to check has page 0 read
from space that we investigate for key rotation, if it is not read
we read it.
fil_space_crypt_get_status(): Now that page 0 for every .ibd
file is not read on startup here also we need to read page 0
if it is not yet read it. This is needed
as tests use IS query to wait until background encryption
or decryption has finished and this function is used to
produce results.
fil_crypt_thread(): Add is_stopping condition for tablespace
so that we do not rotate pages if usage of tablespace should
be stopped. This was needed for failure seen on regression
testing.
fil_space_create: Remove page_0_crypt_read and extra
unnecessary info output.
fil_open_single_table_tablespace(): We call fsp_flags_try_adjust
only when when no errors has happened and server was not started
on read only mode and tablespace validation was requested or
flags contain other table options except low order bits to
FSP_FLAGS_POS_PAGE_SSIZE position.
fil_space_t::page_0_crypt_read removed.
Added test case innodb-first-page-read to test startup when
encryption is on and when encryption is off to check that not
for all tables page 0 is read on startup.