Unlike commit a54abf0175 claimed,
the caller of THD::awake() may actually hold the InnoDB lock_sys->mutex.
That commit introduced a deadlock of threads in the replication slave
when running the test rpl.rpl_parallel_optimistic_nobinlog.
lock_trx_handle_wait(): Expect the callers to acquire and release
lock_sys->mutex and trx->mutex.
innobase_kill_query(): Restore the logic for conditionally acquiring
and releasing the mutexes. THD::awake() can be called from inside
InnoDB while holding one or both mutexes, via thd_report_wait_for() and
via wsrep_innobase_kill_one_trx().
ha_innobase::unlock_row(): Use a relaxed version of the
trx_state_eq() debug assertion, because rr_unlock_row()
may be invoked after an error has been already reported
and the transaction has been rolled back.
InnoDB in Debian uses utf8mb4 as default character set since
version 10.0.20-2. This leads to major pain due to keys longer
than 767 bytes.
MariaDB 10.2 (and MySQL 5.7) introduced the setting
innodb_default_row_format that is DYNAMIC by default. These
versions also changed the default values of the parameters
innodb_large_prefix=ON and innodb_file_format=Barracuda.
This would allow longer column index prefixes to be created.
The original purpose of these parameters was to allow InnoDB
to be downgraded to MySQL 5.1, which is long out of support.
Every InnoDB version since MySQL 5.5 does support operation
with the relaxed limits.
We backport the parameter innodb_default_row_format to
MariaDB 10.1, but we will keep its default value at COMPACT.
This allows MariaDB 10.1 to be configured so that CREATE TABLE
is less likely to encounter a problem with the limitation:
loose_innodb_large_prefix=ON
loose_innodb_default_row_format=DYNAMIC
(Note that the setting innodb_large_prefix was deprecated in
MariaDB 10.2 and removed in MariaDB 10.3.)
The only observable difference in the behaviour with the default
settings should be that ROW_FORMAT=DYNAMIC tables can be created
both in the system tablespace and in .ibd files, no matter what
innodb_file_format has been assigned to. Unlike MariaDB 10.2,
we are not changing the default value of innodb_file_format,
so ROW_FORMAT=COMPRESSED tables cannot be created without
changing the parameter.
InnoDB limited the maximum number of bytes per character to 4.
But, the filename character set that was introduced in MySQL 5.1
uses up to 5 bytes per character.
To allow InnoDB tables to be created with wider characters, let
us split the mbminmaxlen fields into mbminlen, mbmaxlen, and increase
the limit to 7 bytes per character. This will increase the payload size
of dtype_t and dict_col_t by one bit. The storage size will be unchanged
(54 bits and 77 bits will use the same number of bytes as the
previous sizes 53 and 76 bits).
This is 10.1 version where no merge error exists.
wsrep_on_check
New check function. Galera can't be enabled
if innodb-lock-schedule-algorithm=VATS.
innobase_kill_query
In Galera async kill we could own lock mutex.
innobase_init
If Variance-Aware-Transaction-Sheduling Algorithm (VATS) is
used on Galera we refuse to start InnoDB.
Changed innodb-lock-schedule-algorithm as read-only parameter
as it was designed to be.
lock_rec_other_has_expl_req,
lock_rec_other_has_conflicting,
lock_rec_lock_slow
lock_table_other_has_incompatible
lock_rec_insert_check_and_lock
Change pointer to conflicting lock to normal pointer as this
pointer contents could be changed later.
When MariaDB 10.1.0 introduced table options for encryption and
compression, it unnecessarily changed
ha_innobase::check_if_supported_inplace_alter() so that ALGORITHM=COPY
is forced when these parameters differ.
A better solution is to move the check to innobase_need_rebuild().
In that way, the ALGORITHM=INPLACE interface (yes, the syntax is
very misleading) can be used for rebuilding the table much more
efficiently, with merge sort, with no undo logging, and allowing
concurrent DML operations.
i_s_sys_tables_fill_table_stats(): Acquire dict_operation_lock
S-latch before acquiring dict_sys->mutex, to prevent the table
from being removed from the data dictionary cache and from
being freed while i_s_dict_fill_sys_tablestats() is accessing
the table handle.
When MySQL 5.6.10 introduced innodb_read_only mode, it skipped the
creation of the InnoDB buffer pool dump/restore subsystem in that mode.
Attempts to set the variable innodb_buf_pool_dump_now would have
no effect in innodb_read_only mode, but the corresponding condition
was forgotten in from the other two update functions.
MySQL 5.7.20 would fix the innodb_buffer_pool_load_now,
but not innodb_buffer_pool_load_abort. Let us fix both in MariaDB.
Reverted incorrect changes done on MDEV-7367 and MDEV-9469. Fixes properly
also related bugs:
MDEV-13668: InnoDB unnecessarily rebuilds table when renaming a column and adding index
MDEV-9469: 'Incorrect key file' on ALTER TABLE
MDEV-9548: Alter table (renaming and adding index) fails with "Incorrect key file for table"
MDEV-10535: ALTER TABLE causes standalone/wsrep cluster crash
MDEV-13640: ALTER TABLE CHANGE and ADD INDEX on auto_increment column fails with "Incorrect key file for table..."
Root cause for all these bugs is the fact that MariaDB .frm file
can contain virtual columns but InnoDB dictionary does not and
previous fixes were incorrect or unnecessarily forced table
rebuilt. In index creation key_part->fieldnr can be bigger than
number of columns in InnoDB data dictionary. We need to skip not
stored fields when calculating correct column number for InnoDB
data dictionary.
dict_table_get_col_name_for_mysql
Remove
innobase_match_index_columns
Revert incorrect change done on MDEV-7367
innobase_need_rebuild
Remove unnecessary rebuild force when column is renamed.
innobase_create_index_field_def
Calculate InnoDB column number correctly and remove
unnecessary column name set.
innobase_create_index_def, innobase_create_key_defs
Remove unneeded fields parameter. Revert unneeded memset.
prepare_inplace_alter_table_dict
Remove unneeded col_names parameter
index_field_t
Remove unneeded col_name member.
row_merge_create_index
Remove unneeded col_names parameter and resolution.
Effected tests:
innodb-alter-table : Add test case for MDEV-13668
innodb-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
innodb-wl5980-alter : Remove MDEV-13668, MDEV-9469 FIXMEs
and restore original tests
Problem:- This crash happens because of thd = NULL , and while checking
for wsrep_on , we no longer check for thd != NULL (MDEV-7955). So this
problem is regression of MDEV-7955. However this patch not only solves
this regression , It solves all regression caused by MDEV-7955 patch.
To get all possible cases when thd can be null , assert(thd)/
assert(trx->mysql_thd) is place just before all wsrep_on and innodb test
suite is run. And the assert which caused failure are removed with a physical
check for thd != NULL. Rest assert are removed. Hopefully this method will
remove all current/potential regression of MDEV-7955.
…porary file
Fixed by removing writing key version to start of every block that
was encrypted. Instead we will use single key version from log_sys
crypt info.
After this MDEV also blocks writen to row log are encrypted and blocks
read from row log aren decrypted if encryption is configured for the
table.
innodb_status_variables[], struct srv_stats_t
Added status variables for merge block and row log block
encryption and decryption amounts.
Removed ROW_MERGE_RESERVE_SIZE define.
row_merge_fts_doc_tokenize
Remove ROW_MERGE_RESERVE_SIZE
row_log_t
Add index, crypt_tail, crypt_head to be used in case of
encryption.
row_log_online_op, row_log_table_close_func
Before writing a block encrypt it if encryption is enabled
row_log_table_apply_ops, row_log_apply_ops
After reading a block decrypt it if encryption is enabled
row_log_allocate
Allocate temporary buffers crypt_head and crypt_tail
if needed.
row_log_free
Free temporary buffers crypt_head and crypt_tail if they
exist.
row_merge_encrypt_buf, row_merge_decrypt_buf
Removed.
row_merge_buf_create, row_merge_buf_write
Remove ROW_MERGE_RESERVE_SIZE
row_merge_build_indexes
Allocate temporary buffer used in decryption and encryption
if needed.
log_tmp_blocks_crypt, log_tmp_block_encrypt, log_temp_block_decrypt
New functions used in block encryption and decryption
log_tmp_is_encrypted
New function to check is encryption enabled.
Added test case innodb-rowlog to force creating a row log and
verify that operations are done using introduced status
variables.
Fixes also MDEV-13488: InnoDB writes CRYPT_INFO even though
encryption is not enabled.
Problem was that we created encryption metadata (crypt_data) for
system tablespace even when no encryption was enabled and too early.
System tablespace can be encrypted only using key rotation.
Test innodb-key-rotation-disable, innodb_encryption, innodb_lotoftables
require adjustment because INFORMATION_SCHEMA INNODB_TABLESPACES_ENCRYPTION
contain row only if tablespace really has encryption metadata.
fil_crypt_set_thread_cnt: Send message to background encryption threads
if they exits when they are ready. This is required to find tablespaces
requiring key rotation if no other changes happen.
fil_crypt_find_space_to_rotate: Decrease the amount of time waiting
when nothing happens to better enable key rotation on startup.
fsp_header_init: Write encryption metadata to page 0 only if tablespace is
encrypted or encryption is disabled by table option.
i_s_dict_fill_tablespaces_encryption : Skip tablespaces that do not
contain encryption metadata. This is required to avoid too early
wait condition trigger in encrypted -> unencrypted state transfer.
open_or_create_data_files: Do not create encryption metadata
by default to system tablespace.
Assertions failed due to incorrect handling of the --tc-heuristic-recover
option when InnoDB is in read-only mode either due to innodb_read_only=1
or innodb_force_recovery>3. InnoDB failed to refuse a XA COMMIT or
XA ROLLBACK operation, and there were errors in the error handling in
the upper layer.
This was fixed by making InnoDB XA operations respect the
high_level_read_only flag. The InnoDB part of the fix and
parts of the test main.tc_heuristic_recover were provided
by Marko Mäkelä.
LOCK_log mutex lock/unlock had to be added to fix MDEV-13438.
The measure is confirmed by mysql sources as well.
For testing of the conflicting option combination, mysql-test-run is
made to export a new $MYSQLD_LAST_CMD. It holds the very last value
generated by mtr.mysqld_start(). Even though the options have been
also always stored in $mysqld->{'started_opts'} there were no access
to them beyond the automatic server restart by mtr through the expect
file interface.
Effectively therefore $MYSQLD_LAST_CMD represents a more general
interface to $mysqld->{'started_opts'} which can be used in wider
scopes including server launch with incompatible options.
Notice another existing method to restart the server with incompatible
options relying on $MYSQLD_CMD is is aware of $mysqld->{'started_opts'}
(the actual options that the server is launched by mtr). In order to use
this method they would have to be provided manually.
NOTE: When merging to 10.2, the file search_pattern_in_file++.inc
should be replaced with the pre-existing search_pattern_in_file.inc.
Following merge from 5.6.36, this merge also rejects changes that
collided with the rejection of 6ca4f693c1ce472e2b1bf7392607c2d1124b4293.
We initially rejected 6ca4f693c1ce472e2b1bf7392607c2d1124b4293 because
it was introducing a new storage engine API method.
In all InnoDB row formats, the pointers or lengths stored in the record
header can be at most 14 bits, that is, count up to 16383.
In ROW_FORMAT=REDUNDANT, this limits the maximum possible record length
to 16383 bytes. In other ROW_FORMAT, it could merely limit the maximum
length of variable-length fields.
When MySQL 5.7 introduced innodb_page_size=32k and 64k, the maximum
record length was limited to 16383 bytes (I hope 16383, not 16384,
to be able to distinguish from a record whose length is 0 bytes).
This change is present in MariaDB Server 10.2.
btr_cur_optimistic_update(): Restrict maximum record size to 16K-1
for REDUNDANT and 64K page size.
dict_index_too_big_for_tree(): The maximum allowed record size
is half a B-tree page or 16K(-1 for REDUNDANT) for 64K page size.
convert_error_code_to_mysql(): Fix error message to print
correct limits.
my_error_innodb(): Fix error message to print correct limits.
page_zip_rec_needs_ext() : record size was already restricted to 16K.
Restrict REDUNDANT to 16K-1.
rem0rec.h: Introduce REDUNDANT_REC_MAX_DATA_SIZE (16K-1)
and COMPRESSED_REC_MAX_DATA_SIZE (16K).
The option innodb_log_compressed_pages was contributed by
Facebook to MySQL 5.6. It was disabled in the 5.6.10 GA release
due to problems that were fixed in 5.6.11, which is when the
option was enabled.
The option was set to innodb_log_compressed_pages=ON by default
(disabling the feature), because safety was considered more
important than speed. The option innodb_log_compressed_pages=OFF
can *CORRUPT* ROW_FORMAT=COMPRESSED tables on crash recovery
if the zlib deflate function is behaving differently (producing
a different amount of compressed data) from how it behaved
when the redo log records were written (prior to the crash recovery).
In MDEV-6935, the default value was changed to
innodb_log_compressed_pages=OFF. This is inherently unsafe, because
there are very many different environments where MariaDB can be
running, using different zlib versions. While zlib can decompress
data just fine, there are no guarantees that different versions will
always compress the same data to the exactly same size. To avoid
problems related to zlib upgrades or version mismatch, we must
use a safe default setting.
This will reduce the write performance for users of
ROW_FORMAT=COMPRESSED tables. If you configure
innodb_log_compressed_pages=ON, please make sure that you will
always cleanly shut down InnoDB before upgrading the server
or zlib.
dict_table_t::thd: Remove. This was only used by btr_root_block_get()
for reporting decryption failures, and it was only assigned by
ha_innobase::open(), and never cleared. This could mean that if a
connection is closed, the pointer would become stale, and the server
could crash while trying to report the error. It could also mean
that an error is being reported to the wrong client. It is better
to use current_thd in this case, even though it could mean that if
the code is invoked from an InnoDB background operation, there would
be no connection to which to send the error message.
Remove dict_table_t::crypt_data and dict_table_t::page_0_read.
These fields were never read.
fil_open_single_table_tablespace(): Remove the parameter "table".
When a slow shutdown is performed soon after spawning some work for
background threads that can create or commit transactions, it is possible
that new transactions are started or committed after the purge has finished.
This is violating the specification of innodb_fast_shutdown=0, namely that
the purge must be completed. (None of the history of the recent transactions
would be purged.)
Also, it is possible that the purge threads would exit in slow shutdown
while there exist active transactions, such as recovered incomplete
transactions that are being rolled back. Thus, the slow shutdown could
fail to purge some undo log that becomes purgeable after the transaction
commit or rollback.
srv_undo_sources: A flag that indicates if undo log can be generated
or the persistent, whether by background threads or by user SQL.
Even when this flag is clear, active transactions that already exist
in the system may be committed or rolled back.
innodb_shutdown(): Renamed from innobase_shutdown_for_mysql().
Do not return an error code; the operation never fails.
Clear the srv_undo_sources flag, and also ensure that the background
DROP TABLE queue is empty.
srv_purge_should_exit(): Do not allow the purge to exit if
srv_undo_sources are active or the background DROP TABLE queue is not
empty, or in slow shutdown, if any active transactions exist
(and are being rolled back).
srv_purge_coordinator_thread(): Remove some previous workarounds
for this bug.
innobase_start_or_create_for_mysql(): Set buf_page_cleaner_is_active
and srv_dict_stats_thread_active directly. Set srv_undo_sources before
starting the purge subsystem, to prevent immediate shutdown of the purge.
Create dict_stats_thread and fts_optimize_thread immediately
after setting srv_undo_sources, so that shutdown can use this flag to
determine if these subsystems were started.
dict_stats_shutdown(): Shut down dict_stats_thread. Backported from 10.2.
srv_shutdown_table_bg_threads(): Remove (unused).
The doublewrite buffer pages must fit in the first InnoDB system
tablespace data file. The checks that were added in the initial patch
(commit 112b21da37)
were at too high level and did not cover all cases.
innodb.log_data_file_size: Test all innodb_page_size combinations.
fsp_header_init(): Never return an error. Move the change buffer creation
to the only caller that needs to do it.
btr_create(): Clean up the logic. Remove the error log messages.
buf_dblwr_create(): Try to return an error on non-fatal failure.
Check that the first data file is big enough for creating the
doublewrite buffers.
buf_dblwr_process(): Check if the doublewrite buffer is available.
Display the message only if it is available.
recv_recovery_from_checkpoint_start_func(): Remove a redundant message
about FIL_PAGE_FILE_FLUSH_LSN mismatch when crash recovery has already
been initiated.
fil_report_invalid_page_access(): Simplify the message.
fseg_create_general(): Do not emit messages to the error log.
innobase_init(): Revert the changes.
trx_rseg_create(): Refactor (no functional change).
Problem was that all doublewrite buffer pages must fit to first
system datafile.
Ported commit 27a34df7882b1f8ed283f22bf83e8bfc523cbfde
Author: Shaohua Wang <shaohua.wang@oracle.com>
Date: Wed Aug 12 15:55:19 2015 +0800
BUG#21551464 - SEGFAULT WHILE INITIALIZING DATABASE WHEN
INNODB_DATA_FILE SIZE IS SMALL
To 10.1 (with extended error printout).
btr_create(): If ibuf header page allocation fails report error and
return FIL_NULL. Similarly if root page allocation fails return a error.
dict_build_table_def_step: If fsp_header_init fails return
error code.
fsp_header_init: returns true if header initialization succeeds
and false if not.
fseg_create_general: report error if segment or page allocation fails.
innobase_init: If first datafile is smaller than 3M and could not
contain all doublewrite buffer pages report error and fail to
initialize InnoDB plugin.
row_truncate_table_for_mysql: report error if fsp header init
fails.
srv_init_abort: New function to report database initialization errors.
srv_undo_tablespaces_init, innobase_start_or_create_for_mysql: If
database initialization fails report error and abort.
trx_rseg_create: If segment header creation fails return.
Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for
encrypted pages even in system datafiles should contain key_version
except very first page (0:0) is after encryption overwritten with
flush lsn.
Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1
The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during
InnoDB startup.
At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
from the first page of each file in the InnoDB system tablespace.
If there are multiple files, the minimum and maximum LSN can differ.
These numbers are passed to InnoDB startup.
Having the number in other files than the first file of the InnoDB
system tablespace is not providing much additional value. It is
conflicting with other use of the field, such as on InnoDB R-tree
index pages and encryption key_version.
This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to
other files than the first file of the InnoDB system tablespace
(page number 0:0) when system tablespace is encrypted. If tablespace
is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
to all first pages of system tablespace to avoid unnecessary
warnings on downgrade.
open_or_create_data_files(): pass only one flushed_lsn parameter
xb_load_tablespaces(): pass only one flushed_lsn parameter.
buf_page_create(): Improve comment about where
FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set.
fil_write_flushed_lsn(): A new function, merged from
fil_write_lsn_and_arch_no_to_file() and
fil_write_flushed_lsn_to_data_files().
Only write to the first page of the system tablespace (page 0:0)
if tablespace is encrypted, or write all first pages of system
tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE)
afterwards.
fil_read_first_page(): read flush_lsn and crypt_data only from
first datafile.
fil_open_single_table_tablespace(): Remove output of LSN, because it
was only valid for the system tablespace and the undo tablespaces, not
user tablespaces.
fil_validate_single_table_tablespace(): Remove output of LSN.
checkpoint_now_set(): Use fil_write_flushed_lsn and output
a error if operation fails.
Remove lsn variable from fsp_open_info.
recv_recovery_from_checkpoint_start(): Remove unnecessary second
flush_lsn parameter.
log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn
and output error if it fails.
open_or_create_data_files(): Pass only one flushed_lsn variable.