Disable memory leak check in debug server, if rocksdb is loaded.
There is some subtle bug somewhere in 3rd party code we cannot
do much about.
The bug is manifested as follows
Rocksdb does not shutdown worker threads, when plugin is shut down. Thus
OS does not unload the library since there are some active threads using
this library's code. Thus global destructors in the library do not run,
and there is still some memory allocated when server exits.
The workaround disables server's memory leak check, if rocksdb engine was
loaded.
The option innodb_log_compressed_pages was contributed by
Facebook to MySQL 5.6. It was disabled in the 5.6.10 GA release
due to problems that were fixed in 5.6.11, which is when the
option was enabled.
The option was set to innodb_log_compressed_pages=ON by default
(disabling the feature), because safety was considered more
important than speed. The option innodb_log_compressed_pages=OFF
can *CORRUPT* ROW_FORMAT=COMPRESSED tables on crash recovery
if the zlib deflate function is behaving differently (producing
a different amount of compressed data) from how it behaved
when the redo log records were written (prior to the crash recovery).
In MDEV-6935, the default value was changed to
innodb_log_compressed_pages=OFF. This is inherently unsafe, because
there are very many different environments where MariaDB can be
running, using different zlib versions. While zlib can decompress
data just fine, there are no guarantees that different versions will
always compress the same data to the exactly same size. To avoid
problems related to zlib upgrades or version mismatch, we must
use a safe default setting.
This will reduce the write performance for users of
ROW_FORMAT=COMPRESSED tables. If you configure
innodb_log_compressed_pages=ON, please make sure that you will
always cleanly shut down InnoDB before upgrading the server
or zlib.
Since MariaDB 10.2.2, temporary table metadata is not written
to the InnoDB data dictionary tables. Therefore,
the DICT_TF2_TEMPORARY flag cannot be set in SYS_TABLES,
except if there exist orphan temporary tables that were created
before MariaDB 10.2.2.
trx_resurrect_table_locks(): Do not skip temporary tables.
If a resurrect transaction modified a temporary table that was
created before MariaDB 10.2.2, that table would be treated
internally as a persistent table. It is safer to resurrect
locks than to skip the table, because the table would be modified
on transaction rollback.
buf_flush_page_cleaner_coordinator: In the first loop, use an
appropriate termination condition, waiting for !recv_writer_thread_active.
logs_empty_and_mark_files_at_shutdown(): Signal recv_sys->flush_start
in case the recv_writer_thread was never started, or
buf_flush_page_cleaner_coordinator failed to notice its termination.
innobase_start_or_create_for_mysql(): Remove a redundant, unreachable
condition, and properly release resources when aborting startup due to
recv_sys->found_corrupt_log.
Don't write to a temporary file, use String.
Remove strange one-liner "helpers", use String methods.
Don't use current_thd, don't allocate memory for 1-byte strings, etc.
when opening 10.1- table that has virtual columns:
1. don't error out if it has vcols over autoinc columns.
just issue a warning.
2. set vcol type properly
3. in innodb: use table->s->stored_fields instead of table->s->fields,
because that's what was stored in innodb data dictionary
don't use thd->query_id check in background purge threads
(it doesn't work, because thd->query_id is never incremented there)
instead use thd->open_tables directly, there can be only one table
there anyway, and this is the table opened by this purge thread.
When using innodb_page_size=16k, InnoDB tables
that were created in MariaDB 10.1.0 to 10.1.20 with
PAGE_COMPRESSED=1 and
PAGE_COMPRESSION_LEVEL=2 or PAGE_COMPRESSION_LEVEL=3
would fail to load.
fsp_flags_is_valid(): When using innodb_page_size=16k, use a
more strict check for .ibd files, with the assumption that
nobody would try to use different-page-size files.
When using innodb_page_size=16k, InnoDB tables
that were created in MariaDB 10.1.0 to 10.1.20 with
PAGE_COMPRESSED=1 and
PAGE_COMPRESSION_LEVEL=2 or PAGE_COMPRESSION_LEVEL=3
would fail to load.
fsp_flags_is_valid(): When using innodb_page_size=16k, use a
more strict check for .ibd files, with the assumption that
nobody would try to use different-page-size files.
This is a regression caused by
commit bb60a832ed
srv_shutdown_all_bg_threads(): If os_thread_count indicates that
no threads are running, do not bother checking thread status.
This avoids a crash when InnoDB startup is aborted before
os_aio_init() has been invoked. (os_aio_all_slots_free() would
dereference AIO::s_reads even though it is NULL.)
InnoDB I/O and buffer pool interfaces and the redo log format
have been changed between MariaDB 10.1 and 10.2, and the backup
code has to be adjusted accordingly.
The code has been simplified, and many memory leaks have been fixed.
Instead of the file name xtrabackup_logfile, the file name ib_logfile0
is being used for the copy of the redo log. Unnecessary InnoDB startup and
shutdown and some unnecessary threads have been removed.
Some help was provided by Vladislav Vaintroub.
Parameters have been cleaned up and aligned with those of MariaDB 10.2.
The --dbug option has been added, so that in debug builds,
--dbug=d,ib_log can be specified to enable diagnostic messages
for processing redo log entries.
By default, innodb_doublewrite=OFF, so that --prepare works faster.
If more crash-safety for --prepare is needed, double buffering
can be enabled.
The parameter innodb_log_checksums=OFF can be used to ignore redo log
checksums in --backup.
Some messages have been cleaned up.
Unless --export is specified, Mariabackup will not deal with undo log.
The InnoDB mini-transaction redo log is not only about user-level
transactions; it is actually about mini-transactions. To avoid confusion,
call it the redo log, not transaction log.
We disable any undo log processing in --prepare.
Because MariaDB 10.2 supports indexed virtual columns, the
undo log processing would need to be able to evaluate virtual column
expressions. To reduce the amount of code dependencies, we will not
process any undo log in prepare.
This means that the --export option must be disabled for now.
This also means that the following options are redundant
and have been removed:
xtrabackup --apply-log-only
innobackupex --redo-only
In addition to disabling any undo log processing, we will disable any
further changes to data pages during --prepare, including the change
buffer merge. This means that restoring incremental backups should
reliably work even when change buffering is being used on the server.
Because of this, preparing a backup will not generate any further
redo log, and the redo log file can be safely deleted. (If the
--export option is enabled in the future, it must generate redo log
when processing undo logs and buffered changes.)
In --prepare, we cannot easily know if a partial backup was used,
especially when restoring a series of incremental backups. So, we
simply warn about any missing files, and ignore the redo log for them.
FIXME: Enable the --export option.
FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write
a test that initiates a backup while an ALGORITHM=INPLACE operation
is creating indexes or rebuilding a table. An error should be detected
when preparing the backup.
FIXME: In --incremental --prepare, xtrabackup_apply_delta() should
ensure that if FSP_SIZE is modified, the file size will be adjusted
accordingly.
The POINT data type is being treated just like any other
geometry data type in InnoDB. The fixed-length data type
DATA_POINT had been introduced in WL#6942 based on a
misunderstanding and without appropriate review.
Because of fundamental design problems (such as a
DEFAULT POINT(0 0) value secretly introduced by InnoDB),
the code was disabled in Oracle Bug#20415831 fix.
This patch removes the dead code and definitions that were
left behind by the Oracle Bug#20415831 patch.
This is preparation for MDEV-12288, which would set DB_TRX_ID=0
when purging history. Also with that change in place, delete-marked
records must always refer to an undo log record via a nonzero
DB_TRX_ID column. (The DB_TRX_ID is only present in clustered index
leaf page records.)
btr_cur_parse_del_mark_set_clust_rec(), rec_get_trx_id():
Statically allocate the offsets
(should never use the heap). Add some debug assertions.
Replace some use of rec_get_trx_id() with row_get_rec_trx_id().
trx_undo_report_row_operation(): Add some sanity checks that are
common for all operations that produce undo log.
- Added variable tmp_disk_table_size
- Added variable tmp_memory_table_size as an alias for tmp_table_size
- Changed internal variable tmp_table_size to tmp_memory_table_size
- create_info.data_file_length is now set with tmp_disk_table_size
- Fixed that Aria doesn't reset max_data_file_length for internal tables
- Added status flag if table is full so that we can detect this on next insert.
This ensures that the table is always 'correct', but we get the error one
row after the row that grow the table too big.
- Removed some mutex lock for internal temporary tables
The field fts_token->position is not initialized in
row_merge_fts_doc_tokenize(). We cannot have that field
without changing the fulltext parser plugin ABI
(adding st_mysql_ftparser_boolean_info::position,
as it was done in MySQL 5.7 in WL#6943).
The InnoDB fulltext parser plugins "ngram" and "Mecab" that were
introduced in MySQL 5.7 do depend on that field. But the simple_parser
does not. Apparently, simple_parser is leaving the field as 0.
So, in our fix we will assume that the missing position field is 0.
In Mariabackup, we would want the backed-up redo log file size to be
a multiple of 512 bytes, or OS_FILE_LOG_BLOCK_SIZE. However, at startup,
InnoDB would be picky, requiring the file size to be a multiple of
innodb_page_size.
Furthermore, InnoDB would require the parameter to be a multiple of
one megabyte, while the minimum granularity is 512 bytes. Because
the data-file-oriented fil_io() API is being used for writing the
InnoDB redo log, writes will for now require innodb_log_file_size to
be a multiple of the maximum innodb_page_size (65536 bytes).
To complicate matters, InnoDB startup divided srv_log_file_size by
UNIV_PAGE_SIZE, so that initially, the unit was bytes, and later it
was innodb_page_size. We will simplify this and keep srv_log_file_size
in bytes at all times.
innobase_log_file_size: Remove. Remove some obsolete checks against
overflow on 32-bit systems. srv_log_file_size is always 64 bits, and
the maximum size 512GiB in multiples of innodb_page_size always fits
in ulint (which is 32 or 64 bits). 512GiB would be 8,388,608*64KiB or
134,217,728*4KiB.
log_init(): Remove the parameter file_size that was always passed as
srv_log_file_size.
log_set_capacity(): Add a parameter for passing the requested file size.
srv_log_file_size_requested: Declare static in srv0start.cc.
create_log_file(), create_log_files(),
innobase_start_or_create_for_mysql(): Invoke fil_node_create()
with srv_log_file_size expressed in multiples of innodb_page_size.
innobase_start_or_create_for_mysql(): Require the redo log file sizes
to be multiples of 512 bytes.
trx_sys_print_mysql_binlog_offset(): Use 64-bit arithmetics and ib::info().
TRX_SYS_MYSQL_LOG_OFFSET: Replaces TRX_SYS_MYSQL_LOG_OFFSET_HIGH,
TRX_SYS_MYSQL_LOG_OFFSET_LOW.
trx_sys_update_mysql_binlog_offset(): Remove the constant parameter
field=TRX_SYS_MYSQL_LOG_INFO. Use 64-bit arithmetics.
A similar change was contributed to Percona XtraBackup, but for some
reason, it is not present in Percona XtraDB. Since MDEV-9566
(MariaDB 10.1.23), that change is present in the MariaDB XtraDB.
recv_sys_init(): Remove the parameter.
recv_sys_create(): Merge to recv_sys_init().
recv_sys_mem_free(): Merge to recv_sys_close().
log_mem_free(): Merge to log_shutdown().
These self references were previously used to avoid having to check the
IO_CACHE's type. However, a benchmark shows that on x86 5930k stock,
the type comparison is marginally faster than the double pointer dereference.
For 40 billion my_b_tell calls, the difference is .1 seconds in favor of performing the
type check. (Basically there is no measurable difference)
To prevent bugs from copying the structure using the equals(=) operator,
and having to do the bookkeeping manually, remove these "convenience"
variables.
srv_log_files_created: A debug flag to ensure that InnoDB redo log
files can only be created once in the server lifetime, and that
after log files have been created, no crash recovery will take place.
recv_scan_log_recs(): Detect the special case where the log consists
of a sole MLOG_CHECKPOINT record, such as immediately after creating
the redo logs.
recv_recovery_from_checkpoint_start(): Skip the recovery message
if the redo log is logically empty.
A merge error caused InnoDB bootstrap to fail when
innodb_undo_tablespaces was set to more than 2.
This was because of a bug that was introduced to
srv_undo_tablespaces_init() by the merge.
Furthermore, some adjustments for Oracle Bug#25551311 aka
Bug#23517560 changes were forgotten. We must minimize direct
references to srv_undo_tablespaces_open and use predicates
instead.
srv_undo_tablespaces_init(): Increment srv_undo_tablespaces_open
once, not twice, per loop iteration.
is_system_or_undo_tablespace(): Remove (unused function).
is_predefined_tablespace(): Invoke srv_is_undo_tablespace().
When it comes to DEFAULT values of columns, InnoDB is imposing both
unnecessary and insufficient conditions on whether ALGORITHM=INPLACE
should be allowed for ALTER TABLE.
When changing an existing column to NOT NULL, any NULL values in the
columns only get a special treatment if the column is changed to an
AUTO_INCREMENT column (which is not supported by ALGORITHM=INPLACE)
or the column type is TIMESTAMP. In all other cases, an error
must be reported for the failure to convert a NULL value to NOT NULL.
InnoDB was unnecessarily interested in whether the DEFAULT value
is not constant when altering other than TIMESTAMP columns. Also,
when changing a TIMESTAMP column to NOT NULL, InnoDB was performing
an insufficient check, and it was incorrectly allowing a constant
DEFAULT value while not being able to replace NULL values with that
constant value.
Furthermore, in ADD COLUMN, InnoDB is unnecessarily rejecting certain
nondeterministic DEFAULT expressions (depending on the session
parameters or the current time).
buf_flush_page_cleaner_coordinator(): Signal the thread creator
that the error log output regarding setpriority() has been issued.
innobase_start_or_create_for_mysql(): Wait for
buf_flush_page_cleaner_coordinator() to completely start up.
This prevents sporadic failures of tests that search the server
error log for InnoDB redo log recovery messages.
While the primary purpose of innodb_force_recovery is to allow
data to be rescued from an InnoDB instance that would crash due
to some data corruption, the settings 1, 2, or 3 are relatively
safe to use and there is no need to prevent write transactions
in these modes.
The setting innodb_force_recovery=4 and above can cause database
corruption. For those modes, we already set the flag
high_level_read_only to disable modifications, except DROP TABLE.
MODIFICATIONS_NOT_ALLOWED_MSG_FORCE_RECOVERY: Remove. There is no
need to spam the error log for each refused DML operation. It suffices
to return an error to the client. There will be messages at startup
if innodb_read_only or innodb_force_recovery are preventing writes.