InnoDB would refuse to start up if there is a mismatch on
the size of the system tablespace files. However, before this
check is conducted, the system tablespace may already have been
heavily modified.
InnoDB should perform the size check as early as possible.
recv_recovery_from_checkpoint_finish():
Move the recv_apply_hashed_log_recs() call to
innobase_start_or_create_for_mysql().
innobase_start_or_create_for_mysql(): Test the mutex functionality
before doing anything else. Use a compile_time_assert() for a
sizeof() constraint. Check the size of the system tablespace as
early as possible.
recv_scan_log_recs(): Remember if redo log apply is needed,
even if starting up in innodb_read_only mode.
recv_recovery_from_checkpoint_start_func(): Refuse
innodb_read_only startup if redo log apply is needed.
at the start 759654123 and the end 0 do not match."
For page compressed and encrypted tables log sequence
number at end is not stored, thus disable this message
for them.
Change default to zlib, this has effect only if user has
explicitly requested page compression and then user
naturally expects that pages are really compressed
if they can be compressed.
restarting server with encryption and read-only
buf0buf.cc: Temporary slots used in encryption was calculated
by read_threads * write_threads. However, in read-only mode
write_threads is zero. Correct way is to calculate
(read_threads + write_threads) * max pending IO requests.
MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K
The storage format of FSP_SPACE_FLAGS was accidentally broken
already in MariaDB 10.1.0. This fix is bringing the format in
line with other MySQL and MariaDB release series.
Please refer to the comments that were added to fsp0fsp.h
for details.
This is an INCOMPATIBLE CHANGE that affects users of
page_compression and non-default innodb_page_size. Upgrading
to this release will correct the flags in the data files.
If you want to downgrade to earlier MariaDB 10.1.x, please refer
to the test innodb.101_compatibility how to reset the
FSP_SPACE_FLAGS in the files.
NOTE: MariaDB 10.1.0 to 10.1.20 can misinterpret
uncompressed data files with innodb_page_size=4k or 64k as
compressed innodb_page_size=16k files, and then probably fail
when trying to access the pages. See the comments in the
function fsp_flags_convert_from_101() for detailed analysis.
Move PAGE_COMPRESSION to FSP_SPACE_FLAGS bit position 16.
In this way, compressed innodb_page_size=16k tablespaces will not
be mistaken for uncompressed ones by MariaDB 10.1.0 to 10.1.20.
Derive PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR from the
dict_table_t::flags when the table is available, in
fil_space_for_table_exists_in_mem() or fil_open_single_table_tablespace().
During crash recovery, fil_load_single_table_tablespace() will use
innodb_compression_level for the PAGE_COMPRESSION_LEVEL.
FSP_FLAGS_MEM_MASK: A bitmap of the memory-only fil_space_t::flags
that are not to be written to FSP_SPACE_FLAGS. Currently, these will
include PAGE_COMPRESSION_LEVEL, ATOMIC_WRITES and DATA_DIR.
Introduce the macro FSP_FLAGS_PAGE_SSIZE(). We only support
one innodb_page_size for the whole instance.
When creating a dummy tablespace for the redo log, use
fil_space_t::flags=0. The flags are never written to the redo log files.
Remove many FSP_FLAGS_SET_ macros.
dict_tf_verify_flags(): Remove. This is basically only duplicating
the logic of dict_tf_to_fsp_flags(), used in a debug assertion.
fil_space_t::mark: Remove. This flag was not used for anything.
fil_space_for_table_exists_in_mem(): Remove the unnecessary parameter
mark_space, and add a parameter for table flags. Check that
fil_space_t::flags match the table flags, and adjust the (memory-only)
flags based on the table flags.
fil_node_open_file(): Remove some redundant or unreachable conditions,
do not use stderr for output, and avoid unnecessary server aborts.
fil_user_tablespace_restore_page(): Convert the flags, so that the
correct page_size will be used when restoring a page from the
doublewrite buffer.
fil_space_get_page_compressed(), fsp_flags_is_page_compressed(): Remove.
It suffices to have fil_space_is_page_compressed().
FSP_FLAGS_WIDTH_DATA_DIR, FSP_FLAGS_WIDTH_PAGE_COMPRESSION_LEVEL,
FSP_FLAGS_WIDTH_ATOMIC_WRITES: Remove, because these flags do not
exist in the FSP_SPACE_FLAGS but only in memory.
fsp_flags_try_adjust(): New function, to adjust the FSP_SPACE_FLAGS
in page 0. Called by fil_open_single_table_tablespace(),
fil_space_for_table_exists_in_mem(), innobase_start_or_create_for_mysql()
except if --innodb-read-only is active.
fsp_flags_is_valid(ulint): Reimplement from the scratch, with
accurate comments. Do not display any details of detected
inconsistencies, because the output could be confusing when
dealing with MariaDB 10.1.x data files.
fsp_flags_convert_from_101(ulint): Convert flags from buggy
MariaDB 10.1.x format, or return ULINT_UNDEFINED if the flags
cannot be in MariaDB 10.1.x format.
fsp_flags_match(): Check the flags when probing files.
Implemented based on fsp_flags_is_valid()
and fsp_flags_convert_from_101().
dict_check_tablespaces_and_store_max_id(): Do not access the
page after committing the mini-transaction.
IMPORT TABLESPACE fixes:
AbstractCallback::init(): Convert the flags.
FetchIndexRootPages::operator(): Check that the tablespace flags match the
table flags. Do not attempt to convert tablespace flags to table flags,
because the conversion would necessarily be lossy.
PageConverter::update_header(): Write back the correct flags.
This takes care of the flags in IMPORT TABLESPACE.
contains a bad and a good copy
Clean up the InnoDB doublewrite buffer code.
buf_dblwr_init_or_load_pages(): Do not add empty pages to the buffer.
buf_dblwr_process(): Do consider changes to pages that are all zero.
Do not abort when finding a corrupted copy of a page in the doublewrite
buffer, because there could be multiple copies in the doublewrite buffer,
and only one of them needs to be good.
buf_flush_init_flush_rbt() was called too early in MariaDB server 10.0,
10.1, MySQL 5.5 and MySQL 5.6. The memory leak has been fixed in
the XtraDB storage engine and in MySQL 5.7.
As a result, when the server is started to initialize new data files,
the buf_pool->flush_rbt will be created unnecessarily and then leaked.
This memory leak was noticed in MariaDB server 10.1 when running the
test encryption.innodb_first_page.
* Update mysqld_safe script to remove duplicated parameter --crash-script
* Make --core-file-size accept underscores as well as dashes correctly.
* Add mysqld_safe_helper to Debian and Ubuntu files.
* Update innodb minor version to 35
Sometimes innodb_data_file_size_debug was reported as INT UNSIGNED
instead of BIGINT UNSIGNED. Make it uint instead of ulong to get
a more deterministic result.
Memory was leaked when ALTER TABLE is attempted on a table
that contains corrupted indexes.
The memory leak was reported by AddressSanitizer for the test
innodb.innodb_corrupt_bit. The leak was introduced into
MariaDB Server 10.0.26, 10.1.15, 10.2.1 by the following:
commit c081c978a2
Merge: 1d21b22155a482e76e65
Author: Sergei Golubchik <serg@mariadb.org>
Date: Tue Jun 21 14:11:02 2016 +0200
Merge branch '5.5' into bb-10.0
In the backport of Bug#24450908 UNDO LOG EXISTS AFTER SLOW SHUTDOWN
from MySQL 5.7 to the MySQL 5.6 based MariaDB Server 10.1, we must
use a mutex when HAVE_ATOMIC_BUILTINS is not defined.
Also, correct a function comment. In MySQL 5.6 and MariaDB Server 10.1,
also temporary InnoDB tables are redo-logged.
InnoDB shutdown failed to properly take fil_crypt_thread() into account.
The encryption threads were signalled to shut down together with other
non-critical tasks. This could be much too early in case of slow shutdown,
which could need minutes to complete the purge. Furthermore, InnoDB
failed to wait for the fil_crypt_thread() to actually exit before
proceeding to the final steps of shutdown, causing the race conditions.
Furthermore, the log_scrub_thread() was shut down way too early.
Also it should remain until the SRV_SHUTDOWN_FLUSH_PHASE.
fil_crypt_threads_end(): Remove. This would cause the threads to
be terminated way too early.
srv_buf_dump_thread_active, srv_dict_stats_thread_active,
lock_sys->timeout_thread_active, log_scrub_thread_active,
srv_monitor_active, srv_error_monitor_active: Remove a race condition
between startup and shutdown, by setting these in the startup thread
that creates threads, not in each created thread. In this way, once the
flag is cleared, it will remain cleared during shutdown.
srv_n_fil_crypt_threads_started, fil_crypt_threads_event: Declare in
global rather than static scope.
log_scrub_event, srv_log_scrub_thread_active, log_scrub_thread():
Declare in static rather than global scope. Let these be created by
log_init() and freed by log_shutdown().
rotate_thread_t::should_shutdown(): Do not shut down before the
SRV_SHUTDOWN_FLUSH_PHASE.
srv_any_background_threads_are_active(): Remove. These checks now
exist in logs_empty_and_mark_files_at_shutdown().
logs_empty_and_mark_files_at_shutdown(): Shut down the threads in
the proper order. Keep fil_crypt_thread() and log_scrub_thread() alive
until SRV_SHUTDOWN_FLUSH_PHASE, and check that they actually terminate.
Port a bug fix from MySQL 5.7, so that all undo log pages will be freed
during a slow shutdown. We cannot scrub pages that are left allocated.
commit 173e171c6fb55f064eea278c76fbb28e2b1c757b
Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com>
Date: Fri Sep 9 18:01:27 2016 +0530
Bug #24450908 UNDO LOG EXISTS AFTER SLOW SHUTDOWN
Problem:
========
1) cached undo segment is not removed from rollback segment history
(RSEG_HISTORY) during slow shutdown. In other words, If the segment is
not completely free, we are failing to remove an entry from the history
list. While starting the server, we traverse all rollback segment slots
history list and make it as list of undo logs to be purged in purge
queue.
In that case, purge queue will never be empty after slow shutdown.
2) Freeing of undo log segment is linked with removing undo log header
from history.
Fix:
====
1) Have separate logic of removing the undo log header from
history list from rollback segment slots and remove it from
rollback segment history even though it is not completely free.
Reviewed-by: Debarun Banerjee <debarun.banerjee@oracle.com>
Reviewed-by: Marko Mäkelä <marko.makela@oracle.com>
RB:13672
after aborted InnoDB startup
This bug was repeatable by starting MariaDB 10.2 with an
invalid option, such as --innodb-flush-method=foo.
It is not repeatable in MariaDB 10.1 in the same way, but the
problem exists already there.
This commit is for optimizing WSREP(thd) macro.
#define WSREP(thd) \
(WSREP_ON && wsrep && (thd && thd->variables.wsrep_on))
In this we can safely remove wsrep and thd. We are not removing WSREP_ON
because this will change WSREP(thd) behaviour.
Patch Credit:- Nirbhay Choubay, Sergey Vojtovich
fil_space_t::recv_size: New member: recovered tablespace size in pages;
0 if no size change was read from the redo log,
or if the size change was implemented.
fil_space_set_recv_size(): New function for setting space->recv_size.
innodb_data_file_size_debug: A debug parameter for setting the system
tablespace size in recovery even when the redo log does not contain
any size changes. It is hard to write a small test case that would
cause the system tablespace to be extended at the critical moment.
recv_parse_log_rec(): Note those tablespaces whose size is being changed
by the redo log, by invoking fil_space_set_recv_size().
innobase_init(): Correct an error message, and do not require a larger
innodb_buffer_pool_size when starting up with a smaller innodb_page_size.
innobase_start_or_create_for_mysql(): Allow startup with any initial
size of the ibdata1 file if the autoextend attribute is set. Require
the minimum size of fixed-size system tablespaces to be 640 pages,
not 10 megabytes. Implement innodb_data_file_size_debug.
open_or_create_data_files(): Round the system tablespace size down
to pages, not to full megabytes, (Our test truncates the system
tablespace to more than 800 pages with innodb_page_size=4k.
InnoDB should not imagine that it was truncated to 768 pages
and then overwrite good pages in the tablespace.)
fil_flush_low(): Refactored from fil_flush().
fil_space_extend_must_retry(): Refactored from
fil_extend_space_to_desired_size().
fil_mutex_enter_and_prepare_for_io(): Extend the tablespace if
fil_space_set_recv_size() was called.
The test case has been successfully run with all the
innodb_page_size values 4k, 8k, 16k, 32k, 64k.
Problem was that for encryption we use temporary scratch area for
reading and writing tablespace pages. But if page was not really
decrypted the correct updated page was not moved to scratch area
that was then written. This can happen e.g. for page 0 as it is
newer encrypted even if encryption is enabled and as we write
the contents of old page 0 to tablespace it contained naturally
incorrect space_id that is then later noted and error message
was written. Updated page with correct space_id was lost.
If tablespace is encrypted we use additional
temporary scratch area where pages are read
for decrypting readptr == crypt_io_buffer != io_buffer.
Destination for decryption is a buffer pool block
block->frame == dst == io_buffer that is updated.
Pages that did not require decryption even when
tablespace is marked as encrypted are not copied
instead block->frame is set to src == readptr.
If tablespace was encrypted we copy updated page to
writeptr != io_buffer. This fixes above bug.
For encryption we again use temporary scratch area
writeptr != io_buffer == dst
that is then written to the tablespace
(1) For normal tables src == dst == writeptr
ut_ad(!encrypted && !page_compressed ?
src == dst && dst == writeptr + (i * size):1);
(2) For page compressed tables src == dst == writeptr
ut_ad(page_compressed && !encrypted ?
src == dst && dst == writeptr + (i * size):1);
(3) For encrypted tables src != dst != writeptr
ut_ad(encrypted ?
src != dst && dst != writeptr + (i * size):1);
Replace all exit() calls in InnoDB with abort() [possibly via ut_a()].
Calling exit() in a multi-threaded program is problematic also for
the reason that other threads could see corrupted data structures
while some data structures are being cleaned up by atexit() handlers
or similar.
In the long term, all these calls should be replaced with something
that returns an error all the way up the call stack.
Make some global fil_crypt_ variables static.
fil_close(): Call mutex_free(&fil_system->mutex) also in InnoDB, not
only in XtraDB. In InnoDB, sync_close() was called before fil_close().
innobase_shutdown_for_mysql(): Call fil_close() before sync_close(),
similar to XtraDB shutdown.
fil_space_crypt_cleanup(): Call mutex_free() to pair with
fil_space_crypt_init().
fil_crypt_threads_cleanup(): Call mutex_free() to pair with
fil_crypt_threads_init().
Essentially revert MDEV-6759, which addressed a double free of memory
by removing the freeing altogether, introducing the memory leaks.
No double free was observed when running the test suite -DWITH_ASAN.
Replace some mem_heap_free(foreign->heap) with dict_foreign_free(foreign)
so that the calls can be located and instrumented more easily when needed.
Reduce the number of calls to encryption_get_key_get_latest_version
when doing key rotation with two different methods:
(1) We need to fetch key information when tablespace not yet
have a encryption information, invalid keys are handled now
differently (see below). There was extra call to detect
if key_id is not found on key rotation.
(2) If key_id is not found from encryption plugin, do not
try fetching new key_version for it as it will fail anyway.
We store return value from encryption_get_key_get_latest_version
call and if it returns ENCRYPTION_KEY_VERSION_INVALID there
is no need to call it again.
crashes server
This bug is the result of merging the Oracle MySQL follow-up fix
BUG#22963169 MYSQL CRASHES ON CREATE FULLTEXT INDEX
without merging the base bug fix:
Bug#79475 Insert a token of 84 4-bytes chars into fts index causes
server crash.
Unlike the above mentioned fixes in MySQL, our fix will not change
the storage format of fulltext indexes in InnoDB or XtraDB
when a character encoding with mbmaxlen=2 or mbmaxlen=3
and the length of a word is between 128 and 84*mbmaxlen bytes.
The Oracle fix would allocate 2 length bytes for these cases.
Compatibility with other MySQL and MariaDB releases is ensured by
persisting the used maximum length in the SYS_COLUMNS table in the
InnoDB data dictionary.
This fix also removes some unnecessary strcmp() calls when checking
for the legacy default collation my_charset_latin1
(my_charset_latin1.name=="latin1_swedish_ci").
fts_create_one_index_table(): Store the actual length in bytes.
This metadata will be written to the SYS_COLUMNS table.
fts_zip_initialize(): Initialize only the first byte of the buffer.
Actually the code should not even care about this first byte, because
the length is set as 0.
FTX_MAX_WORD_LEN: Define as HA_FT_MAXCHARLEN * 4 aka 336 bytes,
not as 254 bytes.
row_merge_create_fts_sort_index(): Set the actual maximum length of the
column in bytes, similar to fts_create_one_index_table().
row_merge_fts_doc_tokenize(): Remove the redundant parameter word_dtype.
Use the actual maximum length of the column. Calculate the extra_size
in the same way as row_merge_buf_encode() does.
trx_state_eq(): Add the parameter bool relaxed=false, to
allow trx->state==TRX_STATE_NOT_STARTED where a different
state is expected, if an error has been reported.
trx_release_savepoint_for_mysql(): Pass relaxed=true to
trx_state_eq(). That is, allow the transaction to be idle
when ROLLBACK TO SAVEPOINT is attempted after an error
has been reported to the client.
buf_block_init(): Initialize buf_page_t::flush_type.
For some reason, Valgrind 3.12.0 would seem to flag some
bits in adjacent bitfields as uninitialized, even though only
the two bits of flush_type were left uninitialized. Initialize
the field to get rid of many warnings.
buf_page_init_low(): Initialize buf_page_t::old.
For some reason, Valgrind 3.12.0 would seem to flag all 32
bits uninitialized when buf_page_init_for_read() invokes
buf_LRU_add_block(bpage, TRUE). This would trigger bogus warnings
for buf_page_t::freed_page_clock being uninitialized.
(The V-bits would later claim that only "old" is initialized
in the 32-bit word.) Perhaps recent compilers
(GCC 6.2.1 and clang 4.0.0) generate more optimized x86_64 code
for bitfield operations, confusing Valgrind?
mach_write_to_1(), mach_write_to_2(), mach_write_to_3():
Rewrite the assertions that ensure that the most significant
bits are zero. Apparently, clang 4.0.0 would optimize expressions
of the form ((n | 0xFF) <= 0x100) to (n <= 0x100). The redundant
0xFF was added in the first place in order to suppress a
Valgrind warning. (Valgrind would warn about comparing uninitialized
values even in the case when the uninitialized bits do not affect
the result of the comparison.)
In InnoDB and XtraDB functions that declare pointer parameters as nonnull,
remove nullness checks, because GCC would optimize them away anyway.
Use #ifdef instead of #if when checking for a configuration flag.
Clang says that left shifts of negative values are undefined.
So, use ~0U instead of ~0 in a number of macros.
Some functions that were defined as UNIV_INLINE were declared as
UNIV_INTERN. Consistently use the same type of linkage.
ibuf_merge_or_delete_for_page() could pass bitmap_page=NULL to
buf_page_print(), conflicting with the __attribute__((nonnull)).
This is not a fix, this is instrumentation to find out is MySQL frm dictionary
and InnoDB data dictionary really out-of-sync when this assertion is fired,
or is there some other reason (bug).
Analysis: Problem is that page is encrypted but encryption information
on page 0 has already being changed.
Fix: If page header contains key_version != 0 and even if based on
current encryption information tablespace is not encrypted we
need to check is page corrupted. If it is not, then we know that
page is not encrypted. If page is corrupted, we need to try to
decrypt it and then compare the stored and calculated checksums
to see is page corrupted or not.