Move function ut_crc32_swap_byteorder to a non-x86 #ifdef area.
As its only used in BIGENDIAN, use #ifdefs around
ut_crc32_swap_byteorder.
Travis CI and Debian both include s390x in builds/test, which is big endian.
Fixes commit: 1312b4ebb6
log_write_buf(): Remove the unused variable write_header.
The dependent code was removed in
commit 0c2365c4e3.
That was dead or unnecessary code at least ever since
commit 9ef2d29ff4
removed the support for innodb_log_files_in_group>1.
log_t::has_encryption_key_rotation(): checks whether
key rotation is supported.
In a subsequent redo log format version, this key rotation
may be broken again.
Write log header just ones when file is created, instead of
writing to it on every log file wrap around.
log_t::file::write_header_durable(): this one writes to log header
log_write_buf(): this one stops writing to log header
By default, when redo log is being written for modifying a persistent
data page, the data page must actually be changed. If the write can
sometimes be optimized away, then the template parameter w=mtr_t::OPT
should be passed in order to silence the debug assertion failure.
InnoDB undo log pages can be reused without properly freeing or
initializing them in between. In particular, the undo log header
page field TRX_UNDO_TRX_NO could have been part of an undo log
record page, and those bytes could accidentally have the desired
value when the page is reused as an undo log header page of
another transaction.
Because the function trx_undo_set_state_at_finish() always changes
the TRX_UNDO_STATE of the page, and because recovery is only reading
TRX_UNDO_TRX_NO for pages that either have the correct TRX_UNDO_STATE
or, in trx_rseg_array_init(), are attached to the TRX_SYS page, the
garbage values in TRX_UNDO_TRX_NO do not seem to cause a problem.
This assertion failure affects debug builds only.
e.g.
- dont -> don't
- occurence -> occurrence
- succesfully -> successfully
- easyly -> easily
Also remove trailing space in selected files.
These changes span:
- server core
- Connect and Innobase storage engine code
- OQgraph, Sphinx and TokuDB storage engines
Related to MDEV-21769.
Instrument new synchronization primitive with thd_wait_begin/end
to inform threadpool about waits.
This considerably improve performance on write benchmarks
(e.g sysbench update_index) with generic threadpool, of course the cost is
possibility of many newly created threads.
1. Refactored innobase_close_connection(). Transaction must've already
been rolled back by this time. We should expect only transactions in the
PREPARED state when MDEV-742 is done.
2. Added missing put_pins() to trx_disconnect_prepared(). Missing
put_pins() wasn't a problem because trx_disconnect_prepared() is a dead
code. But it will get revived in the main MDEV-742 patch.
3. Fixed missing reset of trx->mysql_log_file_name when RW transaction
didn't emit any log records (zero-modification RW). The problem was
detected by ASAN when disconnected XA transaction was trying to make
use of inherited mysql_log_file_name pointing into binlog data of
detached THD.
This missing reset also had user-visible side effect, when
trx_sys_print_mysql_binlog_offset() would report binlog position
not of the most recently committed transaction.
One of possible scenarios that is expected to misbehave is as following:
thr1> CREATE TABLE t1(a INT) ENGINE=InnoDB;
thr1> INSERT INTO t1 VALUES(1);
thr1> BEGIN;
thr1> UPDATE t1 SET a=1
thr1> COMMIT; -- zero-modification, misses to reset mysql_log_file_name
thr2> BEGIN;
thr2> INSERT INTO t1 VALUES(2);
thr2> COMMIT;
thr1> BEGIN;
thr1> do-some-real-changes;
thr1> ROLLBACK; -- will store binlog pos from previous COMMIT in thr1?
In this case it means if binlog is replayed from position reported by
trx_sys_print_mysql_binlog_offset(), t1 will end up with two records
containing '2'.
Part of
MDEV-742 - XA PREPAREd transaction survive disconnect/server restart
- This issue was caused by 5e62b6a5e0.
fts_optimize_callback() should free fts_optimize_wq and make it as NULL
when it receives FTS_MSG_STOP message. So that subsequent
fts_optimize_callback() doesn't fail with segmentation fault.
Starting with commit 1a6f708ec5
the function buf_pool_get_dirty_pages_count() is only used
in a debug check. It was dead code for non-debug builds.
buf_flush_dirty_pages(): Perform the debug check inline,
and replace the assertion
ut_ad(first || buf_pool_get_dirty_pages_count(id) == 0);
with another one that is executed while holding the mutexes:
ut_ad(id != bpage->id.space());
All tablespace metadata is buffered in fil_system. There is a LRU
mechanism, but that only controls the opening and closing of
fil_node_t::handle.
It is much more efficient and less error-prone to access data file names
by looking up the fil_space_t object rather than by essentially joining
each row with an access to SYS_DATAFILES via the InnoDB internal SQL parser.
dict_get_first_path(): Declare static. The function may only be needed
when loading or updating the data dictionary. Also, change a condition
in order to avoid a bogus GCC 10 -Wstringop-overflow warning for
mem_strdupl() about len==ULINT_UNDEFINED.
i_s_sys_tablespaces_fill_table(): Do not access other InnoDB internal
dictionary tables than SYS_TABLESPACES.
The -Wconversion in GCC seems to be stricter than in clang.
GCC at least since version 4.4.7 issues truncation warnings for
assignments to bitfields, while clang 10 appears to only issue
warnings when the sizes in bytes rounded to the nearest integer
powers of 2 are different.
Before GCC 10.0.0, -Wconversion required more casts and would not
allow some operations, such as x<<=1 or x+=1 on a data type that
is narrower than int.
GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining
about x|=y even when x and y are compatible types that are narrower
than int. Hence, we must rewrite some x|=y as
x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion.
In GCC 6 and later, the warning for assigning wider to bitfields
that are narrower than 8, 16, or 32 bits can be suppressed by
applying a bitwise & with the exact bitmask of the bitfield.
For older GCC, we must disable -Wconversion for GCC 4 or 5 in such
cases.
The bitwise negation operator appears to promote short integers
to a wider type, and hence we must add explicit truncation casts
around them. Microsoft Visual C does not allow a static_cast to
truncate a constant, such as static_cast<byte>(1) truncating int.
Hence, we will use the constructor-style cast byte(~1) for such cases.
This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0,
clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019)
on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
The extension of the record comparison functions for SPATIAL INDEX in
mysql/mysql-server@b66ad511b6
was suboptimal for multiple reasons:
Some functions used unnecessary temporary variables of the int type,
instead of the more appropriate size_t, causing type mismatch.
Many functions unnecessarily required rec_get_offsets() to be
computed, or a parameter for length, although the size of the
minimum bounding rectangle (MBR) is hard-coded as
SPDIMS * 2 * sizeof(double), or 32 bytes.
In InnoDB SPATIAL INDEX records, there always is a 32-byte key
followed by either a 4-byte child page number or the PRIMARY KEY value.
The length parameters were not properly validated.
The function cmp_geometry_field() was making an incorrect attempt
at checking that the lengths are at least sizeof(double) (8 bytes),
even though the function is accessing up to 32 bytes in both MBR.
Functions that are called from only one compilation unit are defined
in another compilation unit, making the code harder to follow and
potentially slower to execute.
cmp_dtuple_rec_with_gis(): FIXME: Correct the debug assertion
and possibly the function TABLE_SHARE::init_from_binary_frm_image()
or related code, which causes an unexpected length of
DATA_MBR_LEN + 2 bytes to be passed to this function.
actually, page_zip_verify_checksum() generally allows all-zeroes
checksums because our CRC32 checksum is something like
crc_1 ^ crc_2 ^ crc_3
Also, all zeroes page is considered correct.
As a side effect fix nasty reinterpret_cast<> UB
Also, since c0f47a4a58 innodb_checksum_algorithm=full_crc32
exists which computes CRC32 in one go (without bitwise arithmetic)
PageBulk::insertPage(): Check the array bounds before comparing.
We used to read one byte beyond the end of the 'rec' payload.
The incorrect logic was originally introduced in
commit 7ae21b18a6.
buf_flush_freed_page(): Reformat in the common style, and
simplify some code. Prefer to request all information from
smaller data structures (buf_page_t) than from fil_space_t
or the global variable srv_immediate_scrub_data_uncompressed.
SysTablespace::open_or_create(): Assert that the temporary
tablespace will not be created in page_compressed format, so that
buf_flush_freed_page() can avoid checking that on every call.
IORequest: Remove duplicated constructors, and do not explicitly
declare a default constructor.
The following parameters are deprecated:
innodb-background-scrub-data-uncompressed
innodb-background-scrub-data-compressed
innodb-background-scrub-data-interval
innodb-background-scrub-data-check-interval
Removed scrubbing code completely(btr0scrub.h, btr0scrub.cc)
Removed information_schema.innodb_tablespaces_scrubbing tables
Removed the scrubbing logic from fil_crypt_thread()
When a InnoDB data file page is freed, its contents becomes garbage,
and any storage allocated in the data file is wasted. During flushing,
InnoDB initializes the page with zeros if scrubbing is enabled. If the
tablespace is compressed then InnoDB should punch a hole else ignore the
flushing of the freed page.
buf_page_t:
- Replaced the variable file_page_was_freed, init_on_flush in buf_page_t
with status enum variable.
- Changed all debug assert of file_page_was_freed to DBUG_ASSERT
of buf_page_t::status
Removed buf_page_set_file_page_was_freed(),
buf_page_reset_file_page_was_freed().
buf_page_free(): Newly added function which takes X-lock on the page
before marking the status as FREED. So that InnoDB flush handler can
avoid concurrent flush of the freed page. Also while flushing the page,
InnoDB make sure that redo log which does freeing of the page also written
to the disk. Currently, this function only marks the page as FREED if
it is in buffer pool
buf_flush_freed_page(): Newly added function which initializes zeros
asynchorously if innodb_immediate_scrub_data_uncompressed is enabled.
Punch a hole to the file synchorously if page_compressed is enabled.
Reset the io_fix to NORMAL. Release the block from flush list and
associated mutex before writing zeros or punch a hole to the file.
buf_flush_page(): Removed the unnecessary usage of temporary
variable "flush"
fil_io(): Introduce new parameter called punch_hole. It allows fil_io()
to punch the hole to the file for the given offset.
buf_page_create(): Let the callers assign buf_page_t::status.
Every caller should eventually invoke mtr_t::init().
fsp_page_create(): Remove the unused mtr_t parameter.
In all other callers of buf_page_create() except fsp_page_create(),
before invoking mtr_t::init(), invoke
mtr_t::sx_latch_at_savepoint() or mtr_t::x_latch_at_savepoint().
mtr_t::init(): Initialize buf_page_t::status also for the temporary
tablespace (when redo logging is disabled), to avoid assertion failures.
recv_log_recover_10_4(): Add a missing bit pattern negation that
was forgotten when commit f8a9f90667
(MDEV-12353) removed the support for crash-upgrading.
btr_cur_upd_rec_in_place(): Invoke page_zip_rec_set_deleted()
for ROW_FORMAT=COMPRESSED pages, so that the change will be
written to the redo log.
This part of crash recovery was broken in
commit 08ba388713 (MDEV-12353).
create_log_file(): Delete all old redo log files where they used to be
deleted, after the crash injection point innodb_log_abort_6,
before commit 9ef2d29ff4
deprecated and ignored the setting innodb_log_files_in_group.
log_crypt_101_read_checkpoint(), log_crypt_101_read_block():
Declare as ATTRIBUTE_COLD. These are only used when
checking that a MariaDB 10.1 encrypted redo log is clean.
log_block_calc_checksum_format_0(): Define in the only
compilation unit where it is needed. This is only used
when reading the checkpoint information from redo logs
before MariaDB 10.2.2.
crypt_info_t: Declare the byte arrays directly with alignas().
log_crypt(): Use memcpy_aligned instead of reinterpret_cast
on integers.