InnoDB tablespace identifiers and page numbers are 32-bit numbers.
Let us use a 32-bit type for them in innochecksum.
The changes in commit 1918bdf32c
broke the build on 32-bit Windows.
Thanks to Vicențiu Ciorbaru for an initial version of this fixup.
In the InnoDB data files, we allocate 32 bits for tablespace identifiers
and page numbers as well as tablespace flags. But, in main memory
data structures we allocate 32 or 64 bits, depending on the register
width of the processor. Let us always use 32-bit fields to eliminate
a mismatch and reduce the memory footprint on 64-bit systems.
It is implementation-defined whether alignment requirements
that are larger than std::max_align_t (typically 8 or 16 bytes)
will be honored by the compiler and linker.
It turns out that on IBM AIX, both alignas() and MY_ALIGNED()
only guarantees alignment up to 16 bytes.
For some data structures, specifying alignment to the CPU
cache line size (typically 64 or 128 bytes) is a mere performance
optimization, and we do not really care whether the requested
alignment is guaranteed.
But, for the correct operation of direct I/O, we do require that
the buffers be aligned at a block size boundary.
field_ref_zero: Define as a pointer, not an array.
For innochecksum, we can make this point to unaligned memory;
for anything else, we will allocate an aligned buffer from the heap.
This buffer will be used for overwriting freed data pages when
innodb_immediate_scrub_data_uncompressed=ON. And exactly that code
hit an assertion failure on AIX, in the test innodb.innodb_scrub.
log_sys.checkpoint_buf: Define as a pointer to aligned memory
that is allocated from heap.
log_t::file::write_header_durable(): Reuse log_sys.checkpoint_buf
instead of trying to allocate an aligned buffer from the stack.
Store and maintain xdes pages always. And doesn't verify checksums for
freed pages.
innochecksum can work only with the first space file of multiple ones.
Tell about it and abort in case of not the first file.
Let us simply refuse an upgrade from earlier versions if the
upgrade procedure was not followed. This simplifies the purge,
commit, and rollback of transactions.
Before upgrading to MariaDB 10.3 or later, a clean shutdown
of the server (with innodb_fast_shutdown=1 or 0) is necessary,
to ensure that any incomplete transactions are rolled back.
The undo log format was changed in MDEV-12288. There is only
one persistent undo log for each transaction.
This fixed the MySQL bug# 20338 about misuse of double underscore
prefix __WIN__, which was old MySQL's idea of identifying Windows
Replace it by _WIN32 standard symbol for targeting Windows OS
(both 32 and 64 bit)
Not that connect storage engine is not fixed in this patch (must be
fixed in "upstream" branch)
Historically, InnoDB supported a buggy page checksum algorithm that did not
compute a checksum over the full page. Later, well before MySQL 4.1
introduced .ibd files and the innodb_file_per_table option, the algorithm
was corrected and the first 4 bytes of each page were redefined to be
a checksum.
The original checksum was so slow that an option to disable page checksum
was introduced for benchmarketing purposes.
The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set
extension, which includes instructions for faster computation of CRC-32C.
In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was
implemented to make of that. As that option was changed to be the default
in MySQL 5.7, a bug was found on big-endian platforms and some work-around
code was added to weaken that checksum further. MariaDB disables that
work-around by default since MDEV-17958.
Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER
and ARM and also for IA-32/AMD64, making use of carry-less multiplication
where available.
Long story short, innodb_checksum_algorithm=crc32 is faster and more secure
than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb.
It should have removed any need to use innodb_checksum_algorithm=none.
The setting innodb_checksum_algorithm=crc32 is the default in
MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5,
MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default.
It is even faster and more secure.
The default settings in MariaDB do allow old data files to be read,
no matter if a worse checksum algorithm had been used.
(Unfortunately, before innodb_checksum_algorithm=full_crc32,
the data files did not identify which checksum algorithm is being used.)
The non-default settings innodb_checksum_algorithm=strict_crc32 or
innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C
checksums. The incompatibility with old data files is why they are
not the default.
The newest server not to support innodb_checksum_algorithm=crc32
were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life.
A valid reason for using innodb_checksum_algorithm=innodb could have
been the ability to downgrade. If it is really needed, data files
can be converted with an older version of the innochecksum utility.
Because there is no good reason to allow data files to be written
with insecure checksums, we will reject those option values:
innodb_checksum_algorithm=none
innodb_checksum_algorithm=innodb
innodb_checksum_algorithm=strict_none
innodb_checksum_algorithm=strict_innodb
Furthermore, the following innochecksum options will be removed,
because only strict crc32 will be supported:
innochecksum --strict-check=crc32
innochecksum -C crc32
innochecksum --write=crc32
innochecksum -w crc32
If a user wishes to convert a data file to use a different checksum
(so that it might be used with the no-longer-supported
MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE
nor system tablespace format changes that were made in MariaDB 10.3),
then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or
MySQL 5.7 can be used.
Reviewed by: Thirunarayanan Balathandayuthapani
One should not change the program arguments!
This change also reduces warnings from the icc compiler.
Almost all changes are just syntax changes (adding const to
'get_one_option function' declarations).
Other changes:
- Added a few cast of 'argument' from 'const char*' to 'char *'. This
was mainly in calls to 'external' functions we don't have control of.
- Ensure that all reset of 'password command line argument' are similar.
(In almost all cases it was just adding a comment and a cast)
- In mysqlbinlog.cc and mysqld.cc there was a few cases that changed
the command line argument. These places where changed to instead allocate
the option in a MEM_ROOT to avoid changing the argument. Some of this
code was changed to ensure that different programs did parsing the
same way. Added a test case for the changes in mysqlbinlog.cc
- Changed a few variables that took their value from command line options
from 'char *' to 'const char *'.
Add CRC32C code to mysys. The x86-64 implementation uses PCMULQDQ in addition to CRC32 instruction
after Intel whitepaper, and is ported from rocksdb code.
Optimized ARM and POWER CRC32 were already present in mysys.
Before commit 90c52e5291 introduced
aligned_malloc(), InnoDB always used a pattern of over-allocating
memory and invoking ut_align() to guarantee the desired alignment.
It is cleaner to invoke aligned_malloc() and aligned_free() directly.
ut_align(): Remove. In assertions, ut_align_down() can be used instead.
The page allocation bitmaps in the extent descriptor pages
contain two bits per page: XDES_FREE_BIT and XDES_CLEAN_BIT,
which is unused. Simplify read access.
xdes_is_free(descr,mtr): Remove. Use !xdes_get_n_used(descr) instead.
xdes_is_free(): Replaces xdes_get_bit(), xdes_mtr_get_bit().
xdes_find_free(): Replaces xdes_find_bit().
fsp_seg_inode_page_get_nth_inode(): Remove the redundant parameters
physical_size, mtr.
fsp_seg_inode_page_find_used(), fsp_seg_inode_page_find_free():
Remove the redundant parameter mtr.
Added the condition in innochecksum tool to check page id mismatch.
This could catch the write corruption caused by InnoDB.
Added the debug insert inside fil_io() to check whether it writes
the page to wrong offset.
This is a follow-up task to MDEV-12026, which introduced
innodb_checksum_algorithm=full_crc32 and a simpler page format.
MDEV-12026 did not enable full_crc32 for page_compressed tables,
which we will be doing now.
This is joint work with Thirunarayanan Balathandayuthapani.
For innodb_checksum_algorithm=full_crc32 we change the
page_compressed format as follows:
FIL_PAGE_TYPE: The most significant bit will be set to indicate
page_compressed format. The least significant bits will contain
the compressed page size, rounded up to a multiple of 256 bytes.
The checksum will be stored in the last 4 bytes of the page
(whether it is the full page or a page_compressed page whose
size is determined by FIL_PAGE_TYPE), covering all preceding
bytes of the page. If encryption is used, then the page will
be encrypted between compression and computing the checksum.
For page_compressed, FIL_PAGE_LSN will not be repeated at
the end of the page.
FSP_SPACE_FLAGS (already implemented as part of MDEV-12026):
We will store the innodb_compression_algorithm that may be used
to compress pages. Previously, the choice of algorithm was written
to each compressed data page separately, and one would be unable
to know in advance which compression algorithm(s) are used.
fil_space_t::full_crc32_page_compressed_len(): Determine if the
page_compressed algorithm of the tablespace needs to know the
exact length of the compressed data. If yes, we will reserve and
write an extra byte for this right before the checksum.
buf_page_is_compressed(): Determine if a page uses page_compressed
(in any innodb_checksum_algorithm).
fil_page_decompress(): Pass also fil_space_t::flags so that the
format can be determined.
buf_page_is_zeroes(): Check if a page is full of zero bytes.
buf_page_full_crc32_is_corrupted(): Renamed from
buf_encrypted_full_crc32_page_is_corrupted(). For full_crc32,
we always simply validate the checksum to the page contents,
while the physical page size is explicitly specified by an
unencrypted part of the page header.
buf_page_full_crc32_size(): Determine the size of a full_crc32 page.
buf_dblwr_check_page_lsn(): Make this a debug-only function, because
it involves potentially costly lookups of fil_space_t.
create_table_info_t::check_table_options(),
ha_innobase::check_if_supported_inplace_alter(): Do allow the creation
of SPATIAL INDEX with full_crc32 also when page_compressed is used.
commit_cache_norebuild(): Preserve the compression algorithm when
updating the page_compression_level.
dict_tf_to_fsp_flags(): Set the flags for page compression algorithm.
FIXME: Maybe there should be a table option page_compression_algorithm
and a session variable to back it?
MariaDB data-at-rest encryption (innodb_encrypt_tables)
had repurposed the same unused data field that was repurposed
in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN)
field of SPATIAL INDEX. Because of this, MariaDB was unable to
support encryption on SPATIAL INDEX pages.
Furthermore, InnoDB page checksums skipped some bytes, and there
are multiple variations and checksum algorithms. By default,
InnoDB accepts all variations of all algorithms that ever existed.
This unnecessarily weakens the page checksums.
We hereby introduce two more innodb_checksum_algorithm variants
(full_crc32, strict_full_crc32) that are special in a way:
When either setting is active, newly created data files will
carry a flag (fil_space_t::full_crc32()) that indicates that
all pages of the file will use a full CRC-32C checksum over the
entire page contents (excluding the bytes where the checksum
is stored, at the very end of the page). Such files will always
use that checksum, no matter what the parameter
innodb_checksum_algorithm is assigned to.
For old files, the old checksum algorithms will continue to be
used. The value strict_full_crc32 will be equivalent to strict_crc32
and the value full_crc32 will be equivalent to crc32.
ROW_FORMAT=COMPRESSED tables will only use the old format.
These tables do not support new features, such as larger
innodb_page_size or instant ADD/DROP COLUMN. They may be
deprecated in the future. We do not want an unnecessary
file format change for them.
The new full_crc32() format also cleans up the MariaDB tablespace
flags. We will reserve flags to store the page_compressed
compression algorithm, and to store the compressed payload length,
so that checksum can be computed over the compressed (and
possibly encrypted) stream and can be validated without
decrypting or decompressing the page.
In the full_crc32 format, there no longer are separate before-encryption
and after-encryption checksums for pages. The single checksum is
computed on the page contents that is written to the file.
We do not make the new algorithm the default for two reasons.
First, MariaDB 10.4.2 was a beta release, and the default values
of parameters should not change after beta. Second, we did not
yet implement the full_crc32 format for page_compressed pages.
This will be fixed in MDEV-18644.
This is joint work with Marko Mäkelä.