Commit graph

114 commits

Author SHA1 Message Date
Thirunarayanan Balathandayuthapani
9ba18d1aa0 MDEV-35394 Innochecksum misinterprets freed pages
- Innochecksum misinterprets the freed pages as active one.
This leads the user to think there are too many valid
pages exist.

- To avoid this confusion, innochecksum introduced one
more option --skip-freed-pages and -r to avoid the freed
pages while dumping or printing the summary of the tablespace.

- Innochecksum can safely assume the page is freed if
the respective extent doesn't belong to a segment and marked as
freed in XDES_BITMAP in extent descriptor page.

- Innochecksum shouldn't assume that zero-filled page as extent
descriptor page.

Reviewed-by: Marko Mäkelä
2024-11-27 13:00:51 +05:30
Marko Mäkelä
bb47e575de MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

On crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

A section of the test mariabackup.innodb_redo_overwrite
that is parsing some mariadb-backup --backup output has
been removed, because that output "redo log block is overwritten"
would often be missing in a Microsoft Windows environment
as a result of these changes.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page(): Handle FSP_SPACE_FLAGS=0xffffffff
in the same way on both 32-bit and 64-bit architectures.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-17 17:24:20 +03:00
Oleksandr Byelkin
6bf8483cac Merge branch '10.5' into 10.6 2023-08-01 15:08:52 +02:00
Oleksandr Byelkin
7564be1352 Merge branch '10.4' into 10.5 2023-07-26 16:02:57 +02:00
Marko Mäkelä
12a5fb4b36 MDEV-31641 innochecksum dies with Floating point exception
print_summary(): Skip index_ids for which index.pages is 0.
Tablespaces may contain some freed pages that used to refer to indexes
or tables that were dropped.
2023-07-10 13:46:34 +03:00
Oleksandr Byelkin
c3a5cf2b5b Merge branch '10.5' into 10.6 2023-01-31 09:31:42 +01:00
Oleksandr Byelkin
a977054ee0 Merge branch '10.3' into 10.4 2023-01-28 18:22:55 +01:00
Oleksandr Byelkin
7fa02f5c0b Merge branch '10.4' into 10.5 2023-01-27 13:54:14 +01:00
Oleksandr Byelkin
dd24fa3063 Merge branch '10.3' into 10.4 2023-01-26 10:34:26 +01:00
Mikhail Chalov
567b681299 Minimize unsafe C functions usage - replace strcat() and strcpy() (and strncat() and strncpy()) with custom safe_strcat() and safe_strcpy() functions
The MariaDB code base uses strcat() and strcpy() in several
places. These are known to have memory safety issues and their usage is
discouraged. Common security scanners like Flawfinder flags them. In MariaDB we
should start using modern and safer variants on these functions.

This is similar to memory issues fixes in 19af1890b5
and 9de9f105b5 but now replace use of strcat()
and strcpy() with safer options strncat() and strncpy().

However, add '\0' forcefully to make sure the result string is correct since
for these two functions it is not guaranteed what new string will be null-terminated.

Example:

    size_t dest_len = sizeof(g->Message);
    strncpy(g->Message, "Null json tree", dest_len); strncat(g->Message, ":",
    sizeof(g->Message) - strlen(g->Message)); size_t wrote_sz = strlen(g->Message);
    size_t cur_len = wrote_sz >= dest_len ? dest_len - 1 : wrote_sz;
    g->Message[cur_len] = '\0';

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the BSD-new
license. I am contributing on behalf of my employer Amazon Web Services

-- Reviewer and co-author Vicențiu Ciorbaru <vicentiu@mariadb.org>
-- Reviewer additions:
* The initial function implementation was flawed. Replaced with a simpler
  and also correct version.
* Simplified code by making use of snprintf instead of chaining strcat.
* Simplified code by removing dynamic string construction in the first
  place and using static strings if possible. See connect storage engine
  changes.
2023-01-20 15:18:52 +02:00
Marko Mäkelä
ca3bbf4c0c Merge 10.5 into 10.6 2022-04-12 09:26:02 +03:00
Daniel Black
4ee00a29e3 MDEV-28250 aix test case failure innodb_zip.innochecksum_3,4k,crc32,innodb
As discovered by tracing, but also presenting in AIX fseeko
documentation, seeking beyond the EOF is acceptable, as you can write
there.

To display the same error in AIX to other implementations that return
errors on seek, we take the EFBIG error code on reading and error the
same way.

An AIX truss of an aspect of the test:

truss extra/innochecksum --page=18446744073709551615 $MYSQLD_DATADIR/test/tab1.ibd

  statx("./mysql-test/var/log/innodb_zip.innochecksum_3-4k,crc32,innodb/mysqld.1/data//test/tab1.ibd", 0x0FFFFFFFFFFFF610, 176, 010) = 0
  kopen("./mysql-test/var/log/innodb_zip.innochecksum_3-4k,crc32,innodb/mysqld.1/data//test/tab1.ibd", O_RDONLY|O_LARGEFILE) = 3
  kfcntl(3, 12, 0x00000001100006C8)               = 0
  kfcntl(3, F_GETFL, 0x00000001100A6CF8)          = 67108864
  kioctl(3, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
  klseek(3, 0, 1, 0x0FFFFFFFFFFFF3F0)             = 0
  kioctl(3, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
  kread(3, "DEADBEEF\0\0\0\0FFFFFFFF".., 4096)    = 4096
  klseek(3, 0, 1, 0x0FFFFFFFFFFFF450)             = 0
  klseek(3, 17592186040320, 0, 0x0FFFFFFFFFFFF450) = 0
  klseek(3, 0, 1, 0x0FFFFFFFFFFFF3F0)             = 0
  kread(3, "DEADBEEF\0\0\0\0FFFFFFFF".., 4096)    Err#27 EFBIG

An equivalent Linux trace:

ltrace extra/innochecksum --page=18446744073709551615 $MYSQLD_DATADIR/test/tab1.ibd

  stat64(0x7fff10ea2dc3, 0x7fff10ea0670, 88, 0x8026be41)         = 0
  open64("./mysql-test/var/log/innodb_zip."..., 0, 02072403160)  = 3
  fcntl64(3, 6, 0x139f180, 1)                                    = 0
  fgetpos64(0x615000000080, 0x7fff10ea0760, 1, 0)                = 0
  fseeko64(0x615000000080, 0xffffffff000, 0, 5 <unfinished ...>
  pthread_getspecific(0, 0x4d0eb8, 0x7fff10ea0490, 0)            = 0x7f7b2806d000
  <... fseeko64 resumed> )                                       = 0
  fgetpos64(0x615000000080, 0x7fff10ea0760, 1, 1)                = 0
  feof(0x615000000080)                                           = 0
  feof(0x615000000080)                                           = 1
  Error: Unable to seek to necessary offset
2022-04-07 15:17:52 +10:00
Marko Mäkelä
9d94c60f2b Merge 10.5 into 10.6 2022-04-06 12:08:30 +03:00
Marko Mäkelä
cacb61b6be Merge 10.4 into 10.5 2022-04-06 10:06:39 +03:00
Marko Mäkelä
d6d66c6e90 Merge 10.3 into 10.4 2022-04-06 08:59:09 +03:00
Marko Mäkelä
7c584d8270 Merge 10.2 into 10.3 2022-04-06 08:06:35 +03:00
Vlad Lesin
33ff18627e MDEV-27835 innochecksum -S crashes for encrypted .ibd tablespace
As main() invokes parse_page() when -S or -D are set, it can be a case
when parse_page() is invoked when -D filename is not set, that is why
any attempt to write to page dump file must be done only if the file
name is set with -D.

The bug is caused by 2ef7a5a13a
(MDEV-13443).
2022-03-29 13:47:37 +03:00
Marko Mäkelä
b2fa874e46 MDEV-28181 The innochecksum -w option was inadvertently removed
In commit 7a4fbb55b0 (MDEV-25105)
the innochecksum option --write (-w) was removed altogether.
It should have been made a Boolean option, so that old data files
may be converted to a format that is compatible with
innodb_checksum_algorithm=strict_crc32 by executing the following:

innochecksum -n -w ibdata* */*.ibd

It would be better to use an older-version innochecksum
for such a conversion, so that page checksums will be validated
before updating the checksum.

It never was possible for innochecksum to convert files to the
innodb_checksum_algorithm=full_crc32 format that is the default
for new InnoDB data files.
2022-03-28 11:35:10 +03:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Oleksandr Byelkin
a576a1cea5 Merge branch '10.3' into 10.4 2022-01-30 09:46:52 +01:00
Oleksandr Byelkin
41a163ac5c Merge branch '10.2' into 10.3 2022-01-29 15:41:05 +01:00
Vladislav Vaintroub
47e18af906 MDEV-27494 Rename .ic files to .inl 2022-01-17 16:41:51 +01:00
Oleksandr Byelkin
6efb5e9f5e Merge branch '10.5' into 10.6 2021-08-02 10:11:41 +02:00
Oleksandr Byelkin
ae6bdc6769 Merge branch '10.4' into 10.5 2021-07-31 23:19:51 +02:00
Oleksandr Byelkin
7841a7eb09 Merge branch '10.3' into 10.4 2021-07-31 22:59:58 +02:00
Marko Mäkelä
b50ea90063 Merge 10.2 into 10.3 2021-07-22 18:57:54 +03:00
Marko Mäkelä
124dc0d85b MDEV-25361 fixup: Fix integer type mismatch
InnoDB tablespace identifiers and page numbers are 32-bit numbers.
Let us use a 32-bit type for them in innochecksum.

The changes in commit 1918bdf32c
broke the build on 32-bit Windows.

Thanks to Vicențiu Ciorbaru for an initial version of this fixup.
2021-07-22 17:53:43 +03:00
Marko Mäkelä
641f09398f Merge 10.5 into 10.6 2021-07-22 10:11:08 +03:00
Marko Mäkelä
82d5994520 MDEV-26110: Do not rely on alignment on static allocation
It is implementation-defined whether alignment requirements
that are larger than std::max_align_t (typically 8 or 16 bytes)
will be honored by the compiler and linker.

It turns out that on IBM AIX, both alignas() and MY_ALIGNED()
only guarantees alignment up to 16 bytes.

For some data structures, specifying alignment to the CPU
cache line size (typically 64 or 128 bytes) is a mere performance
optimization, and we do not really care whether the requested
alignment is guaranteed.

But, for the correct operation of direct I/O, we do require that
the buffers be aligned at a block size boundary.

field_ref_zero: Define as a pointer, not an array.
For innochecksum, we can make this point to unaligned memory;
for anything else, we will allocate an aligned buffer from the heap.
This buffer will be used for overwriting freed data pages when
innodb_immediate_scrub_data_uncompressed=ON. And exactly that code
hit an assertion failure on AIX, in the test innodb.innodb_scrub.

log_sys.checkpoint_buf: Define as a pointer to aligned memory
that is allocated from heap.

log_t::file::write_header_durable(): Reuse log_sys.checkpoint_buf
instead of trying to allocate an aligned buffer from the stack.
2021-07-22 10:05:13 +03:00
Sergei Golubchik
6190a02f35 Merge branch '10.2' into 10.3 2021-07-21 20:11:07 +02:00
Eugene Kosov
7da1cfb07a avoid searching std::map twice in innochecksum 2021-07-20 19:32:33 +03:00
Eugene Kosov
1918bdf32c MDEV-25361 innochecksum must not report errors for freed pages
Store and maintain xdes pages always. And doesn't verify checksums for
freed pages.

innochecksum can work only with the first space file of multiple ones.
Tell about it and abort in case of not the first file.
2021-07-20 19:32:33 +03:00
Marko Mäkelä
4dfec8b230 Merge 10.5 into 10.6 2021-06-21 17:49:33 +03:00
Marko Mäkelä
a42c80bd48 Merge 10.4 into 10.5 2021-06-21 14:22:22 +03:00
Marko Mäkelä
d3e4fae797 Merge 10.3 into 10.4 2021-06-21 12:38:25 +03:00
Marko Mäkelä
e46f76c974 MDEV-15912: Remove traces of insert_undo
Let us simply refuse an upgrade from earlier versions if the
upgrade procedure was not followed. This simplifies the purge,
commit, and rollback of transactions.

Before upgrading to MariaDB 10.3 or later, a clean shutdown
of the server (with innodb_fast_shutdown=1 or 0) is necessary,
to ensure that any incomplete transactions are rolled back.
The undo log format was changed in MDEV-12288. There is only
one persistent undo log for each transaction.
2021-06-21 12:34:07 +03:00
Vladislav Vaintroub
3d6eb7afcf MDEV-25602 get rid of __WIN__ in favor of standard _WIN32
This fixed the MySQL bug# 20338 about misuse of double underscore
prefix __WIN__, which was old MySQL's idea of identifying Windows
Replace it by _WIN32 standard symbol for targeting Windows OS
(both 32 and 64 bit)

Not that connect storage engine is not fixed in this patch (must be
fixed in "upstream" branch)
2021-06-06 13:21:03 +02:00
Marko Mäkelä
7a4fbb55b0 MDEV-25105 Remove innodb_checksum_algorithm values none,innodb,...
Historically, InnoDB supported a buggy page checksum algorithm that did not
compute a checksum over the full page. Later, well before MySQL 4.1
introduced .ibd files and the innodb_file_per_table option, the algorithm
was corrected and the first 4 bytes of each page were redefined to be
a checksum.

The original checksum was so slow that an option to disable page checksum
was introduced for benchmarketing purposes.

The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set
extension, which includes instructions for faster computation of CRC-32C.
In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was
implemented to make of that. As that option was changed to be the default
in MySQL 5.7, a bug was found on big-endian platforms and some work-around
code was added to weaken that checksum further. MariaDB disables that
work-around by default since MDEV-17958.

Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER
and ARM and also for IA-32/AMD64, making use of carry-less multiplication
where available.

Long story short, innodb_checksum_algorithm=crc32 is faster and more secure
than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb.
It should have removed any need to use innodb_checksum_algorithm=none.

The setting innodb_checksum_algorithm=crc32 is the default in
MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5,
MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default.
It is even faster and more secure.

The default settings in MariaDB do allow old data files to be read,
no matter if a worse checksum algorithm had been used.
(Unfortunately, before innodb_checksum_algorithm=full_crc32,
the data files did not identify which checksum algorithm is being used.)

The non-default settings innodb_checksum_algorithm=strict_crc32 or
innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C
checksums. The incompatibility with old data files is why they are
not the default.

The newest server not to support innodb_checksum_algorithm=crc32
were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life.
A valid reason for using innodb_checksum_algorithm=innodb could have
been the ability to downgrade. If it is really needed, data files
can be converted with an older version of the innochecksum utility.

Because there is no good reason to allow data files to be written
with insecure checksums, we will reject those option values:

    innodb_checksum_algorithm=none
    innodb_checksum_algorithm=innodb
    innodb_checksum_algorithm=strict_none
    innodb_checksum_algorithm=strict_innodb

Furthermore, the following innochecksum options will be removed,
because only strict crc32 will be supported:

    innochecksum --strict-check=crc32
    innochecksum -C crc32
    innochecksum --write=crc32
    innochecksum -w crc32

If a user wishes to convert a data file to use a different checksum
(so that it might be used with the no-longer-supported
MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE
nor system tablespace format changes that were made in MariaDB 10.3),
then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or
MySQL 5.7 can be used.

Reviewed by: Thirunarayanan Balathandayuthapani
2021-03-11 12:46:18 +02:00
Monty
5d6ad2ad66 Added 'const' to arguments in get_one_option and find_typeset()
One should not change the program arguments!
This change also reduces warnings from the icc compiler.

Almost all changes are just syntax changes (adding const to
'get_one_option function' declarations).

Other changes:
- Added a few cast of 'argument' from 'const char*' to 'char *'. This
  was mainly in calls to 'external' functions we don't have control of.
- Ensure that all reset of 'password command line argument' are similar.
  (In almost all cases it was just adding a comment and a cast)
- In mysqlbinlog.cc and mysqld.cc there was a few cases that changed
  the command line argument. These places where changed to instead allocate
  the option in a MEM_ROOT to avoid changing the argument. Some of this
  code was changed to ensure that different programs did parsing the
  same way. Added a test case for the changes in mysqlbinlog.cc
- Changed a few variables that took their value from command line options
  from 'char *' to 'const char *'.
2021-02-08 12:16:29 +02:00
Vladislav Vaintroub
ccbe6bb6fc MDEV-19935 Create unified CRC-32 interface
Add CRC32C code to mysys. The x86-64 implementation uses PCMULQDQ in addition to CRC32 instruction
after Intel whitepaper, and is ported from rocksdb code.

Optimized ARM and POWER CRC32 were already present in mysys.
2020-09-17 16:07:37 +02:00
Marko Mäkelä
18a62eb76d MDEV-21133 follow-up: Use fil_page_get_type()
Let us use the common accessor function fil_page_get_type()
instead of accessing the page header field FIL_PAGE_TYPE directly.
2020-05-07 17:15:34 +03:00
Marko Mäkelä
ba573c4736 MDEV-21133 follow-up: More my_assume_aligned hints
fsp0pagecompress.h: Remove.
Invoke fil_page_get_type() and FSP_FLAGS_GET_PAGE_COMPRESSION_LEVEL
directly.

log_block_get_flush_bit(), log_block_set_flush_bit():
Access the byte directly.

dict_sys_read_row_id(): Remove (unused function).
2020-05-07 12:25:00 +03:00
Marko Mäkelä
28c89b7151 Merge 10.4 into 10.5 2019-12-16 07:47:17 +02:00
Oleksandr Byelkin
a15234bf4b Merge branch '10.3' into 10.4 2019-12-09 15:09:41 +01:00
Marko Mäkelä
42a4ae54c2 MDEV-21225 Remove ut_align() and use aligned_malloc()
Before commit 90c52e5291 introduced
aligned_malloc(), InnoDB always used a pattern of over-allocating
memory and invoking ut_align() to guarantee the desired alignment.

It is cleaner to invoke aligned_malloc() and aligned_free() directly.

ut_align(): Remove. In assertions, ut_align_down() can be used instead.
2019-12-05 06:42:31 +02:00
Faustin Lammler
2df2238cb8 Lintian complains on spelling error
The lintian check complains on spelling error:
https://salsa.debian.org/mariadb-team/mariadb-10.3/-/jobs/95739
2019-12-02 12:41:13 +02:00
Marko Mäkelä
5ed54e78ac Cleanup: Remove redundant XDES_FREE_BIT parameters
The page allocation bitmaps in the extent descriptor pages
contain two bits per page: XDES_FREE_BIT and XDES_CLEAN_BIT,
which is unused. Simplify read access.

xdes_is_free(descr,mtr): Remove. Use !xdes_get_n_used(descr) instead.

xdes_is_free(): Replaces xdes_get_bit(), xdes_mtr_get_bit().

xdes_find_free(): Replaces xdes_find_bit().

fsp_seg_inode_page_get_nth_inode(): Remove the redundant parameters
physical_size, mtr.

fsp_seg_inode_page_find_used(), fsp_seg_inode_page_find_free():
Remove the redundant parameter mtr.
2019-11-08 13:45:02 +02:00
Sergei Golubchik
f217612fad MDEV-12684 Show what config file a sysvar got a value from
change get_one_option() prototype to pass the filename and
not to pass the redundant optid.
2019-10-14 10:29:30 +02:00
Marko Mäkelä
c221bcdce7 Merge 10.3 into 10.4 2019-08-16 10:51:20 +03:00