This task is to ensure we have a clear definition and rules of how to
repair or optimize a table.
The rules are:
- REPAIR should be used with tables that are crashed and are
unreadable (hardware issues with not readable blocks, blocks with
'unexpected data' etc)
- OPTIMIZE table should be used to optimize the storage layout for the
table (recover space for delete rows and optimize the index
structure.
- ALTER TABLE table_name FORCE should be used to rebuild the .frm file
(the table definition) and the table (with the original table row
format). If the table is from and older MariaDB/MySQL release with a
different storage format, it will convert the data to the new
format. ALTER TABLE ... FORCE is used as part of mariadb-upgrade
Here follows some more background:
The 3 ways to repair a table are:
1) ALTER TABLE table_name FORCE" (not other options).
As an alias we allow: "ALTER TABLE table_name ENGINE=original_engine"
2) "REPAIR TABLE" (without FORCE)
3) "OPTIMIZE TABLE"
All of the above commands will optimize row space usage (which means that
space will be needed to hold a temporary copy of the table) and
re-generate all indexes. They will also try to replicate the original
table definition as exact as possible.
For ALTER TABLE and "REPAIR TABLE without FORCE", the following will hold:
If the table is from an older MariaDB version and data conversion is
needed (for example for old type HASH columns, MySQL JSON type or new
TIMESTAMP format) "ALTER TABLE table_name FORCE, algorithm=COPY" will be
used.
The differences between the algorithms are
1) Will use the fastest algorithm the engine supports to do a full repair
of the table (except if data conversions are is needed).
2) Will use the storage engine internal REPAIR facility (MyISAM, Aria).
If the engine does not support REPAIR then
"ALTER TABLE FORCE, ALGORITHM=COPY" will be used.
If there was data incompatibilities (which means that FORCE was used)
then there will be a warning after REPAIR that ALTER TABLE FORCE is
still needed.
The reason for this is that REPAIR may be able to go around data
errors (wrong incompatible data, crashed or unreadable sectors) that
ALTER TABLE cannot do.
3) Will use the storage engine internal OPTIMIZE. If engine does not
support optimize, then "ALTER TABLE FORCE" is used.
The above will ensure that ALTER TABLE FORCE is able to
correct almost any errors in the row or index data. In case of
corrupted blocks then REPAIR possible followed by ALTER TABLE is needed.
This is important as mariadb-upgrade executes ALTER TABLE table_name
FORCE for any table that must be re-created.
Bugs fixed with InnoDB tables when using ALTER TABLE FORCE:
- No error for INNODB_DEFAULT_ROW_FORMAT=COMPACT even if row length
would be too wide. (Independent of innodb_strict_mode).
- Tables using symlinks will be symlinked after any of the above commands
(Independent of the setting of --symbolic-links)
If one specifies an algorithm together with ALTER TABLE FORCE, things
will work as before (except if data conversion is required as then
the COPY algorithm is enforced).
ALTER TABLE .. OPTIMIZE ALL PARTITIONS will work as before.
Other things:
- FORCE argument added to REPAIR to allow one to first run internal
repair to fix damaged blocks and then follow it with ALTER TABLE.
- REPAIR will not update frm_version if ha_check_for_upgrade() finds
that table is still incompatible with current version. In this case the
REPAIR will end with an error.
- REPAIR for storage engines that does not have native repair, like InnoDB,
is now using ALTER TABLE FORCE.
- REPAIR csv-table USE_FRM now works.
- It did not work before as CSV tables had extension list in wrong
order.
- Default error messages length for %M increased from 128 to 256 to not
cut information from REPAIR.
- Documented HA_ADMIN_XX variables related to repair.
- Added HA_ADMIN_NEEDS_DATA_CONVERSION to signal that we have to
do data conversions when converting the table (and thus ALTER TABLE
copy algorithm is needed).
- Fixed typo in error message (caused test changes).
Remove alter_algorithm but keep the variable as no-op (with a warning).
The reasons for removing alter_algorithm are:
- alter_algorithm was introduced as a replacement for the
old_alter_table that was used to force the usage of the original
alter table algorithm (copy) in the cases where the new alter
algorithm did not work. The new option was added as a way to force
the usage of a specific algorithm when it should instead have made
it possible to disable algorithms that would not work for some
reason.
- alter_algorithm introduced some cases where ALTER TABLE would not
work without specifying the ALGORITHM=XXX option together with
ALTER TABLE.
- Having different values of alter_algorithm on master and slave could
cause slave to stop unexpectedly.
- ALTER TABLE FORCE, as used by mariadb-upgrade, would not always work
if alter_algorithm was set for the server.
- As part of the MDEV-33449 "improving repair of tables" it become
clear that alter- algorithm made it harder to provide a better and
more consistent ALTER TABLE FORCE and REPAIR TABLE and it would be
better to remove it.
MDEV-32188 make TIMESTAMP use whole 32-bit unsigned range
This was done by changing DTVAL to use longlong to internally store
the date/timestamp instead of an int.
I had to preserve the old binary representation used for dates in
TABLE_TYPE=BIN. One consequence is that TABLE_TYPE=BIN cannot support
the full date bit range.
MDEV-32188 make TIMESTAMP use whole 32-bit unsigned range
- Changed usage of timeval to my_timeval as the timeval parts on windows
are 32-bit long, which causes some compiler issues on windows.
This patch extends the timestamp from
2038-01-19 03:14:07.999999 to 2106-02-07 06:28:15.999999
for 64 bit hardware and OS where 'long' is 64 bits.
This is true for 64 bit Linux but not for Windows.
This is done by treating the 32 bit stored int as unsigned instead of
signed. This is safe as MariaDB has never accepted dates before the epoch
(1970).
The benefit of this approach that for normal timestamp the storage is
compatible with earlier version.
However for tables using system versioning we before stored a
timestamp with the year 2038 as the 'max timestamp', which is used to
detect current values. This patch stores the new 2106 year max value
as the max timestamp. This means that old tables using system
versioning needs to be updated with mariadb-upgrade when moving them
to 11.4. That will be done in a separate commit.
Various help message improvements:
* MySQL->MariaDB, mysqld->mariadbd, "mysqld daemon" -> "mariadbd process"
* typos
* don't specify defaults directly in the help message
* don't say that an option is deprecated, mark is as such
* missing spaces in the middle of the text
etc
* use new deprecated printer for all deprecated server options
* restore alphabetic option sorting order
* move deprecated printer from mysqld.cc to my_getopt.c
* in --help print deprecation message at the end of the option help
* move 'ALL' help text where it belongs - to other SET options, and
with a correct indentation.
* consistently end all or none command-line option help strings
with a dot - my_print_help() needs that.
It's about 50/50 now, so let's do none, less line wraps in --help
* remove trailing spaces from command-line option help strings
Currently there are mechanism to mark a system variable as
deprecated, but they are only used to print warning messages
when a deprecated variable is set.
Leverage the existing mechanisms in order to make the
deprecation information available at the --help output of mysqld by:
* Moving the deprecation information (i.e `deprecation_substitute`
attribute) from the `sys_var` class into the `my_option` struct.
As every `sys_var` contains its own `my_option` struct, the access
to the deprecation information remains available to `sys_var`
objects. `my_getotp` functions, which works directly with
`my_option` structs, gain access to this information while building
the --help output.
* For plugin variables, leverages the `PLUGIN_VAR_DEPRECATED` flag
and set the `deprecation_substitute` attribute accordingly when
building the `my_option` objects.
* Change the `option_cmp` function to use the `deprecation_substitute`
attribute instead of the name when sorting the options. This way
deprecated options and the substitutes will be grouped together.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
trx_undo_mem_create_at_db_start(): Invoke recv_sys_t::recover()
instead of buf_page_get_gen(), so that all undo log pages will be
recovered correctly. Failure to do this could prevent InnoDB from
starting up due to "Data structure corruption", or it could
potentially lead to a situation where InnoDB starts up but some
transactions were recovered incorrectly.
recv_sys_t::recover(): Only acquire a buffer-fix on the pages,
not a shared latch. This is adequate protection, because this function
is only being invoked during early startup when no "users" are modifying
buffer pool pages. The only writes are due to server bootstrap
(the data files being created) or crash recovery (changes from
ib_logfile0 being applied).
buf_page_get_gen(): Assert that the function is not invoked while crash
recovery is in progress, and that the special mode BUF_GET_RECOVER is
only invoked during crash recovery or server bootstrap.
All this should really have been part of
commit 850d61736d (MDEV-32042).
in buf_dblwr_t::init_or_load_pages()
- InnoDB fails to set the TRX_SYS_DOUBLEWRITE_SPACE_ID_STORED
flag in transaction system header page while recreating
the undo log tablespaces
buf_dblwr_t::init_or_load_pages(): Tries to reset the
space id and try to write into doublewrite buffer even
when read_only mode is enabled.
In srv_all_undo_tablespaces_open(), InnoDB should try to
open the extra unused undo tablespaces instead of trying to
creating it.
This bug was found related to MDEV-34212.
Some InnoDB tests, most notably innodb.table_flags,64k would occasionally
fail. I am able to reproduce this locally on a MemorySanitizer build,
sporadically.
The following is a minimal .test file for reproducing this:
--source include/have_innodb.inc
SELECT @@innodb_page_size;
and the .opt file:
--innodb-undo-tablespaces=0 --innodb-page-size=64k
--innodb-buffer-pool-size=20m
This bug was revealed due to the recent
commit 466ae1cf81
which set innodb_fast_shutdown=0 during server bootstrap
in our regression test driver.
Due to the bug, a write of undo page 50 in the system tablespace
was discarded in buf_page_t::flush(). A subsequent InnoDB startup
failed because an old version of that page would point to a
freed undo log page 300.
mtr_t::commit_shrink(): Only invoke fil_space_t::set_create_lsn()
on undo tablespaces, which will be fully reinitialized or created
from the scratch. On the system tablespace, we must only adjust
the file size, to avoid writing pages that are beyond the end
of the tablespace. Thanks to Thirunarayanan Balathandayuthapani
for providing this fix.
innobase_end(): Do not attempt to shrink the system tablespace if
innodb_read_only=ON or innodb_force_recovery>4. This fixes a regression
due to commit 2d6c2f22a4 (MDEV-32452).
This bug was caught when testing a fix of MDEV-34200, which adds
SET GLOBAL innodb_fast_shutdown=0 to the test innodb.undo_upgrade.
In the merge commit f9807aadef
there were some omissions or errors.
ibuf_remove_free_page(): Return an error if the free list is corrupted
when removing the change buffer on an upgrade. A special 11.0 version of
commit 263932d505 would have been useful.
buf_page_get_gen(): Correctly handle the case that a page was being
concurrently read into the buffer pool and found out to be corrupted.
This was part of commit a4cda66e2d
but had been discarded in the merge.
Because MariaDB Server 11.0 has reached its end of life as of
commit 466ae1cf81 this fix is being applied
to the 11.1 branch.
(Based on original patch by Oleksandr Byelkin)
Multi-table DELETE can execute via "buffered" mode: at phase #1 it collects
rowids of rows to be deleted, then at phase #2 in multi_delete::do_deletes()
it calls handler->rnd_pos() to read rows to be deleted and deletes them.
The problem occurred when phase #1 used Rowid Filter on the table that
phase #2 would be deleting from.
In InnoDB, h->rnd_init(scan=false) and h->rnd_pos() is an index scan over PK
under the hood. So, at phase #2 ha_innobase::rnd_init() would try to use the
Rowid Filter and hit an assertion inside ha_innobase::rnd_init().
Note that multi-table UPDATE works similarly but was not affected, because
patch for MDEV-7487 added code to disable rowid filter for phase #2 in
multi_update::do_updates().
This patch changes the approach:
- It makes InnoDB not use Rowid Filter in rnd_pos() scans: it is disabled in
ha_innobase::rnd_init() and enabled back in ha_innobase::rnd_end().
- multi_update::do_updates() no longer disables Rowid Filter for phase#2 as
it is no longer necessary.
The problem was two fold:
- REPAIR TABLE t1 USE_FRM did not work for transactional
Aria tables (Table was thought to be repaired, which it was not) which
caused issues in later usage of the table.
- When swapping tmp_data file to data file, sort_info files where not
updated. This caused problems if there was several unique keys and
there was a duplicate for the second key.
In the absence of insight of the cause of spider.spider_fixes_part
failure as described in MDEV-30929, This is a workaround, which could
help narrow the possibility down to whether slave SQL thread attempts
to read from file that maybe not yet on disk. It does not otherwise
affect the coverage of the test.
I have pushed this commit 4 times, but have yet to encounter the
failure as described in MDEV-30929, so it could also fix the test and
stop the CI pollution.
Also replaced START SLAVE; with --source include/start_slave.inc
inside the slave_test_init.inc files.
safety first - tell mariadb client not to execute dangerous
cli commands, they cannot be present in the dump anyway.
wrapping the command in /*!999999 ..... */ guarantees that
if a non-mariadb-cli client loads the dump and sends it to the
server - the server will ignore the command it doesn't understand
on disable_indexes(HA_KEY_SWITCH_NONUNIQ_SAVE) the engine does
not know that the long unique is logically unique, because on the
engine level it is not. And the engine disables it,
Change the disable_indexes/enable_indexes API. Instead of the enum
mode, send a key_map of indexes that should be enabled. This way the
server will decide what is unique, not the engine.
UDF isn't supposed to use my_error(), it should return
the error message in the provided error message buffer
Fixes valgrind:
==93993== Conditional jump or move depends on uninitialised value(s)
==93993== at 0x484ECCD: strnlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==93993== by 0x1AD2B2C: process_str_arg (my_vsnprintf.c:259)
==93993== by 0x1AD47E0: my_vsnprintf_ex (my_vsnprintf.c:696)
==93993== by 0x1A3B91E: my_error (my_error.c:120)
==93993== by 0xF87BE8: udf_handler::fix_fields(THD*, Item_func_or_sum*, unsigned int, Item**) (item_func.cc:3638)
followup for 267dd5a993
- ZLIB_LIBRARIES, not ZLIB_LIBRARY
- ZLIB_INCLUDE_DIRS, not ZLIB_INCLUDE_DIR
For building libmariadb, ZLIB_LIBRARY/ZLIB_INCLUDE_DIR are still defined
This workaround will be removed later.
The two I_S plugins SPIDER_ALLOC_MEM and SPIDER_WRAPPER_PROTOCOL
only makes sense if the main SPIDER plugin is installed. Further,
SPIDER_ALLOC_MEM requires a mutex that requires SPIDER init to fill
the table.
We also update the spider init query to override
--transaction_read_only=on so that it does not affect the spider init.
Also fixed error handling in spider_db_init() so that failure in
spider table init does not result in memory leak
Issue: When getting a page (buf_page_get_gen) with no latch option
(RW_NO_LATCH), the caller is not expected to follow the B-tree latching
order. However in buf_page_get_low we try to acquire shared page latch
unconditionally to wait for a page that is being loaded by another
thread concurrently. In general it could lead to latch order violation
and deadlock.
Currently it affects the change buffer insert path btr_latch_prev()
which tries to load the previous page out of order with RW_NO_LATCH and
two concurrent inserts into IBUF tree cause deadlock. This problem is
introduced in 10.6 by following commit.
commit 9436c778c3 (MDEV-27058)
Fix: While trying to latch a page with RW_NO_LATCH, always use the
"*lock_try" interface and retry operation on failure after unfixing the
page.
Problem:
=======
During InnoDB non-rebuild online alter operation, InnoDB set the
dummy log to clustered index online log. This can be used by
concurrent DML to identify whether the table undergoes online DDL.
InnoDB fails to reset the dummy log of clustered index in case
of error happened during prepare phase.
Solution:
========
Reset the InnoDB clustered index online log in case of error during
prepare phase.
Problem:
========
- Currently mariabackup have to reread the pages in case they are
modified by server concurrently. But while reading the undo
tablespace, mariabackup failed to do reread the page in case of
error.
Fix:
===
Mariabackup --backup functionality should have retry logic
while reading the undo tablespaces.
Problem:
========
- InnoDB wrongly calulates the record size in
btr_node_ptr_max_size() when prefix index of
the column has to be stored externally.
Fix:
====
- InnoDB should add the maximum field size to
record size when the field is a fixed length one.