Columns added to TABLE_STATISTICS
- ROWS_INSERTED, ROWS_DELETED, ROWS_UPDATED, KEY_READ_HITS and
KEY_READ_MISSES.
Columns added to CLIENT_STATISTICS and USER_STATISTICS:
- KEY_READ_HITS and KEY_READ_MISSES.
User visible changes (except new columns):
- CLIENT_STATISTICS and USER_STATISTICS has columns KEY_READ_HITS and
KEY_READ_MISSES added after column ROWS_UPDATED before SELECT_COMMANDS.
Other changes:
- Do not collect table statistics for system tables like index_stats
table_stats, performance_schema, information_schema etc as the user
has no control of these and the generate noice in the statistics.
- All row variables that are part of user_stats are moved to
'struct rows_stats' to make it easy to clear all of them at once.
- ha_read_key_misses added to STATUS_VAR
Notes:
- userstat.result has a change of numbers of rows for handler_read_key.
This is because use-stat-tables is now disabled for the test.
EE_ numbers occupy the same range as OS Exxx errors and
my_errno is generally for OS errors. And for HA_ERR_ handler errors
which occupy a different range, so can be freely mixed with OS errors.
Changes:
- Fixed that MyISAM and Aria parallel repair works with tmp file limit.
This required to add current_thd to all parallel workers and add
protection in my_malloc_size_cb_func() and temp_file_size_cb_func() to
be able to handle shared THD's. I removed the old code in MyISAM to
set current_thd() as only worked when using with virtal indexed
columns and I wanted to keep the Aria and MyISAM code identical.
Other things:
- Improved error messages from Aria parallel repair and
create_internal_tmp_table_from_heap().
Two new variables added:
- max_tmp_space_usage : Limits the the temporary space allowance per user
- max_total_tmp_space_usage: Limits the temporary space allowance for
all users.
New status variables: tmp_space_used & max_tmp_space_used
New field in information_schema.process_list: TMP_SPACE_USED
The temporary space is counted for:
- All SQL level temporary files. This includes files for filesort,
transaction temporary space, analyze, binlog_stmt_cache etc.
It does not include engine internal temporary files used for repair,
alter table, index pre sorting etc.
- All internal on disk temporary tables created as part of resolving a
SELECT, multi-source update etc.
Special cases:
- When doing a commit, the last flush of the binlog_stmt_cache
will not cause an error even if the temporary space limit is exceeded.
This is to avoid giving errors on commit. This means that a user
can temporary go over the limit with up to binlog_stmt_cache_size.
Noteworthy issue:
- One has to be careful when using small values for max_tmp_space_limit
together with binary logging and with non transactional tables.
If a the binary log entry for the query is bigger than
binlog_stmt_cache_size and one hits the limit of max_tmp_space_limit
when flushing the entry to disk, the query will abort and the
binary log will not contain the last changes to the table.
This will also stop the slave!
This is also true for all Aria tables as Aria cannot do rollback
(except in case of crashes)!
One way to avoid it is to use @@binlog_format=statement for
queries that updates a lot of rows.
Implementation:
- All writes to temporary files or internal temporary tables, that
increases the file size, are routed through temp_file_size_cb_func()
which updates and checks the temp space usage.
- Most of the temporary file monitoring is done inside IO_CACHE.
Temporary file monitoring is done inside the Aria engine.
- MY_TRACK and MY_TRACK_WITH_LIMIT are new flags for ini_io_cache().
MY_TRACK means that we track the file usage. TRACK_WITH_LIMIT means
that we track the file usage and we give an error if the limit is
breached. This is used to not give an error on commit when
binlog_stmp_cache is flushed.
- global_tmp_space_used contains the total tmp space used so far.
This is needed quickly check against max_total_tmp_space_usage.
- Temporary space errors are using EE_LOCAL_TMP_SPACE_FULL and
handler errors are using HA_ERR_LOCAL_TMP_SPACE_FULL.
This is needed until we move general errors to it's own error space
so that they cannot conflict with system error numbers.
- Return value of my_chsize() and mysql_file_chsize() has changed
so that -1 is returned in the case my_chsize() could not decrease
the file size (very unlikely and will not happen on modern systems).
All calls to _chsize() are updated to check for > 0 as the error
condition.
- At the destruction of THD we check that THD::tmp_file_space == 0
- At server end we check that global_tmp_space_used == 0
- As a precaution against errors in the tmp_space_used code, one can set
max_tmp_space_usage and max_total_tmp_space_usage to 0 to disable
the tmp space quota errors.
- truncate_io_cache() function added.
- Aria tables using static or dynamic row length are registered in 8K
increments to avoid some calls to update_tmp_file_size().
Other things:
- Ensure that all handler errors are registered. Before, some engine
errors could be printed as "Unknown error".
- Fixed bug in filesort() that causes a assert if there was an error
when writing to the temporay file.
- Fixed that compute_window_func() now takes into account write errors.
- In case of parallel replication, rpl_group_info::cleanup_context()
could call trans_rollback() with thd->error set, which would cause
an assert. Fixed by resetting the error before calling trans_rollback().
- Fixed bug in subselect3.inc which caused following test to use
heap tables with low value for max_heap_table_size
- Fixed bug in sql_expression_cache where it did not overflow
heap table to Aria table.
- Added Max_tmp_disk_space_used to slow query log.
- Fixed some bugs in log_slow_innodb.test
- FLUSH GLOBAL STATUS now resets most global_status_vars.
At this stage, this is mainly to be used for testing.
- FLUSH SESSION STATUS added as an alias for FLUSH STATUS.
- FLUSH STATUS does not require any privilege (before required RELOAD).
- FLUSH GLOBAL STATUS requires RELOAD privilege.
- All global status reset moved to FLUSH GLOBAL STATUS.
- Replication semisync status variables are now reset by
FLUSH GLOBAL STATUS.
- In test cases, the only changes are:
- Replace FLUSH STATUS with FLUSH GLOBAL STATUS
- Replace FLUSH STATUS with FLUSH STATUS; FLUSH GLOBAL STATUS.
This was only done in a few tests where the test was using SHOW STATUS
for both local and global variables.
- Uptime_since_flush_status is now always provided, independent if
ENABLED_PROFILING is enabled when compiling MariaDB.
- @@global.Uptime_since_flush_status is reset on FLUSH GLOBAL STATUS
and @@session.Uptime_since_flush_status is reset on FLUSH SESSION STATUS.
- When connected, @@session.Uptime_since_flush_status is set to 0.
Added SHOW_LONGLONG_NOFLUSH to mark the variables that should not be
flushed.
New variables cleared as part of SHOW STATUS:
- Rpl_semi_sync_master_request_ack
- Rpl_semi_sync_master_get_ac
- Rpl_semi_sync_slave_send_ack
- Slave_skipped_error
This task is to ensure we have a clear definition and rules of how to
repair or optimize a table.
The rules are:
- REPAIR should be used with tables that are crashed and are
unreadable (hardware issues with not readable blocks, blocks with
'unexpected data' etc)
- OPTIMIZE table should be used to optimize the storage layout for the
table (recover space for delete rows and optimize the index
structure.
- ALTER TABLE table_name FORCE should be used to rebuild the .frm file
(the table definition) and the table (with the original table row
format). If the table is from and older MariaDB/MySQL release with a
different storage format, it will convert the data to the new
format. ALTER TABLE ... FORCE is used as part of mariadb-upgrade
Here follows some more background:
The 3 ways to repair a table are:
1) ALTER TABLE table_name FORCE" (not other options).
As an alias we allow: "ALTER TABLE table_name ENGINE=original_engine"
2) "REPAIR TABLE" (without FORCE)
3) "OPTIMIZE TABLE"
All of the above commands will optimize row space usage (which means that
space will be needed to hold a temporary copy of the table) and
re-generate all indexes. They will also try to replicate the original
table definition as exact as possible.
For ALTER TABLE and "REPAIR TABLE without FORCE", the following will hold:
If the table is from an older MariaDB version and data conversion is
needed (for example for old type HASH columns, MySQL JSON type or new
TIMESTAMP format) "ALTER TABLE table_name FORCE, algorithm=COPY" will be
used.
The differences between the algorithms are
1) Will use the fastest algorithm the engine supports to do a full repair
of the table (except if data conversions are is needed).
2) Will use the storage engine internal REPAIR facility (MyISAM, Aria).
If the engine does not support REPAIR then
"ALTER TABLE FORCE, ALGORITHM=COPY" will be used.
If there was data incompatibilities (which means that FORCE was used)
then there will be a warning after REPAIR that ALTER TABLE FORCE is
still needed.
The reason for this is that REPAIR may be able to go around data
errors (wrong incompatible data, crashed or unreadable sectors) that
ALTER TABLE cannot do.
3) Will use the storage engine internal OPTIMIZE. If engine does not
support optimize, then "ALTER TABLE FORCE" is used.
The above will ensure that ALTER TABLE FORCE is able to
correct almost any errors in the row or index data. In case of
corrupted blocks then REPAIR possible followed by ALTER TABLE is needed.
This is important as mariadb-upgrade executes ALTER TABLE table_name
FORCE for any table that must be re-created.
Bugs fixed with InnoDB tables when using ALTER TABLE FORCE:
- No error for INNODB_DEFAULT_ROW_FORMAT=COMPACT even if row length
would be too wide. (Independent of innodb_strict_mode).
- Tables using symlinks will be symlinked after any of the above commands
(Independent of the setting of --symbolic-links)
If one specifies an algorithm together with ALTER TABLE FORCE, things
will work as before (except if data conversion is required as then
the COPY algorithm is enforced).
ALTER TABLE .. OPTIMIZE ALL PARTITIONS will work as before.
Other things:
- FORCE argument added to REPAIR to allow one to first run internal
repair to fix damaged blocks and then follow it with ALTER TABLE.
- REPAIR will not update frm_version if ha_check_for_upgrade() finds
that table is still incompatible with current version. In this case the
REPAIR will end with an error.
- REPAIR for storage engines that does not have native repair, like InnoDB,
is now using ALTER TABLE FORCE.
- REPAIR csv-table USE_FRM now works.
- It did not work before as CSV tables had extension list in wrong
order.
- Default error messages length for %M increased from 128 to 256 to not
cut information from REPAIR.
- Documented HA_ADMIN_XX variables related to repair.
- Added HA_ADMIN_NEEDS_DATA_CONVERSION to signal that we have to
do data conversions when converting the table (and thus ALTER TABLE
copy algorithm is needed).
- Fixed typo in error message (caused test changes).
Remove alter_algorithm but keep the variable as no-op (with a warning).
The reasons for removing alter_algorithm are:
- alter_algorithm was introduced as a replacement for the
old_alter_table that was used to force the usage of the original
alter table algorithm (copy) in the cases where the new alter
algorithm did not work. The new option was added as a way to force
the usage of a specific algorithm when it should instead have made
it possible to disable algorithms that would not work for some
reason.
- alter_algorithm introduced some cases where ALTER TABLE would not
work without specifying the ALGORITHM=XXX option together with
ALTER TABLE.
- Having different values of alter_algorithm on master and slave could
cause slave to stop unexpectedly.
- ALTER TABLE FORCE, as used by mariadb-upgrade, would not always work
if alter_algorithm was set for the server.
- As part of the MDEV-33449 "improving repair of tables" it become
clear that alter- algorithm made it harder to provide a better and
more consistent ALTER TABLE FORCE and REPAIR TABLE and it would be
better to remove it.
MDEV-32188 make TIMESTAMP use whole 32-bit unsigned range
This was done by changing DTVAL to use longlong to internally store
the date/timestamp instead of an int.
I had to preserve the old binary representation used for dates in
TABLE_TYPE=BIN. One consequence is that TABLE_TYPE=BIN cannot support
the full date bit range.
MDEV-32188 make TIMESTAMP use whole 32-bit unsigned range
- Changed usage of timeval to my_timeval as the timeval parts on windows
are 32-bit long, which causes some compiler issues on windows.
This patch extends the timestamp from
2038-01-19 03:14:07.999999 to 2106-02-07 06:28:15.999999
for 64 bit hardware and OS where 'long' is 64 bits.
This is true for 64 bit Linux but not for Windows.
This is done by treating the 32 bit stored int as unsigned instead of
signed. This is safe as MariaDB has never accepted dates before the epoch
(1970).
The benefit of this approach that for normal timestamp the storage is
compatible with earlier version.
However for tables using system versioning we before stored a
timestamp with the year 2038 as the 'max timestamp', which is used to
detect current values. This patch stores the new 2106 year max value
as the max timestamp. This means that old tables using system
versioning needs to be updated with mariadb-upgrade when moving them
to 11.4. That will be done in a separate commit.
Various help message improvements:
* MySQL->MariaDB, mysqld->mariadbd, "mysqld daemon" -> "mariadbd process"
* typos
* don't specify defaults directly in the help message
* don't say that an option is deprecated, mark is as such
* missing spaces in the middle of the text
etc
* use new deprecated printer for all deprecated server options
* restore alphabetic option sorting order
* move deprecated printer from mysqld.cc to my_getopt.c
* in --help print deprecation message at the end of the option help
* move 'ALL' help text where it belongs - to other SET options, and
with a correct indentation.
* consistently end all or none command-line option help strings
with a dot - my_print_help() needs that.
It's about 50/50 now, so let's do none, less line wraps in --help
* remove trailing spaces from command-line option help strings
Currently there are mechanism to mark a system variable as
deprecated, but they are only used to print warning messages
when a deprecated variable is set.
Leverage the existing mechanisms in order to make the
deprecation information available at the --help output of mysqld by:
* Moving the deprecation information (i.e `deprecation_substitute`
attribute) from the `sys_var` class into the `my_option` struct.
As every `sys_var` contains its own `my_option` struct, the access
to the deprecation information remains available to `sys_var`
objects. `my_getotp` functions, which works directly with
`my_option` structs, gain access to this information while building
the --help output.
* For plugin variables, leverages the `PLUGIN_VAR_DEPRECATED` flag
and set the `deprecation_substitute` attribute accordingly when
building the `my_option` objects.
* Change the `option_cmp` function to use the `deprecation_substitute`
attribute instead of the name when sorting the options. This way
deprecated options and the substitutes will be grouped together.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
trx_undo_mem_create_at_db_start(): Invoke recv_sys_t::recover()
instead of buf_page_get_gen(), so that all undo log pages will be
recovered correctly. Failure to do this could prevent InnoDB from
starting up due to "Data structure corruption", or it could
potentially lead to a situation where InnoDB starts up but some
transactions were recovered incorrectly.
recv_sys_t::recover(): Only acquire a buffer-fix on the pages,
not a shared latch. This is adequate protection, because this function
is only being invoked during early startup when no "users" are modifying
buffer pool pages. The only writes are due to server bootstrap
(the data files being created) or crash recovery (changes from
ib_logfile0 being applied).
buf_page_get_gen(): Assert that the function is not invoked while crash
recovery is in progress, and that the special mode BUF_GET_RECOVER is
only invoked during crash recovery or server bootstrap.
All this should really have been part of
commit 850d61736d (MDEV-32042).
in buf_dblwr_t::init_or_load_pages()
- InnoDB fails to set the TRX_SYS_DOUBLEWRITE_SPACE_ID_STORED
flag in transaction system header page while recreating
the undo log tablespaces
buf_dblwr_t::init_or_load_pages(): Tries to reset the
space id and try to write into doublewrite buffer even
when read_only mode is enabled.
In srv_all_undo_tablespaces_open(), InnoDB should try to
open the extra unused undo tablespaces instead of trying to
creating it.
This bug was found related to MDEV-34212.
Some InnoDB tests, most notably innodb.table_flags,64k would occasionally
fail. I am able to reproduce this locally on a MemorySanitizer build,
sporadically.
The following is a minimal .test file for reproducing this:
--source include/have_innodb.inc
SELECT @@innodb_page_size;
and the .opt file:
--innodb-undo-tablespaces=0 --innodb-page-size=64k
--innodb-buffer-pool-size=20m
This bug was revealed due to the recent
commit 466ae1cf81
which set innodb_fast_shutdown=0 during server bootstrap
in our regression test driver.
Due to the bug, a write of undo page 50 in the system tablespace
was discarded in buf_page_t::flush(). A subsequent InnoDB startup
failed because an old version of that page would point to a
freed undo log page 300.
mtr_t::commit_shrink(): Only invoke fil_space_t::set_create_lsn()
on undo tablespaces, which will be fully reinitialized or created
from the scratch. On the system tablespace, we must only adjust
the file size, to avoid writing pages that are beyond the end
of the tablespace. Thanks to Thirunarayanan Balathandayuthapani
for providing this fix.
innobase_end(): Do not attempt to shrink the system tablespace if
innodb_read_only=ON or innodb_force_recovery>4. This fixes a regression
due to commit 2d6c2f22a4 (MDEV-32452).
This bug was caught when testing a fix of MDEV-34200, which adds
SET GLOBAL innodb_fast_shutdown=0 to the test innodb.undo_upgrade.
In the merge commit f9807aadef
there were some omissions or errors.
ibuf_remove_free_page(): Return an error if the free list is corrupted
when removing the change buffer on an upgrade. A special 11.0 version of
commit 263932d505 would have been useful.
buf_page_get_gen(): Correctly handle the case that a page was being
concurrently read into the buffer pool and found out to be corrupted.
This was part of commit a4cda66e2d
but had been discarded in the merge.
Because MariaDB Server 11.0 has reached its end of life as of
commit 466ae1cf81 this fix is being applied
to the 11.1 branch.
(Based on original patch by Oleksandr Byelkin)
Multi-table DELETE can execute via "buffered" mode: at phase #1 it collects
rowids of rows to be deleted, then at phase #2 in multi_delete::do_deletes()
it calls handler->rnd_pos() to read rows to be deleted and deletes them.
The problem occurred when phase #1 used Rowid Filter on the table that
phase #2 would be deleting from.
In InnoDB, h->rnd_init(scan=false) and h->rnd_pos() is an index scan over PK
under the hood. So, at phase #2 ha_innobase::rnd_init() would try to use the
Rowid Filter and hit an assertion inside ha_innobase::rnd_init().
Note that multi-table UPDATE works similarly but was not affected, because
patch for MDEV-7487 added code to disable rowid filter for phase #2 in
multi_update::do_updates().
This patch changes the approach:
- It makes InnoDB not use Rowid Filter in rnd_pos() scans: it is disabled in
ha_innobase::rnd_init() and enabled back in ha_innobase::rnd_end().
- multi_update::do_updates() no longer disables Rowid Filter for phase#2 as
it is no longer necessary.
The problem was two fold:
- REPAIR TABLE t1 USE_FRM did not work for transactional
Aria tables (Table was thought to be repaired, which it was not) which
caused issues in later usage of the table.
- When swapping tmp_data file to data file, sort_info files where not
updated. This caused problems if there was several unique keys and
there was a duplicate for the second key.
In the absence of insight of the cause of spider.spider_fixes_part
failure as described in MDEV-30929, This is a workaround, which could
help narrow the possibility down to whether slave SQL thread attempts
to read from file that maybe not yet on disk. It does not otherwise
affect the coverage of the test.
I have pushed this commit 4 times, but have yet to encounter the
failure as described in MDEV-30929, so it could also fix the test and
stop the CI pollution.
Also replaced START SLAVE; with --source include/start_slave.inc
inside the slave_test_init.inc files.
safety first - tell mariadb client not to execute dangerous
cli commands, they cannot be present in the dump anyway.
wrapping the command in /*!999999 ..... */ guarantees that
if a non-mariadb-cli client loads the dump and sends it to the
server - the server will ignore the command it doesn't understand
on disable_indexes(HA_KEY_SWITCH_NONUNIQ_SAVE) the engine does
not know that the long unique is logically unique, because on the
engine level it is not. And the engine disables it,
Change the disable_indexes/enable_indexes API. Instead of the enum
mode, send a key_map of indexes that should be enabled. This way the
server will decide what is unique, not the engine.
UDF isn't supposed to use my_error(), it should return
the error message in the provided error message buffer
Fixes valgrind:
==93993== Conditional jump or move depends on uninitialised value(s)
==93993== at 0x484ECCD: strnlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==93993== by 0x1AD2B2C: process_str_arg (my_vsnprintf.c:259)
==93993== by 0x1AD47E0: my_vsnprintf_ex (my_vsnprintf.c:696)
==93993== by 0x1A3B91E: my_error (my_error.c:120)
==93993== by 0xF87BE8: udf_handler::fix_fields(THD*, Item_func_or_sum*, unsigned int, Item**) (item_func.cc:3638)
followup for 267dd5a993