EE_ numbers occupy the same range as OS Exxx errors and
my_errno is generally for OS errors. And for HA_ERR_ handler errors
which occupy a different range, so can be freely mixed with OS errors.
Two new variables added:
- max_tmp_space_usage : Limits the the temporary space allowance per user
- max_total_tmp_space_usage: Limits the temporary space allowance for
all users.
New status variables: tmp_space_used & max_tmp_space_used
New field in information_schema.process_list: TMP_SPACE_USED
The temporary space is counted for:
- All SQL level temporary files. This includes files for filesort,
transaction temporary space, analyze, binlog_stmt_cache etc.
It does not include engine internal temporary files used for repair,
alter table, index pre sorting etc.
- All internal on disk temporary tables created as part of resolving a
SELECT, multi-source update etc.
Special cases:
- When doing a commit, the last flush of the binlog_stmt_cache
will not cause an error even if the temporary space limit is exceeded.
This is to avoid giving errors on commit. This means that a user
can temporary go over the limit with up to binlog_stmt_cache_size.
Noteworthy issue:
- One has to be careful when using small values for max_tmp_space_limit
together with binary logging and with non transactional tables.
If a the binary log entry for the query is bigger than
binlog_stmt_cache_size and one hits the limit of max_tmp_space_limit
when flushing the entry to disk, the query will abort and the
binary log will not contain the last changes to the table.
This will also stop the slave!
This is also true for all Aria tables as Aria cannot do rollback
(except in case of crashes)!
One way to avoid it is to use @@binlog_format=statement for
queries that updates a lot of rows.
Implementation:
- All writes to temporary files or internal temporary tables, that
increases the file size, are routed through temp_file_size_cb_func()
which updates and checks the temp space usage.
- Most of the temporary file monitoring is done inside IO_CACHE.
Temporary file monitoring is done inside the Aria engine.
- MY_TRACK and MY_TRACK_WITH_LIMIT are new flags for ini_io_cache().
MY_TRACK means that we track the file usage. TRACK_WITH_LIMIT means
that we track the file usage and we give an error if the limit is
breached. This is used to not give an error on commit when
binlog_stmp_cache is flushed.
- global_tmp_space_used contains the total tmp space used so far.
This is needed quickly check against max_total_tmp_space_usage.
- Temporary space errors are using EE_LOCAL_TMP_SPACE_FULL and
handler errors are using HA_ERR_LOCAL_TMP_SPACE_FULL.
This is needed until we move general errors to it's own error space
so that they cannot conflict with system error numbers.
- Return value of my_chsize() and mysql_file_chsize() has changed
so that -1 is returned in the case my_chsize() could not decrease
the file size (very unlikely and will not happen on modern systems).
All calls to _chsize() are updated to check for > 0 as the error
condition.
- At the destruction of THD we check that THD::tmp_file_space == 0
- At server end we check that global_tmp_space_used == 0
- As a precaution against errors in the tmp_space_used code, one can set
max_tmp_space_usage and max_total_tmp_space_usage to 0 to disable
the tmp space quota errors.
- truncate_io_cache() function added.
- Aria tables using static or dynamic row length are registered in 8K
increments to avoid some calls to update_tmp_file_size().
Other things:
- Ensure that all handler errors are registered. Before, some engine
errors could be printed as "Unknown error".
- Fixed bug in filesort() that causes a assert if there was an error
when writing to the temporay file.
- Fixed that compute_window_func() now takes into account write errors.
- In case of parallel replication, rpl_group_info::cleanup_context()
could call trans_rollback() with thd->error set, which would cause
an assert. Fixed by resetting the error before calling trans_rollback().
- Fixed bug in subselect3.inc which caused following test to use
heap tables with low value for max_heap_table_size
- Fixed bug in sql_expression_cache where it did not overflow
heap table to Aria table.
- Added Max_tmp_disk_space_used to slow query log.
- Fixed some bugs in log_slow_innodb.test
This change removed 68 explict strlen() calls from the code.
The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name
Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible
Other things:
- There where compiler issues with ensuring that all character set names
points to the same string: gcc doesn't allow one to use integer constants
when defining global structures (constant char * pointers works fine).
To get around this, I declared defines for each character set name
length.
It seems some overly tolerant compilers (gcc) allow the structure
of IO_CACHE that is defined differently in libmaria to have
members equalivance to the iocache in mysys.
More strict Solaris compilers recognise that rc_pos really
isn't a structure member and won't compile.
There are 2 issues here:
Issue #1: memory allocation.
An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data,
decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc).
Issue #2: IO_CACHE::seek_not_done
When IO_CACHE objects are cloned, they still share the file descriptor.
This means, operation on one IO_CACHE may change the file read position
which will confuse other IO_CACHEs using it.
The fix of these issues would be:
Allocate the buffer to also include the extra size needed for encryption.
Perform seek again after one IO_CACHE reads the file.
Passing a null pointer to a nonnull argument is not only undefined
behaviour, but it also grants the compiler the permission to optimize
away further checks whether the pointer is null. GCC -O2 at least
starting with version 8 may do that, potentially causing SIGSEGV.
These problems were caught in a WITH_UBSAN=ON build with the
Bug#7024 test in main.view.
The issue here is that end_of_file for encrypted temporary IO_CACHE (used by filesort) is updated
using lseek.
Encryption adds storage overhead and hides it from the caller by recalculating offsets and lengths.
Two different IO_CACHE cannot possibly modify the same file
because the encryption key is randomly generated and stored in the IO_CACHE.
So when the tempfiles are encrypted DO NOT use lseek to change end_of_file.
Further observations about updating end_of_file using lseek
1) The end_of_file update is only used for binlog index files
2) The whole point is to update file length when the file was modified via a different file descriptor.
3) The temporary IO_CACHE files can never be modified via a different file descriptor.
4) For encrypted temporary IO_CACHE, end_of_file should not be updated with lseek
fix MDEV-18750: failed to flashback large-size binlog file
fix mysqlbinlog flashback failure caused by reading io_cache without MY_FULL_IO flag
fix MDEV-18750: mysqlbinlog flashback failure on large binlog
On some systems with 10,000+ binlogs, show binary logs could block
log rotation for more than 10 seconds.
This patch fixes this by first caching all binary log names and
releases all mutexes while calculating the sizes of the binary logs.
Other things:
- Ensure that reinit_io_cache() sets end_of_file when moving to read_cache.
This ensures that external changes of the underlying file is known to
the cache.
- get_binlog_list() is made more efficent and show_binlogs() is changed
to call get_binlog_list()
Reviewed by Andrei Elkin
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.
main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
Users expect window functions to produce a certain ordering of rows in
the final result set. Although the standard does not require this, we
already have the filesort result done for when we computed the window
function. If there is no ORDER BY attached to the query, just keep it
till the SELECT is completely evaluated and use that to print the
result.
Update test cases as many did not take care to guarantee a stable
result.
According to logs analysis the Dump thread attempted to read again data which
was already sent. The reason of regressed read turns out in an _my_b_cache_read()
early exit branch which missed to distinguish between total zero size read (e.g
ineffective read when Count argument is zero) from a case when the
requested amount of data is fully read out by sole accessing the cache's
file. In the latter case such then *effective* reading was not
reflected in the cache's state to screw the cache's state.
Fixed with a check introduced of whether the file reading was effective prior to
early exit. When this is the case conduct standard cache state change to
account the actual read size.
Notice the bug can show up also as an error to read binlog event e.g
through BINLOG_GTID_POS() (of MDEV-16886).