buf_flush_remove(): Disable the output for now, because we
certainly do not want this after every page flush on shutdown.
It must be rate-limited somehow. There already is a timeout
extension for waiting the page cleaner to exit in
logs_empty_and_mark_files_at_shutdown().
log_write_up_to(): Use correct format.
srv_purge_should_exit(): Move the timeout extension to the
appropriate place, from one of the callers.
Use systemd EXTEND_TIMEOUT_USEC to advise systemd of progress
Move towards progress measures rather than pure time based measures.
Progress reporting at numberious shutdown/startup locations incuding:
* For innodb_fast_shutdown=0 trx_roll_must_shutdown() for rolling back incomplete transactions.
* For merging the change buffer (in srv_shutdown(bool ibuf_merge))
* For purging history, srv_do_purge
Thanks Marko for feedback and suggestions.
The purpose of the InnoDB buffer pool dump is to allow InnoDB to be
restarted with the same persistent data pages in the buffer pool.
The InnoDB temporary tablespace that was introduced in MariaDB 10.2.2
is always reinitialized on restart. Therefore, it does not make sense
to attempt to dump or restore any pages of the temporary tablespace.
As this is the only moderately critical fopened for writing file,
create an alternate path to use open and fdopen for non-glibc platforms
that support O_CLOEXEC (BSDs).
Tested on Linux (by modifing the GLIBC defination) to take this
alternate path:
$ cd /proc/23874
$ more fdinfo/71
pos: 0
flags: 02100001
mnt_id: 24
$ ls -la fd/71
l-wx------. 1 dan dan 64 Mar 14 13:30 fd/71 -> /dev/shm/var_auto_i7rl/mysqld.1/data/ib_buffer_pool.incomplete
buf_flush_page_cleaner_coordinator(): Signal the worker threads
to exit while waiting for them to exit. Apparently, signals are
sometimes lost, causing shutdown to occasionally hang when
multiple page cleaners (and buffer pool instances) are used,
that is, when innodb_buffer_pool_size is at least 1 GiB.
buf_flush_page_cleaner_close(): Merge with the only caller.
fil_space_t::atomic_write_supported: Always set this flag for
TEMPORARY TABLESPACE and during IMPORT TABLESPACE. The page
writes during these operations are by definition not crash-safe
because they are not written to the redo log.
fil_space_t::use_doublewrite(): Determine if doublewrite should
be used.
buf_dblwr_update(): Add assertions, and let the caller check whether
doublewrite buffering is desired.
buf_flush_write_block_low(): Disable the doublewrite buffer for
the temporary tablespace and for IMPORT TABLESPACE.
fil_space_set_imported(), fil_node_open_file(), fil_space_create():
Initialize or revise the space->atomic_write_supported flag.
buf_page_io_complete(), buf_flush_write_complete(): Add the parameter
dblwr, to indicate whether doublewrite was used for writes.
buf_dblwr_sync_datafiles(): Remove an unnecessary flush of
persistent tablespaces when flushing temporary tablespaces.
(Move the call to buf_dblwr_flush_buffered_writes().)
Before that line there is call to buf_page_get_gen that could
return block = NULL when decrypting a page fails. However,
we should set error to be != DB_SUCCESS also. In error log
there was error about decompression but in that code there
is one case where error is not set correctly.
Import and adjust the innodb.innodb_buffer_pool_resize tests,
except innodb.innodb_buffer_pool_resize_debug, which would time out.
buf_pool_clear_hash_index(): Adjust assertions.
While the bug was reported as a regression of
MDEV-11025 Make number of page cleaner threads variable dynamic
in MariaDB Server 10.3, the code that MariaDB Server 10.2
inherited from MySQL 5.7.4 (WL#6642) looks prone to similar errors.
pc_flush_slot(): If there is no work to do, reset the is_requested
signal, to avoid potential busy-waiting in
buf_flush_page_cleaner_worker(). If the coordinator thread has shut
down, avoid resetting the is_requested event, to avoid a potential
hang at shutdown if there are multiple worker threads.
mem_heap_free_heap_top(): Remove UNIV_MEM_ASSERT_W() and unpoison
the memory region first, because part of it may have been poisoned
by an earlier mem_heap_free_top() call.
Poison the address range at the end.
mem_heap_block_free(): Poison the address range at the end.
UNIV_MEM_ASSERT_AND_ALLOC(): Replace with UNIV_MEM_ALLOC().
We want to keep the address ranges poisoned (unaccessible) as
long as possible.
UNIV_MEM_ASSERT_AND_FREE(): Replace with UNIV_MEM_FREE().
innodb/buf_LRU_get_free_block
Add debug instrumentation to produce error message about
no free pages. Print error message only once and do not
enable innodb monitor.
xtradb/buf_LRU_get_free_block
Add debug instrumentation to produce error message about
no free pages. Print error message only once and do not
enable innodb monitor. Remove code that does not seem to
be used.
innodb-lru-force-no-free-page.test
New test case to force produce desired error message.
Silence the error log output that was introduced in MySQL 5.7
(MariaDB 10.2.2) if log_warnings=2 or less.
We should still figure out what these messages really indicate
and how to solve the problems.
pc_sleep_if_needed(): Add a parameter for the current time,
so that there will be fewer successive calls to ut_time_ms()
with no I/O between them.
buf_flush_page_cleaner_coordinator(): Exit the first loop
whenever shutdown has been requested. At the start of the loop,
call ut_time_ms() only once. Do not display the message if
log_warnings=2 or less.
Also, MDEV-14317 When ALTER TABLE is aborted, do not write garbage pages to data files
As pointed out by Shaohua Wang, the merge of MDEV-13328 from
MariaDB 10.1 (based on MySQL 5.6) to 10.2 (based on 5.7)
was performed incorrectly.
Let us always pass a non-NULL FlushObserver* when writing
to data files is desired.
FlushObserver::is_partial_flush(): Check if this is a bulk-load
(partial flush of the tablespace).
FlushObserver::is_interrupted(): Check for interrupt status.
buf_LRU_flush_or_remove_pages(): Instead of trx_t*, take
FlushObserver* as a parameter.
buf_flush_or_remove_pages(): Remove the parameters flush, trx.
If observer!=NULL, write out the data pages. Use the new predicate
observer->is_partial() to distinguish a partial tablespace flush
(after bulk-loading) from a full tablespace flush (export).
Return a bool (whether all pages were removed from the flush_list).
buf_flush_dirty_pages(): Remove the parameter trx.
FlushObserver::flush(): Never discard unwritten changes.
We do not want to risk corrupting the system tablespace
or .ibd files that are not part of a table-rebuilding ALTER.
Ever since MDEV-10813 cleaned up InnoDB use of atomic memory operations
and made buf_block_fix() an atomic operation, some conditions around
buf_block_fix() have been unnecessary.
With a big buffer pool that contains many data pages,
DISCARD TABLESPACE took a long time, because it would scan the
entire buffer pool to remove any pages that belong to the tablespace.
With a large buffer pool, this would take a lot of time, especially
when the table-to-discard is empty.
The minimum amount of work that DISCARD TABLESPACE must do is to
remove the pages of the to-be-discarded table from the
buf_pool->flush_list because any writes to the data file must be
prevented before the file is deleted.
If DISCARD TABLESPACE does not evict the pages from the buffer pool,
then IMPORT TABLESPACE must do it, because we must prevent pre-DISCARD,
not-yet-evicted pages from being mistaken for pages of the imported
tablespace.
It would not be a useful fix to simply move the buffer pool scan to
the IMPORT TABLESPACE step. What we can do is to actively evict those
pages that could be mistaken for imported pages. In this way, when
importing a small table into a big buffer pool, the import should
still run relatively fast.
Import is bypassing the buffer pool when reading pages for the
adjustment phase. In the adjustment phase, if a page exists in
the buffer pool, we could replace it with the page from the imported
file. Unfortunately I did not get this to work properly, so instead
we will simply evict any matching page from the buffer pool.
buf_page_get_gen(): Implement BUF_EVICT_IF_IN_POOL, a new mode
where the requested page will be evicted if it is found. There
must be no unwritten changes for the page.
buf_remove_t: Remove. Instead, use trx!=NULL to signify that a write
to file is desired, and use a separate parameter bool drop_ahi.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Replace buf_remove_t.
buf_LRU_remove_pages(), buf_LRU_remove_all_pages(): Remove.
PageConverter::m_mtr: A dummy mini-transaction buffer
PageConverter::PageConverter(): Complete the member initialization list.
PageConverter::operator()(): Evict any 'shadow' pages from the
buffer pool so that pre-existing (garbage) pages cannot be mistaken
for pages that exist in the being-imported file.
row_discard_tablespace(): Remove a bogus comment that seems to
refer to IMPORT TABLESPACE, not DISCARD TABLESPACE.
ibuf_check_bitmap_on_import(): Only access the pages that
are below FSP_FREE_LIMIT. It is possible that especially with
ROW_FORMAT=COMPRESSED, the FSP_SIZE will be much bigger than
the FSP_FREE_LIMIT, and the bitmap pages (page_size*N, 1+page_size*N)
are filled with zero bytes.
buf_page_is_corrupted(), buf_page_io_complete(): Make the
fault injection compatible with MariaDB 10.2.
Backport the IMPORT tests from 10.2.
Issue
=====
The original issue was that the size of a fil_per_table tablespace was calculated
incorrectly during truncate in the presence of an fts index. This incorrect calculation
was fixed as part of BUG#25053705 along with a testcase to reproduce the bug. The
assert that was added as part of it to reproduce the bug was wrong and resulted in
this bug.
Fix
===
Although the assert was removed earlier in a seperate commit as it was blocking the
ntest, this patch replaces the other parts of the code that were added to reproduce
the bug and replaces it with code that tries to reproduce the bug in a different way.
The new code basically tries to tweak conditions so as to simulate the random read
where a page that doesn't exist is tried to be read.
RB: 15890
Reviewed-by: Jimmy Yang <Jimmy.Yang@oracle.com>
Reviewed-by: Satya Bodapati <satya.bodapati@oracle.com>
Some innobase/xtrabackup changes around from 10.1 are null merged
, in partucular using os_set_file_size to extend tablespaces in server
or mariabackup.
They require non-trivial amount of additional work in 10.2, due to
innobase differences between 10.1 and 10.2
In MariaDB Server 10.1, this problem manifests itself only as
a debug assertion failure in page_zip_decompress() when an insert
requires a page to be decompressed.
In MariaDB 10.1, the encryption of InnoDB data files repurposes the
previously unused field FILE_FLUSH_LSN for an encryption key version.
This field was only used in the first page of each file of the system
tablespace. For ROW_FORMAT=COMPRESSED tables, the field was always
written as 0 until encryption was implemented.
There is no bug in the encryption, because the buffer pool blocks will
not be written to files. Instead, copies of the blocks will be encrypted.
In these encrypted copies, the key version field will be updated before
the buffer is written to the file. The field in the buffer pool is
basically garbage that does not really matter.
Already in MariaDB 10.0, the memset() calls to reset this unused field
in buf_flush_update_zip_checksum() and buf_flush_write_block_low()
are unnecessary, because fsp_init_file_page_low() would guarantee that
the field is always 0 in the buffer pool (unless 10.1 encryption is
used).
Removing the unnecessary memset() calls makes page_zip_decompress()
happy and will prevent a SPATIAL INDEX corruption bug in
MariaDB Server 10.2. In MySQL 5.7.5, as part of WL#6968, the same
field was repurposed for an R-tree split sequence number (SSN) and
these memset() were removed. (Because of the repurposing, MariaDB
encryption is not available for tables that contain SPATIAL INDEX.)
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)
- Also fix printf-format warnings
Make the above mentioned warnings fatal.
- fix pthread_join on Windows to set return value.