mtr.commit_file() call in fil_space_t::drop() removes space from
fil_system.named_spaces, but then the space can be inserted in the
container again by some another thread while fil_space_t::drop() is
waiting for pending operations finishing.
The fix is to check and remove a space from fil_system.named_spaces
after all pengind operations on the space are finished. Also the ut_d()
macro is removed for space->max_lsn=0 assignments to avoid repeated
space removing from fil_system.named_spaces.
There is error in ilist::pop_back(). ilist::end() returns sentinel,
and the pop_back() removes sentinel from the list instead of the last
element. The error is fixed in this commit.
Reviewed by Marko Mäkelä
fil_ibd_create(): Add a DEBUG_SYNC point for the test case.
fil_node_open_file_low(): If node->deferred is set, set the
OS_FILE_ON_ERROR_SILENT flag on OS_FILE_OPEN and attempt
OS_FILE_CREATE if needed. If this fails, then InnoDB will
refuse to start up, giving the operator a chance to resolve
the situation, for example by freeing up some space in the
file system.
recv_validate_tablespace(): Invoke deferred_spaces.add() on
any missing tablespace for which we know the LSN of the FILE_CREATE
record. In this way, fil_node_open_file_low() will end up being invoked
on files that are supposed to be created.
fil_name_process(): For FILE_CREATE, remember the create_lsn.
recv_sys_t::parse(): Pass FILE_CREATE to fil_name_process().
Some existing tests have been adjusted for the improved recovery of
file creation.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Saahil Alam
It was possible that a log checkpoint was completed and the server killed
between the time that fil_ibd_create() durably wrote a FILE_CREATE record,
and the initialization of the tablespace. This could lead to a failure
to start up after the server was killed during TRUNCATE TABLE or any
table-rebuilding operation such as OPTIMIZE TABLE.
In the case of TRUNCATE TABLE, an attempt to rename a file #sql-ibNNN.ibd
(the contents of the table before truncation) to tablename.ibd would fail,
because both files existed and the file tablename.ibd would have been
filled with NUL bytes. It was possible to resume from this error by
deleting the file tablename.ibd and restarting the server.
We will prevent this class of errors by ensuring that both the FILE_CREATE
record and the records written by fsp_header_init() will be part of the
same atomic transaction, which must be durably written before any file
is created or allocated.
NOTE: There is another possible crash recovery problem, which we are not
attempting to solve here and which will be covered by the subsequent
change (MDEV-38026). If fil_ibd_create() fails to create the file and
the server were killed, recovery would not even attempt to create the file
at all.
fil_space_t::create(): Remove the DBUG_EXECUTE_IF fault injection that
was the only cause of return nullptr. This allows us to simplify
several callers.
fil_space_t::set_stopped(), fil_space_t::clear_stopped(): Accessor
functions for fil_ibd_create() for preventing any concurrent access
to an incompletely created tablespace.
fil_ibd_create(): In a single atomic mini-transaction, write the
FILE_CREATE record as well as the log for initializing the tablespace.
After durably writing the log, create the file in the file system
Only after the file has been successfully created and allocated,
open the tablespace for business. Finally, release the exclusive page
latches so that the header pages may be written to the file.
fil_ibd_open(): Move some fault injection from fil_space_t::create()
to a higher level, to the place where an existing file is being opened.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Saahil Alam
* remove the file to be --repeat friendly
* specify the correct defaults-group-suffix. Spider doesn't use the
conventional .1 group, so MYSQLD_CMD skips a lot of config options,
in particular it doesn't read --tmpdir and creates files in /tmp
Problem:
=======
In multi-batch recovery, pages with LSN > recv_sys.scanned_lsn
are discarded from doublewrite buffer. Later when
reading a corrupted page from disk, corresponding dblwr page
is no longer available. This leads to failure of InnoDB recovery.
Solution:
=========
In buf_dblwr_t::recover(): Replace recv_sys.scanned_lsn
with max_lsn when checking if a page's LSN is within
the recovery range.
When a page is being requested from buf_page_get_gen() with
mode=BUF_GET_POSSIBLY_FREED, it is very well possible that the page
had been marked as freed in the tablespace and had never been written
back to the tablespace.
buf_page_t::read_complete(): Validate the page identifier before
invoking buf_zip_decompress(), to avoid noisy messages or debug
assertion failures on an obviously invalid page.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Saahil Alam
@@new_mode is a set of flags to control introduced features.
Flags are by default off. Setting a flag in @@new_mode will introduce
a new different server behaviour and/or set of features.
We also introduce a new option 'hidden_values' into some system variable
types to hide options that we do not wish to show as options.
- Don't print hidden values in mysqld --help output.
- Make get_options() use the same logic as check_new_mode_value() does.
- Setting @@new_mode=ALL shouldn't give warnings.
(cherry picked from commit f7387cb13d)
Any InnoDB write workload is marking data pages in the buffer pool dirty.
To make the changes durable, it suffices to write to the write-ahead log
to facilitate crash recovery. The longer we keep pages dirty in the
buffer pool, the better, because the same pages could be modified again
and a single write to the file system could cover several changes.
Eventually, we must write out dirty pages, so that we can advance the
log checkpoint, that is, allow the start of the recovery log to be
discarded. (On a clean shutdown, a full checkpoint to the end of the log
will be made.)
A write workload can be bound by innodb_buffer_pool_size
(we must write out changes and evict data pages to make room for others)
or by innodb_log_file_size
(we must advance the log checkpoint before the tail of the circular
ib_logfile0 would overwrite the previous checkpoint).
In innodb_log_file_size bound workloads, we failed to set an optimal
target for the next checkpoint LSN. If we write out too few pages, then
all writer threads may occasionally be blocked in log_free_check() while
the buf_flush_page_cleaner() thread is resolving the situation. If we
write out too many pages, then the I/O subsystem will be loaded
unnecessarily and there will be some write amplification in case some of
the unnecessarily written pages would be modified soon afterwards.
log_close(): Return the target LSN for buf_flush_ahead(lsn),
bitwise-ORed with the "furious" flag, or the special values 0
indicating that no flushing is needed, which is the usual case.
log_checkpoint_margin(): Use a similar page checkpoint target as
log_close() for the !not_furious case.
mtr_flush_ahead(): A wrapper for buf_flush_ahead().
mtr_t::commit_log_release(): Make some common code non-inline in order
to reduce code duplication.
buf_flush_ahead(lsn, furious=false): Avoid an unnecessary wake-up of the
page cleaner if it is scheduled to wake up once per second.
Co-developed-by: Alessandro Vetere
Reviewed by: Vladislav Lesin
row_raw_format_str(): Treat the invalid value charset_coll==0 as binary.
This could be invoked on FTS_%_CONFIG.key or SYS_FOREIGN.ID
or possible other key columns.
dtype_is_utf8(): Merge to its only caller row_raw_format_str().
row_raw_format(): Add debug assertions and comments to document
when dtype_get_charset_coll(prtype) may be 0.
rtree_mbr_from_wkb::n_node: Change the data type from int to uint16_t
and pack it together with the added field key_len. In this way, an
increase of sizeof(rtree_mbr_from_wkb) will be avoided.
Problem:
========
- When an R-tree root page becomes full and requires splitting,
InnoDB follows a specific root-raising procedure to maintain
tree integrity. The process involves allocating a new page
(Page X) to hold the current root's content, preserving the
original root page number as the tree's entry point, and
migrating all existing records to Page X.
The root page is then cleared and reconstructed as an
internal node containing a single node pointer with an
MBR that encompasses all spatial objects on Page X.
Subsequently, InnoDB should split the records on Page X
into two spatially optimized groups using the
pick_seeds() and pick_next() algorithms,
creating a second page (Page Y) for Group B records
while retaining Group A records on Page X.
After records are redistributed between Page X and Page Y,
the recalculated MBR for Page X must remain within
or be smaller than the original MBR stored in the
root page's node pointer.
Bug scenario:
============
- When root page 4 becomes full, it triggers a split operation
where the content is copied to page 7 and root page 4 is cleared
to become an internal node.
- During the first split attempt on page 7, Group 1 overflows
and remaining entries are reassigned to Group 2.
- A new page 8 is created and the remaining entry record
is inserted, but the combined size of the remaining entry
record and new record exceeds the page size limit.
- This triggers a second split operation on page 7, where
Group 2 overflows again and entries are moved back to Group 1.
- When the new record is finally inserted into page 7,
it causes the MBR (Minimum Bounding Rectangle) for page 7
to expand beyond its original boundaries.
- Subsequently, when InnoDB attempts to update the parent
page 4 with the new MBR information, it fails to locate
the corresponding internal node, leading to spatial
index corruption and the reported failure.
Problem:
========
- Second split operation should happen on page 8, not on page 7.
- split_rtree_node() considers key_size to estimate
record sizes during the splitting algorithm, which fails to
account for variable-length fields in spatial records.
- In rtr_page_split_and_insert(), when reorganization
succeeds, InnoDB doesn't attempt the insert the entry
Solution:
========
rtr_page_split_and_insert(): InnoDB should do insert the
tuple when btr_page_reorganize() is successful.
rtr_page_split_and_insert(): Use the overflow page
for consecutive split operation.
split_rtree_node(): Store the record length for each
record in r-tree node. This should give proper
estimation while determining the group entries and
also helpful in overflow validation
This is a fixup of 41725b4cee where the
symlinks were incorrectly created using absolute paths.
Also fix ps protocol for spider/bugfix.mdev_29502,usual_handler
buf_read_page_low(): Implement separate code paths for synchronous
and asynchronous read, similar to
commit b2c1ba820b (MDEV-37860),
and always invoke thd_wait_begin() and thd_wait_end()
the same number of times.
This fixes a regression that affects the non-default setting
thread_handling=pool-of-threads and had been introduced in
commit b1ab211dee (MDEV-15053).
Reviewed by: Vladislav Vaintroub
This prevents segv on NULL result_list->current->result on handler
calls that are not the first. Reasons supporting this change:
- spider_db_result::limit_mode is not called anywhere else. The only
calls to limit_mode method are on spider_db_conn
- it is very unlikely (impossible?) for the connection to change from
one backend to another between execution of sql statements and
storing result:
/* in spider_bg_conn_action: */
if (dbton_handler->execute_sql(
sql_type,
conn,
result_list->quick_mode,
&spider->need_mons[conn->link_idx])
// [... 9 lines elided]
if (!(result_list->bgs_error =
spider_db_store_result(spider, conn->link_idx,
result_list->table)))
this also means it is very unlikely (impossible?) for the backend
type (dbton_id) to differ between conn->db_conn and
result_list->current->result, also considering that
spider_db_result::dbton_id comes from spider_db_conn::dbton_id:
spider_db_result::spider_db_result(
SPIDER_DB_CONN *in_db_conn
) : db_conn(in_db_conn), dbton_id(in_db_conn->dbton_id)
Since this was the only call to spider_db_result::limit_mode, we also
remove the method altogether.
Ideally spider should throw an error when local and remote tables have
conflict definitions.
This fixes the test for --view-protocol and
--mysqld=--loose-spider-disable-group-by-handler
Multi-table UPDATE and DELETE statements employ mysql_select() calls
during their processing, so the server may try to instantiate
`select_handler` in an attempt to push down the statement to
a foreign engine.
However, the current implementation of `select_handler` for FederatedX
pushes down the whole query and not only its select part
(`thd->query()`, see `int ha_federatedx_select_handler::init_scan()`).
FederatedX engine does not support execution of DML statements
on the remote side, that is why the error occured.
Solution:
- Add an extra check to only allow SELECT statements pushdown
to FederatedX
The debug parameter innodb_trx_rseg_n_slots_debug is not compatible
with the bug fix that was implemented in MDEV-30671 and last revised
in MDEV-33213, which leads to bogus assertion failures in
trx_assign_rseg_low(), regarding look_for_rollover. When needed,
this parameter can be better simulated by adding a DBUG_EXECUTE_IF to
trx_undo_seg_create() or one of its callers.
btr_search_lazy_free(): Check for !table->id. It is possible that
dict_sys_t::remove(table, ...) has started executing but has not
reached the final part yet. If that is the case, let
dict_sys_t::remove() observe that table->freed_indexes is empty
and let it free the table object.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Saahil Alam
btr_search_update_hash_ref(): Check the consistency of block->index
and block->n_pointers only while holding the partition latch.
Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Saahil Alam
The ha_partition::direct_delete_rows() only calls file->ha_rnd_init() if
the rnd_seq is on. (that is when it's the first call to
the direct_update_rows() with RND scan). So don't call the ::ha_rnd_end() if
rnd_seq isn't set.
Problem:
=======
By keeping innodb_ft_min_token_size = 0, InnoDB FTS does tokenization
on empty string and tries to add the nullptr as token word. This
leads to crash when we try to convert the token into lower case
characters.
Solution:
=========
The minimum value of innodb_ft_min_token_size should be 1.
It should be consistent with MYISAM and ARIA engine.
The ha_maria::repair() call leaves the MARIA_HA::cur_row::rowid filled
after index-recreating scans.
Should be emptied so consequent operations dont fail.
buf_read_page(): Return a pointer to a buffer-fixed, non-read-fixed page,
or nullptr in case of an error.
IORequest::read_complete(): Assert that the page is both read-fixed
and buffer-fixed. Sample recv_sys.recovery_on only once.
Buffer-unfix the page when the asynchronous read completes.
buf_page_t::read_complete(): Assert that the page is both
read-fixed and buffer-fixed. Remove a hard-to-test condition
on innodb_force_recovery that made a difference when the page number
in the read frame did not match. It makes no sense to ignore such
corruption.
buf_page_t::read_wait(): Wait for a read-fixed and buffer-fixed page
to be only buffer-fixed, by acquiring a shared latch.
buf_page_init_for_read(): Return a pointer to a buffer-fixed block
descriptor pointer, bitwise-ORed with 1 in case the block already
exists in the buffer pool.
buf_page_create_low(): Before invoking buf_relocate() on a
compressed-only ROW_FORMAT=COMPRESSED page, wait for any concurrent
IORequest::read_complete() to release its buffer-fix. This is similar
to what buf_pool_t::unzip() already did.
buf_read_release_count(): Common code for freeing a spare block if needed
as well as updating some statistics counters.
buf_read_page_low(): Replace the parameter sync with err, which will
return an error code to a synchronous caller.
Return the pointer to the block, or the special values nullptr
(read failure) or -1 or -2 for asynchronous reads.
Increment the statistics when a synchronous read was requested.
In a synchronous read, if the page has already been allocated in
the buffer pool but it is read-fixed, wait for the read to complete.
buf_page_get_zip(): Get a buffer-fixed page from buf_read_page(),
and unfix() it. Our caller is relying solely on a page latch.
buf_read_page_background(): Update the statistics if supplied.
While waiting for for I/O completion, let us skip spin loops.
Even on fast storage, reading a page into the buffer pool takes
so long that a spin loop would only end up wasting CPU time.
block_lock::s_lock_nospin(): A no-spin variant of acquiring a shared
buffer page latch. Regular s_lock() always involveis a spin loop.
ssux_lock_impl::rd_lock_spin(), ssux_lock_impl::rd_lock_nospin():
Split from rd_wait().
ssux_lock_impl::rd_lock(): Invoke either rd_lock_nospin() or
rd_lock_try() and rd_lock_spin().
buf_page_get_low(): After acquiring a page latch on an io-fixed block,
try to optimize operations on the page latch.
Reviewed by: Thirunarayanan Balathandayuthapani
In commit 13076351f1 (MDEV-37152)
some code was refactored that would break the server-side
connection pool logic. Calls to thd_wait_begin() and thd_wait_end()
were not balanced.
buf_read_page_low(): Only invoke thd_wait_begin() and thd_wait_end()
for synchronous reads, and do that also when the read operation fails.
Reviewed by: Vladislav Lesin
Clean up spider tests by removing excessive $VARIABLE abstractions.
Indentations are preserved to minimise diff. After this change only
one disabled test remains with the following regexp:
$ (cd ../src && git grep -l "eval .*\$\(MASTER\|CHILD\|SLAVE\)[A-Z0-9_]*_\(COMMENT\|TABLES\)") | grep spider.*\.test$
storage/spider/mysql-test/spider/bugfix/t/wait_timeout.test
The patch is created with the following steps:
Step 1. hack mysqltest.cc to log statements
modified client/mysqltest.cc
@@ -1745,17 +1745,23 @@ void verbose_msg(const char *fmt, ...)
DBUG_ENTER("verbose_msg");
DBUG_PRINT("enter", ("format: %s", fmt));
- if (!verbose)
- DBUG_VOID_RETURN;
+ /*
+ if (!verbose)
+ DBUG_VOID_RETURN;
+ */
fflush(stdout);
va_start(args, fmt);
- fprintf(stderr, "mysqltest: ");
+ /*
+ fprintf(stderr, "mysqltest: ");
+ */
if (cur_file && cur_file != file_stack)
fprintf(stderr, "In included file \"%s\": ",
cur_file->file_name);
- if (start_lineno != 0)
- fprintf(stderr, "At line %u: ", start_lineno);
+ /*
+ if (start_lineno != 0)
+ fprintf(stderr, "At line %u: ", start_lineno);
+ */
vfprintf(stderr, fmt, args);
fprintf(stderr, "\n");
va_end(args);
@@ -10412,6 +10418,22 @@ int main(int argc, char **argv)
if (ok_to_do)
{
+ /* only log current file / do not log included files */
+ if (!cur_file || cur_file == file_stack)
+ {
+ if (command->type == Q_EMPTY_LINE)
+ verbose_msg("%s", "");
+ /* ignore if; eval will be handled after var subst */
+ else if (command->type != Q_IF && command->type != Q_END_BLOCK &&
+ command->type != Q_EVAL)
+ /*
+ read_command_buf instead of command->query because only
+ the former retains leading "--"
+ */
+ verbose_msg("%s%s", read_command_buf,
+ strncmp("--", read_command_buf, 2) &&
+ strncmp("#", read_command_buf, 1) ? ";" : "");
+ }
command->last_argument= command->first_argument;
processed = 1;
/* Need to remember this for handle_error() */
@@ -10595,6 +10617,14 @@ int main(int argc, char **argv)
/* Restore settings */
display_result_vertically= old_display_result_vertically;
+ /* only log current file / do not log included files */
+ if (!cur_file || cur_file == file_stack)
+ {
+ /* print evaled query */
+ if (command->type == Q_EVAL)
+ verbose_msg("%s;", command->eval_query.str);
+ }
+
break;
}
case Q_SEND:
Statements are printed in mysql-test/var/log/mysqltest.log between two
lines:
"Start processing test commands from '$test_file_name' ..."
and
"... Done processing test commands."
Step 2. Overwrite the test files with logged statements and run the
tests again to check they pass (not all pass, see below)
(cd ../src && git grep -l "eval \$CHILD") | grep \.test$ | sed "s/^.*spider\//spider\//g" | sed "s/\/t\//./g" | sed "s/\.test$//g" | xargs -i sh -c './mysql-test/mtr {} && emacs --batch --load /home/ycp/source/mariadb-tools/mdev-37193.el --eval "(my-mdev-37193)" && ./mysql-test/mtr {}'
where my-mdev-37193 is as follows:
(defun my-mdev-37193 ()
(let ((log-file
(file-name-concat default-directory
"mysql-test/var/log/mysqltest.log"))
test-file-name beg end)
(if (file-exists-p log-file)
(with-temp-buffer
(insert-file-contents log-file)
(goto-char (point-min))
(re-search-forward
"^Start processing test commands from '\\(.*\\)' ...$")
(setq test-file-name (match-string 1))
(beginning-of-line 2)
(setq beg (point))
(re-search-forward "^... Done processing test commands.$")
(beginning-of-line 1)
(setq end (point))
(write-region beg end test-file-name)
(message "Wrote %s." test-file-name))
(message "Log file does not exist (test disabled?): %s" log-file))))
In my run, out of the 79 cleaned tests, about 8 failed:
21606:spider/bg.spider_fixes [ fail ]
22463:spider/bugfix.mdev_20100 [ fail ]
22734:spider/bugfix.mdev_21884 [ fail ]
23876:spider/regression/e1121.direct_join_by_pkey_key [ fail ]
24077:spider/regression/e1121.direct_join_by_pkey_pkey [ fail ]
25653:spider.partition_mrr [ fail ]
26388:spider.spider_fixes [ fail ]
26970:spider.udf_pushdown [ fail ]
Step 3. Remove all blank line removals in the diff
The diff could contain removal of blank lines which could affect
readability of tests arranged in logical blocks. this can be fixed
with the following:
git diff > backup.diff
git diff --ignore-blank-lines -U1 > changes.diff
git reset --hard
git apply -- changes.diff
Step 4. Manually Fix up the 8 failing tests above, which was relatively
straightforward.
Step 5. Fix sockets. Sockets need to be changed back to vars because
their paths depends on those of the vardir.
Do this
(cd ../src && git grep -l "mysql-test/var/tmp/.*\\.sock") | grep \.test$ | xargs -i sh -c 'emacs --batch --load /home/ycp/source/mariadb-tools/mdev-37193.el --eval "(my-mdev-37193-fix-sockets \"../src/{}\")"'
where my-mdev-37193-fix-sockets is defined as follows:
(defun my-mdev-37193--translate-sock (m n)
(pcase m
("1" "$MASTER_1_MYSOCK")
("2" (format "$CHILD2_%s_MYSOCK" n))
("3" (format "$CHILD3_%s_MYSOCK" n))
("4" (format "$SLAVE1_%s_MYSOCK" n))))
(defun my-mdev-37193--fix-sockets ()
"Fix sockets in the current buffer"
(goto-char (point-min))
(while (re-search-forward
"socket \".*mysqld\\.\\(.\\)\\.\\(.\\)\\.sock\"" nil t)
(replace-match (format "socket \"%s\""
(my-mdev-37193--translate-sock
(match-string 1)
(match-string 2))))
(re-search-backward ";$")
(beginning-of-line 2)
(insert "eval\n")))
(defun my-mdev-37193-fix-sockets (file)
"Fix sockets in FILE"
(with-temp-buffer
(insert-file-contents file)
(my-mdev-37193--fix-sockets)
(write-file file)))
Subsequent small fixup of spider.ha and spider/bugfix.ha are needed
because they involve strings of two sockets instead of one.
Step 6. Clean up the "main" inc files by removing the obsolete vars.
This can be done manually with the help of grep as there are not many
such files. Note that there are still about 100 remaining .inc files
matching relevant regexp (using the command below), but they are all
"specialised", i.e. $VARs used in foo.test are declared in foo.inc,
instead of some generic files several --source's away. So the
readability is not as bad, and they are minor nuisances that can be
easily fixed on demand and on-the-fly when needed.
$ (cd ../src && git grep -l "let \$\(MASTER\|CHILD\|SLAVE\)[A-Z0-9_]*_\(COMMENT\|TABLES\)") | grep spider.*\.inc | sed "s/^/..\/src\//" | wc -l
99
Step 7. Further clean up white spaces to reduce diff and ease merging:
git checkout -b tmp
git diff -U0 -w --no-color | git apply --cached --ignore-whitespace --unidiff-zero -
git checkout .
git commit
This patch removes unnecessary and potentially performance affecting
reads of my_disable_sync global variable, which exists for single purpose
of speeding up mtr tests, by making fsync a no-op.
The read is still left in its proper location in os_file_flush_func().
Background
According to Marko Mäkelä, the removed code originated from the porting
of MDEV-30054 — debug-no-sync doesn’t fully disable sync calls,
introduced in merge commit be24e75229.
At that point, the access to my_disable_sync variable in higher-level
Innodb code was already unnecessary because MDEV-30136 had already
implemented proper handling of flush skipping (in os0file.cc) about nine
months earlier.