mark old keys in the ALTER TABLE with the `old` flag, not with
the `key_create_info.check_for_duplicate_indexes`.
This allows to mark old foreign keys too.
HA_UNIQUE_CHECK was
* only used internally by MyISAM/Aria
* only used for internal temporary tables (for DISTINCT)
* never saved in frm
* saved in MYI/MAD but only for temporary tables
* only set, never checked
it's safe to remove it and free the bit (there are only 16 of them)
Do not attempt to produce "r_engine_stats" on the temporary (=work) tables.
These tables may be
- re-created during the query execution
- freed during the query execution (This is done e.g. in JOIN::cleanup(),
before we produce ANALYZE FORMAT=JSON output).
- (Also, make save_explain_data() functions not set handler_for_stats
to point to handler objects that do not have handler->handler_stats set.
If the storage engine is not collecting handler_stats, it will not have
them when we're producing ANALYZE FORMAT=JSON output, either).
buf_page_t::write_complete(), buf_page_write_complete(),
IORequest::write_complete(): Add a parameter for passing
an error code. If an error occurred, we will release the
io-fix, buffer-fix and page latch but not reset the
oldest_modification field. The block would remain in
buf_pool.LRU and possibly buf_pool.flush_list, to be written
again later, by buf_flush_page_cleaner(). If all page writes
start consistently failing, all write threads should eventually
hang in log_free_check() because the log checkpoint cannot
be advanced to make room in the circular write-ahead-log ib_logfile0.
IORequest::read_complete(): Add a parameter for passing
an error code. If a read operation fails, we report the error
and discard the page, just like we would do if the page checksum
was not validated or the page could not be decrypted.
This only affects asynchronous reads, due to linear or random read-ahead
or crash recovery. When buf_page_get_low() invokes buf_read_page(),
that will be a synchronous read, not involving this code.
This was tested by randomly injecting errors in
write_io_callback() and read_io_callback(), like this:
if (!ut_rnd_interval(100))
cb->m_err= 42;
buf_LRU_free_page(): When we are discarding the uncompressed copy of a
ROW_FORMAT=COMPRESSED page, buf_page_t::can_relocate() must have ensured
that the block descriptor state is one of FREED, UNFIXED, REINIT.
Do not overwrite the state with UNFIXED. We do not want to write back
pages that were actually freed, and we want to avoid doublewrite for
pages that were (re)initialized by log records written since the latest
checkpoint. Last but not least, we do not want crashes like those that
commit dc1bd1802a (MDEV-31386)
was supposed to fix.
The test innodb_zip.wl5522_zip should typically cover all 3 states.
This bug is a regression due to
commit aaef2e1d8c (MDEV-27058).
don't construct open ranges from prefix blob keys for < (less than)
just as it's already done for > (greater than)
because prefix KEY_PART doesn't create prefix Field for blobs
(see open_table_from_share() near "Create a new field for the key part"),
so stored_field_cmp_to_item() will compare the original field to the
value not taking the prefix length into account.
Problem:
========
- InnoDB scans the complete redo log to ensure that there is
no corruption and to find the end of the log. During this scan,
InnoDB saves all the freed ranges, but it doesn't save
recovered size. Later, InnoDB recovery applies partial
redo logs and IO thread tries to flush the all freed
ranges which was noted during previous complete scan of redo logs.
Fix:
====
InnoDB should store the freed pages only when InnoDB stores
the redo log records.
A simple "SET SESSION gtid_seq_no= DEFAULT" did not work, it would straight
up crash the server! Also, explicitly setting gtid_seq_no to 0 gave an error
in --gtid-strict-mode=1.
Setting to DEFAULT or 0 should disable any prior setting of
gtid_seq_no, so that the next transaction is allocated the next GTID
in sequence, as normal.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
log_sort_flush_list(): Wait for any pending page writes to cease
before sorting the buf_pool.flush_list. Starting with
commit 22b62edaed (MDEV-25113),
it is possible that some buf_page_t::oldest_modification_
that we will be comparing in std::sort() will be updated from
some value >2 to 1 while we are holding buf_pool.flush_list_mutex.
To catch this type of trouble better in the future, we will clean
garbage (pages that have been written out) from buf_pool.flush_list
while constructing the array for sorting, and check with debug
assertions that all blocks that we are copying from the array to the
list will be dirty (requiring a writeback) while we sort and copy
the array back to buf_pool.flush_list.
This failure was observed by chance exactly once when running the test
innodb.recovery_memory. It was never reproduced in the same form
afterwards. Unrelated to this change, the test does occasionally
reproduce a failure to start up InnoDB due to a corrupted page
being read in recovery. The ticket MDEV-31791 was filed for that.
Tested by: Matthias Leich
prep_alter_part_table upon re-partitioning by system time
memcmp() tries to compare beyond the last member of interval because
sizeof(Vers_part_info::interval) is 80. It is sizeof of variable,
sizeof of type is 76.
Now we compare interval_t struct C++ way.
The assertion was to make sure we don't do vers_set_hist_part() for
SELECT (or any non-DML). But actually we must do it if SELECT calls
some function that does DML. Patch moves the assertion to non-routines
only.
differently react to SQL_MODE => unusable SHOW CREATE
Use abort_on_warning dependent on strict mode over create new table
like it is done for copy data and inplace alter.
- InnoDB fails to update the autoinc persistently after
bulk insert operation.
row_merge_bulk_t::write_to_index(): Update the autoinc value
persistently
This patch adds for "--ps-protocol" second execution
of queries "SELECT".
Also in this patch it is added ability to disable/enable
(--disable_ps2_protocol/--enable_ps2_protocol) second
execution for "--ps-prototocol" in testcases.
MDEV-31749 sporadic assert in MDEV-30619 new test
If the workers of a parallel replica are busy (potentially with long
queues), but the SQL thread has no events left to distribute (so it
goes idle), then the next event that comes from the primary will
update mi->last_master_timestamp with its timestamp, even if the
workers have not yet finished.
This patch changes the parallel replica logic which updates
last_master_timestamp after idling from using solely sql_thread_caught_up
(added in MDEV-29639) to using the latter with rli queued/dequeued
event counters.
That is, if the queued count is equal to the dequeued count, it
means all events have been processed and the replica is considered
idle when the driver thread has also distributed all events.
Low level details of the commit include
- to make a more generalized test for Seconds_Behind_Master on
the parallel replica, rpl_delayed_parallel_slave_sbm.test
is renamed to rpl_parallel_sbm.test for this purpose.
- pause_sql_thread_on_next_event usage was removed
with the MDEV-30619 fixes. Rather than remove it, we adapt it
to the needs of this test case
- added test case to cover SBM spike of relay log read and LMT
update that was fixed by MDEV-29639
- rpl_seconds_behind_master_spike.test is made to use
the negate_clock_diff_with_master debug eval.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
The main problem is that at ever since
commit aaef2e1d8c removed the
function buf_wait_for_read(), it is not safe to invoke
buf_page_get_low() with RW_NO_LATCH, that is, only buffer-fixing
the page. If a page read (or decryption or decompression) is in
progress, there would be a race condition when executing consistency
checks, and a page would wrongly be flagged as corrupted.
Furthermore, if the page is actually corrupted and the initial
access to it was with RW_NO_LATCH (only buffer-fixing), the
page read handler would likely end up in an infinite loop in
buf_pool_t::corrupted_evict(). It is not safe to invoke
mtr_t::upgrade_buffer_fix() on a block on which a page latch
was not initially acquired in buf_page_get_low().
btr_block_reget(): Remove the constant parameter rw_latch=RW_X_LATCH.
btr_block_get(): Assert that RW_NO_LATCH is not being used,
and change the parameter type of rw_latch.
btr_pcur_move_to_next_page(), innobase_table_is_empty(): Adjust for the
parameter type change of btr_block_get().
btr_root_block_get(): If mode==RW_NO_LATCH, do not check the integrity of
the page, because it is not safe to do so.
btr_page_alloc_low(), btr_page_free(): If the root page latch is not
previously held by the mini-transaction, invoke btr_root_block_get()
again with the proper latching mode.
btr_latch_prev(): Helper function to safely acquire a latch on a
preceding sibling page while holding a latch on a B-tree page.
To avoid deadlocks, we must not wait for the latch while holding
a latch on the current page, because another thread may be waiting
for our page latch when moving to the next page from our preceding
sibling page. If s_lock_try() or x_lock_try() on the preceding page fails,
we must release the current page latch, and wait for the latch on the
preceding page as well as the current page, in that order.
Page splits or merges will be prevented by the parent page latch
that we are holding.
btr_cur_t::search_leaf(): Make use of btr_latch_prev().
btr_cur_t::open_leaf(): Make use of btr_latch_prev(). Do not invoke
mtr_t::upgrade_buffer_fix() (when latch_mode == BTR_MODIFY_TREE),
because we will already have acquired all page latches upfront.
btr_cur_t::pessimistic_search_leaf(): Do acquire an exclusive index latch
before accessing the page. Make use of btr_latch_prev().
We introduce simple plugin dependency. A plugin init function may
return HA_ERR_RETRY_INIT. If this happens during server startup when
the server is trying to initialise all plugins, the failed plugins
will be retried, until no more plugins succeed in initialisation or
want to be retried.
This will fix spider init bugs which is caused in part by its
dependency on Aria for initialisation.
The reason we need a new return code, instead of treating every
failure as a request for retry, is that it may be impossible to clean
up after a failed plugin initialisation. Take InnoDB for example, it
has a global variable `buf_page_cleaner_is_active`, which may not
satisfy an assertion during a second initialisation try, probably
because InnoDB does not expect the initialisation to be called
twice.
row_ins_sec_index_entry_low(): Correct a condition that was
inadvertently inverted
in commit 89ec4b53ac (MDEV-29603).
We are not supposed to buffer INSERT operations into unique indexes,
because duplicate key values would not be checked for. It is only
allowed when using unique_checks=0, and in that case the user is
supposed to guarantee that there are no duplicates.
Since TLS server certificate verification is a client
only option, this flag is removed in both client (C/C)
and MariaDB server capability flags.
This patch reverts commit 89d759b93e
(MySQL Bug #21543) and stores the server certificate validation
option in mysql->options.extensions.
Since TLS server certificate verification is a client
only option, this flag is removed in both client (C/C)
and MariaDB server capability flags.
This patch reverts commit 89d759b93e
(MySQL Bug #21543) and stores the server certificate validation
option in mysql->options.extensions.
ANALYZE FORMAT=JSON output now includes table.r_engine_stats which
has the engine statistics. Only non-zero members are printed.
Internally: EXPLAIN data structures Explain_table_acccess and
Explain_update now have handler* handler_for_stats pointer.
It is used to read statistics from handler_for_stats->handler_stats.
The following applies only to 10.9+, backport doesn't use it:
Explain data structures exist after the tables are closed. We avoid
walking invalid pointers using this:
- SQL layer calls Explain_query::notify_tables_are_closed() before
closing tables.
- After that call, printing of JSON output is disabled. Non-JSON output
can be printed but we don't access handler_for_stats when doing that.
noinline attribute was being ignored by clang-16 and reporting
32 stack size on Gentoo, 16 locally on Fedora 38.
Based on https://stackoverflow.com/questions/54481855/clang-ignoring-attribute-noinline
appended noopt in addition to the gcc recognised attributes.
After that the -pcre_exec(NULL, NULL, NULL, -999, -999, 0, NULL, 0)
returned 1056, simlar to gcc.
From https://bugs.gentoo.org/910188.
Thanks Zhixu Liu for the great bug report.