Commit graph

17796 commits

Author SHA1 Message Date
Sergei Golubchik
f50694c52b remove pointless test 2024-03-27 16:14:55 +01:00
Sergei Golubchik
dc681953cf events in perfschema tests: use ON COMPLETION NOT PRESERVE
when the execution is very slow, under valgrind,
the event might manage to fire more than once, making the
test to fail
2024-03-27 16:14:55 +01:00
Sergei Golubchik
8bb8820df2 MDEV-22949 perfschema.memory_aggregate_no_a_no_u fails sporadically in buildbot with wrong result
perfschema aggregation, like SHOW STATUS, is only statistically correct.
It doesn't use atomics for performance reasons and might miss individual
increments, particularly when two connections are disconnecting at the
same time.

To have stable results tests should avoid doing it.
2024-03-27 16:14:55 +01:00
Marko Mäkelä
ccb7a1e9a1 Merge 10.5 into 10.6 2024-03-27 15:00:56 +02:00
Dave Gosselin
58df20974b MDEV-33460 select '123' 'x'; unexpected result
Queries that select concatenated constant strings now have
colname and value that match.  For example,
  SELECT '123' 'x';
will return a result where the column name and value both
are '123x'.

Review: Daniel Black
2024-03-27 15:51:26 +11:00
Daniele Sciascia
c71dc39529 MDEV-26499 Fix error "mysql_shutdown failed" during MTR tests
- Fix to avoid mysqltest client getting killed abruptly during
  mysql_shutdown(). When Galera replication is shutdown, wait for
  THDs with `thd->stmt_da()->is_eof()` to disconnect (these are about
  to disconnect anyway).
- Extract duplicate code from `wsrep_stop_replication()` and
  `wsrep_shutdown_replication()` in a new function.
- No need to use a custom `shutdown_mysqld.inc` in galera
  suite. Delete it, so that the one in `mysql-test/include/` is used.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-03-27 04:31:45 +01:00
Denis Protivensky
7bf3c3124a MDEV-33136: Properly BF-abort user transactions with explicit locks
User transactions may acquire explicit MDL locks from InnoDB level
when persistent statistics is re-read for a table.
If such a transaction would be subject to BF-abort, it was improperly
detected as a system transaction and wouldn't get aborted.

The fix: Check if a transaction holding explicit MDL locks is a user
transaction in the MDL conflict handling code.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-03-27 01:25:22 +01:00
Daniele Sciascia
e0c8165487 MDEV-33509 Failed to apply write set with flags=(rollback|pa_unsafe)
Fix function `remove_fragment()` in wsrep_schema so that no error is
raised if the fragment to be removed is not found in the
wsrep_streaming_log table. This is necessary to handle the case where
streaming transaction in idle state is BF aborted. This may result in
the case where the rollbacker thread successfully removes the
transaction's fragments, followed by the applier's attempt to remove
the same fragments. Causing the node to leave the cluster after
reporting a "Failed to apply write set" error.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-03-26 05:56:37 +01:00
Marko Mäkelä
fa8a46eb68 MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool
By design, InnoDB has always hung when permanently running out of
buffer pool, for example when several threads are waiting to allocate
a block, and all of the buffer pool is buffer-fixed by the active threads.

The hang that we are fixing here occurs when the buffer pool is only
temporarily running out and the situation could be rescued by writing out
some dirty pages or evicting some clean pages.

buf_LRU_get_free_block(): Simplify the way how we wait for
the buf_flush_page_cleaner thread. This fixes occasional hangs
of the test encryption.innochecksum that were introduced by
commit a55b951e60 (MDEV-26827).
To play it safe, we use a timed wait when waiting for the
buf_flush_page_cleaner() thread to perform its job. Should that
thread get stuck, we will invoke buf_pool.LRU_warn() in order to
display a message that pages could not be freed, and keep trying
to wake up the buf_flush_page_cleaner() thread.

The INFORMATION_SCHEMA.INNODB_METRICS counters
buffer_LRU_single_flush_failure_count and
buffer_LRU_get_free_waits will be removed.
The latter is represented by buffer_pool_wait_free.

Also removed will be the message
"InnoDB: Difficult to find free blocks in the buffer pool"
because in d34479dc66 we
introduced a more precise message
"InnoDB: Could not free any blocks in the buffer pool"
in the buf_flush_page_cleaner thread.

buf_pool_t::LRU_warn(): Issue the warning message that we could
not free any blocks in the buffer pool. This may also be invoked
by buf_LRU_get_free_block() if buf_flush_page_cleaner() appears
to be stuck.

buf_pool_t::n_flush_dec(): Remove.

buf_pool_t::n_flush_dec_holding_mutex(): Rename to n_flush_dec().

buf_flush_LRU_list_batch(): Increment the eviction counter for blocks
of temporary, discarded or dropped tablespaces.

buf_flush_LRU(): Make static, and remove the constant parameter
evict=false. The only caller will be the buf_flush_page_cleaner()
thread.

IORequest::is_LRU(): Remove. The only case of evicting pages on
write completion will be when we are writing out pages of the
temporary tablespace. Those pages are not in buf_pool.flush_list,
only in buf_pool.LRU.

buf_page_t::flush(): Remove the parameter evict.

buf_page_t::write_complete(): Change the parameter "bool temporary"
to "bool persistent" and add a parameter for an already read state().

Reviewed by: Debarun Banerjee
2024-03-22 14:17:39 +02:00
Marko Mäkelä
bf0b82d24b MDEV-33515 log_sys.lsn_lock causes excessive context switching
The log_sys.lsn_lock is a very contended resource with a small
critical section in log_sys.append_prepare(). On many processor
microarchitectures, replacing the system call based log_sys.lsn_lock
with a pure spin lock would fare worse during high concurrency workloads,
wasting a significant amount of CPU cycles in the spin loop.

On other microarchitectures, we would see a significant amount of time
being spent in native_queued_spin_lock_slowpath() in the Linux kernel,
plus context switching between user and kernel address space. This was
pointed out by Steve Shaw from Intel Corporation.

Depending on the workload and the hardware implementation, it may be
useful to use a pure spin lock in log_sys.append_prepare().
We will introduce a parameter. The statement

	SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=50;

would enable a spin lock that will execute that many MY_RELAX_CPU()
operations (such as the x86 PAUSE instruction) between successive
attempts of acquiring the spin lock. The use of a system call based
log_sys.lsn_lock (which is the default setting) can be enabled by

	SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=0;

This patch will also introduce #ifdef LOG_LATCH_DEBUG
(part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON) for more accurate
tracking of log_sys.latch ownership and reorganize the fields of
log_sys to improve the locality of reference and to reduce the
chances of false sharing.

When a spin lock is being used, it will be maintained in the
most significant bit of log_sys.buf_free. This is useful, because that is
one of the fields that is covered by the lock. For IA-32 or AMD64, we
implement the spin lock specially via log_t::lsn_lock_bts(), employing the
i386 LOCK BTS instruction. A straightforward std::atomic::fetch_or() would
translate into an inefficient loop around LOCK CMPXCHG.

mtr_t::spin_wait_delay: The value of innodb_log_spin_wait_delay.

mtr_t::finisher: Pointer to the currently used mtr_t::finish_write()
implementation. This allows to avoid introducing conditional branches.
We no longer invoke log_sys.is_pmem() at the mini-transaction level,
but we would do that in log_write_up_to().

mtr_t::finisher_update(): Update finisher when spin_wait_delay is
changed from or to 0 (the spin lock is changed to log_sys.lsn_lock or
vice versa).
2024-03-22 12:29:01 +02:00
Brandon Nesterenko
75c7c6dc39 MDEV-33551: Semi-sync Wait Point AFTER_COMMIT Slow on Workloads with Heavy Concurrency
When using semi-sync replication with
rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the
primary can significantly reduce compared to AFTER_SYNC's
performance for workloads with many concurrent users executing
transactions. This is because all connections on the primary share
the same cond_wait variable/mutex pair, so any time an ACK is
received from a replica, all waiting connections are awoken to check
if the ACK was for itself, which is done in mutual exclusion.

This patch changes this such that the waiting THD will use its own
local condition variable, and the ACK receiver thread only signals
connections which have been ACKed for wakeup. That is, the
THD::LOCK_wakeup_ready condition variable is re-used for this
purpose, and the Active_tranx queue nodes are extended to hold the
waiting thread, so it can be signalled once ACKed.

Additionally:

 1)  Removed part of MDEV-11853 additions, which allowed suspended
connection threads awaiting their semi-sync ACKs to live until their
ACKs had been received. This part, however, wasn't needed.  That is,
all that was needed was for the Ack_thread to survive.  So now the
connection threads are killed during phase 1. Thereby
THD::is_awaiting_semisync_ack, and all its related code was removed.

 2) COND_binlog_send is repurposed to signal on the condition when
Active_tranx is emptied during clear_active_tranx_nodes.

 3) At master shutdown (when waiting for slaves), instead of the
main loop individually waiting for each ACK, await_slave_reply()
(renamed await_all_slave_replies()) just waits once for the
repurposed COND_binlog_send to signal it is empty.

 4) Test rpl_semi_sync_shutdown_await_ack is updates as following:
   4.1) Added test case (adapted from Kristian Nielsen) to ensure
that if a thread awaiting its ACK is killed while SHUTDOWN WAIT FOR
ALL SLAVES is issued, the primary will still wait for the ACK from
the killed thread.
   4.2) As connections which by-passed phase 1 of thread killing no
longer are delayed for kill until phase 2, we can no longer query
yes/no tx after receiving an ACK/timeout. The check for these
variables is removed.
   4.3) Comment descriptions are updated which mention that the
connection is alive; and adjusted to be the Ack_thread.

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-21 08:42:18 -06:00
Vladislav Vaintroub
a2dd4c14a3 post-fix 1c55b845e0
remove bunch of  *.rej and *.orig files, checked in by mistake.
2024-03-20 16:03:15 +01:00
Marko Mäkelä
b8a6719889 MDEV-26642/MDEV-26643/MDEV-32898 Implement innodb_snapshot_isolation
https://jepsen.io/analyses/mysql-8.0.34 highlights that the
transaction isolation levels in the InnoDB storage engine do not
correspond to any widely accepted definitions, such as
"Generalized Isolation Level Definitions"
https://pmg.csail.mit.edu/papers/icde00.pdf
(PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ,
PL-3 = SERIALIZABLE).
Only READ UNCOMMITTED in InnoDB seems to match the above definition.

The issue is that InnoDB does not detect write/write conflicts
(Section 4.4.3, Definition 6) in the above.

It appears that as soon as we implement write/write conflict detection
(SET SESSION innodb_snapshot_isolation=ON), the default isolation level
(SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become
Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of
"A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf

Locking reads inside InnoDB used to read the latest committed version,
ignoring what should actually be visible to the transaction.
The added test innodb.lock_isolation illustrates this. The statement
	UPDATE t SET a=3 WHERE b=2;
is executed in a transaction that was started before a read view or
a snapshot of the current transaction was created, and committed before
the current transaction attempts to execute
	UPDATE t SET b=3;
If SET innodb_snapshot_isolation=ON is in effect when the second
transaction was started, the second transaction will be aborted with
the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF),
the second transaction would execute inconsistently, displaying an
incorrect SELECT COUNT(*) FROM t in its read view.

If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a
record that does not exist in the current read view is made, an error
DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will
be raised. This error will be treated in the same way as a deadlock:
the transaction will be rolled back.

lock_clust_rec_read_check_and_lock(): If the current transaction has
a read view where the record is not visible and
innodb_snapshot_isolation=ON, fail before trying to acquire the lock.

row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON,
disable the "semi-consistent read" logic that had been implemented by
myself on the directions of Heikki Tuuri in order to address
https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer
wanting UPDATE to skip locked rows that do not match the WHERE condition.
It looks like my changes were included in the MySQL 5.1.5
commit ad126d90e019f223470e73e1b2b528f9007c4532; at that time, employees
of Innobase Oy (a recent acquisition of Oracle) had lost write access to
the repository.

The only reason why we set innodb_snapshot_isolation=OFF by default is
backward compatibility with applications, such as the one that motivated
the implementation of "semi-consistent read" back in 2005. In a later
major release, we can default to innodb_snapshot_isolation=ON.

Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work
on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations
and some testing of this fix.

Thanks to Vladislav Lesin for the initial test for MDEV-26643,
as well as reviewing these changes.
2024-03-20 09:48:03 +02:00
Brandon Nesterenko
ca07f62992 MDEV-33716: rpl.rpl_semi_sync_slave_enabled_consistent Fails with Error Condition Reached
Though the test itself doesn't create any transactions
directly, the added test suppressions are replicated,
and when the SQL thread is stopped mid-execution,
it is set into an error state because these are
non-transactional events being aborted.

This patch fixes the test by ensuring that the test
suppressions are fully replicated before continuing
2024-03-19 14:16:07 -06:00
Thirunarayanan Balathandayuthapani
c3a6248bba MDEV-33542 Inplace algorithm occupies more disk space compared to copy algorithm
Problem:
=======
- In case of large file size, InnoDB eagerly adds the new extent
even though there are many existing unused pages of the segment.
Reason is that in case of larger file size, threshold
(1/8 of reserved pages) for adding new extent has been
reached frequently.

Solution:
=========
- Try to utilise the unused pages in the segment before adding
the new extent in the file segment.

need_for_new_extent(): In case of larger file size, try to use
the 4 * FSP_EXTENT_SIZE as threshold to allocate the new extent.

fseg_alloc_free_page_low(): Rewrote the function to allocate
the page in the following order.
1) Try to get the page from existing segment extent.
2) Check whether the segment needs new extent
(need_for_new_extent()) and allocate the new extent,
find the page.
3) Take individual page from the unused page from
segment or tablespace.
4) Allocate a new extent and take first page from it.

Removed FSEG_FILLFACTOR, FSEG_FRAG_LIMIT variable.
2024-03-19 18:42:45 +05:30
Marko Mäkelä
50715bd2ed Merge 10.5 into 10.6 2024-03-18 17:07:32 +02:00
Kristian Nielsen
86a0b57689 MDEV-32976: Un-deprecate MASTER_USE_GTID=Current_Pos
Remove incorrect deprecation.

Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-15 18:18:42 +01:00
Kristian Nielsen
51abae5e46 MDEV-25923: Aria parallel repair MY_THREAD_SPECIFIC mismatch in realloc
maria_repair_parallel() clears the MY_THREAD_SPECIFIC flag for allocations
since it uses different threads. But it still did one _ma_alloc_buffer()
call as thread-specific which would later assert if another thread needed
to extend the buffer with realloc.

This patch, due to Monty, removes the MY_THREAD_SPECIFIC flag for
allocations that need to realloc in different threads, and preserves
it for those that are allocated/freed in the user's thread.

Also fixes MDEV-33562.

Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-15 15:55:07 +01:00
Kristian Nielsen
1fb00f37ae MDEV-33303: slave_parallel_mode=optimistic should not report the mode's specific temporary errors
An earlier patch for MDEV-13577 fixed the most common instances of this, but
missed one case for tables without primary key when the scan reaches the end
of the table. This patch adds similar code to handle this case, converting
the error to HA_ERR_RECORD_CHANGED when doing optimistic parallel apply.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-15 15:55:06 +01:00
Kristian Nielsen
fb774eb1eb Fix occasional test failure of rpl.rpl_parallel_stop_slave
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-15 15:04:09 +01:00
Thirunarayanan Balathandayuthapani
f5df4482e0 MDEV-33214 Table is getting rebuild with ALTER TABLE ADD COLUMN
Problem:
======
- InnoDB fail to do instant operation while adding the variable
length column. Problem is that InnoDB wrongly assumes that
variable character length can never part of externally stored
page.

Solution:
========
instant_alter_column_possible(): Variable length
character field can be stored as externally stored page.
2024-03-15 14:04:59 +05:30
Sergei Golubchik
55cea0c2a6 Merge branch '10.5' into 10.6 2024-03-14 19:52:08 +01:00
Sergei Golubchik
bb451d2cad fix galera tests after 9a132d423a 2024-03-14 19:51:09 +01:00
Sergei Golubchik
7eb6d5aa21 update s3.partition result after 57ffcd686f 2024-03-14 11:43:13 +01:00
Thirunarayanan Balathandayuthapani
967a148966 MDEV-33635 innodb.innodb-64k-crash - Found warnings/errors in server log file
- Suppress the "Difficult to find free blocks" warning
globally to avoid many different test case failing.

- Demote the error information in validate_first_page() to note.
So first page can recovered from doublewrite buffer and can throw
error in case the page wasn't found in doublewrite buffer.
2024-03-14 08:34:56 +05:30
Sergei Golubchik
f71d7f2f0f Merge branch '10.5' into 10.6 2024-03-13 21:02:34 +01:00
Sergei Golubchik
bc46f1a7d9 cleanup: remove SEARCH_TYPE from search_pattern_in_file.inc 2024-03-13 18:27:18 +01:00
Kristian Nielsen
0a6f46965a MDEV-33475: --gtid-ignore-duplicate can double-apply event in case of parallel replication retry
When rolling back and retrying a transaction in parallel replication, don't
release the domain ownership (for --gtid-ignore-duplicates) as part of the
rollback. Otherwise another master connection could grab the ownership and
double-apply the transaction in parallel with the retry.

Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-13 16:59:10 +01:00
Marko Mäkelä
c3a00dfa53 Merge 10.5 into 10.6 2024-03-12 09:19:57 +02:00
mariadb-DebarunBanerjee
67abdb9f33 MDEV-33593 Auto increment deadlock error causes ASSERT in subsequent save point
innodb.autoinc_debug: Correct the test case for predictable deadlock.
2024-03-11 15:59:56 +05:30
Marko Mäkelä
f703e72bd8 Merge 10.4 into 10.5 2024-03-11 10:08:20 +02:00
Monty
b3d507ff13 Suppressed new warning for rpl_get_lock on amd-freebsd and aarch64-macos 2024-03-09 12:20:58 +02:00
Kristian Nielsen
23c48474f7 MDEV-33212: mysqldump uses MASTER_LOG_POS with dump-slave
The patch for MDEV-15530 incorrectly added a column in the middle of SHOW
SLAVE STATUS output. This is wrong, as it breaks backwards compatibility
with existing applications and scripts. In this case, it even broke
mariadb-dump, which is included in the server source tree!

Revert the incorrect change, putting the new Replicate_Rewrite_DB at the end
of SHOW SLAVE STATUS output.

Add a testcase for the mariadb-dump --dump-slave wrong output problem. Also
add a testcase rpl.rpl_show_slave_status to hopefully prevent any future
incorrect additions to SHOW SLAVE STATUS.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-03-08 15:23:42 +01:00
Monty
9a132d423a MDEV-33620 Improve times and states in show processlist for replication
This will makes it easier to find out what replication workers are
doing and what they are waiting for.

Things changed in processlist:
- Slave_SQL time was not consistent. Now time for state "Slave has
  read all relay log; waiting for more updates" shows how long it has
  waited for getting the next event.
- Slave_worker threads did often show "Closing tables" for a long
  time.  Now the state is reverted to the previous state after
  "Closing tables" is done.
- Commit and Rollback states where not shown for replication (and some
  other threads). Now Commit and Rollback states are always shown and
  the state is reverted to previous state when the Commit/Rollback
  have finished.

Code changes:
- Added thd->set_time_for_next_stage() for parallel replication when
  when starting to wait for prior transactions to commit, group commit,
  and FTWRL and for free space in thread pool.
  Before we reset the time only after the above events.
- Moved THD_STAGE_INFO(stage_rollback) and THD_STAGE_INFO(stage_commit)
  from sql_parse.cc to transaction.cc to ensure this is done for
  all commits and not only 'normal connection queries'.

Test case changes:
- close_thread_tables() reverting stage to previous stage caused the
  counter in performance_schema to be increased. In many case it is
  the 'sql/starting' stage that was effected.
- We only change to "Commit" stage if there is a need for a commit.
  This caused some "Commit" stages to disapper from perfschema reports.

TODO in 11.#:
- Slave_IO always showes "Waiting for master to send event" and the time is
  from SLAVE START. We should in 11.# change this to be the time since
  reading the last event.
2024-03-08 15:23:17 +02:00
mariadb-DebarunBanerjee
afe9632913 MDEV-33593 Auto increment deadlock error causes ASSERT in subsequent save point
The issue here is ha_innobase::get_auto_increment() could cause a
deadlock involving auto-increment lock and rollback the transaction
implicitly. For such cases, storage engines usually call
thd_mark_transaction_to_rollback() to inform SQL engine about it which
in turn takes appropriate actions and close the transaction. In innodb,
we call it while converting Innodb error code to MySQL.

However, since ::innobase_get_autoinc() returns void, we skip the call
for error code conversion and also miss marking the transaction for
rollback for deadlock error. We assert eventually while releasing a
savepoint as the transaction state is not active.

Since convert_error_code_to_mysql() is handling some generic error
handling part, like invoking the callback when needed, we should call
that function in ha_innobase::get_auto_increment() even if we don't
return the resulting mysql error code back.
2024-03-07 21:54:06 +05:30
Monty
0df4651c13 Fixed some mtr results found in Jenins after MDEV-333582 push
MDEV-33582 Add more warnings to be able to better diagnose network issues

- Disabled "Semisync ack receiver got hangup" warning
  - One could get this warning from semisync if running
    mtr --mysqld=log-warnings=3 rpl.rpl_semi_sync_shutdown_await_ack
- Fixed result file for engines/funcs/rpl_get_lock.test
2024-03-06 15:16:03 +02:00
Thirunarayanan Balathandayuthapani
6e5333fc8c MDEV-32445 InnoDB may corrupt its log before upgrading it on startup
Problem:
========
 During upgrade, InnoDB does write the redo log for adjusting
the tablespace size or tablespace flags even before the log
has upgraded to configured format. This could lead to data
inconsistent if any crash happened during upgrade process.

Fix:
===
srv_start(): Write the tablespace flags adjustment, increased
tablespace size redo log only after redo log upgradation.

log_write_low(), log_reserve_and_write_fast(): Check whether
the redo log is in physical format.
2024-03-06 15:01:26 +05:30
Thirunarayanan Balathandayuthapani
738da4918d MDEV-32346 Assertion failure sym_node->table != NULL in pars_retrieve_table_def on UPDATE
- During update operation, InnoDB should avoid the initializing
the FTS_DOC_ID of foreign table if the foreign table is discarded
2024-03-06 14:04:49 +05:30
Thirunarayanan Balathandayuthapani
8532dd82f1 MDEV-13765 encryption.encrypt_and_grep failed in buildbot with wrong result
- Adjust the test case to check whether all tablespaces
are encrypted by comparing it with existing table count.
2024-03-06 11:57:09 +05:30
Monty
567c097359 MDEV-33582 Add more warnings to be able to better diagnose network issues
Warnings are added to net_server.cc when
global_system_variables.log_warnings >= 4.

When the above condition holds then:
- All communication errors from net_serv.cc is also written to the
  error log.
- In case of a of not being able to read or write a packet, a more
  detailed error is given.

Other things:
- Added detection of slaves that has hangup to Ack_receiver::run()
- vio_close() is now first marking the socket closed before closing it.
  The reason for this is to ensure that the connection that gets a read
  error can check if the reason was that the socket was closed.
- Add a new state to vio to be able to detect if vio is acive, shutdown or
  closed. This is used to detect if socket is closed by another thread.
- Testing of the new warnings is done in rpl_get_lock.test
- Suppress some of the new warnings in mtr to allow one to run some of
  the tests with -mysqld=--log-warnings=4. All test in the 'rpl' suite
  can now be run with this option.
 - Ensure that global.log_warnings are restored at test end in a way
   that allows one to use mtr --mysqld=--log-warnings=4.

Reviewed-by: <serg@mariadb.org>,<brandon.nesterenko@mariadb.com>
2024-03-05 20:19:49 +02:00
Alexey Botchkov
b93252a303 MDEV-32454 JSON test has problem in view protocol.
Few Item_func_json_xxx::fix_length_and_dec() functions fixed.
2024-03-02 14:58:57 +04:00
Monty
ee27bf749b Disable mariabackup.aria_backup with msan because of timeouts 2024-03-01 12:44:32 +02:00
Brandon Nesterenko
bd604add76 MDEV-33546: Rpl_semi_sync_slave_status is ON When Replication Is Not Configured
If a server has a default configuration (e.g. in a my.cnf file) with
rpl_semi_sync_slave_enabled set, on server start, the corresponding
rpl_semi_sync_slave_status variable will also be ON initially, even
if the slave was never configured/started. This is because the
Repl_semi_sync_slave initialization logic (function init_object())
sets the running status to the enabled value during
init_server_components().

This patch fixes this by removing the statement which sets the
semi-sync slave running status from the initialization logic. An
additional change needed from this is to semi-sync recovery: this
status variable was used as a condition to determine binlog
truncation during server recovery. This patch also switches this
condition to reference the global rpl_semi_sync_slave_enabled
variable. Though note, the semi-sync recovery condition is to be
changed entirely with the MDEV-33424 agenda.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
2024-02-29 07:38:55 -07:00
Jan Lindström
41b435fea9 MDEV-33211 : Galera SST on maria-backup causes donor node to be unresponsive
If mariabackup with backup locks is used on SST we do not
pause and desync galera provider at all. If WSREP_MODE_BF_MARIABACKUP
case provider is paused and desync at BLOCK_COMMIT phase. In
other cases provider is paused and desync at BLOCK_DDL phase.
2024-02-27 20:55:54 +02:00
Monty
1c55b845e0 MDEV-32932 Port backup features from ES
Added support to BACKUP STAGE to maria-backup

This is a port of the code from ES 10.6
See MDEV-5336 for backup stages description.

The following old options are not supported by the new code:
--rsync             ; This is because rsync will not work on tables
                      that are in used.
--no-backup-locks   ; This is disabled as mariadb-backup will always
                      use backup locks for better performance.
2024-02-27 20:55:54 +02:00
Monty
0c079f4f76 Updated test cases result for s3.parition
MDEV-21472 ALTER TABLE ... ANALYZE PARTITION ... with EITS reads and locks all rows
2024-02-27 14:55:47 +02:00
mariadb-DebarunBanerjee
969669767b MDEV-33011 mariabackup --backup: FATAL ERROR: ... Can't open datafile cool_down/t3
The root cause is the WAL logging of file operation when the actual
operation fails afterwards. It creates a situation with a log entry for
a operation that would always fail. I could simulate both the backup
scenario error and Innodb recovery failure exploiting the weakness.

We are following WAL for file rename operation and once logged the
operation must eventually complete successfully, or it is a major
catastrophe. Right now, we fail for rename and handle it as normal error
and it is the problem.

I created a patch to address RENAME operation to a non existing schema
where the destination schema directory is missing. The patch checks for
the missing schema before logging in an attempt to avoid the failure
after WAL log is written/flushed. I also checked that the schema cannot
be dropped or there cannot be any race with other rename to the same
file. This is protected by the MDL lock in SQL today.

The patch should this be a good improvement over the current situation
and solves the issue at hand.
2024-02-27 17:59:20 +05:30
Thirunarayanan Balathandayuthapani
57cc8605eb MDEV-19044 Alter table corrupts while applying the modification log
Problem:
========
- InnoDB reads the length of the variable length field wrongly
while applying the modification log of instant table.

Solution:
========
rec_init_offsets_comp_ordinary(): For the temporary instant
file record, InnoDB should read the length of the variable length
field from the record itself.
2024-02-27 12:59:46 +05:30
Thirunarayanan Balathandayuthapani
2c5f3bbe78 MDEV-14193 innodb.log_file_name failed in buildbot with exception
Problem:
=======
- innodb.log_file_name fails if it executes after
innodb.lock_insert_into_empty in few cases.
innodb.lock_insert_into_empty test case failed to
cleanup the table t2. Rollback of create..select fails
to remove the table when it fails to acquire the
innodb statistics table. This leads to rename table
in log_file_name test case fails.

Solution:
========
- Cleanup the table t2 explictly after resetting
innodb_lock_wait_timeout variable in
innodb.lock_insert_into_empty test case.
2024-02-26 19:04:14 +05:30
Alexander Barkov
e63311c2cf MDEV-33496 Out of range error in AVG(YEAR(datetime)) due to a wrong data type
Functions extracting non-negative datetime components:

- YEAR(dt),        EXTRACT(YEAR FROM dt)
- QUARTER(td),     EXTRACT(QUARTER FROM dt)
- MONTH(dt),       EXTRACT(MONTH FROM dt)
- WEEK(dt),        EXTRACT(WEEK FROM dt)
- HOUR(dt),
- MINUTE(dt),
- SECOND(dt),
- MICROSECOND(dt),
- DAYOFYEAR(dt)
- EXTRACT(YEAR_MONTH FROM dt)

did not set their max_length properly, so in the DECIMAL
context they created a too small DECIMAL column, which
led to the 'Out of range value' error.

The problem is that most of these functions historically
returned the signed INT data type.

There were two simple ways to fix these functions:
1. Add +1 to max_length.
   But this would also change their size in the string context
   and create too long VARCHAR columns, with +1 excessive size.

2. Preserve max_length, but change the data type from INT to INT UNSIGNED.
   But this would break backward compatibility.
   Also, using UNSIGNED is generally not desirable,
   it's better to stay with signed when possible.

This fix implements another solution, which it makes all these functions
work well in all contexts: int, decimal, string.

Fix details:

- Adding a new special class Type_handler_long_ge0 - the data type
  handler for expressions which:
  * should look like normal signed INT
  * but which known not to return negative values
  Expressions handled by Type_handler_long_ge0 store in Item::max_length
  only the number of digits, without adding +1 for the sign.

- Fixing Item_extract to use Type_handler_long_ge0
  for non-negative datetime components:
   YEAR, YEAR_MONTH, QUARTER, MONTH, WEEK

- Adding a new abstract class Item_long_ge0_func, for functions
  returning non-negative datetime components.
  Item_long_ge0_func uses Type_handler_long_ge0 as the type handler.
  The class hierarchy now looks as follows:

Item_long_ge0_func
  Item_long_func_date_field
    Item_func_to_days
    Item_func_dayofmonth
    Item_func_dayofyear
    Item_func_quarter
    Item_func_year
  Item_long_func_time_field
    Item_func_hour
    Item_func_minute
    Item_func_second
    Item_func_microsecond

- Cleanup: EXTRACT(QUARTER FROM dt) created an excessive VARCHAR column
  in string context. Changing its length from 2 to 1.
2024-02-23 18:30:06 +04:00