Commit graph

2852 commits

Author SHA1 Message Date
ParadoxV5
8d6ff0db25 MDEV-27669: Add skip-slave-start info message
When a slave does not start up the slave threads on restart,
but not reporting anything to the error log about startup failures
either, this can be due to `skip-slave-start` being set in the
config file(s) or on the command line (and most likely is).

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
2025-03-18 18:31:28 +01:00
Sergey Vojtovich
feb1cf9086 Corrections to parent "fix typos" commmit 2025-03-14 12:08:56 +04:00
Vasilii Lakhin
717c12de0e Fix typos in C comments inside sql/ 2025-03-14 12:08:56 +04:00
ParadoxV5
5091986cea misc. sql/slave.cc & co. refactor
* `get_master_version_and_clock()` de-duplicate label using fall-through
* `io_slave_killed()` & `check_io_slave_killed()`:
  * reüse the result from the level lower
  * add distinguishing docs
* `try_to_reconnect()`: extract `'` from `if`-`else`
* `handle_slave_io()`: Both `while`s have the same condition;
  looks like the outer `while` can simply be an `if`.
* `connect_to_master()`:
  * assume `mysql_errno()` is not 0 on connection error
  * utilize 0’s falsiness in the loop
  * extend docs
* `sql/sql_show.cc`: refactor SHOW ALL REPLICAS filter’s condition
* `sql/mysqld.cc`: init `master-retry-count` with `master_retry_count`

Reviewed-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-02-26 20:37:53 -07:00
ParadoxV5
e2dbd9b6ac MDEV-35304: Add Connects_Tried and Master_Retry_Count to SSS
When the IO thread (re)connect to a primary,
no updates are available besides unique errors that cause the failure.
These new `Master_info` numbers supplement SHOW SLAVE STATUS’s (most-
recent) ‘Connecting’ state with statistics on (re)connect attempts:

* `Connects_Tried`: how many retries have been attempted so far
  This was previously a local variable that only counted re-attempts;
  it’s now meaningful even after the “Connecting” state concludes.
* `Master_Retry_Count` (from MDEV-25674): out of how many configured

Side-note: Some of the tests updated by this commit dump the entire
SHOW SLAVE STATUS, which might include non-deterministic entries.

Reviewed-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
2025-02-26 20:37:53 -07:00
ParadoxV5
7094a75596 MDEV-25674: Add CHANGE MASTER TO master_retry_count
This new CHANGE MASTER TO field specifies the `--master-retry-count`
(global option: the number of Primary connection attempts)
for each multi-source replica; i.e, per-channel `performance_schema.`
`replication_connection_configuration.CONNECTION_RETRY_COUNT`.

`--master-retry-count` remains the default for new `CHANGE MASTER TO`s.

This new keyword and `master-info` entry
matches those of pre-‘REPLICATION SOURCE’ MySQL.
2025-02-26 20:37:53 -07:00
ParadoxV5
66f52ba630 slave.cc try_to_reconnect remove retry_counter
`try_to_reconnect()` wraps `safe_reconnect()` with logging, but the
latter already loops reconnection attempts up to `master_retry_count`
times with `mi->connect_retry`-msec sleeps inbetween.
This means `try_to_reconnect()` has been counting disconnects of the
replication session (since it doesn’t loop) while `safe_reconnect()`
was counting actual tries (which may be multiple per disconnect).
In practice, this outer counter’s only benefit was to override the edge
case `--master-retry-count=0` that the inner loop doesn’t cover with 1.
2025-02-26 20:37:53 -07:00
Sergei Golubchik
ba01c2aaf0 Merge branch '11.4' into 11.7
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
  (checks are done in a different order)
2025-02-06 16:46:36 +01:00
ParadoxV5
697b88bf75 SHOW REPLICA STATUS: mark columns as unsigned
Update all integer columns of SHOW REPLICA STATUS (technically
INFORMATION_SCHEMA.SLAVE_STATUS) to unsigned because, well, they are (:.

Some `uint32` ones were accidentally using the `Field::store(double nr)`
overload because they forgot the `, true` for
`Field::store(longlong nr, bool unsigned_val)`.
The mistake’s harmless, fortunately, as `double` supports over 15
significant decimal digits, well over `uint32`’s 9-and-a-half.
2025-01-31 20:56:41 -07:00
Sergei Golubchik
7d657fda64 Merge branch '10.11 into 11.4 2025-01-30 12:01:11 +01:00
Sergei Golubchik
e69f8cae1a Merge branch '10.6' into 10.11 2025-01-30 11:55:13 +01:00
Kristian Nielsen
72e1cc8f52 MDEV-35806: Error in read_log_event() corrupts relay log writer, crashes server
In Log_event::read_log_event(), don't use IO_CACHE::error of the relay log's
IO_CACHE to signal an error back to the caller. When reading the active
relay log, this flag is also being used by the IO thread, and setting it can
randomly cause the IO thread to wrongly detect IO error on writing and
permanently disable the relay log.

This was seen sporadically in test case rpl.rpl_from_mysql80. The read
error set by the SQL thread in the IO_CACHE would be interpreted as a
write error by the IO thread, which would cause it to throw a fatal
error and close the relay log. And this would later cause CHANGE
MASTER to try to purge a closed relay log, resulting in nullptr crash.

SQL thread is not able to parse an event read from the relay log. This
can happen like here when replicating unknown events from a MySQL master,
potentially also for other reasons.

Also fix a mistake in my_b_flush_io_cache() introduced back in 2001
(fa09f2cd7e) where my_b_flush_io_cache() could wrongly return an error set
in IO_CACHE::error, even if the flush operation itself succeeded.

Also fix another sporadic failure in rpl.rpl_from_mysql80 where the outout
of MASTER_POS_WAIT() depended on timing of SQL and IO thread.

Reviewed-by: Monty <monty@mariadb.org>
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-01-24 09:15:20 +00:00
Marko Mäkelä
33907f9ec6 Merge 11.4 into 11.7 2024-12-02 17:51:17 +02:00
Marko Mäkelä
2719cc4925 Merge 10.11 into 11.4 2024-12-02 11:35:34 +02:00
Marko Mäkelä
3d23adb766 Merge 10.6 into 10.11 2024-11-29 13:43:17 +02:00
Marko Mäkelä
7d4077cc11 Merge 10.5 into 10.6 2024-11-29 12:37:46 +02:00
Brandon Nesterenko
dbfee9fc2b MDEV-34348: Consolidate cmp function declarations
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.

This patch consolidates the functions all into the
qsort_cmp2 interface.

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:22 -07:00
ParadoxV5
cf2d49ddcf Extract some of fixes to 10.5.x
That PR uncovered countless issues on `my_snprintf` uses.
This commit backports a squashed subset of their fixes.
2024-11-21 22:43:56 +11:00
Thirunarayanan Balathandayuthapani
074831ec61 Merge branch 10.5 into 10.6 2024-11-08 18:17:15 +05:30
Oleksandr Byelkin
9e1fb104a3 MariaDB 11.4.4 release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEF39AEP5WyjM2MAMF8WVvJMdM0dgFAmck77AACgkQ8WVvJMdM
 0dgccQ/+Lls8fWt4D+gMPP7x+drJSO/IE/gZFt3ugbWF+/p3B2xXAs5AAE83wxEh
 QSbp4DCkb/9PnuakhLmzg0lFbxMUlh4rsJ1YyiuLB2J+YgKbAc36eQQf+rtYSipd
 DT5uRk36c9wOcOXo/mMv4APEvpPXBIBdIL4VvpKFbIOE7xT24Sp767zWXdXqrB1f
 JgOQdM2ct+bvSPC55oZ5p1kqyxwvd6K6+3RB3CIpwW9zrVSLg7enT3maLjj/761s
 jvlRae+Cv+r+Hit9XpmEH6n2FYVgIJ3o3WhdAHwN0kxKabXYTg7OCB7QxDZiUHI9
 C/5goKmKaPB1PCQyuTQyLSyyK9a8nPfgn6tqw/p/ZKDQhKT9sWJv/5bSWecrVndx
 LLYifSTrFC/eXLzgPvCnNv/U8SjsZaAdMIKS681+qDJ0P5abghUIlGnMYTjYXuX1
 1B6Vrr0bdrQ3V1CLB3tpkRjpUvicrsabtuAUAP65QnEG2G9UJXklOer+DE291Gsl
 f1I0o6C1zVGAOkUUD3QEYaHD8w7hlvyfKme5oXKUm3DOjaAar5UUKLdr6prxRZL4
 ebhmGEy42Mf8fBYoeohIxmxgvv6h2Xd9xCukgPp8hFpqJGw8abg7JNZTTKH4h2IY
 J51RpD10h4eoi6WRn3opEcjexTGvZ+xNR7yYO5WxWw6VIre9IUA=
 =s+WW
 -----END PGP SIGNATURE-----

Merge tag '11.4' into 11.6

MariaDB 11.4.4 release
2024-11-08 07:17:00 +01:00
Denis Protivensky
6d5fe9ed0d MDEV-28378: Don't hang trying to peek log event past the end of log
While applying CTAS log event, we peek the relay log to see if CTAS
contains inserted rows or if it's empty.
The peek function didn't check for end-of-file condition when tried to
get the next event from the log, and thus it hanged.

The fix includes checking for end-of-file while peeking for log events
and considering returned XID_EVENT value as a sign of an empty CTAS.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-11-06 04:59:09 +01:00
Brandon Nesterenko
5290fa043b MDEV-35109 PREP: simulate_delay_semisync_slave_reply use debug_sync
This is a preparatory commit for MDEV-35109 to make its
testing code cleaner (and harden other tests too).

The DEBUG_DBUG point simulate_delay_semisync_slave_reply
up to this patch used my_sleep() to delay an ACK response,
but sleeps are prone to test failures on machines that
run tests when already having a heavy load (e.g. on
buildbot).

This patch changes this DEBUG_DBUG sleep to use DEBUG_SYNC
to coordinate exactly when a slave should send its reply,
which is safer and faster.

As DEBUG_SYNC can't be used while a server is shutting
down, to synchronize threads with SHUTDOWN WAIT FOR SLAVES
logic, we use and extend wait_for_pattern_in_file.inc to
wait for an informational error message in the logic to
indicate that the shutdown process has reached the
intended state (i.e. indicating that the shutdown has
been delayed to await semi-sync ACKs). Specifically, the
extensions are as follows:

 1. wait_for_pattern_in_file.inc is extended with parameter
    wait_for_pattern_count as a number that indicates the
    number of times a pattern should occur in the file before
    return control back to the calling script.

 2. search_for_pattern_in_file.inc is extended with parameter
    SEARCH_ABORT_IS_SUCCESS to inverse the error/success
    logic, so the SEARCH_ABORT condition can be used to
    indicate success, rather than error.
2024-11-04 10:45:58 -07:00
Oleksandr Byelkin
c770bce898 Merge branch '11.2' into 11.4 2024-10-30 15:11:17 +01:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Oleksandr Byelkin
3d0fb15028 Merge branch '10.6' into 10.11 2024-10-29 15:24:38 +01:00
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
Monty
bddbef3573 MDEV-34533 asan error about stack overflow when writing record in Aria
The problem was that when using clang + asan, we do not get a correct value
for the thread stack as some local variables are not allocated at the
normal stack.

It looks like that for example clang 18.1.3, when compiling with
-O2 -fsanitize=addressan it puts local variables and things allocated by
alloca() in other areas than on the stack.

The following code shows the issue

Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection
    (connect=0x5080000027b8,
    put_in_cache=<optimized out>) at sql/sql_connect.cc:1399

THD *thd;
1399      thd->thread_stack= (char*) &thd;
(gdb) p &thd
(THD **) 0x7fffedee7060
(gdb) p $sp
(void *) 0x7fffef4e7bc0

The address of thd is 24M away from the stack pointer

(gdb) info reg
...
rsp            0x7fffef4e7bc0      0x7fffef4e7bc0
...
r13            0x7fffedee7060      140737185214560

r13 is pointing to the address of the thd. Probably some kind of
"local stack" used by the sanitizer

I have verified this with gdb on a recursive call that calls alloca()
in a loop. In this case all objects was stored in a local heap,
not on the stack.

To solve this issue in a portable way, I have added two functions:

my_get_stack_pointer() returns the address of the current stack pointer.
The code is using asm instructions for intel 32/64 bit, powerpc,
arm 32/64 bit and sparc 32/64 bit.
Supported compilers are gcc, clang and MSVC.
For MSVC 64 bit we are using _AddressOfReturnAddress()

As a fallback for other compilers/arch we use the address of a local
variable.

my_get_stack_bounds() that will return the address of the base stack
and stack size using pthread_attr_getstack() or NtCurrentTed() with
fallback to using the address of a local variable and user provided
stack size.

Server changes are:

- Moving setting of thread_stack to THD::store_globals() using
  my_get_stack_bounds().
- Removing setting of thd->thread_stack, except in functions that
  allocates a lot on the stack before calling store_globals().  When
  using estimates for stack start, we reduce stack_size with
  MY_STACK_SAFE_MARGIN (8192) to take into account the stack used
  before calling store_globals().

I also added a unittest, stack_allocation-t, to verify the new code.

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
2024-10-16 17:24:46 +03:00
Marko Mäkelä
43465352b9 Merge 11.4 into 11.6 2024-10-03 16:09:56 +03:00
Marko Mäkelä
b53b81e937 Merge 11.2 into 11.4 2024-10-03 14:32:14 +03:00
Marko Mäkelä
12a91b57e2 Merge 10.11 into 11.2 2024-10-03 13:24:43 +03:00
Marko Mäkelä
63913ce5af Merge 10.6 into 10.11 2024-10-03 10:55:08 +03:00
Marko Mäkelä
7e0afb1c73 Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
Brandon Nesterenko
68938d2b42 MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail
The failing test case validates Seconds_Behind_Master for a delayed
slave, while STOP SLAVE is executed during a delay. The test fixes
initially added to the test (commit b04c857596) added a table lock
to ensure a transaction could not finish before validating the
Seconds_Behind_Master field after SLAVE START, but did not address a
possibility that the transaction could finish before running the
STOP SLAVE command, which invalidates the validations for the rest
of the test case. Specifically, this would result in 1) a timeout in
“Waiting for table metadata lock” on the replica, which expects the
transaction to retry after slave restart and hit a lock conflict on
the locked tables (added in b04c857596), and 2) that
Seconds_Behind_Master should have increased, but did not.

The failure can be reproduced by synchronizing the slave to the master
before the MDEV-32265 echo statement (i.e. before the SLAVE STOP).

This patch fixes the test by adding a mechanism to use DEBUG_SYNC to
synchronize a MASTER_DELAY, rather than continually increase the
duration of the delay each time the test fails on buildbot. This is
to ensure that on slow machines, a delay does not pass before the
test gets a chance to validate results. Additionally, it decreases
overall test time because the test can continue immediately after
validation, thereby bypassing the remainder of a full delay for each
transaction.
2024-09-17 06:29:20 -06:00
Oleksandr Byelkin
d6444022ca Merge branch 'bb-11.5-release' into bb-11.6-release 2024-08-06 17:28:38 +02:00
Oleksandr Byelkin
ea75a0b600 Merge branch '11.4' into 11.5 2024-08-05 17:50:18 +02:00
Oleksandr Byelkin
1640c9b06e Merge branch '11.2' into 11.4 2024-08-04 17:27:48 +02:00
Oleksandr Byelkin
80abd847da Merge branch '10.11' into 11.1 2024-08-03 09:32:42 +02:00
Monty
25b5c63905 MDEV-33856: Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps
This commit adds 3 new status variables to 'show all slaves status':

- Master_last_event_time ; timestamp of the last event read from the
  master by the IO thread.
- Slave_last_event_time ; Master timestamp of the last event committed
  on the slave.
- Master_Slave_time_diff: The difference of the above two timestamps.

All the above variables are NULL until the slave has started and the
slave has read one query event from the master that changes data.

- Added information_schema.slave_status, which allows us to remove:
   - show_master_info(), show_master_info_get_fields(),
     send_show_master_info_data(), show_all_master_info()
   - class Sql_cmd_show_slave_status.
   - Protocol::store(I_List<i_string_pair>* str_list) as it is not
     used anymore.
- Changed old SHOW SLAVE STATUS and SHOW ALL SLAVES STATUS to
  use the SELECT code path, as all other SHOW ... STATUS commands.

Other things:
- Xid_log_time is set to time of commit to allow slave that reads the
  binary log to calculate Master_last_event_time and
  Slave_last_event_time.
  This is needed as there is not 'exec_time' for row events.
- Fixed that Load_log_event calculates exec_time identically to
  Query_event.
- Updated RESET SLAVE to reset Master/Slave_last_event_time
- Updated SQL thread's update on first transaction read-in to
  only update Slave_last_event_time on group events.
- Fixed possible (unlikely) bugs in sql_show.cc ...old_format() functions
  if allocation of 'field' would fail.

Reviewed By:
Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Kristian Nielsen <knielsen@knielsen-hq.org>
2024-07-25 08:57:27 -06:00
Oleksandr Byelkin
0fe39d368a Merge branch '10.6' into 10.11 2024-07-22 15:14:50 +02:00
Monty
e0cff1e72b Fixed failure in rpl.rpl_change_master_demote : "IO thread should not be running..."
The issue was that the test did not take into account that the IO thread
could have been in COMMAND=Connecting state, which happens before the
COMMANMD=Slave_IO state.

The test is a bit fragile as it depends on the COMMAND state to be
syncronised with the Slave_IO_State, which is not the case.

I added a new proc state and some more information to the error
output to be able to diagnose future failures more easily.
2024-07-11 11:15:47 +03:00
Vladislav Vaintroub
584fc85e21 MDEV-32537 Name threads to improve debugging experience and diagnostics.
Use SetThreadDescription/pthread_setname_np to give threads a name.
2024-07-09 13:17:20 +02:00
Julius Goryavsky
4026f04425 Merge branch 10.5 into 10.6 2024-07-09 11:56:47 +02:00
Brandon Nesterenko
744580d5a7 MDEV-32892: IO Thread Reports False Error When Stopped During Connecting to Primary
The IO thread can report error code 2013 into the error log when it
is stopped during the initial connection process to the primary, as
well as when trying to read an event. However, because the IO thread
is being stopped, its connection to the primary is force-killed by
the signaling thread (see THD::awake_no_mutex()), and thereby these
connection errors should be ignored.

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
2024-07-08 10:39:17 -06:00
Vladislav Vaintroub
7c5fdc9b6a MDEV-33748 get rid of pthread_(get_/set_)specific, use thread_local
Apart from better performance when accessing thread local variables,
we'll get rid of things that depend on initialization/cleanup of
pthread_key_t variables.

Where appropriate, use compiler-dependent pre-C++11 thread-local
equivalents, where it makes sense, to avoid initialization check overhead
that non-static thread_local can suffer from.
2024-06-21 13:46:41 +02:00
Monty
24c57165d5 ALTER TABLE and replication should convert old row_end timestamps to new timestamp range
MDEV-32188 make TIMESTAMP use whole 32-bit unsigned range

- Added --update-history option to mariadb-dump to change 2038
  row_end timestamp to 2106.
- Updated ALTER TABLE ... to convert old row_end timestamps to
  2106 timestamp for tables created before MariaDB 11.4.0.
- Fixed bug in CHECK TABLE where we wrongly suggested to USE REPAIR
  TABLE when ALTER TABLE...FORCE is needed.
- mariadb-check printed table names that where used with REPAIR TABLE but
  did not print table names used with ALTER TABLE or with name repair.
  Fixed by always printing a table that is fixed if --silent is not
  used.
- Added TABLE::vers_fix_old_timestamp() that will change max-timestamp
  for versioned tables when replication from a pre-11.4.0 server.

A few test cases changed. This is caused by:
- CHECK TABLE now prints 'Please do ALTER TABLE... instead of
  'Please do REPAIR TABLE' when there is a problem with the information
  in the .frm file (for example a very old frm file).
- mariadb-check now prints repaired table names.
- mariadb-check also now prints nicer error message in case ALTER TABLE
  is needed to repair a table.
2024-05-27 12:39:03 +02:00
Monty
2464ee758a MDEV-33655 Remove alter_algorithm
Remove alter_algorithm but keep the variable as no-op (with a warning).

The reasons for removing alter_algorithm are:
- alter_algorithm was introduced as a replacement for the
  old_alter_table that was used to force the usage of the original
  alter table algorithm (copy) in the cases where the new alter
  algorithm did not work. The new option was added as a way to force
  the usage of a specific algorithm when it should instead have made
  it possible to disable algorithms that would not work for some
  reason.
- alter_algorithm introduced some cases where ALTER TABLE would not
  work without specifying the ALGORITHM=XXX option together with
  ALTER TABLE.
- Having different values of alter_algorithm on master and slave could
  cause slave to stop unexpectedly.
- ALTER TABLE FORCE, as used by mariadb-upgrade, would not always work
  if alter_algorithm was set for the server.
- As part of the MDEV-33449 "improving repair of tables" it become
  clear that alter- algorithm made it harder to provide a better and
  more consistent ALTER TABLE FORCE and REPAIR TABLE and it would be
  better to remove it.
2024-05-27 12:39:03 +02:00
Monty
dfdedd46e4 MDEV-32188 make TIMESTAMP use whole 32-bit unsigned range
This patch extends the timestamp from
2038-01-19 03:14:07.999999 to 2106-02-07 06:28:15.999999
for 64 bit hardware and OS where 'long' is 64 bits.
This is true for 64 bit Linux but not for Windows.

This is done by treating the 32 bit stored int as unsigned instead of
signed.  This is safe as MariaDB has never accepted dates before the epoch
(1970).
The benefit of this approach that for normal timestamp the storage is
compatible with earlier version.

However for tables using system versioning we before stored a
timestamp with the year 2038 as the 'max timestamp', which is used to
detect current values.  This patch stores the new 2106 year max value
as the max timestamp. This means that old tables using system
versioning needs to be updated with mariadb-upgrade when moving them
to 11.4. That will be done in a separate commit.
2024-05-27 12:39:02 +02:00
Monty
ce6cce85d4 MDEV-33620 Improve times and states in show processlist for replication
- Slave_IO thread time is now reset between reading events. Before
  this commit Slave_IO always showed "Waiting for master to send
  event" and the time was from SLAVE START. Now it shows time since
  reading last event.
2024-05-27 12:39:02 +02:00
Oleksandr Byelkin
dd7d9d7fb1 Merge branch '11.4' into 11.5 2024-05-23 17:01:43 +02:00
Oleksandr Byelkin
99b370e023 Merge branch '11.2' into 11.4 2024-05-21 19:38:51 +02:00