MDEV-5262: Missing retry after temp error in parallel replication

Handle retry of event groups that span multiple relay log files. - If retry reaches the end of one relay log file, move on to the next. - Handle refcounting of relay log files, and avoid purging relay log files until all event groups have completed that might have needed them for transaction retry.
2026-04-26 18:25:30 +02:00 · 2014-05-15 15:52:08 +02:00 · 2014-05-15 15:52:08 +02:00 · 787c470cef
commit 787c470cef
parent d60915692c
8 changed files with 269 additions and 49 deletions
--- a/sql/rpl_rli.h
+++ b/sql/rpl_rli.h
@ -170,6 +170,7 @@ public:
  */
  inuse_relaylog *inuse_relaylog_list;
  inuse_relaylog *last_inuse_relaylog;
+  my_atomic_rwlock_t inuse_relaylog_atomic_lock;

  /*
    Needed to deal properly with cur_log getting closed and re-opened with
@ -481,12 +482,26 @@ private:
  Each rpl_group_info has a pointer to one of those, corresponding to the
  first GTID event.

-  A reference count keeps track of how long a relay log is potentially in use.
+  A pair of reference count keeps track of how long a relay log is potentially
+  in use. When the `completed' flag is set, all events have been read out of
+  the relay log, but the log might still be needed for retry in worker
+  threads.  As worker threads complete an event group, they increment
+  atomically the `dequeued_count' with number of events queued. Thus, when
+  completed is set and dequeued_count equals queued_count, the relay log file
+  is finally done with and can be purged.
+
+  By separating the queued and dequeued count, only the dequeued_count needs
+  multi-thread synchronisation; the completed flag and queued_count fields
+  are only accessed by the SQL driver thread and need no synchronisation.
 */
 struct inuse_relaylog {
  inuse_relaylog *next;
-  uint64 queued_count;
-  uint64 dequeued_count;
+  /* Number of events in this relay log queued for worker threads. */
+  int64 queued_count;
+  /* Number of events completed by worker threads. */
+  volatile int64 dequeued_count;
+  /* Set when all events have been read from a relaylog. */
+  bool completed;
  char name[FN_REFLEN];
 };