Commit graph

610 commits

Author SHA1 Message Date
Sergei Golubchik
152921174d fix sporadic versioning.rpl_row failures 2022-06-16 09:57:31 +02:00
Sergei Golubchik
bf2bdd1a1a Merge branch '10.8' into 10.9 2022-05-19 14:07:55 +02:00
Sergei Golubchik
99a433ed1c Merge branch '10.6' into 10.7 2022-05-18 10:34:38 +02:00
Sergei Golubchik
b2187662bc Merge branch '10.5' into 10.6 2022-05-18 10:30:47 +02:00
Sergei Golubchik
7970ac7fe8 Merge branch '10.4' into 10.5 2022-05-18 09:50:26 +02:00
Sergei Golubchik
23ddc3518f Merge branch '10.3' into 10.4 2022-05-18 01:25:30 +02:00
Aleksey Midenkov
107623c5c5 MDEV-28552 Assertion `inited==RND' failed in handler::ha_rnd_end
We cannot permanently change bits in read_partitions in the middle of
processing because ha_rnd_init()/ha_rnd_end() depends on that.
2022-05-18 01:22:29 +02:00
Sergei Golubchik
fd132be117 Merge branch '10.6' into 10.7 2022-05-11 11:25:33 +02:00
Sergei Golubchik
3bc98a4ec4 Merge branch '10.5' into 10.6 2022-05-10 14:01:23 +02:00
Sergei Golubchik
ef781162ff Merge branch '10.4' into 10.5 2022-05-09 22:04:06 +02:00
Sergei Golubchik
a70a1cf3f4 Merge branch '10.3' into 10.4 2022-05-08 23:03:08 +02:00
Aleksey Midenkov
706a8232da MDEV-25477 Auto-create breaks replication when triggering event was not replicated
If UPDATE/DELETE does not change data it is skipped from
replication. We now force replication of such events when they trigger
partition auto-creation.

For ROLLBACK it is as simple as set OPTION_KEEP_LOG
flag. trans_cannot_safely_rollback() does the rest.

For UPDATE/DELETE .. LIMIT 0 we make additional binlog_query() calls
at the early points of return.

As a safety measure we also convert row format into statement if it is
needed. The condition is decided by
binlog_need_stmt_format(). Basically if there are some row events in
cache we don't need that: table open of row event will trigger
auto-creation anyway.

Multi-update/delete works via mysql_select(). There is no early points
of return, so binlogging is always checked by
send_eof()/abort_resultset(). But we must comply with the above
measure of converting into statement.
2022-05-06 15:11:02 +03:00
Aleksey Midenkov
92bfc0e8c4 MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
:: Syntax change ::

Keyword AUTO enables history partition auto-creation.

Examples:

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME INTERVAL 1 MONTH
    STARTS '2021-01-01 00:00:00' AUTO PARTITIONS 12;

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO;

Or with explicit partitions:

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO
    (PARTITION p0 HISTORY, PARTITION pn CURRENT);

To disable or enable auto-creation one can use ALTER TABLE by adding
or removing AUTO from partitioning specification:

    CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;

    # Disables auto-creation:
    ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR;

    # Enables auto-creation:
    ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;

If the rest of partitioning specification is identical to CREATE TABLE
no repartitioning will be done (for details see MDEV-27328).

:: Description ::

Before executing history-generating DML command (see the list of commands below)
add N history partitions, so that N would be sufficient for potentially
generated history. N > 1 may be required when history partitions are switched
by INTERVAL and current_timestamp is N times further than the interval
boundary of the last history partition.

If the last history partition equals or exceeds LIMIT records then new history
partition is created and selected as the working partition. According to
MDEV-28411 partitions cannot be switched (or created) while the command is
running. Thus LIMIT does not carry strict limitation and the history partition
size must be planned as LIMIT value plus average number of history one DML
command can generate.

Auto-creation is implemented by synchronous fast_alter_partition_table() call
from the thread of the executed DML command before the command itself is run
(by the fallback and retry mechanism similar to Discovery feature,
see Open_table_context).

The name for newly added partitions are generated like default partition names
with extension of MDEV-22155 (which avoids name clashes by extending assignment
counter to next free-enough gap).

These DML commands can trigger auto-creation:

    DELETE (including multitable DELETE, excluding DELETE HISTORY)
    UPDATE (including multitable UPDATE)
    REPLACE (including REPLACE .. SELECT)
    INSERT .. ON DUPLICATE KEY UPDATE (including INSERT .. SELECT .. ODKU)
    LOAD DATA .. REPLACE

:: Bug fixes ::

MDEV-23642 Locking timeout caused by auto-creation affects original DML

    The reasons for this are:

    - Do not disrupt main business process (the history is auxiliary service);

    - Consequences are non-fatal (history is not lost, but comes into wrong
      partition; fixed by partitioning rebuild);

    - There is more freedom for application to fail in this case or not: it may
      read warning info and find corresponding error number.

    - While non-failing command is easy to handle by an application and fail it,
      the opposite is hard to handle: there is no automatic actions to fix
      failed command and retry, DBA intervention is required and until then
      application is non-functioning.

MDEV-23639 Auto-create does not work under LOCK TABLES or inside triggers

    Don't do tdc_remove_table() for OT_ADD_HISTORY_PARTITION because it is
    not possible in locked tables mode.

    LTM_LOCK_TABLES mode (and LTM_PRELOCKED_UNDER_LOCK_TABLES) works out
    of the box as fast_alter_partition_table() can reopen tables via
    locked_tables_list.

    In LTM_PRELOCKED we reopen and relock table manually.

:: More fixes ::

* some_table_marked_for_reopen flag fix

  some_table_marked_for_reopen affets only reopen of
  m_locked_tables. I.e. Locked_tables_list::reopen_tables() reopens only
  tables from m_locked_tables.

* Unused can_recover_from_failed_open() condition

  Is recover_from_failed_open() can be really used after
  open_and_process_routine()?

:: Reviewed by ::

Sergei Golubchik <serg@mariadb.org>
2022-05-06 15:11:02 +03:00
Aleksey Midenkov
75ede427e4 MDEV-27328 Change of SYSTEM_TIME partitioning options is not possible without data copy
When we need to add/remove or change LIMIT, INTERVAL, AUTO we have to
recreate partitioning from scratch (via data copy). Such operations
should be done fast. To remove options like LIMIT or INTERVAL one
should write:

  alter table t1 partition by system_time;

The command checks whether it is new or existing SYSTEM_TIME
partitioning. And in the case of new it behaves as CREATE would do:
adds default number of partitions (2). If SYSTEM_TIME partitioning
already existed it just changes its options: removes unspecified ones
and adds/changes those specified explicitly. In case when partitions
list was supplied it behaves as usual: does full repartitioning.

Examples:

  create or replace table t1 (x int) with system versioning
  partition by system_time limit 100 partitions 4;

  # Change LIMIT
  alter table t1 partition by system_time limit 33;

  # Remove LIMIT
  alter table t1 partition by system_time;

  # This does full repartitioning
  alter table t1 partition by system_time limit 33 partitions 4;

  # This does data copy as pruning will require records in correct partitions
  alter table t1 partition by system_time interval 1 hour
  starts '2000-01-01 00:00:00';

  # But this works fast, LIMIT will apply to DML commands
  alter table t1 partition by system_time limit 33;

To sum up, ALTER for SYSTEM_TIME partitioning does full repartitioning
when:

  - INTERVAL was added or changed;
  - partition list or partition number was specified;

Otherwise it does fast alter table.

Cleaned up dead condition in set_up_default_partitions().

Reviewed by:

  Oleksandr Byelkin <sanja@mariadb.com>
  Nikita Malyavin <nikitamalyavin@gmail.com>
2022-05-06 10:45:17 +03:00
Aleksey Midenkov
ddc416c606 MDEV-20077 Warning on full history partition is delayed until next DML statement
Moved LIMIT warning from vers_set_hist_part() to new call
vers_check_limit() at table unlock phase. At that point
read_partitions bitmap is already pruned by DML code (see
prune_partitions(), find_used_partitions()) so we have to set
corresponding bits for working history partition.

Also we don't do my_error(ME_WARNING|ME_ERROR_LOG), because at that
point it doesn't update warnings number, so command reports 0 warnings
(but warning list is still updated). Instead we do
push_warning_printf() and sql_print_warning() separately.

Under LOCK TABLES external_lock(F_UNLCK) is not executed. There is
start_stmt(), but no corresponding "stop_stmt()". So for that mode we
call vers_check_limit() directly from close_thread_tables().

Test result has been changed according to new LIMIT and warning
printing algorithm. For convenience all LIMIT warnings are marked with
"You see warning above ^".

TODO MDEV-20345 fixed. Now vers_history_generating() contains
fine-grained list of DML-commands that can generate history (and TODO
mechanism worked well).
2022-04-29 13:31:42 +03:00
Aleksey Midenkov
ea2f09979f MDEV-28271 Assertion on TRUNCATE PARTITION for PARTITION BY SYSTEM_TIME
Like in MDEV-27217 vers_set_hist_part() for LIMIT depends on all
partitions selected in read_partitions. That bugfix just disabled
partition selection for DELETE with this check:

  if (table->pos_in_table_list &&
      table->pos_in_table_list->partition_names)
  {
    return HA_ERR_PARTITION_LIST;
  }

ALTER TABLE TRUNCATE PARTITION is a different story. First, it doesn't
update pos_in_table_list->partition_names, but
thd->lex->alter_info.partition_names. But we cannot depend on that
since alter_info will be stale for DML. Second, we should not disable
TRUNCATE PARTITION for that to be consistent with TRUNCATE TABLE
behavior.

Now we don't do vers_set_hist_part() for ALTER TABLE as this command
is not DML, so it does not produce history.
2022-04-29 13:31:41 +03:00
Marko Mäkelä
638afc4acf Merge 10.6 into 10.7 2022-04-26 18:59:40 +03:00
Aleksey Midenkov
9286c9e647 MDEV-28254 Wrong position for row_start, row_end after adding column to implicit versioned table
Implicit system-versioned table does not contain system fields in SHOW
CREATE. Therefore after mysqldump recovery such table has system
fields in the last place in frm image. The original table meanwhile
does not guarantee these system fields on last place because adding
new fields via ALTER TABLE places them last. Thus the order of fields
may be different between master and slave, so row-based replication
may fail.

To fix this on ALTER TABLE we now place system-invisible fields always
last in frm image. If the table was created via old revision and has
an incorrect order of fields it can be fixed via any copy operation of
ALTER TABLE, f.ex.:

  ALTER TABLE t1 FORCE;

To check the order of fields in frm file one can use hexdump:

  hexdump -C t1.frm

Note, the replication fails only when all 3 conditions are met:

  1. row-based or mixed mode replication;
  2. table has new fields added via ALTER TABLE;
  3. table was rebuilt on some, but not all nodes via mysqldump image.

Otherwise it will operate properly even with incorrect order of
fields.
2022-04-22 15:49:37 +03:00
Aleksey Midenkov
88a9f13a90 MDEV-25546 LIMIT partitioning does not respect ROLLBACK
vers_info->hist_part retained stale value after ROLLBACK. The
algorithm in vers_set_hist_part() continued iteration from that value.

The simplest solution is to process partitions each time from start
for LIMIT in vers_set_hist_part().
2022-04-22 15:49:37 +03:00
Marko Mäkelä
fae0ccad6e Merge 10.5 into 10.6 2022-04-21 17:46:40 +03:00
Daniel Black
580cbd18b3 Merge branch 10.4 into 10.5
A few of constaint -> constraint
2022-04-21 15:47:03 +10:00
Oleg Smirnov
39cc2545af MDEV-24529 Assertion failed in vers_select_conds_t::print
This commit adds processing of SYSTEM_TIME_BEFORE and SYSTEM_TIME_HISTORY
to vers_select_conds_t::print().
2022-04-18 11:19:34 +03:00
Marko Mäkelä
2d8e38bc94 Merge 10.6 into 10.7 2022-04-06 13:00:09 +03:00
Marko Mäkelä
9d94c60f2b Merge 10.5 into 10.6 2022-04-06 12:08:30 +03:00
Marko Mäkelä
cacb61b6be Merge 10.4 into 10.5 2022-04-06 10:06:39 +03:00
Marko Mäkelä
d6d66c6e90 Merge 10.3 into 10.4 2022-04-06 08:59:09 +03:00
Sergei Golubchik
d7fd76456e MDEV-19525 fix the test for embedded
followup for 58cd2a8ded
2022-04-05 13:09:44 +02:00
Aleksey Midenkov
1e859d4abc MDEV-22973 Assertion in compare_record upon multi-update involving versioned table via view
records_are_comparable() requires this condition:

  bitmap_is_subset(table->write_set, table->read_set)

On first iteration vers_update_fields() changes write_set and
read_set. On second iteration the above condition fails.

Added missing read bit for ROW_START. Also reorganized
bitmap_set_bit() so it is called only when needed.
2022-03-29 13:44:14 +03:00
Aleksey Midenkov
58cd2a8ded MDEV-19525 remove ER_VERS_FIELD_WRONG_TYPE from init_from_binary_frm_image()
Throw ER_NOT_FORM_FILE if this is wrong FRM data (warning with
ER_VERS_FIELD_WRONG_TYPE is still printed for deeper knowledge of what
was happened).

Keep ER_VERS_FIELD_WRONG_TYPE for creating partitioned table with
trx-versioning. Tested by MDEV-15951 in trx_id.test
2022-03-29 13:44:14 +03:00
Daniel Black
8b92e346b1 Merge 10.6 into 10.7 2022-03-25 14:31:59 +11:00
Daniel Black
ec62f46a61 Merge 10.5 to 10.6 2022-03-25 11:31:49 +11:00
Marko Mäkelä
75b7cd680b MDEV-23974 Tests fail due to [Warning] InnoDB: Trying to delete tablespace
A few regression tests invoke heavy flushing of the buffer pool
and may trigger warnings that tablespaces could not be deleted
because of pending writes. Those warnings are to be expected
during the execution of such tests.

The warnings are also frequently seen with Valgrind or MemorySanitizer.
For those, the global suppression in have_innodb.inc does the trick.
2022-03-23 16:42:43 +02:00
Marko Mäkelä
af87186c1d Merge 10.6 into 10.7 2022-03-08 09:51:31 +02:00
Vlad Lesin
202316a38f Merge 10.5 into 10.6 2022-03-07 18:42:47 +03:00
Vlad Lesin
0b92c7b0e0 Merge 10.4 into 10.5 2022-03-07 17:16:11 +03:00
Vlad Lesin
1ec3205703 Merge 10.3 into 10.4 2022-03-07 16:46:00 +03:00
Vlad Lesin
86c1bf118a MDEV-27992 DELETE fails to delete record after blocking is released
MDEV-27025 allows to insert records before the record on which DELETE is
locked, as a result the DELETE misses those records, what causes serious ACID
violation.

Revert MDEV-27025, MDEV-27550. The test which shows the scenario of ACID
violation is added.
2022-03-07 16:42:05 +03:00
Vlad Lesin
f6f055a191 Merge 10.3 into 10.4 2022-02-21 14:10:27 +03:00
Vlad Lesin
5f001bd7b8 MDEV-27025 insert-intention lock conflicts with waiting ORDINARY lock
The code was backported from 10.5 be8113861c
commit. See that commit message for details.
2022-02-21 12:49:54 +03:00
Oleksandr Byelkin
9ed8deb656 Merge branch '10.6' into 10.7 2022-02-04 14:11:46 +01:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Oleksandr Byelkin
a576a1cea5 Merge branch '10.3' into 10.4 2022-01-30 09:46:52 +01:00
Marko Mäkelä
5e6fd4e804 Merge 10.6 into 10.7 2022-01-20 08:02:58 +02:00
Vlad Lesin
be8113861c MDEV-27025 insert-intention lock conflicts with waiting ORDINARY lock
The code was backported from 10.6 bd03c0e516
commit. See that commit message for details.

Apart from the above commit trx_lock_t::wait_trx was also backported from
MDEV-24738. trx_lock_t::wait_trx is protected with lock_sys.wait_mutex
in 10.6, but that mutex was implemented only in MDEV-24789. As there is no
need to backport MDEV-24789 for MDEV-27025,
trx_lock_t::wait_trx is protected with the same mutexes as
trx_lock_t::wait_lock.

This fix should not break innodb-lock-schedule-algorithm=VATS. This
algorithm uses an Eldest-Transaction-First (ETF) heuristic, which prefers
older transactions over new ones. In this fix we just insert granted lock
just before the last granted lock of the same transaction, what does not
change transactions execution order.

The changes in lock_rec_create_low() should not break Galera Cluster,
there is a big "if" branch for WSREP. This branch is necessary to provide
the correct transactions execution order, and should not be changed for
the current bug fix.
2022-01-18 18:15:10 +03:00
Vlad Lesin
bd03c0e516 MDEV-27025 insert-intention lock conflicts with waiting ORDINARY lock
When lock is checked for conflict, ignore other locks on the record if
they wait for the requesting transaction.

lock_rec_has_to_wait_in_queue() iterates not all locks for
the page, but only the locks located before the waiting lock in the
queue. So there is some invariant - any lock in the queue can wait only
lock which is located before the waiting lock in the queue.

In the case when conflicting lock waits for the transaction of
requesting lock, we need to place the requesting lock before the waiting
lock in the queue to preserve the invariant. That is why we are looking
for the first waiting for requesting transation lock and place the new
lock just after the last granted requesting transaction lock before the
first waiting for requesting transaction lock.

Example:

trx1 waiting lock, trx1 granted lock, ..., trx2 lock - waiting for trx1
place new lock here -----------------^

There are also implicit locks which are lazily converted to explicit
ones, and we need to place the newly created explicit lock to the correct
place in a queue. All explicit locks converted from implicit ones are
placed just after the last non-waiting lock of the same transaction before
the first waiting for the transaction lock.

Code review and cleanup was made by Marko Mäkelä.
2022-01-18 15:18:42 +03:00
Aleksey Midenkov
585cb18ed1 MDEV-27452 TIMESTAMP(0) system field is allowed for certain creation of system-versioned table
First, we do not add VERS_UPDATE_UNVERSIONED_FLAG for system field and
that fixes SHOW CREATE result.

Second, we have to call check_sys_fields() for any CREATE TABLE and
there correct type is checked for system fields.

Third, we update system_time like as_row structures for ALTER TABLE
and that makes check_sys_fields() happy for ALTER TABLE when we make
system fields hidden.
2022-01-13 23:35:17 +03:00
Aleksey Midenkov
241ac79e49 MDEV-26778 row_start is not updated in current row for InnoDB
Update was skipped (need_update was false) because compare_record()
used HA_PARTIAL_COLUMN_READ branch and it skipped row_start check
has_explicit_value() was false. When we set bit for row_start in
has_value_set the row is updated with new row_start value.

The bug was caused by combination of MDEV-23446 and 3789692d17. The
latter one says:

  ... But generated columns that are written to the table are always
  deterministic and cannot change unless normal non-generated columns
  were changed. ...

Since MDEV-23446 generated row_start can change while non-generated
columns are not changed.

Explicit value flag came from HAS_EXPLICIT_DEFAULT which was used to
distinguish default-generated value from user-supplied one.
2022-01-13 23:35:17 +03:00
Aleksey Midenkov
4d5ae2b325 MDEV-27217 DELETE partition selection doesn't work for history partitions
LIMIT history switching requires the number of history partitions to
be marked for read: from first to last non-empty plus one empty. The
least we can do is to fail with error message if the needed partition
was not marked for read. As this is handler interface we require new
handler error code to display user-friendly error message.

Switching by INTERVAL works out-of-the-box with
ER_ROW_DOES_NOT_MATCH_GIVEN_PARTITION_SET error.
2022-01-13 23:35:16 +03:00
Aleksey Midenkov
f9f6b190cc Versioning test suite cleanups
Merged truncate_privilege and sysvars-notembedded into not_embedded.test
Merged partition_innodb into trx_id.test
2022-01-13 23:35:16 +03:00