mariadb/mysql-test/suite
Monty 232533978f MDEV-36290: ALTER TABLE with multi-master can cause data loss
One can have data loss in multi-master setups when 1) both masters
update the same table, 2) ALTER TABLE is run on one master which
re-arranges the column ordering, and 3) transactions are binlogged
in ROW binlog_format.

This is because the slave assumes that all columns are in the same
order on the master and slave and all columns on the master also
exists on the slave. This happens even if binlog_row_metadata=FULL is
used.  If this is not the case, this will lead to silent data loss.

A new option for slave_type_conversions bit field,
ERROR_IF_MISSING_FIELD, has been added. This allows the user to define if
the slave should abort replication if it is missing some field that
existed on the master. This option is off by default to keep things
compatible with earlier versions.
If a field is missing on the slave and log_warnings >= 1, a warning
will be logged to the error log.

This patch fixes this, when binlog_row_metadata=FULL is used on the
master, by mapping fields with identical names on the master and slave.
If slave has fields that does not exist in the row event, these will
be set to their default value.

The main idea is that we added two conversion tables:
m_tabledef.master_to_slave_map[master_column_index] -> slave_column_index
and m_tabledef.master_to_slave_error[master_column_index] which contains
an error number if the master_column does not exists on the slave or
it is not possible to convert the master data to the slave column.
master_to_slave_error[#] contains 0 if the column exists and is compatible.

General code changes:
- Instead of looping over row fields in the order of slave table
  we are instead looping over fields in the order of the binary log.
- We are using table->write_set to know which fields should be updated
  on the slave. This is reflected in unpack_row
- We are calling TABLE::mark_columns_per_binlog_row_image() to ensure
  that rpl_write_set is properly set. This is needed if the slave also
  is doing binary logging.
- Before replication aborted if the master and slave tables where too
  different.  Now replication is only aborted if the row actually uses
  columns that does not exists on the slave (and ALLOW_MISSING_FIELDS
  is not used) or uses columns that cannot be converted.
  - Instead of giving errors in compatible_with(), used when table is
    accessed by first the row event, we are instead giving errors
    when we examine a row event and notice that it is accessing
    a not existing or not compatible field.

Other code changes:
- Removed conv_table argument from compatible_with() and store it
  directly in RPL_TABLE_LIST->m_conv_table
- table_def::compatible_with() returns now 1 on error (not 0).
- Remove m_width and skip arguments from prepare_record() as we are
  now using table->write_set() to check which elements need a default
  value.
- Moved DBUG_ENTER() to it's proper place (after variable
  declarations) in a few functions.
- Some changes in unpack_row():
  - Replaced null_mask and null_ptr with an indexed bit check for
    simplicity.
  - Removed check of rgi == null and table_found which never worked.
  - Updated comments to reflect current code.
  - Indentation changes as the code now uses 'continue' instead of
    'if-else' in the main loop.
  - The code to throw away 'extra master fields' is not needed as we
    are now looping over fields in binary log, not over fields in
    slave table.
- fill_extra_persistent_columns() is now using table->cond_set to know
  which columns where not updated from binlog.
- Simplified get_table_data(TABLE *table_arg) by returning found
  table_list.
- Errors for row events are now initialized in compatible_with(),
  checked in check_wrong_column_usage() and reported in
  give_compatibility_error().

Test cases and some code patchs provide by Brandon Nesterenko
<brandon.nesterenko@mariadb.com>
2025-05-19 19:57:47 +03:00
..
archive test: archive-big test too big for msan 2025-04-07 11:04:53 +02:00
atomic MDEV-36666 - atomic.alter_table still times out often 2025-04-25 10:40:47 +04:00
binlog Merge branch '10.5' into 10.6 2025-01-29 11:17:38 +01:00
binlog_encryption Merge 10.5 into 10.6 2025-03-26 17:09:57 +02:00
client
compat Merge branch '10.5' into 10.6 2025-01-29 11:17:38 +01:00
csv Backporting bugs fixes fixed by MDEV-31340 from 11.5 2024-05-21 14:58:01 +04:00
encryption MDEV-36180 Doublewrite recovery of innodb_checksum_algorithm=full_crc32 page_compressed pages does not work 2025-03-26 12:03:44 +01:00
engines Remove dates from all rdiff files 2025-01-05 16:40:11 +02:00
federated MDEV-31846: enable cursor protocol for test federatedx_create_handlers 2025-04-07 11:04:53 +02:00
funcs_1 Merge branch '10.5' into 10.6 2024-12-17 11:06:09 +11:00
funcs_2 Merge 10.5 into 10.6 2025-03-26 17:09:57 +02:00
galera Merge branch '10.5' into 10.6 2025-04-26 10:41:52 +02:00
galera_3nodes MDEV-36516 : galera_3nodes.galera_gtid_2_cluster test failed on 10.5 2025-04-22 20:44:10 +02:00
galera_3nodes_sr galera mtr tests: synchronization between branches and editions 2025-04-02 04:50:11 +02:00
galera_sr Merge branch '10.5' into '10.6' 2025-04-02 04:43:24 +02:00
gcol Merge branch '10.5' into 10.6 2025-04-21 10:43:17 +02:00
handler Merge branch '10.5' into 10.6 2024-12-17 11:06:09 +11:00
heap Merge 10.5 into 10.6 2025-01-20 09:57:37 +02:00
innodb MDEV-36639 innodb_snapshot_isolation=1 gives error for not committed row changes 2025-04-22 20:41:43 +03:00
innodb_fts MDEV-36420 Assertion failure in SET GLOBAL innodb_ft_aux_table 2025-03-28 09:05:20 +02:00
innodb_gis Merge 10.5 into 10.6 2025-03-26 17:09:57 +02:00
innodb_i_s
innodb_zip Merge branch '10.5' into 10.6 2025-01-29 11:17:38 +01:00
jp
json MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
large_tests fix failing large_tests.maria_recover_encrypted 2024-04-22 17:22:11 +02:00
maria MDEV-35469 Heap tables are calling mallocs to often 2025-01-05 16:40:11 +02:00
mariabackup Merge 10.5 into 10.6 2025-03-26 17:09:57 +02:00
mtr/t Remove dates from all rdiff files 2025-01-05 16:40:11 +02:00
mtr2
multi_source MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure 2024-12-12 18:02:00 +02:00
optimizer_unfixed_bugs
parts Merge 10.5 into 10.6 2025-03-26 17:09:57 +02:00
perfschema Suppress processist_state='buffer pool load' 2025-02-03 08:29:52 +02:00
perfschema_stress
period Remove dates from all rdiff files 2025-01-05 16:40:11 +02:00
plugins Merge branch '10.5' into 10.6 2025-04-21 10:43:17 +02:00
roles Merge 10.5 into 10.6 2025-01-20 09:57:37 +02:00
rpl MDEV-36290: ALTER TABLE with multi-master can cause data loss 2025-05-19 19:57:47 +03:00
s3 Merge 10.5 into 10.6 2025-01-20 09:57:37 +02:00
sql_sequence Merge branch '10.5' into 10.6 2025-04-26 10:41:52 +02:00
storage_engine
stress MDEV-34453 Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1 2024-09-20 20:26:43 +05:30
sys_vars MDEV-36290: ALTER TABLE with multi-master can cause data loss 2025-05-19 19:57:47 +03:00
sysschema Merge 10.5 into 10.6 2024-03-12 09:19:57 +02:00
unit
vcol Merge 10.5 into 10.6 2025-01-20 09:57:37 +02:00
versioning Merge branch '10.5' into '10.6' 2025-04-02 04:43:24 +02:00
wsrep Merge branch '10.5' into '10.6' 2025-04-02 04:43:24 +02:00