Main problem was that no log-event print function checked for disk
full error on the IO_CACHE.
All changes in this patch only affects mysqlbinlog, not the server!
- Changed all log-event print functions to return 1 on error
- Fixed memory usage when not using --flashback.
- Added printing of number of rows in row events. Can be disabled with
--print-row-count=0
- Print annotated rows when using mysqlbinlog --short-form
- Fixed that mysqlbinlog --debug works
- Fixed create_drop_binlog.test test failure
- Reorganized fields in PRINT_EVENT_INFO to be according to size to
optimize storage
- Don't change print_row_event_position or print_row_counts if set by user
- Remove some testing of argument to my_free is 0
- base64-output=never is now supported and works in all context
- Updated help information for --base64-output and --short-form
- print_row_count is now on by default. Reset automatically if --short-form
is used
- Removed obsolote warning for mysql 5.6.0
- More DBUG_PRINT for mysqltest.cc
- my_b_write_byte() now checks for flush failures. This fixed a memory
overrun on disk full
- my_b_printf() now returns 1 on failure, 0 on ok. This simplifies code
and no old code was using the old return value of my_b_printf().
- my_b_Write_backtick_quote() now returns 1 on failure and 0 on ok
- Fixed some error conditions in log printing that was not previously
handled.
- Slave_rows_error_report() can now handle longlong positions
- Write_on_release_cache() rewritten so that we can detect errors
on flush. Not depending on automatic release anymore.
- Changed types for Pos and End_log_pos to 64 bit in SHOW BINLOG EVENTS
- Fixed that copy_event_cache_to_string_and_reinit() works with strings
longer than 4G (Changed to use LEX_STRING instead of String)
- Restricted binlog_rows_event_max_size to UINT32_MAX-1 as String's are
anyway restricted to UINT32_MAX
- Fixed bug in rpl_binlog_state::write_to_iocache() which hide write
failures (duplicate variable name)
- Fixed bug in String::append if original string was not allocated
- Stop mysqlbinlog output at once if there is an error.
- Before printing error message, flush result file. This ensures that
the error message is printed last. (Easier to find)
Problem:- Gtid are not transferred in Galera Cluster.
Solution:- We need to transfer gtid in the case on either when cluster is
slave/master in async replication. In normal Gtid replication gtid are generated on
recieving node itself and it is always on sync with other nodes. Because galera keeps
node in sync , So all nodes get same no of event groups. So the issue arises when
say galera is slave in async replication.
A
| (Async replication)
D <-> E <-> F {Galera replication}
So what should happen is that all node should apply the master gtid but this does
node happen, becuase node E, F does not recieve gtid from D in write set , So what E(or F)
does is that it applies wsrep_gtid_domain_id, D server-id , E gtid next seq no. This
generated gtid does not always work when say A has different domain id.
So In this commit, on galera node when we see that this event is recieved from master
we simply write Gtid_Log_Event in write_set and send it to other nodes.
This was missing bug fix from MySQL wsrep i.e. Galera.
Problem was that if stored procedure declares a handler that
catches deadlock error, then the error may have been
cleared in method sp_rcontext::handle_sql_condition().
Use wsrep_conflict_state correctly to determine is the
error already sent to client.
Add test case for both this bug and MDEV-12837: WSREP: BF
lock wait long. Test requires both fixes to pass.
This was missing bug fix from MySQL wsrep i.e. Galera.
Problem was that if stored procedure declares a handler that
catches deadlock error, then the error may have been
cleared in method sp_rcontext::handle_sql_condition().
Use wsrep_conflict_state correctly to determine is the
error already sent to client.
Add test case for both this bug and MDEV-12837: WSREP: BF
lock wait long. Test requires both fixes to pass.
With combination of --log-bin and Galera the server may crash
reporting two characteristic stacks:
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0xc7)[0x7f182a8e2cb7]
/usr/sbin/mysqld(binlog_background_thread+0x2b5)[0x7f182a8e3275]
or
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7ff395b2dafd]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7ff395b2db91]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7ff395b300b2]
The reason of the failure appears to be non-matching decrements for
`xid_count_per_binlog::xid_count`
which can occur when a transaction is executed having its connection issued
`SET @@sql_log_bin=0`. In such case the xid count is not incremented but
its decrements still runs to turn `binlog_xid_count_list` into improper state
which the following FLUSH BINARY LOGS exposes through the crash.
*Note_1*: the regression test reuses an existing galera.sql_log_bin
which does not run stably (even in its base form) by mtr with --log-bin.
*Note_2*: 10.0-galera branch is free of this issue having missed MDEV-7205
fixes.
* created tests focusing in multi-master conflicts during cascading foreign key
processing
* in row0upd.cc, calling wsrep_row_ups_check_foreign_constraints only when
running in cluster
* in row0ins.cc fixed regression from MW-369, which caused crash with MW-402.test
It is possible for a stored procedure that has an error handler
that catches SQLEXCEPTION to call thd->clear_error() on a thd
that failed certification. And because the error is cleared,
wsrep patch proceeds with the normal path and may try to commit
statements that should actually abort.
This patch catches the situation where wsrep_conflict_state is
still set, but the thd's error has been cleared, and rolls back
the statement in such cases.
This was done to get more information about where time is spent.
Now we can get proper timing for time spent in commit, rollback,
binlog write etc.
Following stages was added:
- Commit
- Commit_implicit
- Rollback
- Rollback implicit
- Binlog write
- Init for update
- This is used instead of "Init" for insert, update and delete.
- Staring cleanup
Following stages where changed:
- "Unlocking tables" stage reset stage to previous stage at end
- "binlog write" stage resets stage to previous stage at end
- "end" -> "end of update loop"
- "cleaning up" -> "Reset for next command"
- Added stage_searching_rows_for_update when searching for rows
to be deleted.
Other things:
- Renamed all stages to start with big letter (before there was no
consitency)
- Increased performance_schema_max_stage_classes from 150 to 160.
- Most of the test changes in performance schema comes from renaming of
stages.
- Removed duplicate output of variables and inital state in a lot of
performance schema tests.
This was done to make it easier to change a default value for a
performance variable without affecting all tests.
- Added start_server_variables.test to check configuration
- Removed some duplicate "closing tables" stages
- Updated position for "stage_init_update" and "stage_updating" for
delete, insert and update to be just before update loop (for more
exact timing).
- Don't set "Checking permissions" twice in a row.
- Remove stage_end stage from creating views (not done for create table
either).
- Updated default performance history size from 10 to 20 because of new
stages
- Ensure that ps_enabled is correct (to be used in a later patch)
Problem:- This crash happens because of thd = NULL , and while checking
for wsrep_on , we no longer check for thd != NULL (MDEV-7955). So this
problem is regression of MDEV-7955. However this patch not only solves
this regression , It solves all regression caused by MDEV-7955 patch.
To get all possible cases when thd can be null , assert(thd)/
assert(trx->mysql_thd) is place just before all wsrep_on and innodb test
suite is run. And the assert which caused failure are removed with a physical
check for thd != NULL. Rest assert are removed. Hopefully this method will
remove all current/potential regression of MDEV-7955.
* created tests focusing in multi-master conflicts during cascading foreign key
processing
* in row0upd.cc, calling wsrep_row_ups_check_foreign_constraints only when
running in cluster
* in row0ins.cc fixed regression from MW-369, which caused crash with MW-402.test
galera.galera_kill_applier: Make the test less likely to fail
by adding sleep time.
galera.query_cache: Remove data truncation.
Part of the test file looks like it has been misinterpreted as latin1
and wrongly converted to UTF-8 encoding. In MariaDB 10.1, the server would
only warn about data truncation and not issue an error. 10.2 is stricter.
(The test should be carefully reviewed if it really makes sense.)
For running the Galera tests, the variable my_disable_leak_check
was set to true in order to avoid assertions due to memory leaks
at shutdown.
Some adjustments due to MDEV-13625 (merge InnoDB tests from MySQL 5.6)
were performed. The most notable behaviour changes from 10.0 and 10.1
are the following:
* innodb.innodb-table-online: adjustments for the DROP COLUMN
behaviour change (MDEV-11114, MDEV-13613)
* innodb.innodb-index-online-fk: the removal of a (1,NULL) record
from the result; originally removed in MySQL 5.7 in the
Oracle Bug #16244691 fix
377774689b
* innodb.create-index-debug: disabled due to MDEV-13680
(the MySQL Bug #77497 fix was not merged from 5.6 to 5.7.10)
* innodb.innodb-alter-autoinc: MariaDB 10.2 behaves like MySQL 5.6/5.7,
while MariaDB 10.0 and 10.1 assign different values when
auto_increment_increment or auto_increment_offset are used.
Also MySQL 5.6/5.7 exhibit different behaviour between
LGORITHM=INPLACE and ALGORITHM=COPY, so something needs to be tested
and fixed in both MariaDB 10.0 and 10.2.
* innodb.innodb-wl5980-alter: disabled because it would trigger an
InnoDB assertion failure (MDEV-13668 may need additional effort in 10.2)
Merged from mysql-wsrep-bugs following:
GCF-1058 MTR test galera.MW-86 fails on repeated runs
Wait for the sync point sync.wsrep_apply_cb to be reached before
executing the test and clearing the debug flag sync.wsrep_apply_cb.
The race scenario:
Intended behavior:
node2: set sync.wsrep_apply_cb in order to start waiting in the background INSERT
node1: INSERT start
node2 (background): INSERT start
node1: INSERT end
node2: send signal to background INSERT: "stop waiting and continue executing"
node2: clear sync.wsrep_apply_cb as no longer needed
node2 (background): consume the signal
node2 (background): INSERT end
node2: DROP TABLE
node2: check no pending signals are left - ok
What happens occasionally (unexpected):
node2: set sync.wsrep_apply_cb in order to start waiting in the background INSERT
node1: INSERT start
node2 (background): INSERT start
node1: INSERT end
// The background INSERT still has _not_ reached the place where it starts
// waiting for the signal:
// DBUG_EXECUTE_IF("sync.wsrep_apply_cb", "now wait_for...");
node2: send signal to background INSERT: "stop waiting and continue executing"
node2: clear sync.wsrep_apply_cb as no longer needed
// The background INSERT reaches DBUG_EXECUTE_IF("sync.wsrep_apply_cb", ...)
// but sync.wsrep_apply_cb has already been cleared and the "wait" code is not
// executed. The signal remains unconsumed.
node2 (background): INSERT end
node2: DROP TABLE
node2: check no pending signals are left - failure, signal.wsrep_apply_cb is
pending (not consumed)
Remove MW-360 test case as it is not intended for MariaDB (uses
MySQL GTID).