Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Sequence storage engine is not transactionl so cache will be written in
stmt_cache that is not replicated in cluster. To fix this replicate
what is available in both trans_cache and stmt_cache.
Sequences will only work when NOCACHE keyword is used when sequnce is
created. If WSREP is enabled and we don't have this keyword report error
indicting that sequence will not work correctly in cluster.
When binlog is enabled statement cache will be cleared in transaction
before COMMIT so cache generated from sequence will not be replicated.
We need to keep cache until replication.
Tests are re-recorded because of replication changes that were
introducted with this PR.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
The implementation for MDEV-17048 apperas to be direct copy from mysql version.
The group commit works differently in mariadb and the assert in wsrep_unregister_from_group_commit() is too strict.
The reason is that in: Wsrep_high_priority_service::log_dummy_write_set(), the transaction will undergo full rollback:
{
cs.before_rollback();
cs.after_rollback();
}
After that, the client's transaction state is set to be: wsrep::transaction::s_aborted.
The execution then continues execution by:
...
wsrep_register_for_group_commit(m_thd);
...
wsrep_unregister_from_group_commit(m_thd);
The bogus assert in wsrep_unregister_from_group_commit() allows only transactions states of :s_ordered_commit or s_aborting.
As the fix, I brought back the same assert as is present in MariaDB 10.4 version.
Handling of write sets, which fail in certification happens differently than
with write sets which pass certification. When certification fails, the
write set applying can be skipped and applier needs only to take care of
wsrep XID checkpointing. With current implementation, this can rush ahead
of wsrep XID checkpointing of successful write sets.
The fix in this PR registers wsrep XID checkpointing of certification failure cases in group commit,
which guarantees that XID ceckpointing order is synchronized with real committing transactions.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
* Remove dead code
* MDEV-21675 Data inconsistency after multirow insert rollback
This patch fixes data inconsistencies that happen after rollback of
multirow inserts, with binlog disabled.
For example, statements such as `INSERT INTO t1 VALUES (1,'a'),(1,'b')`
that fail with duplicate key error. In such cases the whole statement
is rolled back. However, with wsrep_emulate_binlog in effect, the
IO_CACHE would not be truncated, and the pending rows events would be
replicated to the rest of the cluster. In the above example, it would
result in row (1,'a') being replicated, whereas locally the statement
is rolled back entirely. Making the cluster inconsistent.
The patch changes the code so that prior to statement rollback,
pending rows event are removed and the stmt cache reset.
That patch also introduces MTR tests that excercise multirow insert
statements for regular, and streaming replication.
* Collect and pass apply error data to provider
* Rollback failed transaction and continue operation if provider returns
SUCCESS
* MTR tests for inconsistency voting
* MDEV-16509 Improve wsrep commit performance with binlog disabled
Release commit order critical section early after trx_commit_low() if
binlog is not transaction coordinator. In order to avoid two phase commit,
binlog_hton is not registered for THD during IO_CACHE population.
Implemented a test which verifies that the transactions release
commit order early.
This optimization will change behavior during recovery as the commit
is not two phase when binlog is off. Fixed and recorded wsrep-recover-v25
and wsrep-recover to match the behavior.
* MDEV-18730 Ordering for wsrep binlog group commit
Previously out of order execution was allowed for wsrep commits.
Established proper ordering by populating wait_for_commit
for every wsrep THD and making group commit leader to wait for
prior commits before proceeding to trx_group_commit_leader().
* MDEV-18730 Added a test case to verify correct commit ordering
* MDEV-16509, MDEV-18730 Review fixes
Use WSREP_EMULATE_BINLOG() macro to decide if the binlog_hton
should be registered. Whitespace/syntax fixes and cleanups.
* MDEV-16509 Require binlog for galera_var_innodb_disallow_writes test
If the commit to InnoDB is done in one phase, the native InnoDB behavior
is that the transaction is committed in memory before it is persisted to
disk. This means that the innodb_disallow_writes=ON may not prevent
transaction to become visible to other readers before commit is completely
over. On the other hand, if the commit is two phase (as it is with binlog),
the transaction will be blocked in prepare phase.
Fixed the test to use binlog, which enforces two phase commit, which
in turn makes commit to block before the changes become visible to
other connections. This guarantees that the test produces expected
result.
snprintf returns the number of bytes it wrote (or would have written) NOT
counting the \0 terminal character.
The buffer size it accepts as argument DOES COUNT the \0 character.
Pass the right parameter value.
"my_snprintf(NULL, 0, ...)" does not follow what snprintf does. Change
the call in wsrep_dump_rbr_buf_with_header to use snprintf (note that
wsrep_dump_rbr_buf was using snprintf all the way).
Fix an off-by-one error in comparison so it actually prints the output
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db
Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")
- Added sql/mariadb.h file that should be included first by files in sql
directory, if sql_plugin.h is not used (sql_plugin.h adds SHOW variables
that must be done before my_global.h is included)
- Removed a lot of include my_global.h from include files
- Removed include's of some files that my_global.h automatically includes
- Removed duplicated include's of my_sys.h
- Replaced include my_config.h with my_global.h
Other things
- Ensure that ut_d() is set to EXPR if ut_ad() is DEBUG_ASSERT()
If not, we will get a crash in purge_sys_t::~purge_sys_t() as
this ut_ad() code expect's that the ut_d() codes has been executed
This happens because the master writes a table_map event to the binary log, but no row event.
The slave has a check that there should always be a row event if there was a table_map event, which
causes a crash.
Fixed by remembering in the cache what kind of events are logged
and ignore cached statements which is just a table map event.
... causes MariaDB to crash
On error, the wsrep replication buffer (binlog) is dumped to a file
to aid investigations. In order to also include the binlog header,
FDLE object is also needed. This object is only available for wsrep-
threads.
Fix: Instantiate an FDLE object for non-wsrep threads.
Introduce Log_event_writer() that encapsulates
writing data to an IO_CACHE with automatic checksum calculation.
Now all events properly checksum themselves as needed.
Use Log_event_writer in MYSQL_BIN_LOG::write_cache() instead
of copy-pasting its logic all over.
Later Log_event_writer will also do encryption.
* remove unused (and not implemented) WRITE_NET type
* remove cast in my_b_write() macro. my_b_* macros are
function-like, casts are responsibility of the caller
* replace hackish _my_b_write(info,0,0) with the explicit
my_b_flush_io_cache() in my_b_write_byte()
* remove unused my_b_fill_cache()
* replace pbool -> my_bool
* make internal IO_CACHE functions static
* reformat comments, correct typos, remove obsolete comments (ISAM)
* assert valid cache type in init_functions()
* use IO_ROUND_DN() macro where appropriate
* remove unused DBUG_EXECUTE_IF in _my_b_cache_write()
* remove unnecessary __attribute__((unused))
* fix goto error in parse_file.cc
* remove redundant reinit_io_cache() in uniques.cc
* don't do reinit_io_cache() if the cache was not initialized
in ma_check.c
* extract duplicate functionality from various _my_b_*_read
functions into a common wrapper. Same for _my_b_*_write
* create _my_b_cache_write_r instead of having if's in
_my_b_cache_write (similar to existing _my_b_cache_read and
_my_b_cache_read_r)
* don't call mysql_file_write() from my_b_flush_io_cache(),
call info->write_function() instead