This patch fixes a bug in the error handling in parallel replication, when one
worker thread gets a failure and other worker threads processing later
transactions have to rollback and abort.
The problem was with the lifetime of group_commit_orderer objects (GCOs).
A GCO is freed when we register that its last event group has committed. This
relies on register_wait_for_prior_commit() and wait_for_prior_commit() to
ensure that the fact that T2 has committed implies that any earlier T1 has
also committed, and can thus no longer execute mark_start_commit().
However, in the error case, the code was skipping the
register_wait_for_prior_commit() and wait_for_prior_commit() calls. Thus
commit ordering was not guaranteed, and a GCO could be freed too early. Then a
later mark_start_commit() would reference deallocated GCO, which could lead to
lost wakeup (causing slave threads to hang) or other corruption.
This patch makes also the error case respect commit order. This way, also the
error case gets the GCO lifetime correct, and the hang no longer occurs.
When a transaction in parallel replication needs to retry (eg. because of
deadlock kill), first wait for all prior transactions to commit before doing
the retry. This way, we avoid the retry once again conflicting with a prior
transaction, requiring yet another retry.
Without this patch, we saw "in the wild" that transactions had to be retried
more than 10 times to succeed, which exceeds the default
--slave_transaction_retries value and is in any case undesirable.
(We already do this in 10.1 in "optimistic" parallel replication mode; this
patch just makes the code use the same logic for "conservative" mode (only
mode in 10.0)).
MDEV-6218 Wrong result of CHAR_LENGTH(non-BMP-character) with 3-byte utf8
- Moving get_text() as a method to Lex_input_stream.
- Moving the unescaping part into a separate function,
this piece of code will later go to /strings most likely.
- Removing Lex_input_string::yytoklen, as it's not needed any more.
On BigEndian build fails with:
[ 109s]
/home/abuild/rpmbuild/BUILD/mariadb-10.0.17/storage/cassandra/ha_cassandra.cc:890:22:
error: invalid conversion from 'longlong*
{aka long long int*}
' to 'long long int' [-fpermissive]
[ 109s] value->x.long_value= (longlong *)*cass_data;
[ 109s] ^
This commit fixes it
Signed-off-by: Dinar Valeev <dvaleev@suse.com>
The patch for optimistic parallel replication as a memory optimisation moved
the gco->installed field into a bit in gco->flags. However, that is just plain
wrong. The gco->flags field is owned by the SQL driver thread, but
gco->installed is used by the worker threads, so this will cause a race
condition.
The user-visible problem might be conflicts between transactions and/or slave
threads hanging.
So revert this part of the optimistic parallel replication patch, going back
to using a separate field gco->installed like in 10.0.
Currently crypt data is written to file space always. Use
that to obtain random IV for every object (file).
Beatify code to confort InnoDB coding styles.
Conflicts:
storage/innobase/fil/fil0crypt.cc
storage/xtradb/fil/fil0crypt.cc
Fix for failing tests.
* Update mysql_system_tables_fix.sql to makeup the differences in system
tables in 5.1.17 (main.system_mysql_db_fix50117)
* Removed system_mysql_db tests for versions 5.0.30 & 4.1.23.
Currently crypt data is written to file space always. Use
that to obtain random IV for every object (file).
Beatify code to confort InnoDB coding styles.
innodb_buffer_pool_pages_total depends on page size. On Power8 it is 65k
compared to 4k on Intel. As we round allocations on page size we may get
slightly more memory for buffer pool.
Sort XA RECOVER as rows order depend on endianness.