This commit fixes a crash reported as MDEV-28377 and a number
of other crashes in automated tests with mtr that are related
to broken .cnf files in galera and galera_3nodes suites, which
happened when automatically migrating MDEV-26171 from 10.3 to
subsequent higher versions.
This commit fixes problems with parsing ipv6 addresses given via
the wsrep_sst_receive_address and wsrep_node_address options.
Also, this commit removes extra lines in the configuration files
in the mtr test suites for Galera related to these parameters.
This commit adds a mtr test for reproducing a test scenario where despite of
innodb_disallow_writes blocking, writes to file system can still happen.
The test launches a garbd node, which triggers one of the cluster node to switch to
SST donor state. In this state, all disk activity should be halted, and e.g.
innodb_disallow_writes has been set. The test records md5sum aggregate over mariadb
data directory when the node enters the donor state, and records another md5sum
when the node leaves the donor state. If there is no IO activity in data directory, these
hashes should be equal.
For this test, the Donor state processing, has beeen instrumented so that, SST donor thread can be
stopped when entering the donor state. The test uses this new dbug sync point,
to control when to record the md5sums.
New SST script was added: wsrep_sst_backup, and garbd uses backup method to lauch the donor
node to call this script, and to enter in donor state.
The backup script could be later extended as general purpose backup method for the cluster.
This commit fixes also one race condition happening in wsrep_sst_rsync, like this:
* wsrep_rsync_sst script requests for flush tables,
and then waits in a loop until mariadbd has created file tables_flushed,
as confirmation that FLUSH TABLES has completed
* mariadbd's SST donor thread, wakes for the flush table request and then performs FTWRL,
and after this it creates the tables_flushed file
* note that SST script will now continue to startup rsync sending
* mariadbd's SST donor thread now calls for sst_disallow_writes(),
so that innodb would setup disk IO blockage, however rsyncing may already be ongoing at this point
This race condition is fixed in this commit, by performing all disk IO blocking before
creating the tables_flushed file.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Variable `wsrep_new_cluster` now will be TRUE also when there is only `gcomm://` used
in configuration. This configuration, even without --wsrep-new-cluster,
is considered to bootstrap new cluster.
Updated galera GTID test to ignore warning message when non bootstrap
node have server-id different thant one cluster is initialized with.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
This is the first part of the fixes for MDEV-24097. This commit
contains the fixes for instability when testing Galera and when
restarting nodes quickly:
1) Protection against a "stuck" old SST process during the execution
of the new SST (after restarting the node) is now implemented for
mariabackup / xtrabackup, which should help to avoid almost all
conflicts due to the use of the same ports - both during testing
with mtr, so and when restarting nodes quickly in a production
environment.
2) Added more protection to scripts against unexpected return of
the rc != 0 (in the commands for deleting temporary files, etc).
3) Added protection against unexpected crashes during binlog transfer
(in SST scripts for rsync).
4) Spaces and some special characters in binlog filenames shouldn't
be a problem now (at the script level).
5) Daemon process termination tracking has been made more robust
against crashes due to unexpected termination of the previous SST
process while new scripts are running.
6) Reading ssl encryption parameters has been moved from specific
SST scripts to a common wsrep_sst_common.sh script, which allows
unified error handling, unified diagnostics and simplifies script
revisions in the future.
7) Improved diagnostics of errors related to the use of openssl.
8) Corrections have been made for xtrabackup-v2 (both in tests and in
the script code) that restore the work of xtrabackup with updated
versions of innodb.
9) Fixed some tests for galera_3nodes, although the complete solution
for the problem of starting three nodes at the same time on fast
machines will be done in a separate commit.
No additional tests are required as this commit fixes problems with
existing tests.
1. Galera SST scripts should use ssl_capath (not ssl_ca) for CA
directory. The current implementation tries to automatically
detect the path using the trailing slash in the ssl_ca variable
value, but this approach is not compatible with the server
configuration. Now, by analogy with the server, SST scripts
also use a separate ssl_capath variable. In addition, a similar
tcapath variable has been added for the old-style configuration
(in the "sst" section).
2. Openssl utility detection made more reliable.
3. Removed extra spaces in automatically generated command lines -
to simplify debugging of the SST scripts.
4. In general, the code for detecting the presence or absence of
auxiliary utilities has been improved - it is made more reliable
in some configurations (and for shells other than bash).
1. Galera SST scripts should use ssl_capath (not ssl_ca) for CA
directory. The current implementation tries to automatically
detect the path using the trailing slash in the ssl_ca variable
value, but this approach is not compatible with the server
configuration. Now, by analogy with the server, SST scripts
also use a separate ssl_capath variable. In addition, a similar
tcapath variable has been added for the old-style configuration
(in the "sst" section).
2. Openssl utility detection made more reliable.
3. Removed extra spaces in automatically generated command lines -
to simplify debugging of the SST scripts.
4. In general, the code for detecting the presence or absence of
auxiliary utilities has been improved - it is made more reliable
in some configurations (and for shells other than bash).
1. Galera SST scripts should use ssl_capath (not ssl_ca) for CA
directory. The current implementation tries to automatically
detect the path using the trailing slash in the ssl_ca variable
value, but this approach is not compatible with the server
configuration. Now, by analogy with the server, SST scripts
also use a separate ssl_capath variable. In addition, a similar
tcapath variable has been added for the old-style configuration
(in the "sst" section).
2. Openssl utility detection made more reliable.
3. Removed extra spaces in automatically generated command lines -
to simplify debugging of the SST scripts.
4. In general, the code for detecting the presence or absence of
auxiliary utilities has been improved - it is made more reliable
in some configurations (and for shells other than bash).
Actual problem was that we tried to calculate persistent statistics
to wsrep_schema tables in this case wsrep_streaming_log. These tables
should not have persistent statistics. Therefore, in table creation
tables should be created with STATS_PERSISTENT=0 table option. During
rolling-upgrade tables naturally already exists, thus we need to
alter them to contain STATS_PERSISTENT=0 table option.
The following features have been added:
1) Automatic addition of the pf = ip6 option for socat
when it can be recognized by the format of the connection
address;
2) Automatically add or remove extra commas at the beginning
and at the end of sockopt, for example, sockopt='pf=ip6'
and sockopt=',pf=ip6' work equally well;
Also, due to interference in the code of the get_transfer()
function, I also refactored it and now:
3) encrypt = 4 is supported not only for xtrabackup-v2,
but also for mariabackup - this can help with migration
from Percona;
4) Improved setting of 'commonname' option for encrypt=3
and encrypt=4 modes;
Another batch of changes that should make the SST process
more reliable in all scenarios:
1) Added hostname or CN verification when stunnel is used
with certificate chain verification (verifyChain = yes);
2) Added check for the absence of the stunnel utility for
mtr tests;
3) Deletion of working files before and after SST is done
more accurately;
4) rsync on joiner can be run even if the path to its
configuration file contains spaces;
5) More accurate directory creation (for data files and
for logs);
6) IST with mysqldump no longer turns off statement logging;
7) Reset password for mysqldump when password is empty but
username is specified;
8) More reliable quoting when generating statements in
wsrep_sst_mysqldump;
9) Added explicit generation of 2048-bit Diffie-Hellman
parameters for sockat < 1.7.3, by analogy with xtrabackup;
10) Compression parameters for qpress are read from all
suitable server groups in configuration file, as well as
from the [sst] and [xtrabackup] groups;
11) Added a test that checks compression using qpress;
12) Checking for optional utilities is modified to work even
if they implemented as built-in shell commands (unlikely
on real systems, but more reliable).
Another batch of changes that should make the SST process
more reliable in all scenarios:
1) Added hostname or CN verification when stunnel is used
with certificate chain verification (verifyChain = yes);
2) Added check for the absence of the stunnel utility for
mtr tests;
3) Deletion of working files before and after SST is done
more accurately;
4) rsync on joiner can be run even if the path to its
configuration file contains spaces;
5) More accurate directory creation (for data files and
for logs);
6) IST with mysqldump no longer turns off statement logging;
7) Reset password for mysqldump when password is empty but
username is specified;
8) More reliable quoting when generating statements in
wsrep_sst_mysqldump;
9) Added explicit generation of 2048-bit Diffie-Hellman
parameters for sockat < 1.7.3, by analogy with xtrabackup;
10) Compression parameters for qpress are read from all
suitable server groups in configuration file, as well as
from the [sst] and [xtrabackup] groups;
11) Added a test that checks compression using qpress;
12) Checking for optional utilities is modified to work even
if they implemented as built-in shell commands (unlikely
on real systems, but more reliable).
galera_var_wsrep_on_off : Add wait conditions to make sure DDL is
replicated before continuing.
wsrep.[variables|variables_debug] : Remove unnecessary parts
and add check to correct number of variables or skip
galera_ssl_reload: Add version check and SSL checks.
After switching to the new mariabackup interface (instead of
the outdated innobackupex interface, which is supported for
compatibility), we need to explicitly pass a path to the datadir
directory as a parameter, since in the new interface the value
of this option is not automatically set in such a way that it
always matches the SST/IST logic. This commit adds passing this
option as an explicit parameter to mariabackup. This commit also
removed unnecessary options that are not used and not supported
by mariabackup.
Also, numerous flaws in the common wsrep_sst_common script have
been fixed:
1) There are many bash-specific constructs in the script that
may not be supported by other interpreters, which can lead
to the most unexpected errors during SST, because failures
in the interpretation of bash-specific constructs lead to
incorrect parsing of arguments;
2) There is parse_cnf() function which is often called by other
scripts for the "mysqld" or "--mysqld" group, but it does not
take into account the default group suffix, which leads to
reading values only from the default group, which then leads
to errors due to reading the default values instead of the
values for a specific group;
3) Some options such as --user, --innodb-data-home-dir or --datadir
are not removed from the --mysqld-args list, although they are
processed inside scripts (and passing of these options funther
may cause problems for mariabackup);
4) If an argument that the script understands is present in
the --mysqld-args list twice, then this causes SST to fail,
instead of reading the most recent value;
5) The "--host" parameter is technically still supported among
the arguments of the SST scripts, but in reality scripts do not
work with it as expected, especially if it has an IPv6 address;
6) If the port number is absent in the --address parameter value,
but the port number is explicitly passed through the --port
argument, then the scripts for mariabackup and xtrabackup-v2
fail;
7) If a new address interface is used (with the --address parameter),
then automatic default port substitution is not performed, although
it is supported for the legacy --host/--port interface.
8) If there are spaces in the parameter values after --mysqld_args,
then their further transfer does not occur correctly, which
causes mariabackup to fail during SST - the space splits
the argument in such a way that it breaks the parsing of the
following parameters;
9) If most of the parameters that are names or paths to the files
or directories contain spaces, then SST scripts fail in an
unpredictable way due to incorrect variable substitutions;
10) If the --log-bin option is passed among the arguments of myqlds
(--mysqld-args) without a parameter, and the --binlog option
is not specified, then the script cannot substitute the default
name for binlog and cannot construct binlog name using the
--log-basename argument (which is against server specifications);
11) Tail slashes are not removed from the directory names, which,
upon further substitution, leads to the appearance of a double
slash in the file paths;
12) The explicit --binlog parameter (which is now always transmitted
from the server side) and the "hidden" --log-bin parameter in the
list of arguments after --mysqld-args are perceived as two different
parameters in different parts of the scripts, and if they are do not
match for some reason, this will lead to failures during SST;
Also, all new changes from the 10.6 branch have been migrated here,
including the latest pull requests for authentication (only the part
that concerns SST scripts).
It also fixes dozens of other bugs in all SST scripts.
Trigger `socket.ssl_reload` when FLUSH SSL is issued. To triger reloading
of certificate, key and CA, files needs to be physically changed.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
`WSREP_CLIENT` is used as condition for starting ALTER/OPTIMIZE/REPAIR TOI.
Using this condition async replicated affected DDL's will not be replicated.
Fixed by removing this condition.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
* Remove usage of wsrep_provider variable in galera_ist_restart_joiner
* Rename galera_load_provider.inc and galera_unload_provider.inc to
galera_stop_replication.inc and galera_start_replication.inc. Their
original names were no longer reflecting what these include files do.
followup for ce3a2a688d
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>