The problem was that when using clang + asan, we do not get a correct value
for the thread stack as some local variables are not allocated at the
normal stack.
It looks like that for example clang 18.1.3, when compiling with
-O2 -fsanitize=addressan it puts local variables and things allocated by
alloca() in other areas than on the stack.
The following code shows the issue
Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection
(connect=0x5080000027b8,
put_in_cache=<optimized out>) at sql/sql_connect.cc:1399
THD *thd;
1399 thd->thread_stack= (char*) &thd;
(gdb) p &thd
(THD **) 0x7fffedee7060
(gdb) p $sp
(void *) 0x7fffef4e7bc0
The address of thd is 24M away from the stack pointer
(gdb) info reg
...
rsp 0x7fffef4e7bc0 0x7fffef4e7bc0
...
r13 0x7fffedee7060 140737185214560
r13 is pointing to the address of the thd. Probably some kind of
"local stack" used by the sanitizer
I have verified this with gdb on a recursive call that calls alloca()
in a loop. In this case all objects was stored in a local heap,
not on the stack.
To solve this issue in a portable way, I have added two functions:
my_get_stack_pointer() returns the address of the current stack pointer.
The code is using asm instructions for intel 32/64 bit, powerpc,
arm 32/64 bit and sparc 32/64 bit.
Supported compilers are gcc, clang and MSVC.
For MSVC 64 bit we are using _AddressOfReturnAddress()
As a fallback for other compilers/arch we use the address of a local
variable.
my_get_stack_bounds() that will return the address of the base stack
and stack size using pthread_attr_getstack() or NtCurrentTed() with
fallback to using the address of a local variable and user provided
stack size.
Server changes are:
- Moving setting of thread_stack to THD::store_globals() using
my_get_stack_bounds().
- Removing setting of thd->thread_stack, except in functions that
allocates a lot on the stack before calling store_globals(). When
using estimates for stack start, we reduce stack_size with
MY_STACK_SAFE_MARGIN (8192) to take into account the stack used
before calling store_globals().
I also added a unittest, stack_allocation-t, to verify the new code.
Reviewed-by: Sergei Golubchik <serg@mariadb.org>
This commit fixes GTID inconsistency which was injected by mariabackup SST.
Donor node now writes new info file: donor_galera_info, which is streamed
along the mariabackup donation to the joiner node. The donor_galera_info
file contains both GTID and gtid domain_id, and joiner will use these to
initialize the GTID state.
Commit has new mtr test case: galera_3nodes.galera_gtid_consistency, which
exercises potentially harmful mariabackup SST scenarios. The test has also
scenario with IST joining.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This commit checks the validity of value change of wsrep_sst_method variable.
The validity check is same as happens in donor node when incoming SST request
is parsed.
The commit has also a mtr test: wsrep.wsrep_variables_sst_method which verifies
that wsrep_sst_method can be succesfully changed to acceptable values and that
the SET command results in error if invalid value was entered.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Do not allow setting wsrep_sst_donor as NULL as it is
incorrect value. User can use value '' (default) that represents
same as NULL. Setting wsrep_cluster_address to NULL is
already handled correctly.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Updated wsrep-lib to version in which server_state
wait_until_state() and sst_received() were changed to report
errors via return codes instead of throwing exceptions. Added
error handling accordingly.
Tested manually that failure in sst_received() which was
caused by server misconfiguration (unknown configuration variable
in server configuration) does not cause crash due to uncaught
exception.
There are separate flags DBUG_OFF for disabling the DBUG facility
and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility.
Let us allow debug builds without DEBUG_SYNC.
Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to
define ENABLED_DEBUG_SYNC.
Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
This commit fixes problems with parsing ipv6 addresses given via
the wsrep_sst_receive_address and wsrep_node_address options.
Also, this commit removes extra lines in the configuration files
in the mtr test suites for Galera related to these parameters.
We will remove the parameter innodb_disallow_writes because it is badly
designed and implemented. The parameter was never allowed at startup.
It was only internally used by Galera snapshot transfer.
If a user executed
SET GLOBAL innodb_disallow_writes=ON;
the server could hang even on subsequent read operations.
During Galera snapshot transfer, we will block writes
to implement an rsync friendly snapshot, as follows:
sst_flush_tables() will acquire a global lock by executing
FLUSH TABLES WITH READ LOCK, which will block any writes
at the high level.
sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true),
will suspend or disable InnoDB background tasks or threads that could
initiate writes. As part of this, log_make_checkpoint() will be invoked
to ensure that anything in the InnoDB buf_pool.flush_list will be written
to the data files. This has the nice side effect that the Galera joiner
will avoid crash recovery.
The changes to sql/wsrep.cc and to the tests are based on a prototype
that was developed by Jan Lindström.
Reviewed by: Jan Lindström
This commit adds a mtr test for reproducing a test scenario where despite of
innodb_disallow_writes blocking, writes to file system can still happen.
The test launches a garbd node, which triggers one of the cluster node to switch to
SST donor state. In this state, all disk activity should be halted, and e.g.
innodb_disallow_writes has been set. The test records md5sum aggregate over mariadb
data directory when the node enters the donor state, and records another md5sum
when the node leaves the donor state. If there is no IO activity in data directory, these
hashes should be equal.
For this test, the Donor state processing, has beeen instrumented so that, SST donor thread can be
stopped when entering the donor state. The test uses this new dbug sync point,
to control when to record the md5sums.
New SST script was added: wsrep_sst_backup, and garbd uses backup method to lauch the donor
node to call this script, and to enter in donor state.
The backup script could be later extended as general purpose backup method for the cluster.
This commit fixes also one race condition happening in wsrep_sst_rsync, like this:
* wsrep_rsync_sst script requests for flush tables,
and then waits in a loop until mariadbd has created file tables_flushed,
as confirmation that FLUSH TABLES has completed
* mariadbd's SST donor thread, wakes for the flush table request and then performs FTWRL,
and after this it creates the tables_flushed file
* note that SST script will now continue to startup rsync sending
* mariadbd's SST donor thread now calls for sst_disallow_writes(),
so that innodb would setup disk IO blockage, however rsyncing may already be ongoing at this point
This race condition is fixed in this commit, by performing all disk IO blocking before
creating the tables_flushed file.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
wsrep_sst_common did not correctly set name for binlog index
file if custom binlog name was used and this name was
not added to script command line.
Added test case for both log_basename and log_binlog.
wsrep_sst_common did not correctly set name for binlog index
file if custom binlog name was used and this name was
not added to script command line.
Added test case for both log_basename and log_binlog.
Fixed bugs caused by inaccuracies in automatic merging
from other branches:
1) Authentication information is not removed from the connection
address, which causes some tests to fail;
2) wsrep_debug=on should be replaced with wsrep_debug=1;
3) Added missing "connection" lines to test result file;
4) Some tests have been corrected for Galera 4.x (10.4+).
1) This commit implements reading all sections from configuration
files while looking for the current value of any server variable,
which were previously only read from the [mysqld.suffix] group and
from [mysqld], but not from other groups such as [mariadb.suffix],
[mariadb] or, for example, [server].
2) This commit also fixes misrecognition of some parameters when
parsing a command line containing a special marker for the end
of the list of options ("--") or when short option names (such
as "-s", "-a" and "-h arg") chained together (like a "-sah arg").
Such parameters can be passed to the SST script in the list of
arguments after "--mysqld-args" if the server is started with a
complex set of options - this was revealed during manual testing
of changes to read configuration files.
3) The server-side preparation code for the "--mysqld-args"
option list has also been simplified to make it easier to change
in the future (if needed), and has been improved to properly
handle the special backquote ("`") character in the argument
values.
Fixed bugs caused by inaccuracies in automatic merging
from other branches:
1) Authentication information is not removed from the connection
address, which causes some tests to fail;
2) wsrep_debug=on should be replaced with wsrep_debug=1;
3) Added missing "connection" lines to test result file;
4) Some tests have been corrected for Galera 4.x (10.4+).
1) This commit implements reading all sections from configuration
files while looking for the current value of any server variable,
which were previously only read from the [mysqld.suffix] group and
from [mysqld], but not from other groups such as [mariadb.suffix],
[mariadb] or, for example, [server].
2) This commit also fixes misrecognition of some parameters when
parsing a command line containing a special marker for the end
of the list of options ("--") or when short option names (such
as "-s", "-a" and "-h arg") chained together (like a "-sah arg").
Such parameters can be passed to the SST script in the list of
arguments after "--mysqld-args" if the server is started with a
complex set of options - this was revealed during manual testing
of changes to read configuration files.
3) The server-side preparation code for the "--mysqld-args"
option list has also been simplified to make it easier to change
in the future (if needed), and has been improved to properly
handle the special backquote ("`") character in the argument
values.
1) This commit implements reading all sections from configuration
files while looking for the current value of any server variable,
which were previously only read from the [mysqld.suffix] group and
from [mysqld], but not from other groups such as [mariadb.suffix],
[mariadb] or, for example, [server].
2) This commit also fixes misrecognition of some parameters when
parsing a command line containing a special marker for the end
of the list of options ("--") or when short option names (such
as "-s", "-a" and "-h arg") chained together (like a "-sah arg").
Such parameters can be passed to the SST script in the list of
arguments after "--mysqld-args" if the server is started with a
complex set of options - this was revealed during manual testing
of changes to read configuration files.
3) The server-side preparation code for the "--mysqld-args"
option list has also been simplified to make it easier to change
in the future (if needed), and has been improved to properly
handle the special backquote ("`") character in the argument
values.
and configuration.
1. Pass joiner's authentication information to donor together with address
in State Transfer Request. This allows joiner to authenticate donor on
connection. Previously joiner would accept data from anywhere.
2. Deprecate custom SSL configuration variables tca, tcert and tkey in favor
of more familiar ssl-ca, ssl-cert and ssl-key. For backward compatibility
tca, tcert and tkey are still supported.
3. Allow falling back to server-wide SSL configuration in [mysqld] if no SSL
configuration is found in [sst] section of the config file.
4. Introduce ssl-mode variable in [sst] section that takes standard values
and has following effects:
- old-style SSL configuration present in [sst]: no effect
otherwise:
- ssl-mode=DISABLED or absent: retains old, backward compatible behavior
and ignores any other SSL configuration
- ssl-mode=VERIFY*: verify joiner's certificate and CN on donor,
verify donor's secret on joiner
(passed to donor via State Transfer Request)
BACKWARD INCOMPATIBLE BEHAVIOR
- anything else enables new SSL configuration convetions but does not
require verification
ssl-mode should be set to VERIFY only in a fully upgraded cluster.
Examples:
[mysqld]
ssl-cert=/path/to/cert
ssl-key=/path/to/key
ssl-ca=/path/to/ca
[sst]
-- server-wide SSL configuration is ignored, SST does not use SSL
[mysqld]
ssl-cert=/path/to/cert
ssl-key=/path/to/key
ssl-ca=/path/to/ca
[sst]
ssl-mode=REQUIRED
-- use server-wide SSL configuration for SST but don't attempt to
verify the peer identity
[sst]
ssl-cert=/path/to/cert
ssl-key=/path/to/key
ssl-ca=/path/to/ca
ssl-mode=VERIFY_CA
-- use SST-specific SSL configuration for SST and require verification
on both sides
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
* Table should have primary key
* Enable wsrep_sync_wait before final selects
* Enable autocommit before final selects.
* Fix joiner monitoring in case of mysqldump.
* Add wait_conditions to stabilize
The test requires adaptation for MariaDB, which is done
in this patch. In addition, this patch includes a fix for
the SST script startup code that adds escaping for special
characters on the command line (in case they are contained
in the arguments to mysqld). The fix does not require
separate tests, as the required tests are already part
of the mtr suite for Galera.