Commit graph

49 commits

Author SHA1 Message Date
Monty
bddbef3573 MDEV-34533 asan error about stack overflow when writing record in Aria
The problem was that when using clang + asan, we do not get a correct value
for the thread stack as some local variables are not allocated at the
normal stack.

It looks like that for example clang 18.1.3, when compiling with
-O2 -fsanitize=addressan it puts local variables and things allocated by
alloca() in other areas than on the stack.

The following code shows the issue

Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection
    (connect=0x5080000027b8,
    put_in_cache=<optimized out>) at sql/sql_connect.cc:1399

THD *thd;
1399      thd->thread_stack= (char*) &thd;
(gdb) p &thd
(THD **) 0x7fffedee7060
(gdb) p $sp
(void *) 0x7fffef4e7bc0

The address of thd is 24M away from the stack pointer

(gdb) info reg
...
rsp            0x7fffef4e7bc0      0x7fffef4e7bc0
...
r13            0x7fffedee7060      140737185214560

r13 is pointing to the address of the thd. Probably some kind of
"local stack" used by the sanitizer

I have verified this with gdb on a recursive call that calls alloca()
in a loop. In this case all objects was stored in a local heap,
not on the stack.

To solve this issue in a portable way, I have added two functions:

my_get_stack_pointer() returns the address of the current stack pointer.
The code is using asm instructions for intel 32/64 bit, powerpc,
arm 32/64 bit and sparc 32/64 bit.
Supported compilers are gcc, clang and MSVC.
For MSVC 64 bit we are using _AddressOfReturnAddress()

As a fallback for other compilers/arch we use the address of a local
variable.

my_get_stack_bounds() that will return the address of the base stack
and stack size using pthread_attr_getstack() or NtCurrentTed() with
fallback to using the address of a local variable and user provided
stack size.

Server changes are:

- Moving setting of thread_stack to THD::store_globals() using
  my_get_stack_bounds().
- Removing setting of thd->thread_stack, except in functions that
  allocates a lot on the stack before calling store_globals().  When
  using estimates for stack start, we reduce stack_size with
  MY_STACK_SAFE_MARGIN (8192) to take into account the stack used
  before calling store_globals().

I also added a unittest, stack_allocation-t, to verify the new code.

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
2024-10-16 17:24:46 +03:00
Jan Lindström
9091afdc55 MDEV-31173 : Server crashes when setting wsrep_cluster_address after adding invalid value to wsrep_allowlist table
Problem was that wsrep_schema tables were not marked as
category information. Fix allows access to wsrep_schema
tables even when node is detached.

This is 10.4-10.9 version of fix.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-08-29 13:41:23 +02:00
Oleksandr Byelkin
9b18275623 Merge branch '10.4' into 10.5 2024-04-16 11:04:14 +02:00
Kristian Nielsen
16aa4b5f59 Merge from 10.4 to 10.5
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-15 17:46:49 +02:00
Daniele Sciascia
a618ff2b1c MDEV-33216 stack-use-after-return in Wsrep_schema_impl::open_table()
Fix a case of stack-use-after-return reported by ASAN in
Wsrep_schema_impl::open_table(). This function has a stack allocated
TABLE_LIST object and return TABLE_LIST::table to the caller.
Changed the function to take a TABLE_LIST pointer as argument.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-03-28 10:55:37 +01:00
Daniele Sciascia
e0c8165487 MDEV-33509 Failed to apply write set with flags=(rollback|pa_unsafe)
Fix function `remove_fragment()` in wsrep_schema so that no error is
raised if the fragment to be removed is not found in the
wsrep_streaming_log table. This is necessary to handle the case where
streaming transaction in idle state is BF aborted. This may result in
the case where the rollbacker thread successfully removes the
transaction's fragments, followed by the applier's attempt to remove
the same fragments. Causing the node to leave the cluster after
reporting a "Failed to apply write set" error.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-03-26 05:56:37 +01:00
Oleksandr Byelkin
f52954ef42 Merge commit '10.4' into 10.5 2023-07-20 11:54:52 +02:00
Jan Lindström
9f909e546e MDEV-30197 : Missing DBUG_RETURN or DBUG_VOID_RETURN macro in function "Wsrep_schema::restore_view()"
Here user is starting server with unsupported client charset.
We need to create wsrep_schema tables using explicit latin1
charset to avoid errors in restoring view.
2023-05-23 01:10:19 +02:00
Jan Lindström
956d6c4af9 MDEV-21479 : Galera 4 unable to query cluster state if not primary component
Set mysql.wsrep_cluster and mysql.wsrep_cluster_members as
TABLE_CATEGORY_INFORMATION as mysql.wsrep_streaming_log
so that they can be queried even if node is not primary
component.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-05-16 13:11:44 +02:00
Oleksandr Byelkin
10e135b679 Merge branch 'bb-10.4-release' into bb-10.5-release 2023-05-02 15:47:10 +02:00
Daniele Sciascia
ef227762b1 MDEV-30838 Assertion `m_thd == _current_thd()'
- Update wsrep-lib which contains fix for the assertion
- Fix error handling for appending fragment to streaming log,
  make sure tables are closed after rollback.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-05-02 03:42:39 +02:00
Marko Mäkelä
c41c79650a Merge 10.4 into 10.5 2023-02-10 12:02:11 +02:00
Vicențiu Ciorbaru
08c852026d Apply clang-tidy to remove empty constructors / destructors
This patch is the result of running
run-clang-tidy -fix -header-filter=.* -checks='-*,modernize-use-equals-default' .

Code style changes have been done on top. The result of this change
leads to the following improvements:

1. Binary size reduction.
* For a -DBUILD_CONFIG=mysql_release build, the binary size is reduced by
  ~400kb.
* A raw -DCMAKE_BUILD_TYPE=Release reduces the binary size by ~1.4kb.

2. Compiler can better understand the intent of the code, thus it leads
   to more optimization possibilities. Additionally it enabled detecting
   unused variables that had an empty default constructor but not marked
   so explicitly.

   Particular change required following this patch in sql/opt_range.cc

   result_keys, an unused template class Bitmap now correctly issues
   unused variable warnings.

   Setting Bitmap template class constructor to default allows the compiler
   to identify that there are no side-effects when instantiating the class.
   Previously the compiler could not issue the warning as it assumed Bitmap
   class (being a template) would not be performing a NO-OP for its default
   constructor. This prevented the "unused variable warning".
2023-02-09 16:09:08 +02:00
Jan Lindström
ba987a46c9 Merge 10.4 into 10.5 2022-09-05 13:28:56 +03:00
Daniele Sciascia
2917bd0d2c Reduce compilation dependencies on wsrep_mysqld.h
Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2022-08-31 11:05:23 +03:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Daniele Sciascia
49e3bd2cbc MDEV-27553 Assertion `inited==INDEX' failed: in ha_index_end()
In wsrep_schema code, call ha_index_end() only if the corresponding
ha_index_init() call succeeded.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2022-01-24 09:46:21 +02:00
Jan Lindström
690c472591 MDEV-21613 : galera_sr.GCF-1018B MTR failed: Failed to open table mysql.wsrep_streaming_log for writing
Query can be bf aborted already earlier and then we should not
even try to open table.
2021-09-27 15:09:50 +03:00
Marko Mäkelä
7e2b42324c Merge 10.4 into 10.5 2021-09-24 08:42:23 +03:00
Jan Lindström
913efaa328 MDEV-26566 : galera.galera_var_cluster_address MTR failed: InnoDB: Assertion failure in file row0ins.cc line 3206
Actual problem was that we tried to calculate persistent statistics
to wsrep_schema tables in this case wsrep_streaming_log. These tables
should not have persistent statistics. Therefore, in table creation
tables should be created with STATS_PERSISTENT=0 table option. During
rolling-upgrade tables naturally already exists, thus we need to
alter them to contain STATS_PERSISTENT=0 table option.
2021-09-23 12:59:39 +03:00
Monty
b4f24c745a Merge branch '10.4' into 10.5
Fixed also an error in suite/perfschema/t/transaction_nested_events-master.opt
2021-09-15 20:23:07 +03:00
Daniele Sciascia
5527fc5861 MDEV-21613 Failed to open table mysql.wsrep_streaming_log for writing
Fix sporadic failure for MTR test galera_sr.GCF-1018B. The test
sometimes fails due to an error that is logged to the error log
unnecessarily.
A deterministic test case (included in this patch) shows that the
error is loggen when a transaction is BF aborted right before  it
opens the streaming log table to perform fragment removal. When that
happens, the attempt to open the table fails and consequently an error
is logged. There is no need to log this error, as an ER_LOCK_DEADLOCK
error is returned to the client.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-09-14 11:38:03 +03:00
Marko Mäkelä
7b48da4d7e Merge 10.4 into 10.5 2021-04-08 07:47:49 +03:00
Jan Lindström
5b71e0424c MDEV-21402 : sql_safe_updates breaks Galera 4
Added handling for sql_safe_updated i.e. we disable it while
we do wsrep_schema operations.
2021-04-06 15:33:13 +03:00
Daniel Black
86d60fc9e7 Merge remote-tracking branch 'origin/10.4' into 10.5 2021-02-26 13:23:13 +11:00
Jan Lindström
d1eeb4b839 MDEV-24964 : Heap-buffer-overflow on wsrep_schema.cc ::remove_fragments
Problem was that we used heap allocated key using too small
array. Fixed by using dynamic memory allocation using actual
needed size.
2021-02-24 17:21:18 +02:00
Sergei Golubchik
901bcde2dd galera.galera_gra_log crashes
reset thd->lex->query_tables_own_last,
because open_table() uses it and will try to dereference
whatever garbage it might have
2021-02-17 17:10:53 +02:00
Sergei Golubchik
ae7989ca20 galera.galera_gra_log crashes
reset thd->lex->query_tables_own_last,
because open_table() uses it and will try to dereference
whatever garbage it might have
2021-02-16 01:11:41 +01:00
Marko Mäkelä
961c7938bb Merge 10.4 into 10.5 2021-01-25 12:44:24 +02:00
sjaakola
9377e9ba0c MDEV-21153 Replica nodes crash due to indexed virtual columns and FK cascading delete
Fix for MDEV-23033 fixes a problem in replication applying of transactions, which contain cascading foreign key delete for a table, which has indexed virtual column.
This fix adds slave_fk_event_map flag for table, to mark when the prelocking is needed for applying of a transaction.
See commit 608b0ee52e for more details.
However, this fix is targeted for async replication only, Rows_log_event::do_apply_event() has condition to rule out galera replication from the fix domain, and use cases suffering from MDEV-23033 and related MDEV-21153 will fail in galera cluster.

The fix in this commit removes the condition to rule out the setting of slave_fk_event_map flag from galera replication, and makes the fix in MDEV-23033 effective for galera replication as well.

However, the above fix has caused regressions for some galera_sr suite tests, which run tests for streaming replication.
This regression can be observed e.g. by: /mtr galera_sr.galera_sr_multirow_rollback  --mysqld=--slave_run_triggers_for_rbr=yes
These galera_sr suite tests were failing in last phase of replication applying, where actual transaction is already applied, and streaming replication related meta data needs to be updated in wsrep system tables.
Opening the wsrep system tables failed for corrupt data in THD::lex:query_tables_list. The fix in this commit uses back query table list for the duration of fragment update operation.

Finally, a mtr test for virtual column support has been added. galera.galera_virtual_column.test has as first test a scenario from MDEV-21153

new fix

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-01-20 08:11:13 +02:00
Marko Mäkelä
6a1e655cb0 Merge 10.4 into 10.5 2020-12-02 18:29:49 +02:00
Marko Mäkelä
24ec8eaf66 MDEV-15532 after-merge fixes from Monty
The Galera tests were massively failing with debug assertions.
2020-12-02 16:16:29 +02:00
Marko Mäkelä
882ce206db Merge 10.4 into 10.5 2020-09-23 11:32:43 +03:00
Jan Lindström
98ac2d425e MDEV-21170 : Galera test failure on galera_sr.GCF-1043[A|B]
Add error printout when mysql.wsrep_streaming_log lock
fails. However, tests are very undeterministic and not
suitable for mtr environment. Thus, they are removed.
2020-09-22 11:29:24 +03:00
Marko Mäkelä
97a4a3872e Merge 10.4 into 10.5 2020-08-26 12:02:07 +03:00
Jan Lindström
88e70f4cae MDEV-23558: Galera heap-buffer-overflow at wsrep_schema.cc:1067
Key buffer needs to contain max field widths i.e. add MAX_FIELD_WIDTH.
2020-08-25 06:53:52 +03:00
Monty
61c15ebe32 Remove String::lex_string() and String::lex_cstring()
- Better to use 'String *' directly.
- Added String::get_value(LEX_STRING*) for the few cases where we want to
  convert a String to LEX_CSTRING.

Other things:
- Use StringBuffer for some functions to avoid mallocs
2020-07-23 10:54:32 +03:00
mkaruza
a940151eec
MDEV-21988: Assertion failure mysqld: bool trans_commit_stmt(THD*): Assertion `thd->in_active_multi_stmt_transaction() || thd->m_transaction_psi == __null' failed. (#1476)
Set temporary `SERVER_STATUS_IN_TRANS` so assert is not triggered in `trans_commit_stmt`.
2020-03-24 09:47:41 +02:00
Marko Mäkelä
28c89b7151 Merge 10.4 into 10.5 2019-12-16 07:47:17 +02:00
Daniele Sciascia
72a5a4f1d5 MDEV-20780 Fixes for failures on galera_sr_ddl_master (#1425)
Test galera_sr_ddl_master would sometimes fail due to leftover
streaming replication fragments. Rollbacker thread would attempt to
open streaming_log table to remove the fragments, but would fail in
check_stack_overrun(). Ultimately the check_stack_overrun() failure
was caused by rollbacker missing to switch the victim's THD thread
stack to rollbacker's thread stack.

Also in this patch:
- Remove duplicate functionality in rollbacker helper functions,
  and extract rollbacker fragment removal into function
  wsrep_remove_streaming_fragments()
- Reuse open_for_write() in wsrep_schema::remove_fragments
- Partially revert changes to galera_sr_ddl_master test from
  commit 44a11a7c08. Removed unnecessary
  wait condition and isolation level setting
2019-12-11 14:08:06 +02:00
Marko Mäkelä
780d2bb8a7 Merge 10.4 into 10.5 2019-09-06 14:25:20 +03:00
Teemu Ollakka
9487e0b259 MDEV-19826 10.4 seems to crash with "pool-of-threads" (#1370)
MariaDB 10.4 was crashing when thread-handling was set to
pool-of-threads and wsrep was enabled.

There were two apparent reasons for the crash:
- Connection handling in threadpool_common.cc was missing calls to
  control wsrep client state.
- Thread specific storage which contains thread variables (THR_KEY_mysys)
  was not handled appropriately by wsrep patch when pool-of-threads
  was configured.

This patch addresses the above issues in the following way:
- Wsrep client state open/close was moved in thd_prepare_connection() and
  end_connection() to have common handling for one-thread-per-connection
  and pool-of-threads.
- Thread local storage handling in wsrep patch was reworked by introducing
  set of wsrep_xxx_threadvars() calls which replace calls to
  THD store_globals()/reset_globals() and deal with thread handling
  specifics internally.

Wsrep-lib was updated to version which relaxes internal concurrency
related sanity checks.

Rollback code from wsrep_rollback_process() was extracted to separate calls
for better readability.

Post rollback thread was removed as it was completely unused.
2019-08-30 08:42:24 +03:00
Alexey Yurchenko
41fa564c88 MDEV-17048 Inconsistency voting support (#1373)
* Collect and pass apply error data to provider
 * Rollback failed transaction and continue operation if provider returns
   SUCCESS
 * MTR tests for inconsistency voting
2019-08-28 09:19:24 +03:00
Monty
97dd057702 Fixed issues when running mtr with --valgrind
- Note that some issues was also fixed in 10.2 and 10.4. I also fixed them
  here to be able to continue with making 10.5 valgrind safe again
- Disable connection threads warnings when doing shutdown
2019-08-23 22:03:54 +02:00
Jan Lindström
7b4de10477 MDEV-20378: Galera uses uninitialized memory
Problem was that wsrep thread argument was deleted on wrong
place. Furthermore, scan method incorrectly used unsafe c_ptr().
Finally, fixed wsrep thread initialization to correctly set
up thread_id and pass correct argument to functions and
fix signess problem causing compiler errors.
2019-08-20 10:32:04 +03:00
Alexey Yurchenko
819c40d694 - wsrep-lib update (SR cleanups and voting support) (#1359)
- TOI error ignoring fix (wsrep_ignore_apply_errors)
2019-07-22 16:34:12 +03:00
seppo
785092ee23 LOCK_thread_count and COND_thread_count removed from wsrep modules (#1197)
Refactored wsrep patch to not use LOCK_thread_count and COND_thread_count anymore.
This has partially been replaced by using old LOCK_wsrep_slave_threads mutex.
For slave thread count change waiting, new COND_wsrep_slave_threads signal has been added

Added LOCK_wsrep_cluster_config mutex to control that cluster address change cannot happen in parallel

Protected wsrep_slave_threads variable changes with LOCK_cluster_config mutex
This is for avoiding concurrent slave thread count and cluster joining operations to happen

Fixes according to Teemu's review
2019-02-26 13:39:05 -05:00
Daniele Sciascia
047754a728 Cleanup wsrep_schema and remove all references to wsrep_thd_pool
* Removed all references related to wsrep_thd_pool (which was removed)

* Removed unused declarations in wsrep_schema.h

* The following would result invalid reads in
  Wsrep_schema::replay_transaction():
  ```
  frag_table->field[4]->val_str(&buf);

  Wsrep_schema_impl::end_index_scan(frag_table);
  Wsrep_schema_impl::finish_stmt(thd);
  ret= wsrep_apply_events(thd, rli, buf.c_ptr_safe(), buf.length());
  ```

  because `buf` was accessed after closing the table. The fix is to
  perform storage reads using a different THD.

* In Wsrep_schema::recover_sr_transactions(), cluster_table was opened
  for write, however it is only read here. And frag_table was opened
  for read, wereas write is potentially needed.
  Also, avoid copy caused by String::c_ptr() to zero terminate the c
  string, use c_ptr_quick instead.
2019-02-14 09:55:14 +01:00
Brave Galera Crew
36a2a185fe Galera4 2019-01-23 15:30:00 +04:00