Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)
Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.
The idea of this fix is from Michael 'Monty' Widenius.
HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.
trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.
trx_t::is_started(): Replaces trx_is_started().
ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.
ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().
The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.
trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.
trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.
fts_trx_create(): Do not copy previous savepoints.
fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.
Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
Most resource limit information is excessive, particularly
limits that aren't limited.
We restructure the output by considering the Linux format
of /proc/limits which had its soft limits beginning at offset
26. "u"limited lines are skipped.
Example output:
Resource Limits (excludes unlimited resources):
Limit Soft Limit Hard Limit Units
Max stack size 8388608 unlimited bytes
Max processes 127235 127235 processes
Max open files 32198 32198 files
Max locked memory 8388608 8388608 bytes
Max pending signals 127235 127235 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
This is 8 lines less that what was before.
The FreeBSD limits file was /proc/curproc/rlimit and a different format
so a loss of non-Linux proc enabled OSes isn't expected.
Provide bug url in addition to how to report the bug.
Remove obsolete information like key_buffers and used connections
as they haven't meaningfully added value to a bug report for quite
a while. Remove information that comes from long fixed interfaces
in glibc/kernel.
Encourage the use of a full backtrace from the core with debug
symbols.
Lets be realistic about the error messages, its the users we are addressing
not developers so wording around getting the information communicated
is the key aspect.
All the user readable text and instructions are in on place, as
non-understandable is the end of the reading process for the user.
Remove the duplicate printing of the query.
Use my_progname rather than "mysqld" to reflex the program name.
So the signal handler output is now in the form:
1. User instructions
2. Server Information
3. Stacktrace
4. connection/query/optimizer_switch
5. Core information and resource limits
6. Kernel information
The segfault in wsrep_check_sequence is due to a
null pointer deference on:
db_type= thd->lex->create_info.db_type->db_type;
Where create_info.db_type is null. This occured under
a used_engine==true condition which is set in the calling
function based on create_info.used_fields==HA_CREATE_USED_ENGINE.
However the create_info.used_fields was a left over
from the parsing of the previous failed CREATE TABLE where
because of its failure, db_type wasn't populated.
This is corrected by cleaning the create_info when we start
to parse ALTER SEQUENCE statements.
Other paths to wsrep_check_sequence is via CREATE SEQUENCE
and CREATE TABLE LIKE which both initialize the create_info
correctly.
buf_dblwr_t::recover(): Correct a debug assertion failure that had
been added in commit bb47e575de (MDEV-34830).
The server may have been killed while a log write was in progress, and
therefore recv_sys.scanned_lsn may be up to RECV_PARSING_BUF_SIZE bytes
ahead of recv_sys.recovered_lsn.
Thanks to Matthias Leich for providing "rr replay" traces and
testing this.
fil_space_t::create(): Instead of invoking the default fil_space_t
constructor on a zero-filled buffer, allocate an uninitialized buffer
and invoke an explicitly defined constructor on it. Also, specify
initializer expressions for all constant data members, so that all of them
will be initialized in the constructor.
fil_space_t::being_imported: Replaces part of fil_space_t::purpose.
fil_space_t::is_being_imported(), fil_space_t::is_temporary():
Replaces fil_space_t::purpose.
fil_space_t:🆔 Changed the type from ulint to uint32_t to reduce
incompatibility with later branches that include
commit ca501ffb04 (MDEV-26195).
fil_space_t::try_to_close(): Do not attempt to close files that are
in an I/O bound phase of ALTER TABLE…IMPORT TABLESPACE.
log_file_op, first_page_init: recv_spaces_t:
Use uint32_t for the tablespace id.
Reviewed by: Debarun Banerjee
os_innodb_umask was of the incorrect type resulting in warnings
in clang-19. The correct type is mode_t.
As os_innodb_umask was set during innnodb_init from my_umask,
corrected the type there along with its companion my_umask_dir.
Because of this, the defaults mask values in innodb never
had an effect.
The resulting change allow found signed differences in
my_create{,_nosymlink}, open_nosymlinks:
mysys/my_create.c:47:20: error: operand of ?: changes signedness from ‘int’ to ‘mode_t’ {aka ‘unsigned int’} due to unsignedness of other operand [-Werror=sign-compare]
47 | CreateFlags ? CreateFlags : my_umask);
Ref: clang-19 warnings:
[55/123] Building CXX object storage/innobase/CMakeFiles/innobase.dir/os/os0file.cc.o
storage/innobase/os/os0file.cc:1075:46: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
1075 | file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
| ~~~~ ^~~~~~~~~~~~~~~
storage/innobase/os/os0file.cc:1249:46: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
1249 | file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
| ~~~~ ^~~~~~~~~~~~~~~
storage/innobase/os/os0file.cc:1381:45: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
1381 | file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
| ~~~~ ^~~~~~~~~~~~~~~
Threads can normally exit without a explicit pthread_exit call.
There seem to date to old glibc bugs, many around 2.2.5.
The semi related bug was https://bugs.mysql.com/bug.php?id=82886.
To improve safety in the signal handlers DBUG_* code was removed.
These where also needed to avoid some MSAN unresolved stack issues.
This is effectively a backport of 2719cc4925.
These tests rely on THR_KEY_mysys but it is not initialized. On
Linux, the corresponding thread variable is null, but on macOS it has a
nonzero value. In all cases, initialize the variable explicitly by
calling MY_INIT and my_end appropriately.
Problem:
=======
InnoDB wrongly stores the primary key field in externally
stored off page during bulk insert operation. This leads
to assert failure.
Solution:
========
row_merge_buf_blob(): Should store the primary key fields
inline. Store the variable length field data externally
based on the row format of the table.
row_merge_buf_write(): check whether the record size exceeds
the maximum record size.
row_merge_copy_blob_from_file(): Construct the tuple based on
the variable length field
In commit 6acada713a the
logic for treating the file system of /dev/shm
as if it were persistent memory was broken.
Let us restore the original logic, so that we will have
some more CI coverage of the memory-mapped redo log interface.
Check the user privileges and fail the command, even if there are no slaves
that need starting respectively stopping.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The slave replication should accept not supported table options (eg.
"transactional" for MyISAM), as such options can end up being set from the
master in binlogged CREATE TABLE.
This was already handled in report_unknown_option(), which skips the error
in slave threads. But in mysql_prepare_create_table_finalize() there was still
a warning given, and this warning gets converted into an error when
STRICT_(ALL|TRANS)_TABLES. So skip this warning for replication also.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The assertion occurred in the SQL thread if an event group was incompletely
written, missing the end XID or COMMIT event, and immediately followed by a
new event group. This could also lead to the incomplete event group being
committed, and with the wrong GTID.
Fix by rolling back any active transaction from a prior event group when
applying the following GTID event.
Getting an incomplete event like this is somewhat rare to happen. If the
server crashes in the middle of writing an event group, the server restart
will write a new format description event, which makes the slave roll back
the partial event group. But presumably it could happen if the master
experiences temporary write errors in the binlog, like intermittent disk
full for example.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The FLUSH TABLE WITH READ LOCK briefly set the state (in PROCESSLIST) to
"Waiting while replication worker thread pool is busy", even if there was
nothing to wait for. This is somewhat confusing on a server that might not
even have any replication configured, let alone replication workers.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
If both do_gco_wait() and do_ftwrl_wait() had to wait, the state was not restored correctly.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Disallow changing @@gtid_domain_id while a temporary table is open in
STATEMENT or MIXED binlog mode. Otherwise, a slave may try to replicate
events refering to the same temporary table in parallel, using domain-based
out-of-order parallel replication. This is not valid, temporary tables are
only available for use within a single thread at a time.
One concrete consequence seen from this bug was a ROLLBACK on an
InnoDB temporary table running in one domain in parallel with DROP
TEMPORARY TABLE in another domain, causing an assertion inside InnoDB:
InnoDB: Failing assertion: table->get_ref_count() == 0 in
dict_sys_t::remove.
Use an existing error code that's somewhat close to the real issue
(ER_INSIDE_TRANSACTION_PREVENTS_SWITCH_GTID_DOMAIN_ID_SEQ_NO), to not add a
new error code in a GA release. When this is merged to the next GA release,
we could optionally introduce a new and more precise error code for an
attempt to change the domain_id while temporary tables are open.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The Write_rows_log_event originally allocated the m_rows_buf up-front, and
thus is_valid() checks that the buffer is allocated correctly. But at some
point this was changed to allocate the buffer lazily on demand. This means
that a a valid event can now have m_rows_buf==NULL. The is_valid() code was
not changed, and thus is_valid() could return false on a valid event.
This caused a bug for REPLACE INTO t() VALUES(), () which generates a
write_rows event with no after image; then the m_rows_buf was never
allocated and is_valid() incorrectly returned false, causing an error in
some other parts of the code.
Also fix a couple of missing special cases in the code for mysqlbinlog to
correctly decode (in comments) row events with missing after image.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The partitioning error handling code was looking at
thd->lex->alter_info.partition_flags in non-alter-table cases, in which cases
the value is stale and contains whatever was set by any earlier ALTER TABLE.
This could cause the wrong error code to be generated, which then in some cases
can cause replication to break with "different errorcode" error.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Limit SHOW BINLOG/RELAYLOG EVENTS in show_rpl_debug_info.inc to 200 lines.
Reviewed-by: Daniel Black <daniel@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Fractional part < 100000 microseconds was printed without leading zeros,
causing such timestamps to be applied incorrectly in mariadb-binlog | mysql
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
RISC-V and Clang produce rdcycle for __builtin_readcyclecounter.
Since Linux kernel 6.6 this is a privileged instruction not available
to userspace programs.
The use of __builtin_readcyclecounter is excluded from RISCV falling
back to the rdtime/rdtimeh instructions provided in MDEV-33435.
Thanks Alexander Richardson for noting it should be linux only in the
code and noting FreeBSD RISC-V permits rdcycle.
Author: BINSZ on JIRA
MDEV-32329 (patch) pushdown from having into where: Server crashes at sub_select
When generating an Item_equal with a Item_ref that refers to a field
outside of a subselect, remove_item_direct_ref() causes the dependency
(depended_from) on the outer select to be lost, which causes trouble
for code downstream that can no longer determine the scope of the Item.
Not calling remove_item_direct_ref() retains the Item's dependency.
Test cases from MDEV-32395 and MDEV-32329 are included.
Some fixes from other developers:
Monty:
- Fixed wrong code in Item_equal::create_pushable_equalities()
that could cause wrong item to be used if there was no matching items.
Daniel Black:
- Added test cases from MDEV-32329
Igor Babaev:
- Provided fix for removing call to remove_item_direct_ref() in
eliminate_item_equal()
MDEV-32395: update_depend_map_for_order: SEGV at /mariadb-11.3.0/sql/sql_select.cc:16583
Include test cases from MDEV-32329.
gcc 7.5.0 does not understand __attribute__((no_sanitize("undefined"))
I moved the usage of this attribute from sql/set_var.h to
include/my_attribute.h and created a macro for it depending on
compiler used.
- mariadb-dump utility performs logical backups by producing
set of sql statements that can be executed. By enabling this
no-autocommit option, InnoDB can load the data in an efficient
way and writes the only one undo log for the whole operation.
Only first insert statement undergoes bulk insert operation,
remaining insert statement doesn't write undo log and undergoes
normal insert code path.
Parallel slave failed to retry in retry_event_group() with error
WSREP: Parallel slave worker failed at wsrep_before_command() hook
Fix wsrep transaction cleanup/restart in retry_event_group() to properly
clean up previous transaction by calling wsrep_after_statement().
Also move call to reset error after call to wsrep_after_statement()
to make sure that it remains effective.
Add a MTR test galera_as_slave_parallel_retry to reproduce the error
when the fix is not present.
Other issues which were detected when testing with sysbench:
Check if parallel slave is killed for retry before waiting for prior
commits in THD::wsrep_parallel_slave_wait_for_prior_commit(). This
is required with slave-parallel-mode=optimistic to avoid deadlock
when a slave later in commit order manages to reach prepare phase
before a lock conflict is detected.
Suppress wsrep applier specific warning for slave threads.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Fix the regular expression that determines which statements
can use the Prepared Statement API, when --ps-protocol is
used. The current regular expression allows COMMIT only if
it is followed by a whitespace.
Meaning that statement "COMMIT ;" is allowed to run with
prepared statements, while "COMMIT;" is not.
Fix the filter so that both are allowed.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Test failed sporadically when --ps-protocol was enabled:
a transaction that was BF aborted on COMMIT would succeed
instead of reporting the expected deadlock error.
The reason for the failure was that, depending on timing,
the transaction was BF aborted while the COMMIT statement
was being prepared through a COM_STMT_PREPARE command.
In the failing cases, the transaction was BF aborted
after COM_STMT_PREPARE had already disabled the diagnostics
area of the client. Attempt to override the deadlock error
towards the end of dispatch_command() would be skipped,
resulting in a successful COMMIT even if the transaction
is aborted.
This bug affected the following MTR tests:
- galera_insert_multi
- galera_nopk_unicode
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>