The TIMESTAMP related code did not handle AUTO_SEC_PART_DIGITS.
FROM_UNIXTIME() sets its member 'decimals' to AUTO_SEC_PART_DIGITS.
So some scripts involving FROM_UNIXTIME() crashed on assert in debug
builds and returned unexpected results in release builds.
to explicit row_start/row_end columns
In case of adding both system fields of same type (length, unsigned
flag) as old implicit system fields do the rename of implicit system
fields to the ones specified in ALTER, remove SYSTEM_INVISIBLE flag in
that case. Correct PERIOD clause must be specified in ALTER as well.
MDEV-34904 Inplace alter for implicit to explicit versioning is broken
Whether ALTER goes inplace and how it goes inplace depends on
handler_flags which goes from alter_info->flags by this logic:
ha_alter_info->handler_flags|= (alter_info->flags & ~flags_to_remove);
ALTER_VERS_EXPLICIT was not in flags_to_remove and its value (1ULL <<
35) clashed with ALTER_ADD_NON_UNIQUE_NON_PRIM_INDEX.
ALTER_VERS_EXPLICIT must not affect inplace, it is SQL-only so we
remove it from handler_flags.
The tests fail on assertion
ut_ad(!wsrep_is_wsrep_xid(&trx->xid));
in `innobase_recover_rollback_by_xid()`.
The fix is to avoid async rollback for prepared transactions
when wsrep is ON or wsrep recovery is in progress. The rationale
is that the rollback of prepared transactions must complete
before the node starts applying write sets after SST, or in
case of wsrep recovery, the recovery must complete before the
process exists.
Change the assertion into stronger one
ut_ad(!(WSREP_ON || wsrep_recovery));
to catch if the async rollback codepath is taken when wsrep is
enabled.
Single-table UPDATE/DELETE didn't provide outer_lookup_keys value for
subqueries. This didn't allow to make a meaningful choice between
IN->EXISTS and Materialization strategies for subqueries.
Fix this:
* Make UPDATE/DELETE save Sql_cmd_dml::scanned_rows,
* Then, subquery's JOIN::choose_subquery_plan() can fetch it from
there for outer_lookup_keys
Details:
UPDATE/DELETE now calls select_lex->optimize_unflattened_subqueries()
twice, like SELECT does (first call optimize_constant_subquries() in
JOIN::optimize_inner(), then call optimize_unflattened_subqueries() in
JOIN::optimize_stage2()):
1. Call with const_only=true before any optimizations. This allows
range optimizer and others to use the values of cheap const
subqueries.
2. Call it with const_only=false after range optimizer, partition
pruning, etc. outer_lookup_keys value is provided, so it's possible to
pick a good subquery strategy.
Note: PROTECT_STATEMENT_MEMROOT requires that first SP execution
performs subquery optimization for all subqueries, even for degenerate
query plans like "Impossible WHERE". Due to that, we ensure that the
call to optimize_unflattened_subqueries (with const_only=false) even
for degenerate query plans still happens, as was the case before this
change.
During a query execution some sorting and grouping operations
on strings may be involved. System variable max_sort_length defines
the maximum number of bytes to use when comparing strings during
sorting/grouping. Thus, the comparable parts of strings may be less
than their actual size, so the results of the query may be not
sorted/grouped properly.
To indicate that some comparisons were done on a truncated lengths,
a new warning has been introduced with this commit.
Step#1: fixing the return type of strnxfrm() from size_t to this structure:
typedef struct
{
size_t m_output_length;
size_t m_source_length_used;
uint m_warnings;
} my_strnxfrm_ret_t;
Adding support for the ROW data type in the stored function RETURNS clause:
- explicit ROW(..members...) for both sql_mode=DEFAULT and sql_mode=ORACLE
CREATE FUNCTION f1() RETURNS ROW(a INT, b VARCHAR(32)) ...
- anchored "ROW TYPE OF [db1.]table1" declarations for sql_mode=DEFAULT
CREATE FUNCTION f1() RETURNS ROW TYPE OF test.t1 ...
- anchored "[db1.]table1%ROWTYPE" declarations for sql_mode=ORACLE
CREATE FUNCTION f1() RETURN test.t1%ROWTYPE ...
Adding support for anchored scalar data types in RETURNS clause:
- "TYPE OF [db1.]table1.column1" for sql_mode=DEFAULT
CREATE FUNCTION f1() RETURNS TYPE OF test.t1.column1;
- "[db1.]table1.column1" for sql_mode=ORACLE
CREATE FUNCTION f1() RETURN test.t1.column1%TYPE;
Details:
- Adding a new sql_mode_t parameter to
sp_head::create()
sp_head::sp_head()
sp_package::create()
sp_package::sp_package()
to guarantee early initialization of sp_head::m_sql_mode.
Before this change, this member was not initialized at all during
CREATE FUNCTION/PROCEDURE/PACKAGE statements, and was not used.
Now it needs to be initialized to write properly the
mysql.proc.returns column, according to the create time sql_mode.
- Code refactoring to make the things simpler and functions smaller:
* Adding a new method
Field_row::row_create_fields(THD *thd, List<Spvar_definition> *list)
to make a Virtual_tmp_table with Fields for ROW members
from an explicit definition.
* Adding a new method
Field_row::row_create_fields(THD *thd, const Spvar_definition &def)
to make a Virtual_tmp_table with Fields for ROW members
from an explicit or a table anchored definition.
* Adding a new method
Item_args::add_array_of_item_field(THD *thd, const Virtual_tmp_table &vtable)
to create and array of Item_field corresponding to all Field instances
in a Virtual_tmp_table
* Removing Item_field_row::row_create_items(). It was decomposed
into the new methods described above.
* Moving the code from the loop body in sp_rcontext::init_var_items()
into a separate method Spvar_definition::make_item_field_row(),
to make the code clearer (smaller functions).
make_item_field_row() itself uses the new methods described above.
- Changing the data type of sp_head::m_return_field_def
from Column_definition to Spvar_definition.
So now it supports not only SQL column field types,
but also explicit ROW and anchored ROW data types,
as well as anchored column types.
- Adding a new Column_definition parameter to sp_head::create_result_field().
Before this patch, create_result_field() took the definition only
from m_return_field_def. Now it's also called with a local Column_definition
variable which contains the explicit definition resolved from an
anchored defition.
- Modifying sql_yacc.yy to support the new grammar.
Adding new helper methods:
* sf_return_fill_definition_row()
* sf_return_fill_definition_rowtype_of()
* sf_return_fill_definition_type_of()
- Fixing tests in:
* Virtual_tmp_table::setup_field_pointers() in sql_select.cc
* Send_field::normalize() in field.h
* store_column_type()
to prevent calling Type_handler_row::field_type(),
which is implemented a DBUG_ASSERT(0).
Before this patch the affected methods and functions were called only
for scalar data types. Now ROW is also possible.
- Adding a new virtual method Field::cols()
- Overriding methods:
Item_func_sp::cols()
Item_func_sp::element_index()
Item_func_sp::check_cols()
Item_func_sp::bring_value()
to support the ROW data type.
- Extending the rule sp_return_type to support
* explicit ROW and anchored ROW data types
* anchored scalar data types
- Overriding Field_row::sql_type() to print
the data type of an explicit ROW.
Changing the return type of the following functions:
- CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), NOW()
- SYSDATE()
- FROM_UNIXTIME()
from DATETIME to TIMESTAMP.
Note, the old function NOW() returning DATETIME is still available
as LOCALTIMESTAMP or LOCALTIMESTAMP(), e.g.:
SELECT
LOCALTIMESTAMP, -- DATETIME
CURRENT_TIMESTAMP; -- TIMESTAMP
The change in the functions return data type fixes some problems
that occurred near a DST change:
- Problem #1
INSERT INTO t1 (timestamp_field) VALUES (CURRENT_TIMESTAMP);
INSERT INTO t1 (timestamp_field) VALUES (COALESCE(CURRENT_TIMESTAMP));
could result into two different values inserted.
- Problem #2
INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526));
INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526+3600));
could result into two equal TIMESTAMP values near a DST change.
Additional changes:
- FROM_UNIXTIME(0) now returns SQL NULL instead of '1970-01-01 00:00:00'
(assuming time_zone='+00:00')
- UNIX_TIMESTAMP('1970-01-01 00:00:00') now returns SQL NULL instead of 0
(assuming time_zone='+00:00'
These additional changes are needed for consistency with TIMESTAMP fields,
which cannot store '1970-01-01 00:00:00 +00:00'
1. Binlog commit by rotate (MDEV-32014) should not be
used with Galera, yet while WSREP binlog emulation
is active, the code path could lead into
binlog_cache_data::write_prepare() in an invalid
state, leading to errors in MTR. To fix, an extra
check is added to ensure the binlog is actually
active before calling write_prepare().
2. If the #binlog_cache_files directory exists on a
mariadbd run without opt_log_bin, the directory
was treated as a table/database, leading to errors.
To fix, on startup, if opt_log_bin is disabled and
#binlog_cache_files exists (in the default log
directory), the directory is deleted (and an
informational message is provided in the error
log)
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
for large transaction
Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.
The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.
Design
======
* binlog_large_commit_threshold
type: ulonglong
scope: global
dynamic: yes
default: 128MB
Only the binlog cache temporary files large than 128MB are
renamed to binlog file.
* #binlog_cache_files directory
To support rename, all binlog cache temporary files are managed
as normal files now. `#binlog_cache_files` directory is in the same
directory with binlog files. It is created at server startup if it doesn't
exist. Otherwise, all files in the directory is deleted at startup.
The temporary files are named with ML_ prefix and the memorary address
of the binlog_cache_data object which guarantees it is unique.
* Reserve space
To supprot rename feature, It must reserve enough space at the
begin of the binlog cache file. The space is required for
Format description, Gtid list, checkpoint and Gtid events when
renaming it to a binlog file.
Since binlog_cache_data's cache_log is directly accessed by binlog log,
online alter and wsrep. It is not easy to update all the code. Thus
binlog cache will not reserve space if it is not session binlog cache or
wsrep session is enabled.
- m_file_reserved_bytes
Stores the bytes reserved at the begin of the cache file.
It is initialized in write_prepare() and cleared by reset().
The reserved file header is hide to callers. Thus there is no
change for callers. E.g.
- get_byte_position() still get the length of binlog data
written to the cache, but not the file length.
- truncate(0) will truncate the file to m_file_reserved_bytes but not 0.
- write_prepare()
write_prepare() is called everytime when anything is being written
into the cache. It will call init_file_reserved_bytes() to create
the cache file (if it doesn't exist) and reserve suitable space if
the data written exceeds buffer's size.
* Binlog_commit_by_rotate
It is used to encapsulate the code for remaing a binlog cache
tempoary file to binlog file.
- should_commit_by_rotate()
it is called by write_transaction_to_binlog_events() to check if
a binlog cache should be rename to a binlog file.
- commit()
That is the entry to rename a binlog cache and commit the
transaction. Both rename and commit are protected by LOCK_log,
Thus not other transactions can write anything into the renamed
binlog before it.
Rename happens in a rotation. After the new binlog file is generated,
replace_binlog_file() is called to:
- copy data from the new binlog file to its binlog cache file.
- write gtid event.
- rename the binlog cache file to binlog file.
After that the rotation will continue to succeed. Then the transaction
is committed in a seperated group itself. Its cache file will be
detached and cache log will be reset before calling
trx_group_commit_with_engines(). Thus only Xid event be written.
One change is that if the port is not supplied or out of bound, the
old behaviour is to print 3306. The new behaviour is to not print
it (if not supplied) or the out of bound value.
The existing syntax for CREATE SERVER
CREATE [OR REPLACE] SERVER [IF NOT EXISTS] server_name
FOREIGN DATA WRAPPER wrapper_name
OPTIONS (option [, option] ...)
option:
{ HOST character-literal
| DATABASE character-literal
| USER character-literal
| PASSWORD character-literal
| SOCKET character-literal
| OWNER character-literal
| PORT numeric-literal }
With this change we have:
option:
{ HOST character-literal
| DATABASE character-literal
| USER character-literal
| PASSWORD character-literal
| SOCKET character-literal
| OWNER character-literal
| PORT numeric-literal
| PORT quoted-numerical-literal
| identifier character-literal}
We store these options as a JSON field in the mysql.servers system
table. We retain the restriction that PORT needs to be a number, but
also allow it to be a quoted number, so that SHOW CREATE SERVER can be
used for dumping. Without an accompanied implementation of SHOW CREATE
SERVER, some mysqldump tests will fail. Therefore this commit should
be immediately followed by the one implementating SHOW CREATE SERVER,
with testing covering both.
In specifying a derived table with a union, for example
CREATE TABLE t (c1 INT KEY,c2 INT,c3 INT) ENGINE=MyISAM;
SELECT * FROM (SELECT * FROM t UNION SELECT * FROM t) AS d (d1,d2);
we bypass an earlier check for the correct number of specified column
names, causing a crash.
Fixed by adding a check for the correct number of supplied arguments
in st_select_lex_unit::rename_types_list()
Fix for MDEV-31466 - add optional derived table column names.
Column names within a SELECT_LEX structure can be left in a non-reparsable
state (as printed out from *::print) after JOIN::prepare. This caused
an incorrect view definition to be written into the .FRM file.
Fixed by resetting item list names in SELECT_LEX structures representing
derived tables before writing out the view definition.
Reviewed by Igor Babaev (igor@mariadb.com)
Extend derived table syntax to support column name assignment.
(subquery expression) [as|=] ident [comma separated column name list].
Prior to this patch, the optional comma separated column name list is
not supported.
Processing within the unit of the subquery expression will use
original column names, outside the unit will use the new names.
For example, in the query
select a1, a2 from
(select c1, c2, c3 from t1 where c2 > 0) as dt (a1, a2, a3)
where a2 > 10;
we see the second column of the derived table dt being used both within,
(where c2 > 0), and outside, (where a2 > 10), the specification.
Both conditions apply to t1.c2.
When multiple unit preparations are required, such as when being used within
a prepared statement or procedure, original column names are needed for
correct resolution. Original names are reset within mysql_derived_reinit().
Item_holder items, used for result tables in both TVC and union preparations
are renamed before use within st_select_lex_unit::prepare().
During wildcard expansion, if column names are present, items names are
set directly after creation.
Reviewed by Igor Babaev (igor@mariadb.com)
Update `SESSION_USER()` behaviour to be comparable with `CURRENT_USER()`.
`SESSION_USER()` will return the user and host columns from `mysql.user`
used to authenticate the user when the session was created.
Historically `SESSION_USER()` was an alias of `USER()` function. The
main difference with `USER()` behaviour after this changes is that
`SESSION_USER()` now returns the host column from `mysql.user` instead of
the client host or ip.
NOTE: `SESSION_USER_IS_USER` old mode is added to make the change
backward compatible.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer
Amazon Web Services, Inc.
During sql_mode=ORACLE, ignore the NOCOPY keyword in stored routine
parameters. The optimization (pass-by-reference instead of
pass-by-value) helping to avoid value copying will be done in a separate
task when needed.
When calculate_cond_selectivity_for_table() takes into account multi-
column selectivities from range access, it tries to take-into account
that selectivity for some columns may have been already taken into account.
For example, for range access on IDX1 using {kp1, kp2}, the selectivity
of restrictions on "kp2" might have already been taken into account
to some extent.
So, the code tries to "discount" that using rec_per_key[] estimates.
This seems to be wrong and unreliable: the "discounting" may produce a
rselectivity_multiplier number that hints that the overall selectivity
of range access on IDX1 was greater than 1.
Do a conservative fix: if we arrive at conclusion that selectivity of
range access on condition in IDX1 >1.0, clip it down to 1.
Analysis:
The value gets appended as string instead of unescaped json value
Fix:
Append the value of json in a temporary string and then store it in the
field instead of directly storing as string.
Don't allow the referencing key column from NULL TO NOT NULL
when
1) Foreign key constraint type is ON UPDATE SET NULL
2) Foreign key constraint type is ON DELETE SET NULL
3) Foreign key constraint type is UPDATE CASCADE and referenced
column declared as NULL
Don't allow the referenced key column from NOT NULL to NULL
when foreign key constraint type is UPDATE CASCADE
and referencing key columns doesn't allow NULL values
get_foreign_key_info(): InnoDB sends the information about
nullability of the foreign key fields and referenced key fields.
fk_check_column_changes(): Enforce the above rules for COPY
algorithm
innobase_check_foreign_drop_col(): Checks whether the dropped
column exists in existing foreign key relation
innobase_check_foreign_low() : Enforce the above rules for
INPLACE algorithm
dict_foreign_t::check_fk_constraint_valid(): This is used
by CREATE TABLE statement to check nullability for foreign
key relation.
The commit cd5808eb introduced a union as a storage for the format
argument passed to the internal API fmt::detail::make_arg. This was done
to solve the issue that the internal API no longer accepted temporary
variables.
However, it's generally better to avoid using internal APIs, as they are
more likely to have breaking changes in the future. Instead, we can use
the public API fmt::dynamic_format_arg_store to dynamically build the
argument list. This API accepts temporary variables, and its behavior is
more stable than the internal API. `libfmt.cmake` is updated to reflect
the change as well.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
The method was declared to return an unsigned integer, but it is
really a boolean (and used as such by all callers).
A secondary change is the addition of "const" and "noexcept" to this
method.
In ha_mroonga.cpp, I also added "inline" to the two helper methods of
referenced_by_foreign_key(). This allows the compiler to flatten the
method.
We have found that my_errno can be "passed" to the next commad in some cases.
It is practically impossible to check/fix all cases of my_errno in the server,
plugins and engines so we will reset it as we reset other errors.
The test case will be fixed by CSV engine fix so will be added with it
(see part2).
Added new test scenario in galera.galera_bf_kill
test to make the issue surface. The tetst scenario has
a multi statement transaction containing a KILL command.
When the KILL is submitted, another transaction is
replicated, which causes BF abort for the KILL command
processing. Handling BF abort rollback while executing
KILL command causes node hanging, in this scenario.
sql_kill() and sql_kill_user() functions have now fix,
to perform implicit commit before starting the KILL command
execution. BEcause of the implicit commit, the KILL execution
will not happen inside transaction context anymore.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
RESET MASTER waits for storage engines to reply to a binlog checkpoint
requests. If this response is delayed for a long time for some reason, then
RESET MASTER can hang.
Fix this by forcing a log sync in all engines just before waiting for the
checkpoint reply.
(Waiting for old checkpoint responses is needed to preserve durability of
any commits that were synced to disk in the to-be-deleted binlog but not yet
synced in the engine.)
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
(Polished initial patch by Alexey Botchkov)
Make the code handle DEFAULT values of any datatype
- Make Json_table_column::On_response::m_default be Item*, not LEX_STRING.
- Change the parser to use string literal non-terminals for producing
the DEFAULT value
-- Also, stop updating json_table->m_text_literal_cs for the DEFAULT
value literals as it is not used.
It's read for every command execution, and during slave replication
for every applied event.
It's also planned to be used during write set applying, so it means
mostly every server thread is going to compete for the mutex covering
this variable, especially considering how rarely it changes.
Converting wsrep_ready to atomic relaxes the things.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
The crash report terminates prematurely when Galera library was
not loaded.
As a fix, check whether the provider is loaded before shutting down
Galera connections.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Move memory allocations performed during Sys_var_gtid_binlog_state::do_check
to Sys_var_gtid_binlog_state::global_update where they will be freed before
the latter method returns.
A call to
dbug_print_join_prefix(join_positions, idx, s)
returns a const char* ponter to string with current join prefix,
including the table being added to it.
Each time a listener socket becomes ready, MariaDB calls accept() ten
times (MAX_ACCEPT_RETRY), even if all but the first one return EAGAIN
because there are no more connections. This causes unnecessary CPU
usage - on our server, the CPU load of that thread, which does nothing
but accept(), saturates one CPU core by ~45%. The loop should stop
after the first EAGAIN.
Perf report:
11.01% mariadbd libc.so.6 [.] accept4
6.42% mariadbd [kernel.kallsyms] [k] finish_task_switch.isra.0
5.50% mariadbd [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
5.50% mariadbd [kernel.kallsyms] [k] syscall_enter_from_user_mode
4.59% mariadbd [kernel.kallsyms] [k] __fget_light
3.67% mariadbd [kernel.kallsyms] [k] kmem_cache_alloc
2.75% mariadbd [kernel.kallsyms] [k] fput
2.75% mariadbd [kernel.kallsyms] [k] mod_objcg_state
1.83% mariadbd [kernel.kallsyms] [k] __inode_wait_for_writeback
1.83% mariadbd [kernel.kallsyms] [k] __sys_accept4
1.83% mariadbd [kernel.kallsyms] [k] _raw_spin_unlock_irq
1.83% mariadbd [kernel.kallsyms] [k] alloc_inode
1.83% mariadbd [kernel.kallsyms] [k] call_rcu
Applied SR transaction on the child table was not BF aborted by TOI running
on the parent table for several reasons:
Although SR correctly collected FK-referenced keys to parent, TOI in Galera
disregards common certification index and simply sets itself to depend on
the latest certified write set seqno.
Since this write set was the fragment of SR transaction, TOI was allowed to
run in parallel with SR presuming it would BF abort the latter.
At the same time, DML transactions in the server don't grab MDL locks on
FK-referenced tables, thus parent table wasn't protected by an MDL lock from
SR and it couldn't provoke MDL lock conflict for TOI to BF abort SR transaction.
In InnoDB, DDL transactions grab shared MDL locks on child tables, which is not
enough to trigger MDL conflict in Galera.
InnoDB-level Wsrep patch didn't contain correct conflict resolution logic due to
the fact that it was believed MDL locking should always produce conflicts correctly.
The fix brings conflict resolution rules similar to MDL-level checks to InnoDB,
thus accounting for the problematic case.
Apart from that, wsrep_thd_is_SR() is patched to return true only for executing
SR transactions. It should be safe as any other SR state is either the same as
for any single write set (thus making the two logically equivalent), or it reflects
an SR transaction as being aborting or prepared, which is handled separately in
BF-aborting logic, and for regular execution path it should not matter at all.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
The code erroneously called sec_since_epoch() for dates with zeros,
e.g. '2024-00-01'.
Fixi: adding a test that the date does not have zeros before
calling TIME_to_native().
The failing test case validates Seconds_Behind_Master for a delayed
slave, while STOP SLAVE is executed during a delay. The test fixes
initially added to the test (commit b04c857596) added a table lock
to ensure a transaction could not finish before validating the
Seconds_Behind_Master field after SLAVE START, but did not address a
possibility that the transaction could finish before running the
STOP SLAVE command, which invalidates the validations for the rest
of the test case. Specifically, this would result in 1) a timeout in
“Waiting for table metadata lock” on the replica, which expects the
transaction to retry after slave restart and hit a lock conflict on
the locked tables (added in b04c857596), and 2) that
Seconds_Behind_Master should have increased, but did not.
The failure can be reproduced by synchronizing the slave to the master
before the MDEV-32265 echo statement (i.e. before the SLAVE STOP).
This patch fixes the test by adding a mechanism to use DEBUG_SYNC to
synchronize a MASTER_DELAY, rather than continually increase the
duration of the delay each time the test fails on buildbot. This is
to ensure that on slow machines, a delay does not pass before the
test gets a chance to validate results. Additionally, it decreases
overall test time because the test can continue immediately after
validation, thereby bypassing the remainder of a full delay for each
transaction.
A CHAR column cannot be longer than 1024, because
Binlog_type_info_fixed_string::Binlog_type_info_fixed_string
replies on this fact - it cannot store binlog metadata for longer columns.
In case of the filename character set mbmaxlen is equal to 5,
so only 1024/5=204 characters can fit into the 1024 limit.
- In strict mode:
Disallowing creation of a CHAR column with octet length grater than 1024.
- In non-strict mode:
Automatically convert CHAR with octet length>1024 into VARCHAR.
my_b_encr_write(): Initialize also block_length, and at the same time
last_block_length, so that all 128 bits can be initialized with fewer
writes. This fixes an error that was caught in the test
encryption.tempfiles_encrypted.
test_my_safe_print_str(): Skip a test that would attempt to
display uninitialized data in the test unit.stacktrace.
Previously, our CI did not build unit tests with MemorySanitizer.
handle_delayed_insert(): Remove a redundant call to pthread_exit(0),
which would for some reason cause MemorySanitizer in clang-19 to
report a stack overflow in a RelWithDebInfo build. This fixes a
failure of several tests.
Reviewed by: Vladislav Vaintroub
The 10.5->10.6 merge commit 3bc98a4ec4 casts the arg to an int16
pointer in set_extraction_flag_processor(). This matched the previous
commit c76eabfb5e where set_extraction_flag was changed to have int16 arg
instead of int.
The commit a5e4c34991 for MDEV-29363 added a call to
set_extraction_flag_processor on IMMUTABLE_FL (MARKER_IMMUTABLE in 10.6).
The subsequent 10.5->10.6 merge f071b7620b did not cast the flag
to int16 when merging this change.
The result is big-endian processors cleared the immutable
flag rather than set the flag, resulting in MDEV-29363
being unfixed on big-endian processors.
(Variant 4, with @@optimizer_adjust_secondary_key_costs, reuse in two
places, and conditions are replaced with equivalent simpler forms in two more)
In best_access_path(), ReuseRangeEstimateForRef-3, the check
for whether
"all used key_part_i used key_part_i=const"
was incorrect: it may produced a "NO" answer for cases when we
had:
key_part1= const // some key parts are usable
key_part2= value_not_in_join_prefix //present but unusable
key_part3= non_const_value // unusable due to gap in key parts.
This caused the optimizer to fail to apply ReuseRangeEstimateForRef
heuristics. The consequence is poor query plan choice when the index
in question has very skewed data distribution.
The fix is enabled if its @@optimizer_adjust_secondary_key_costs flag
is set.
The memory leak happened on second execution of a prepared statement
that runs UPDATE statement with correlated subquery in right hand side of
the SET clause. In this case, invocation of the method
table->stat_records()
could return the zero value that results in going into the 'if' branch
that handles impossible where condition. The issue is that this condition
branch missed saving of leaf tables that has to be performed as first
condition optimization activity. Later the PS statement memory root
is marked as read only on finishing first time execution of the prepared
statement. Next time the same statement is executed it hits the assertion
on attempt to allocate a memory on the PS memory root marked as read only.
This memory allocation takes place by the sequence of the following
invocations:
Prepared_statement::execute
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_update::execute_inner
Sql_cmd_update::update_single_table
st_select_lex::save_leaf_tables
List<TABLE_LIST>::push_back
To fix the issue, add the flag SELECT_LEX::leaf_tables_saved to control
whether the method SELECT_LEX::save_leaf_tables() has to be called or
it has been already invoked and no more invocation required.
Similar issue could take place on running the DELETE statement with
the LIMIT clause in PS/SP mode. The reason of memory leak is the same as for
UPDATE case and be fixed in the same way.
crash recovery
Summary
=======
When doing server recovery, the active transactions will be rolled
back by InnoDB background rollback thread automatically. The
prepared transactions will be committed or rolled back accordingly
by binlog recovery. Binlog recovery is done in main thread before
the server can provide service to users. If there is a big
transaction to rollback, the server will not available for a long
time.
This patch provides a way to rollback the prepared transactions
asynchronously. Thus the rollback will not block server startup.
Design
======
- Handler::recover_rollback_by_xid()
This patch provides a new handler interface to rollback transactions
in recover phase. InnoDB just set the transaction's state to active.
Then the transaction will be rolled back by the background rollback
thread.
- Handler::signal_tc_log_recover_done()
This function is called after tc log is opened(typically binlog opened)
has done. When this function is called, all transactions will be rolled
back have been reverted to ACTIVE state. Thus it starts rollback thread
to rollback the transactions.
- Background rollback thread
With this patch, background rollback thread is defered to run until binlog
recovery is finished. It is started by innobase_tc_log_recovery_done().
If a slave replicating an event has waited for more than
@@slave_abort_blocking_timeout for a conflicting metadata lock held by a
non-replication thread, the blocking query is killed to allow replication to
proceed and not be blocked indefinitely by a user query.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
MySQL-Connector-Net casts SEQ_IN_INDEX to uint and will
raise an exception if the type is a System.Int64.
As we don't support a huge number of multi-columns in
an index reducing to a uint is sufficient to represent
all values and maintain compatibility with MySQL-Connector-Net.
This matches the type (uint) returned by MySQL-8.3 and 8.0.
Reviewer: Alexander Barkov <bar@mariadb.com>
It's possible that MDL conflict handling code is called more
than once for a transaction when:
- it holds more than one conflicting MDL lock
- reschedule_waiters() is executed,
which results in repeated attempts to BF-abort already aborted
transaction.
In such situations, it might be that BF-aborting logic sees
a partially rolled back transaction and erroneously decides
on future actions for such a transaction.
The specific situation tested and fixed is when a SR transaction
applied in the node gets BF-aborted by a started TOI operation.
It's then caught with the server transaction already rolled back,
but with no MDL locks yet released. This caused wrong state
detection for such a transaction during repeated MDL conflict
handling code execution.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
(Variant 2b: call greedy_search() twice, correct handling for limited
search_depth)
Modify the join optimizer to specifically try to produce join orders that
can short-cut their execution for ORDER BY..LIMIT clause.
The optimization is controlled by @@optimizer_join_limit_pref_ratio.
Default value 0 means don't construct short-cutting join orders.
Other value means construct short-cutting join order, and prefer it only
if it promises speedup of more than #value times.
In Optimizer Trace, look for these names:
* join_limit_shortcut_is_applicable
* join_limit_shortcut_plan_search
* join_limit_shortcut_choice
Replication of MyISAM and Aria DML is experimental and best
effort only. Earlier change make INSERT SELECT on both
MyISAM and Aria to replicate using TOI and STATEMENT
replication. Replication should happen only if user
has set needed wsrep_mode setting.
Note: This commit contains additional changes compared
to those already made for the 10.5 branch.
+ small refactoring after main fix.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
It's possible that MDL conflict handling code is called more
than once for a transaction when:
- it holds more than one conflicting MDL lock
- reschedule_waiters() is executed,
which results in repeated attempts to BF-abort already aborted
transaction.
In such situations, it might be that BF-aborting logic sees
a partially rolled back transaction and erroneously decides
on future actions for such a transaction.
The specific situation tested and fixed is when a SR transaction
applied in the node gets BF-aborted by a started TOI operation.
It's then caught with the server transaction already rolled back,
but with no MDL locks yet released. This caused wrong state
detection for such a transaction during repeated MDL conflict
handling code execution.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
For TOI events specifically we have a situation where in case of the
same error different nodes may generate different messages. This may
be for two reasons:
- different locale setting between the current client session and
server default (we can reasonably require server locales to be
identical on all nodes, but user can change message locale for the
session)
- non-deterministic course of STATEMENT execution e.g. for ALTER TABLE
On the other hand we may reasonably expect TOI event failures since
they are executed after replication, so we must ensure that voting is
consistent. For that purpose error codes should be sufficiently unique
and deterministic for TOI event failures as DDLs normally deal with
a single object, so we can merely use MySQL error codes to vote on.
Notice that this problem does not happen with regular transactional
writesets, since the originator node will always vote success and
replica nodes are assumed to have the same global locale setting.
As such different error messages indicate different errors even if
the error code is the same (e.g. ER_DUP_KEY can happen on different
rows tables).
Use only MySQL error code (without the error message) for error voting
in case of TOI event failure.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
When handling fatal signal, shut down Galera networking
before printing out stack trace and writing core file.
This is to achieve fail-silent semantics on crashes which may
keep the process running for a long time, but not fully responding
e.g. due to core dumping or symbol resolving.
Also suppress all Galera/wsrep logging to avoid logging from
background threads to garble crash information from signal handler.
Notice that for fully fail-silent crash, Galera 26.4.19 is needed.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that wsrep_schema tables were not marked as
category information. Fix allows access to wsrep_schema
tables even when node is detached.
This is 10.4-10.9 version of fix.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Problem was that we did not found that table was partitioned
and then we should find what is actual underlaying storage
engine.
We should not use RSU for !InnoDB tables.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
The option binlog_optimize_thread_scheduling was initially added
to provide a safe alternative for the newly added binlog group
commit logic, such that when 0, it would disable a leader thread
from performing the binlog write for all transactions that are a
part of the group commit. Any problems related to the binlog group
commit optimization should be sorted out by now, so we can
deprecate-to-eventually-remove the option altogether.
This commit performs the deprecation, and the removal is tracked
by MDEV-33745. Note, as the option is only able to be provided
via configuration at startup time, users will not see a
deprecation message unless looking through the CLI help
message.
Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Sergei Golubchik <serg@mariadb.org>
Changing the alias LOCALTIME->CURRENT_TIMESTAMP to LOCALTIME->CURRENT_TIME.
This changes the return type of LOCALTIME from DATETIME to TIME,
according to the SQL Standard.
In commit fa8a46eb68 (MDEV-33613)
the parameter innodb_lru_flush_size ceased to have any effect.
Let us declare the parameter as deprecated and additionally as
MARIADB_REMOVED_OPTION, so that there will be a warning written
to the error log in case the option is specified in the command line.
Let us also do the same for the parameter
innodb_purge_rseg_truncate_frequency
that was deprecated&ignored earlier in MDEV-32050.
Reviewed by: Debarun Banerjee
Before doing mark_start_commit(), check that there is no pending deadlock
kill. If there is a pending kill, we won't commit (we will abort, roll back,
and retry). Then we should not mark the commit as started, since that could
potentially make the following GCO start too early, before we completed the
commit after the retry.
This condition could trigger in some corner cases, where InnoDB would take
temporarily table/row locks that are released again immediately, not held
until the transaction commits. This happens with dict_stats updates and
possibly auto-increment locks.
Such locks can be passed to thd_rpl_deadlock_check() and cause a deadlock
kill to be scheduled in the background. But since the blocking locks are
held only temporarily, they can be released before the background kill
happens. This way, the kill can be delayed until after mark_start_commit()
has been called. Thus we need to check the synchronous indication
rgi->killed_for_retry, not just the asynchroneous thd->killed.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Discovered this while working on MDEV-34720: test_if_cheaper_ordering()
uses rec_per_key, while the original estimate for the access method
is produced in best_access_path() by using actual_rec_per_key().
Make test_if_cheaper_ordering() also use actual_rec_per_key().
Also make several getter function "const" to make this compile.
Also adjusted the testcase to handle this (the change backported from
11.0)
MDEV-33582 (3541bd63f0) changed the "Could not write packet"
error message in net_serv.cc to use the function
sql_print_warning(), instead of my_printf_error(). The flags
argument was not removed in this change though, so the old
flags were printed in place of the file descriptor, and all
other args are presenting for the wrong field (and length is
never showed).
This patch removes flags as a parameter to sql_print_warning().
Running an UPDATE statement in PS mode and having positional
parameter(s) bound with an array of actual values (that is
prepared to be run in bulk mode) results in incorrect behaviour
in presence of on update trigger that also executes an UPDATE
statement. The same is true for handling a DELETE statement in
presence of on delete trigger. Typically, the visible effect of
such incorrect behaviour is expressed in a wrong number of
updated/deleted rows of a target table. Additionally, in case UPDATE
statement, a number of modified rows and a state message returned
by a statement contains wrong information about a number of modified rows.
The reason for incorrect number of updated/deleted rows is that
a data structure used for binding positional argument with its
actual values is stored in THD (this is thd->bulk_param) and reused
on processing every INSERT/UPDATE/DELETE statement. It leads to
consuming actual values bound with top-level UPDATE/DELETE statement
by other DML statements used by triggers' body.
To fix the issue, reset the thd->bulk_param temporary to the value
nullptr before invoking triggers and restore its value on finishing
its execution.
The second part of the problem relating with wrong value of affected
rows reported by Connector/C API is caused by the fact that diagnostics
area is reused by an original DML statement and a statement invoked
by a trigger. This fact should be take into account on finalizing a
state of diagnostics area on completion running of a statement.
Important remark: in case the macros DBUG_OFF is on, call of the method
Diagnostics_area::reset_diagnostics_area()
results in reset of the data members
m_affected_rows, m_statement_warn_count.
Values of these data members of the class Diagnostics_area are used on
sending OK and EOF messages. In case DML statement is executed in PS bulk
mode such resetting results in sending wrong result values to a client
for affected rows in case the DML statement fires a triggers. So, reset
these data members only in case the current statement being processed
is not run in bulk mode.
int wsrep_thd_append_key(THD*, const wsrep_key*, int, Wsrep_service_key_type)
CREATE TABLE [SELECT|REPLACE SELECT] is CTAS and idea was that
we force ROW format. However, it was not correctly enforced
and keys were appended before wsrep transaction was started.
At THD::decide_logging_format we should force used stmt binlog
format to ROW in CTAS case and produce a warning if used
binlog format was not ROW.
At ha_innodb::update_row we should not append keys similarly
as in ha_innodb::write_row if sql_command is SQLCOM_CREATE_TABLE.
Improved error logging on ::write_row, ::update_row and ::delete_row
if wsrep key append fails.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
A mixture of a multi-byte *TEXT column and a short binary column
produced a too large column.
For example, COALESCE(tinytext_utf8mb4, short_varbinary)
produced a BLOB column instead of an expected TINYBLOB.
- Adding a virtual method Type_all_attributes::character_octet_length(),
returning max_length by default.
- Overriding Item_field::character_octet_length() to extract
the octet length from the underlying Field.
- Overriding Item_ref::character_octet_length() to extract
the octet length from the references Item (e.g. as VIEW fields).
- Fixing Type_numeric_attributes::find_max_octet_length() to
take the octet length using the new method character_octet_length()
instead of accessing max_length directly.
fprintf() on Windows, when used on unbuffered FILE*, writes bytewise.
This can make crash handler messages harder to read, if they are mixed up
with other error log output.
Fixed , on Windows, by using a small buffer for formatting, and fwrite
instead of fprintf, if buffer is large enough for message.
Replication of MyISAM and Aria DML is experimental and best
effort only. Earlier change make INSERT SELECT on both
MyISAM and Aria to replicate using TOI and STATEMENT
replication. Replication should happen only if user
has set needed wsrep_mode setting.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
New runtime type diagnostic (MDEV-34490) has detected that classes
Item_func_eq, Item_default_value and Item_date_literal_for_invalid_dates
incorrectly return an instance of its ancestor classes when being cloned.
This commit fixes that.
Additionally, it fixes a bug at Item_func_case_simple::do_build_clone()
which led to an endless loop of cloning functions calls.
Reviewer: Oleksandr Byelkin <sanja@mariadb.com>