The memory leak happened on second execution of a prepared statement
that runs UPDATE statement with correlated subquery in right hand side of
the SET clause. In this case, invocation of the method
table->stat_records()
could return the zero value that results in going into the 'if' branch
that handles impossible where condition. The issue is that this condition
branch missed saving of leaf tables that has to be performed as first
condition optimization activity. Later the PS statement memory root
is marked as read only on finishing first time execution of the prepared
statement. Next time the same statement is executed it hits the assertion
on attempt to allocate a memory on the PS memory root marked as read only.
This memory allocation takes place by the sequence of the following
invocations:
Prepared_statement::execute
mysql_execute_command
Sql_cmd_dml::execute
Sql_cmd_update::execute_inner
Sql_cmd_update::update_single_table
st_select_lex::save_leaf_tables
List<TABLE_LIST>::push_back
To fix the issue, add the flag SELECT_LEX::leaf_tables_saved to control
whether the method SELECT_LEX::save_leaf_tables() has to be called or
it has been already invoked and no more invocation required.
Similar issue could take place on running the DELETE statement with
the LIMIT clause in PS/SP mode. The reason of memory leak is the same as for
UPDATE case and be fixed in the same way.
Caused by:
5d37cac7 MDEV-33348 ALTER TABLE lock waiting stages are indistinguishable.
In that commit, progress reporting was moved to
mysql_alter_table from copy_data_between_tables.
The temporary table case wasn't taken into the consideration,
where the execution of mysql_alter_table ends earlier than usual, by the
'end_temporary' label. There, thd_progress_end has been missing.
Fix:
Add missing thd_progress_end() call in mysql_alter_table.
The feedback plugin server_uid variable and the calculate_server_uid()
function is moved from feedback/utils.cc to sql/mysqld.cc
server_uid is added as a global variable (shown in 'show variables') and
is written to the error log on server startup together with server version
and server commit id.
We have an issue if a user have the following in a configuration file:
log_slow_filter="" # Log everything to slow query log
log_queries_not_using_indexes=ON
This set log_slow_filter to 'not_using_index' which disables
slow_query_logging of most queries.
In effect, on should never use log_slow_filter="" in config files but
instead use log_slow_filter=ALL.
Fixed by changing log_slow_filter="" that comes either from a
configuration file or from the command line, when starting to the server,
to log_slow_filter=ALL.
A warning will be printed when this happens.
Other things:
- One can now use =ALL for any 'set' variable to set all options at once.
(backported from 10.6)
When getaddrinfo returns and error, the contents
of ai are invalid so we cannot continue based
on their data structures.
In the previous branch of the if statement, we
abort there if there is an error so for consistency
we abort here too.
The test case fixes the port number to UINTMAX32
for both an enumberated bind-address and the
default bind-address covering the two calls to
getaddrinfo.
Review thanks Sanja.
Item_func_hex::fix_length_and_dec() evaluated a too short data type
for signed numeric arguments, which resulted in a 'Data too long for column'
error on CREATE..SELECT.
Fixing the code to take into account that a short negative
numer can produce a long HEX value: -1 -> 'FFFFFFFFFFFFFFFF'
Also fixing Item_func_hex::val_str_ascii_from_val_real().
Without this change, MTR test with HEX with negative float point arguments
failed on some platforms (aarch64, ppc64le, s390-x).
InnoDB transactions may be reused after committed:
- when taken from the transaction pool
- during a DDL operation execution
In this case wsrep flag on trx object is cleared, which may cause wrong
execution logic afterwards (wsrep-related hooks are not run).
Make trx->wsrep flag initialize from THD object only once on InnoDB transaction
start and don't change it throughout the transaction's lifetime.
The flag is reset at commit time as before.
Unconditionally set wsrep=OFF for THD objects that represent InnoDB background
threads.
Make Wsrep_schema::store_view() operate in its own transaction.
Fix streaming replication transactions' fragments rollback to not switch
THD->wsrep value during transaction's execution
(use THD->wsrep_ignore_table as a workaround).
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Affects:
MDEV-34150 Assertion failure in Diagnostics_area::set_error_status upon binary
logging hitting tmp space limit
MDEV-9101 Limit size of created disk temporary files and tables
This bug was caused by moving flushing of the in-memory-row-events from
close_thread_tables() to binlog_commit() in MDEV-34150.
This was needed to be able to handle the case where binlog writes could
fail.
Galera have two case where the change caused problems:
- Row events in commit_one_phase_2() was not done in the case the standard
binary log was not enabled but Galera was using the binary log
internally.
- Galera disabled the call to binlog_commit_flush_stmt_cache() for not
ending transactions.
Fixed by adding code that flushes the in-memory-row-events to the binary
log (write, but now sync) in the two above cases if Galera is enabled.
As part of commit 685d958e38 (MDEV-14425)
the parameter innodb_log_write_ahead_size was removed, because it was
thought that determining the physical block size would be a sufficient
replacement.
However, we can only determine the physical block size on Linux or
Microsoft Windows. On some file systems, the physical block size
is not relevant. For example, XFS uses a block size of 4096 bytes
even if the underlying block size may be smaller.
On Linux, we failed to determine the physical block size if
innodb_log_file_buffered=OFF was not requested or possible.
This will be fixed.
log_sys.write_size: The value of the reintroduced parameter
innodb_log_write_ahead_size. To keep it simple, this is read-only
and a power of two between 512 and 4096 bytes, so that the previous
alignment guarantees are fulfilled. This will replace the previous
log_sys.get_block_size().
log_sys.block_size, log_t::get_block_size(): Remove.
log_t::set_block_size(): Ensure that write_size will not be less
than the physical block size. There is no point to invoke this
function with 512 or less, because that is the minimum value of
write_size.
innodb_params_adjust(): Add some disabled code for adjusting
the minimum value and default value of innodb_log_write_ahead_size
to reflect the log_sys.write_size.
log_t::set_recovered(): Mark the recovery completed. This is the
place to adjust some things if we want to allow write_size>4096.
log_t::resize_write_buf(): Refer to write_size.
log_t::resize_start(): Refer to write_size instead of get_block_size().
log_write_buf(): Simplify some arithmetics and remove a goto.
log_t::write_buf(): Refer to write_size. If we are writing less than
that, do not switch buffers, but keep writing to the same buffer.
Move some code to improve the locality of reference.
recv_scan_log(): Refer to write_size instead of get_block_size().
os_file_create_func(): For type==OS_LOG_FILE on Linux, always invoke
os_file_log_maybe_unbuffered(), so that log_sys.set_block_size() will
be invoked even if we are not attempting to use O_DIRECT.
recv_sys_t::find_checkpoint(): Read the entire log header
in a single 12 KiB request into log_sys.buf.
Tested with:
./mtr --loose-innodb-log-write-ahead-size=4096
./mtr --loose-innodb-log-write-ahead-size=2048
The log_event_old.cc is included by mysqlbinlog.cc.
With -DWITHOUT_SERVER the include path for the wsrep
include headers isn't there.
As these aren't needed by the mariadb-binlog, move
these to under a ifndef MYSQL_CLIENT preprocessor.
Caused by MDEV-18590
One of possible use cases that reproduces the memory leakage listed below:
set timestamp= unix_timestamp('2000-01-01 00:00:00');
create or replace table t1 (x int) with system versioning
partition by system_time interval 1 hour auto
partitions 3;
create table t2 (x int);
create trigger tr after insert on t2 for each row update t1 set x= 11;
create or replace procedure sp2() insert into t2 values (5);
set timestamp= unix_timestamp('2000-01-01 04:00:00');
call sp2;
set timestamp= unix_timestamp('2000-01-01 13:00:00');
call sp2; # <<=== Memory leak happens there. In case MariaDB server is built
with the option -DWITH_PROTECT_STATEMENT_MEMROOT,
the second execution would hit assert failure.
The reason of leaking a memory is that once a new partition be created
the table should be closed and re-opened. It results in calling the function
extend_table_list() that indirectly invokes the function sp_add_used_routine()
to add routines implicitly used by the statement that makes a new memory
allocation.
To fix it, don't remove routines and tables the statement implicitly depends
on when a table being closed for subsequent re-opening.
Executing an INSERT statement in PS mode having positional parameter
bound with an array could result in incorrect number of inserted rows
in case there is a BEFORE INSERT trigger that executes yet another
INSERT statement to put a copy of row being inserted into some table.
The reason for incorrect number of inserted rows is that a data structure
used for binding positional argument with its actual values is stored
in THD (this is thd->bulk_param) and reused on processing every INSERT
statement. It leads to consuming actual values bound with top-level
INSERT statement by other INSERT statements used by triggers' body.
To fix the issue, reset the thd->bulk_param temporary to the value nullptr
before invoking triggers and restore its value on finishing its execution.
Having Item_func_not items in item trees breaks assumptions during the
optimization phase about transformation possibilities in fix_fields().
Remove Item_func_not by extending normalization during parsing.
Reviewed by Oleksandr Byelkin (sanja@mariadb.com)
New error codes can only be added in the latest major version.
Adding ER_KILL_DENIED_HIGH_PRIORITY would shift by one all
error codes that were added in MariaDB Server 10.6 or later.
This amends commit 1001dae186
Suggested by: Sergei Golubchik
RAND() and UUID() are treated differently with respect to subquery
materialization both should be marked as uncacheable, forcing materialization.
Altered Create_func_uuid(_short)::create_builder().
Added comment in header about UNCACHEABLE_RAND meaning also unmergeable.
Apart from better performance when accessing thread local variables,
we'll get rid of things that depend on initialization/cleanup of
pthread_key_t variables.
Where appropriate, use compiler-dependent pre-C++11 thread-local
equivalents, where it makes sense, to avoid initialization check overhead
that non-static thread_local can suffer from.
There were erroneous calls for charpos() in key_hashnr() and key_buf_cmp().
These functions are never called with prefix segments.
The charpos() calls were wrong. Before the change BNHL joins
- could return wrong result sets, as reported in MDEV-34417
- were extremely slow for multi-byte character sets, because
the hash was calculated on string prefixes, which increased
the amount of collisions drastically.
This patch fixes the wrong result set as reported in MDEV-34417,
as well as (partially) the performance problem reported in MDEV-34352.
A user reported that MariaDB server got a signal 6 but still accepted new
connections and did not crash. I have not been able to find a way to
repeat this or find the cause of issue. However to make it easier to
notice that the server is unstable, I added code to disable new
connections when the handle_fatal_signal() handler has been called.
The problem is seen on CI, where TEMP pointed to directory outside of
the usual vardir, when testing mysql_install_db.exe
A likely cause for this error is that TEMP was periodically cleaned up
by some automation running on the host, perhaps by buildbot itself.
To fix, mysql_install_db.exe will now use datadir as --tmpdir
for the bootstrap run. This will minimize chances to run into any
environment problems.
This commit introduces a reset of password errors counter on any alter user
command for the altered user. This is done so as to not require a
complete privilege system reload.
Problem was that there was two non-conflicting local idle
transactions in node_1 that both inserted a key to primary key.
Then two transactions from other nodes inserted also
a key to primary key so that insert from node_2 conflicted
one of the local transactions in node_1 so that there would
be duplicate key if both are committed. For this insert
from other node tries to acquire S-lock for this record
and because this insert is high priority brute force (BF)
transaction it will kill idle local transaction.
Concurrently, second insert from node_3 conflicts the second
idle insert transaction in node_1. Again, it tries to acquire
S-lock for this record and kills idle local transaction.
At this point we have two non-conflicting high priority
transactions holding S-lock on different records in node_1.
For example like this: rec s-lock-node2-rec s-lock-node3-rec rec.
Because these high priority BF-transactions do not wait
each other insert from node3 that has later seqno compared
to insert from node2 can continue. It will try to acquire
insert intention for record it tries to insert (to avoid
duplicate key to be inserted by local transaction). Hower,
it will note that there is conflicting S-lock in same gap
between records. This will lead deadlock error as we have
defined that BF-transactions may not wait for record lock
but we can't kill conflicting BF-transaction because
it has lower seqno and it should commit first.
BF-transactions are executed concurrently because their
values to primary key are different i.e. they do not
conflict.
Galera certification will make sure that inserts from
other nodes i.e these high priority BF-transactions
can't insert duplicate keys. Local transactions naturally
can but they will be killed when BF-transaction
acquires required record locks.
Therefore, we can allow situation where there is conflicting
S-lock and insert intention lock regardless of their seqno
order and let both continue with no wait. This will lead
to situation where we need to allow BF-transaction
to wait when lock_rec_has_to_wait_in_queue is called
because this function is also called from
lock_rec_queue_validate and because lock is waiting
there would be assertion in ut_a(lock->is_gap()
|| lock_rec_has_to_wait_in_queue(cell, lock));
lock_wait_wsrep_kill
Add debug sync points for BF-transactions killing
local transaction.
wsrep_assert_no_bf_bf_wait
Print also requested lock information
lock_rec_has_to_wait
Add function to handle wsrep transaction lock wait
cases.
lock_rec_has_to_wait_wsrep
New function to handle wsrep transaction lock wait
exceptions.
lock_rec_has_to_wait_in_queue
Remove wsrep exception, in this function all
conflicting locks need to wait in queue.
Conflicts between BF and local transactions
are handled in lock_wait.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Changed error code for Galera unkillable threads to
be ER_KILL_DENIED_HIGH_PRIORITY giving message
This is a high priority thread/query and cannot be killed
without the compromising consistency of the cluster
also a warning is produced
Thread %lld is [wsrep applier|high priority] and cannot be killed
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
After MDEV-4013, the maximum length of replication passwords was extended to
96 ASCII characters. After a restart, however, slaves only read the first 41
characters of MASTER_PASSWORD from the master.info file. This lead to slaves
unable to reconnect to the master after a restart.
After a slave restart, if a master.info file is detected, use the full
allowable length of the password rather than 41 characters.
Reviewed By:
============
Sergei Golubchik <serg@mariadb.com>
In the error case of THD::register_slave(), there is undefined
behavior of Slave_info si because it is allocated via malloc()
(my_malloc), and cleaned up via delete().
This patch makes these consistent by switching si's cleanup
to use my_free.
Review followup: RANGE_OPT_PARAM statement_should_be_aborted()
checks for thd->is_fatal_error and thd->is_error(). The first is
redundant when the second is present.
The optimizer deals with Rowid Filters this way:
1. First, range optimizer is invoked. It saves information
about all potential range accesses.
2. A query plan is chosen. Suppose, it uses a Rowid Filter on
index $IDX.
3. JOIN::make_range_rowid_filters() calls the range optimizer
again to create a quick select on index $IDX which will be used
to populate the rowid filter.
The problem: KILL command catches the query in step #3. Quick
Select is not created which causes a crash.
Fixed by checking if query was killed. Note: the problem also
affects 10.6, even if error handling for
SQL_SELECT::test_quick_select is different there.
(Variant for 10.6: return error code from SQL_SELECT::test_quick_select)
The optimizer deals with Rowid Filters this way:
1. First, range optimizer is invoked. It saves information
about all potential range accesses.
2. A query plan is chosen. Suppose, it uses a Rowid Filter on
index $IDX.
3. JOIN::make_range_rowid_filters() calls the range optimizer
again to create a quick select on index $IDX which will be used
to populate the rowid filter.
The problem: KILL command catches the query in step #3. Quick
Select is not created which causes a crash.
Fixed by checking if query was killed.
Rowid Filter cannot be used with reverse-ordered scans, for the
same reason as IndexConditionPushdown cannot be.
test_if_skip_sort_order() already has logic to disable ICP when
setting up a reverse-ordered scan. Added logic to also disable
Rowid Filter in this case, factored out the code into
prepare_for_reverse_ordered_access(), and added a comment describing
the cause of this limitation.