If UPDATE/DELETE does not change data it is skipped from
replication. We now force replication of such events when they trigger
partition auto-creation.
For ROLLBACK it is as simple as set OPTION_KEEP_LOG
flag. trans_cannot_safely_rollback() does the rest.
For UPDATE/DELETE .. LIMIT 0 we make additional binlog_query() calls
at the early points of return.
As a safety measure we also convert row format into statement if it is
needed. The condition is decided by
binlog_need_stmt_format(). Basically if there are some row events in
cache we don't need that: table open of row event will trigger
auto-creation anyway.
Multi-update/delete works via mysql_select(). There is no early points
of return, so binlogging is always checked by
send_eof()/abort_resultset(). But we must comply with the above
measure of converting into statement.
:: Syntax change ::
Keyword AUTO enables history partition auto-creation.
Examples:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 MONTH
STARTS '2021-01-01 00:00:00' AUTO PARTITIONS 12;
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO;
Or with explicit partitions:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO
(PARTITION p0 HISTORY, PARTITION pn CURRENT);
To disable or enable auto-creation one can use ALTER TABLE by adding
or removing AUTO from partitioning specification:
CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING
PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
# Disables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR;
# Enables auto-creation:
ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO;
If the rest of partitioning specification is identical to CREATE TABLE
no repartitioning will be done (for details see MDEV-27328).
:: Description ::
Before executing history-generating DML command (see the list of commands below)
add N history partitions, so that N would be sufficient for potentially
generated history. N > 1 may be required when history partitions are switched
by INTERVAL and current_timestamp is N times further than the interval
boundary of the last history partition.
If the last history partition equals or exceeds LIMIT records then new history
partition is created and selected as the working partition. According to
MDEV-28411 partitions cannot be switched (or created) while the command is
running. Thus LIMIT does not carry strict limitation and the history partition
size must be planned as LIMIT value plus average number of history one DML
command can generate.
Auto-creation is implemented by synchronous fast_alter_partition_table() call
from the thread of the executed DML command before the command itself is run
(by the fallback and retry mechanism similar to Discovery feature,
see Open_table_context).
The name for newly added partitions are generated like default partition names
with extension of MDEV-22155 (which avoids name clashes by extending assignment
counter to next free-enough gap).
These DML commands can trigger auto-creation:
DELETE (including multitable DELETE, excluding DELETE HISTORY)
UPDATE (including multitable UPDATE)
REPLACE (including REPLACE .. SELECT)
INSERT .. ON DUPLICATE KEY UPDATE (including INSERT .. SELECT .. ODKU)
LOAD DATA .. REPLACE
:: Bug fixes ::
MDEV-23642 Locking timeout caused by auto-creation affects original DML
The reasons for this are:
- Do not disrupt main business process (the history is auxiliary service);
- Consequences are non-fatal (history is not lost, but comes into wrong
partition; fixed by partitioning rebuild);
- There is more freedom for application to fail in this case or not: it may
read warning info and find corresponding error number.
- While non-failing command is easy to handle by an application and fail it,
the opposite is hard to handle: there is no automatic actions to fix
failed command and retry, DBA intervention is required and until then
application is non-functioning.
MDEV-23639 Auto-create does not work under LOCK TABLES or inside triggers
Don't do tdc_remove_table() for OT_ADD_HISTORY_PARTITION because it is
not possible in locked tables mode.
LTM_LOCK_TABLES mode (and LTM_PRELOCKED_UNDER_LOCK_TABLES) works out
of the box as fast_alter_partition_table() can reopen tables via
locked_tables_list.
In LTM_PRELOCKED we reopen and relock table manually.
:: More fixes ::
* some_table_marked_for_reopen flag fix
some_table_marked_for_reopen affets only reopen of
m_locked_tables. I.e. Locked_tables_list::reopen_tables() reopens only
tables from m_locked_tables.
* Unused can_recover_from_failed_open() condition
Is recover_from_failed_open() can be really used after
open_and_process_routine()?
:: Reviewed by ::
Sergei Golubchik <serg@mariadb.org>
When we need to add/remove or change LIMIT, INTERVAL, AUTO we have to
recreate partitioning from scratch (via data copy). Such operations
should be done fast. To remove options like LIMIT or INTERVAL one
should write:
alter table t1 partition by system_time;
The command checks whether it is new or existing SYSTEM_TIME
partitioning. And in the case of new it behaves as CREATE would do:
adds default number of partitions (2). If SYSTEM_TIME partitioning
already existed it just changes its options: removes unspecified ones
and adds/changes those specified explicitly. In case when partitions
list was supplied it behaves as usual: does full repartitioning.
Examples:
create or replace table t1 (x int) with system versioning
partition by system_time limit 100 partitions 4;
# Change LIMIT
alter table t1 partition by system_time limit 33;
# Remove LIMIT
alter table t1 partition by system_time;
# This does full repartitioning
alter table t1 partition by system_time limit 33 partitions 4;
# This does data copy as pruning will require records in correct partitions
alter table t1 partition by system_time interval 1 hour
starts '2000-01-01 00:00:00';
# But this works fast, LIMIT will apply to DML commands
alter table t1 partition by system_time limit 33;
To sum up, ALTER for SYSTEM_TIME partitioning does full repartitioning
when:
- INTERVAL was added or changed;
- partition list or partition number was specified;
Otherwise it does fast alter table.
Cleaned up dead condition in set_up_default_partitions().
Reviewed by:
Oleksandr Byelkin <sanja@mariadb.com>
Nikita Malyavin <nikitamalyavin@gmail.com>
(cherry-pick into preview-10.9-MDEV-27021-explain tree)
Expression_cache_tmptable object uses an Expression_cache_tracker object
to report the statistics.
In the common scenario, Expression_cache_tmptable destructor sets
tracker->cache=NULL. The tracker object survives after the expression
cache is deleted and one may call cache_tracker->fetch_current_stats()
for it with no harm.
However a degenerate cache with no parameters does not set
tracker->cache=NULL in Expression_cache_tmptable destructor which
results in an attempt to use freed data in the
cache_tracker->fetch_current_stats() call.
Fixed by setting tracker->cache to NULL and wrapping the assignment into
a function.
- Describe the lifetime of EXPLAIN data structures in
sql_explain.h:ExplainDataStructureLifetime.
- Make Item_field::set_field() call set_refers_to_temp_table()
when it refers to a temp. table.
- Introduce QT_DONT_ACCESS_TMP_TABLES flag for Item::print.
It directs Item_field::print to not try access its the
temp table.
- Introduce Explain_query::notify_tables_are_closed()
and call it right before the query closes its tables.
- Make Explain data stuctures' print_explain_json() methods
accept "no_tmp_tbl" parameter which means pass
QT_DONT_ACCESS_TMP_TABLES when printing items.
- Make Show_explain_request::call_in_target_thread() not call
set_current_thd(). This wasn't needed as the code inside
lex->print_explain() uses output->thd anyway. output->thd
refers to the SHOW command's THD object.
SHOW EXPLAIN/ANALYZE FORMAT=JSON tries to access items that have already been
freed by a call to free_items() during THD::cleanup_after_query().
The solution is to disallow APC calls including SHOW EXPLAIN/ANALYZE
just before the call to free_items().
1. Add explicit indication that the output is produced by
SHOW EXPLAIN/ANALYZE FORMAT=JSON command.
2. Remove useless "r_total_time_ms" field from SHOW ANALYZE FORMAT=JSON
output when there is no timed statistics gathered.
3. Add "r_query_time_in_progress_ms" to the output of SHOW ANALYZE FORMAT=JSON.
EXPLAIN FOR CONNECTION is a MySQL-compatible syntax for SHOW EXPLAIN.
This commit also adds support for FORMAT=JSON to SHOW EXPLAIN,
so the possible options to get JSON-formatted output are:
- SHOW EXPLAIN FORMAT=JSON FOR $con
- EXPLAIN FORMAT=JSON FOR CONNECTION $con
Problem:
========
The test logic checked for the wrong condition to validate that the
slave had caught up with the master. Specifically, it used the
thread stage of the IO and SQL thread to be in the “Waiting for
master to send event” and “Slave has read all relay log; waiting for
more updates” states, respectively. The problem exposed by this MDEV
is that, this state is also the initial slave state before reading
data from the primary (whereas the intended state was having already
read all available events from the primary and now waiting for new
events). This made the MTR test validate data that it had not yet
received, and thereby fail.
Solution:
========
Instead of using the IO/SQL thread states, use the existing helper
functions save_master_gtid.inc and sync_with_master_gtid.inc. Note
that the test result file also needed to be updated to reflect
this fix.
Special thanks to Angelique Sklavounos for pointing out that
--stop-position was not specified in any buildbot failures, as this
led to an IF block in the MTR test that was the source of the test
failure.
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
This follows up the previous fix in
commit c3c53926c4 (MDEV-26554).
ha_innobase::delete_table(): Work around the insufficient
metadata locking (MDL) during DML operations by acquiring exclusive
InnoDB table locks on all child tables. Previously, this was only
done on TRUNCATE and ALTER.
ibuf_delete_rec(), btr_cur_optimistic_delete(): Do not invoke
lock_update_delete() during change buffer operations.
The revised trx_t::commit(std::vector<pfs_os_file_t>&) will
hold exclusive lock_sys.latch while invoking fil_delete_tablespace(),
which in turn may invoke ibuf_delete_rec().
dict_index_t::has_locking(): A new predicate, replacing the dummy
!dict_table_is_locking_disabled(index->table). Used for skipping lock
operations during ibuf_delete_rec().
trx_t::commit(std::vector<pfs_os_file_t>&): Release the locks
and remove the table from the cache while holding exclusive
lock_sys.latch.
trx_t::commit_in_memory(): Skip release_locks() if dict_operation holds.
trx_t::commit(): Reset dict_operation before invoking commit_in_memory()
via commit_persist().
lock_release_on_drop(): Release locks while lock_sys.latch is
exclusively locked.
lock_table(): Add a parameter for a pointer to the table.
We must not dereference the table before a lock_sys.latch has
been acquired. If the pointer to the table does not match the table
at that point, the table is invalid and DB_DEADLOCK will be returned.
row_ins_foreign_check_on_constraint(): Improve the checks.
Remove a bogus DB_LOCK_WAIT_TIMEOUT return that was needed
before commit c5fd9aa562 (MDEV-25919).
row_upd_check_references_constraints(),
wsrep_row_upd_check_foreign_constraints(): Simplify checks.
- InnoDB should avoid bulk insert operation when table has active
DDL. Because bulk insert writes only one undo log as TRX_UNDO_EMPTY
and logging of concurrent DML happens at commit time uses undo log
record to parse and get the value and operation.
- Removed ROW_T_EMPTY, ROW_OP_EMPTY and their associated functions
and also the test case which tries to log the ROW_OP_EMPTY
when table has active DDL.
Analysis: When trying to find path and handling the match for path,
value at current index is not set to 0 for array_counters. This causes wrong
current step value which eventually causes wrong cur_step->type value.
Fix: Set the value at current index for array_counters to 0.
constructor
Analysis: counter does not increment while sending rows for table value
constructor and so row_number assumes the default value (0 in this case).
Fix: Increment the counter to avoid counter using default value.
Analysis: When trying to compare json paths, the array_sizes variable is
NULL when beginning. But trying to access address by adding to the NULL
pointer while recursive calling json_path_parts_compare() for handling
double wildcard, it causes undefined behaviour and the array_sizes
variable eventually becomes non-null (has some address).
This eventually results in crash.
Fix: If array_sizes variable is NULL then pass NULL recursively as well.
This commit fixes a crash reported as MDEV-28377 and a number
of other crashes in automated tests with mtr that are related
to broken .cnf files in galera and galera_3nodes suites, which
happened when automatically migrating MDEV-26171 from 10.3 to
subsequent higher versions.
- InnoDB DDL results in `Duplicate entry' if concurrent DML throws
duplicate key error. The following scenario explains the problem
connection con1:
ALTER TABLE t1 FORCE;
connection con2:
INSERT INTO t1(pk, uk) VALUES (2, 2), (3, 2);
In connection con2, InnoDB throws the 'DUPLICATE KEY' error because
of unique index. Alter operation will throw the error when applying
the concurrent DML log.
- Inserting the duplicate key for unique index logs the insert
operation for online ALTER TABLE. When insertion fails,
transaction does rollback and it leads to logging of
delete operation for online ALTER TABLE.
While applying the insert log entries, alter operation
encounters 'DUPLICATE KEY' error.
- To avoid the above fake duplicate scenario, InnoDB should
not write any log for online ALTER TABLE before DML transaction
commit.
- User thread which does DML can apply the online log if
InnoDB ran out of online log and index is marked as completed.
Set online log error if apply phase encountered any error.
It can also clear all other indexes log, marks the newly
added indexes as corrupted.
- Removed the old online code which was a part of DML operations
commit_inplace_alter_table() : Does apply the online log
for the last batch of secondary index log and does frees
the log for the completed index.
trx_t::apply_online_log: Set to true while writing the undo
log if the modified table has active DDL
trx_t::apply_log(): Apply the DML changes to online DDL tables
dict_table_t::is_active_ddl(): Returns true if the table
has an active DDL
dict_index_t::online_log_make_dummy(): Assign dummy value
for clustered index online log to indicate the secondary
indexes are being rebuild.
dict_index_t::online_log_is_dummy(): Check whether the online
log has dummy value
ha_innobase_inplace_ctx::log_failure(): Handle the apply log
failure for online DDL transaction
row_log_mark_other_online_index_abort(): Clear out all other
online index log after encountering the error during
row_log_apply()
row_log_get_error(): Get the error happened during row_log_apply()
row_log_online_op(): Does apply the online log if index is
completed and ran out of memory. Returns false if apply log fails
UndorecApplier: Introduced a class to maintain the undo log
record, latched undo buffer page, parse the undo log record,
maintain the undo record type, info bits and update vector
UndorecApplier::get_old_rec(): Get the correct version of the
clustered index record that was modified by the current undo
log record
UndorecApplier::clear_undo_rec(): Clear the undo log related
information after applying the undo log record
UndorecApplier::log_update(): Handle the update, delete undo
log and apply it on online indexes
UndorecApplier::log_insert(): Handle the insert undo log
and apply it on online indexes
UndorecApplier::is_same(): Check whether the given roll pointer
is generated by the current undo log record information
trx_t::rollback_low(): Set apply_online_log for the transaction
after partially rollbacked transaction has any active DDL
prepare_inplace_alter_table_dict(): After allocating the online
log, InnoDB does create fulltext common tables. Fulltext index
doesn't allow the index to be online. So removed the dead
code of online log removal
Thanks to Marko Mäkelä for providing the initial prototype and
Matthias Leich for testing the issue patiently.
It suffices to test compression with one record. Restarting the
server is not really needed; we are exercising the log based recovery
in other tests, such as mariabackup.page_compression_level.
The purpose of the compress() wrapper my_compress_buffer() was twofold:
silence Valgrind warnings about uninitialized memory access before
zlib 1.2.4, and have PERFORMANCE_SCHEMA instrumentation of some zlib
related memory allocation. Because of PERFORMANCE_SCHEMA, we cannot
trivially replace my_compress_buffer() with compress().
az_open(): Remove a crc32() call. Any CRC of the empty string is 0.
This is a backport of commit 4489a89c71
in order to remove the test innodb.redo_log_during_checkpoint
that would cause trouble in the DBUG subsystem invoked by
safe_mutex_lock() via log_checkpoint(). Before
commit 7cffb5f6e8
these mutexes were of different type.
The following options were introduced in
commit 2e814d4702 (mariadb-10.2.2)
and have little use:
innodb_disable_resize_buffer_pool_debug had no effect even in
MariaDB 10.2.2 or MySQL 5.7.9. It was introduced in
mysql/mysql-server@5c4094cf49
to work around a problem that was fixed in
mysql/mysql-server@2957ae4f99
(but the parameter was not removed).
innodb_page_cleaner_disabled_debug and innodb_master_thread_disabled_debug
are only used by the test innodb.redo_log_during_checkpoint
that will be removed as part of this commit.
innodb_dict_stats_disabled_debug is only used by that test,
and it is redundant because one could simply use
innodb_stats_persistent=OFF or the STATS_PERSISTENT=0 attribute
of the table in the test to achieve the same effect.
Ever since commit 685d958e38
some Perl code in the test mariabackup.huge_lsn is writing names of
non-existing files to the InnoDB redo log and testing the recovery.
We do not need any debug instrumentation to duplicate that test.
The --skip-write-binlog message was confusing that it only had
an effect if the galera was enabled. There are uses beyond galera
so we apply SET SESSION SQL_LOG_BIN=0 as implied by the option
without being conditional on the wsrep status.
We also with --skip-write-binlog actually check the session @@WSREP_ON
variable rather than the global server variable.
Since 10.6, the wsrep_mode could replicate Aria and MyISAM, in which
case no change to innodb and back is needed.
By removing the conditions, we can use LOCK TABLES in a general case
improving the load speed of Aria (MDEV-23326), regardless of the
skip-write-binlog flag. The only case where we don't use LOCK TABLES is
when we are replicating via Innodb, because wsrep_on=1 and wsrep_mode
doesn't contain REPLICATE_ARIA{,MYISAM}. This uses an Innodb transaction
instead. When replicating via InnoDB we change the table engine type
back to what it was originally.
By removing the \d and other syntax that requires parsing by
the mariadb client, we can use the generated SQL more generally, like
in the embedded server.
We also save and restore the SQL_LOG_BIN and WSREP_ON session server
variables so this can be included in the same session as other data
without taking into changes in state.
Remove wsrep.mysql_tzinfo_to_sql_symlink{,_skip} tests as they offered
no additional coverage beyond main.mysql_tzinfo_to_sql_symlink (no
server testing was done).
Add galera.mariadb_tzinfo_to_sql to actually test the replication
of tzinfo data through galera.
The conditional executable comment around /*M!100602 ...
START TRANSACTION .. LOCK TABLES.. */ is so that we can provide tzinfo
files (MDEV-27113, MDBF-389) and in the case that a user uses it on a
pre-10.6 server version it will still work. Both START TRANSACTION and
LOCK TABLES are not supported in prepared statements in MariaDB versions
earlier than 10.6.2.
Reviewed by Brandon Nesterenko
- Simplified Chinese translation added
- Character encoding is gdk
-- gdk covers more characters
-- gdk includes both Simplified and Traditional
-- best option I think, may need to work along with other locale
settings
- Other cleanup
-- Within each error, messages are sorted according to language code
-- More consistent formatting (8 spaces proceeding each translation)
-- jps removed as duplicate of jpn translation
This should be a good starting point. More refinement is appreciated,
and needed down the road.
English "containt" (sic) spelling fixes on ER_FK_NO_INDEX_{CHILD,PARENT}
resulting in mtr test case adjustments.
Edited/reviewed by Daniel Black
The following condition has to added:
1) InnoDB fails to include the offset of the node pointer field
in non-leaf record for redundant row format.
2) If the Fixed length field does have only prefix length then
calculate the field maximum size as prefix length.
- Added the test case to test (2) and to check maximum number of
fields can exist in the index.
When we enable writes after Galera SST srv_n_fil_crypt_threads needs
to be set temporally to 0 (as was done when writes were disabled)
to make sure that encryption threads will be really started based
on old value of encryption threads.
Fix provided by Marko Mäkelä.
Analysis: There are 2 server variables- "old_mode" and "old". "old" is no
longer needed as "old_mode" has replaced it (however still used in some places
in the code). "old_mode" and "old" has same purpose- emulate behavior from
previous MariaDB versions. So they can be merged to avoid confusion.
Fix: Deprecate "old" variable and create another mode for @@old_mode to mimic
behavior of previous "old" variable. Create specific modes for specifix task
that --old sql variable was doing earlier and use the new modes instead.
New Feature:
============
Extend mariadb-binlog command-line tool to allow for filtering
events using GTID domain and server ids. The functionality mimics
that of a replica server’s DO_DOMAIN_IDS, IGNORE_DOMAIN_IDS, and
IGNORE_SERVER_IDS from CHANGE MASTER TO. For completeness, this
patch additionally adds the option --do-server-ids as an alias for
--server-id, which now accepts a list of server ids instead of a
single one.
Example usage:
mariadb-binlog --do-domain-ids=2,3,4 --do-server-ids=1,3
master-bin.000001
Functional Notes:
1. --do-domain-ids cannot be combined with --ignore-domain-ids
2. --do-server-ids cannot be combined with --ignore-server-ids
3. A domain id filter can be combined with a server id filter
4. When any new filter options are combined with the
--gtid-strict-mode option, events from excluded domains/servers are
not validated.
5. Domain/server id filters can be combined with GTID ranges (i.e.
specifications of --start-position and --stop-position). However,
because the --stop-position option implicitly undertakes filtering
to only output events within its range of domains, when combined
with --do-domain-ids or --ignore-domain-ids, output will consist of
the intersection between the filters. Specifically, with
--do-domain-ids and --stop-position, only events with domain ids
present in both argument lists will be output. Conversely, with
--ignore-domain-ids and --stop-position, only events with domain ids
present in the --stop-position and absent from the
--ignore-domain-ids options will be output.
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
column generated using date_format() and if()
vcol_info->expr is allocated on expr_arena at parsing stage. Since
expr item is allocated on expr_arena all its containee items must be
allocated on expr_arena too. Otherwise fix_session_expr() will
encounter prematurely freed item.
When table is reopened from cache vcol_info contains stale
expression. We refresh expression via TABLE::vcol_fix_exprs() but
first we must prepare a proper context (Vcol_expr_context) which meets
some requirements:
1. As noted above expr update must be done on expr_arena as there may
be new items created. It was a bug in fix_session_expr_for_read() and
was just not reproduced because of no second refix. Now refix is done
for more cases so it does reproduce. Tests affected: vcol.binlog
2. Also name resolution context must be narrowed to the single table.
Tested by: vcol.update main.default vcol.vcol_syntax gcol.gcol_bugfixes
3. sql_mode must be clean and not fail expr update.
sql_mode such as MODE_NO_BACKSLASH_ESCAPES, MODE_NO_ZERO_IN_DATE, etc
must not affect vcol expression update. If the table was created
successfully any further evaluation must not fail. Tests affected:
main.func_like
Reviewed by: Sergei Golubchik <serg@mariadb.org>