Now that MDEV-14717 made RENAME TABLE crash-safe within InnoDB,
it should be safe to drop the #sql- tables within InnoDB during
crash recovery. These tables can be one of two things:
(1) #sql-ib related to deferred DROP TABLE (follow-up to MDEV-13407)
or to table-rebuilding ALTER TABLE...ALGORITHM=INPLACE
(since MDEV-14378, only related to the intermediate copy of a table),
(2) #sql- related to the intermediate copy of a table during
ALTER TABLE...ALGORITHM=COPY
We will not drop tables whose name starts with #sql2, because
the server can be killed during an ALGORITHM=COPY operation at
a point where the original table was renamed to #sql2 but the
finished intermediate copy was not yet renamed from #sql-
to the original table name.
InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2
redo log records during table-rebuilding ALGORITHM=INPLACE operations.
We must write the records for any .ibd file renames, so that the
operations are crash-safe.
If InnoDB is killed during a RENAME TABLE operation, it can happen that
the transaction for updating the data dictionary will be rolled back.
But, nothing will roll back the renaming of the .ibd file
(the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter,
the renaming of the dict_table_t::name in the dict_sys cache. We introduce
the undo log record TRX_UNDO_RENAME_TABLE to fix this.
fil_space_for_table_exists_in_mem(): Remove the parameters
adjust_space, table_id and some code that was trying to work around
these deficiencies.
fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record.
dict_table_rename_in_cache(): Invoke fil_name_write_rename().
trx_undo_rec_copy(): Set the first 2 bytes to the length of the
copied undo log record.
trx_undo_page_report_rename(), trx_undo_report_rename():
Write a TRX_UNDO_RENAME_TABLE record with the old table name.
row_rename_table_for_mysql(): Invoke trx_undo_report_rename()
before modifying any data dictionary tables.
row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE
by invoking dict_table_rename_in_cache(), which will take care
of both renaming the table and the file.
Problem:
The command was:
find $paths -mindepth 1 -regex $cpat -prune -o -exec rm -rf {} \+
Which was supposed to work as
* skipping $paths directories themselves (-mindepth 1)
* see if the dir/file name matches $cpat (-regex)
* if yes - don't dive into the directory, skip it (-prune)
* otherwise (-o)
* remove it and everything inside (-exec)
Now -exec ... \+ works like this:
every new found path is appended to the end of the command line.
when accumulated command line length reaches `getconf ARG_MAX` (~2Gb)
it's executed, and find continues, appending to a new command line.
What happens here, find appends some directory to the command line,
then dives into it, and starts appending files from that directory.
At some point command line overflows, rm -rf gets executed and removes
the whole directory. Now find tries to continue scanning the directory
that was already removed.
Fix: don't dive into directories that will be recursively removed
anyway, use -prune for them. Basically, we should be pruning both paths
that have matched $cpat and paths that have not matched it. This is
achived by pruning unconditionally, before the regex is tested:
find $paths -mindepth 1 -prune -regex $cpat -o -exec rm -rf {} \+
Patch Credit:- Serg
Using systemd we can automate creating users and directories. So
generate and install the configuration files.
Signed-off-by: Vicențiu Ciorbaru <vicentiu@mariadb.org>
Small change in cmake/install_layout.cmake compared to original contributor
patch to also install SYSTEMD_SYSUSERS and SYSTEMD_TMPFILES directories. The
variables were being set, but the loop which defines the final install files
was not updated.
In the function make_sortkey a tmp buffer was defined and in the absence of
param->tmp_buffer, tmp buffer used the sort_keys buffer. sort_keys buffer
has a length defined in sort_field->length, while param->tmp_buffer is
stored in param->rec_length. Make sure to use the appropriate length
based on which buffer we are using otherwise we'll overflow.
Also added a type cast to size_t during the calculation of the sort keys
buffer size to avoid an oveflow if the buffer size exceeds 32 bits.
galera_events test shows a regression with the original fix for MW-416
Reason was that Events::drop_event() can be called also from inside event
execution, and there we have a speacial treatment for event, which executes
"DROP EVENT" statement, and runs TOI replication inside the event processing body.
This resulted in executing WSREP_TO_ISOLATION two times for such DROP EVENT statement.
Fix is to call WSREP_TO_ISOLATION_BEGIN only in Events::drop_event()
Changed return code for replicatio error to TRUE.
This is aligned with native mysql convention to return TRUE (defined to 1) or FALSE (defined to 0) from a bool function.
This is wrong, but follows the mysql conventiosn, at least...
The InnoDB background DROP TABLE queue is something that we should
really remove, but are unable to until we remove dict_operation_lock
so that DDL and DML operations can be combined in a single transaction.
Because the queue is not persistent, it is not crash-safe. We should
in some way ensure that the deferred-dropped tables will be dropped
after server restart.
The existence of two separate transactions complicates the error handling
of CREATE TABLE...SELECT. We should really not break locks in DROP TABLE.
Our solution to these problems is to rename the table to a temporary
name, and to drop such-named tables on InnoDB startup. Also, the
queue will use table IDs instead of names from now on.
check-testcase.test: Ignore #sql-ib*.ibd files, because tables may enter
the background DROP TABLE queue shortly before the test finishes.
innodb.drop_table_background: Test CREATE...SELECT and the creation of
tables whose file name starts with #sql-ib.
innodb.alter_crash: Adjust the recovery, now that the #sql-ib tables
will be dropped on InnoDB startup.
row_mysql_drop_garbage_tables(): New function, to drop all #sql-ib tables
on InnoDB startup.
row_drop_table_for_mysql_in_background(): Remove an unnecessary and
misplaced call to log_buffer_flush_to_disk(). (The call should have been
after the transaction commit. We do not care about flushing the redo log
here, because the table would be dropped again at server startup.)
Remove the entry from the list after the table no longer exists.
If server shutdown has been initiated, empty the list without actually
dropping any tables. They will be dropped again on startup.
row_drop_table_for_mysql(): Do not call lock_remove_all_on_table().
Instead, if locks exist, defer the DROP TABLE until they do not exist.
If the table name does not start with #sql-ib, rename it to that prefix
before adding it to the background DROP TABLE queue.
find_type_or_exit() client helper did exit(1) on error, exit(1) moved to
clients.
mysql_read_default_options() did exit(1) on error, error is passed through and
handled now.
my_str_malloc_default() did exit(1) on error, replaced my_str_ allocator
functions with normal my_malloc()/my_realloc()/my_free().
sql_connect.cc did many exit(1) on hash initialisation failure. Removed error
check since my_hash_init() never fails.
my_malloc() did exit(1) on error. Replaced with abort().
my_load_defaults() did exit(1) on error, replaced with return 2.
my_load_defaults() still does exit(0) when invoked with --print-defaults.
Problem was that crypt_data->min_key_version is not a reliable way
to detect is tablespace encrypted and could lead that in first page
of the second (page 192 and similarly for other files if more configured)
system tablespace file used key_version is replaced with zero leading
a corruption as in next startup page is though to be corrupted.
Note that crypt_data->min_key_version is updated only after all
pages from tablespace have been processed (i.e. key rotation is done)
and flushed.
fil_write_flushed_lsn
Use crypt_data->should_encrypt() instead.
Whenever we call merge_role_privileges on a role, we make use of
the role->counter variable to check if all it's children have had their
privileges merged. Only if all children have had their privileges merged,
do we update the privileges on parent. This is done to prevent extra work.
The same idea is employed during flush privileges. You only begin merging
from "leaf" roles. The recursive calls will merge their parents at some point.
A problem arises when we try to "re-merge" a parent. Take the following graph:
{noformat}
A (0) ---- C (2) ---- D (2) ---- USER
/ /
B (0) ----/ /
/
E (0) --------------/
{noformat}
In parentheses we have the "counter" value right before we start to iterate
through the roles hash and propagate values. It represents the number of roles
granted to the current role. The order in which we iterate through the roles
hash is alphabetical.
* First merge A, which leads to decreasing the counter for C to 1. Since C is
not 0, we don't proceed with merging into C.
* Second we merge B, which leads to decreasing the counter for C to 0. Now
we proceed with merging into C. This leads to reducing the counter for D to 1
as part of C merge process.
* Third as we iterate through the hash, we see that C has counter 0, thus we
start the merge process *again*. This leads to reducing the counter for
D to 0! We then attempt to merge D.
* Fourth we start merging E. When E sees D as it's parent (according to the code)
it attempts to reduce D's counter, which leads to overflow. Now D's counter is
a very large number, thus E's privileges are not forwarded to D yet.
To correct this behavior we must make sure to only start merging from initial
leaf nodes.
When granting a role to another role, DB privileges get propagated. If
the grantee had no previous DB privileges, an extra ACL_DB entry is created to
house those "indirectly received" privileges. If, afterwards, DB
privileges are granted to the grantee directly, we must make sure to not
create a duplicate ACL_DB entry.
During the user-defined variable defined by the recursive CTE handling procedure
check_dependencies_in_with_clauses that checks dependencies between the tables
that are defined in the CTE and find recursive definitions wasn't called.
The InnoDB background DROP TABLE queue is something that we should
really remove, but are unable to until we remove dict_operation_lock
so that DDL and DML operations can be combined in a single transaction.
Because the queue is not persistent, it is not crash-safe. In stable
versions of MariaDB, we can only try harder to drop all enqueued
tables before server shutdown.
row_mysql_drop_t::table_id: Replaces table_name.
row_drop_tables_for_mysql_in_background():
Do not remove the entry from the list as long as the table exists.
In this way, the table should eventually be dropped.
Moved TOI replication to happen after ACL checking for commands:
SQLCOM_CREATE_EVENT
SQLCOM_ALTER_EVENT
SQLCOM_DROP_EVENT
SQLCOM_CREATE_VIEW
SQLCOM_CREATE_TRIGGER
SQLCOM_DROP_TRIGGER
SQLCOM_INSTALL_PLUGIN
SQLCOM_UNINSTALL_PLUGIN
wrep_sst_common: Setting "-c ''" for my_print_defaults just takes no values from config at all. $MY_PRINT_DEFAULTS is already set at the top of the script to have --defaults-file and --defaults-extra-file. If WSREP_SST_OPT_CONF if set to "--defaults-file=/etc/my.cnf --defaults-extra-file=/etc/my.extra.cnf", then "my_print_defaults -c "" --defaults-file=/etc/my.cnf" succeeds, but if WSREP_SST_OPT_CONF is empty - no default values are taken at all.
wsrep_sst_xtrabackup-v2: innobackupex does not support --defaults-extra-file, so ${WSREP_SST_OPT_CONF} cannot be used as an argument, it has been changed to ${WSREP_SST_OPT_DEFAULT}. Removed --defaults-file= from INNOMOVE line, because WSREP_SST_OPT_CONF already includes it (INNOBACKUP was fine, INNOMOVE - not).
* The version of tcmalloc is written to the system variable
'version_malloc_library' if tcmalloc is used, similarly to
jemalloc
* Extracted method guess_malloc_library()
The 'data' field in the XA RECOVER resultset changed
to be charset_bin. It seems to me right and also
--binary-as-hex starts working. The XA RECOVER FORMAT='SQL' option
implemented. It returns the XID string that fits to be an argument for the
XA ... statements.
The error
"Unsupported collation on string indexed column %s Use
binary collation (latin1_bin, binary, utf8_bin)."
is misleading. Change it:
- It is now a warning
- It is printed only for collations that do not support index-only access
(reversible collations that use unpack_info are ok)
- The new warning text is:
Indexed column %s.%s uses a collation that does not allow index-only
access in secondary key and has reduced disk space efficiency
in primary key.