Problem was that link file (.isl) is also opened using O_DIRECT
mode and if this fails the whole create table fails on internal
error.
Fixed by not using O_DIRECT on link files as they are used only
on create table and startup and do not contain real data.
O_DIRECT failures are successfully ignored for data files
if O_DIRECT is not supported by file system on used
data directory.
When acting as a Galera receiver node, server startup may take
more than 30 secs (the current default) as it has to wait for
SST/IST operation to complete besides spending some time doing
wsrep recovery.
Fixed by raising the default value of MYSQLD_STARTUP_TIMEOUT
to 60 secs. Also sourced /etc/default/mariadb into the init
script so that it can be used to set MYSQLD_STARTUP_TIMEOUT.
Some statements are always replicated in STATEMENT binlog format.
So upon their execution, the current binlog format is temporarily
switched to STATEMENT even though the session's format is different.
This state, stored in THD's current_stmt_binlog_format, was getting
incorrectly masked by wsrep_forced_binlog_format, causing assertions
and unintended generation of row events.
Backported galera.galera_forced_binlog_format and added a test
specific to this case.
In RHEL7/RHEL7.1 libcrack behavior seem to have been modified so that
"foobar" password is considered bad (due to descending "ba") earlier than
expected. For details google for cracklib-2.9.0-simplistic.patch.
Adjusted affected passwords not to have descending and ascending sequences.
Analysis:
-- InnoDB has n (>0) redo-log files.
-- In the first page of redo-log there is 2 checkpoint records on fixed location (checkpoint is not encrypted)
-- On every checkpoint record there is up to 5 crypt_keys containing the keys used for encryption/decryption
-- On crash recovery we read all checkpoints on every file
-- Recovery starts by reading from the latest checkpoint forward
-- Problem is that latest checkpoint might not always contain the key we need to decrypt all the
redo-log blocks (see MDEV-9422 for one example)
-- Furthermore, there is no way to identify is the log block corrupted or encrypted
For example checkpoint can contain following keys :
write chk: 4 [ chk key ]: [ 5 1 ] [ 4 1 ] [ 3 1 ] [ 2 1 ] [ 1 1 ]
so over time we could have a checkpoint
write chk: 13 [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ]
killall -9 mysqld causes crash recovery and on crash recovery we read as
many checkpoints as there is log files, e.g.
read [ chk key ]: [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]
read [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]
This is problematic, as we could still scan log blocks e.g. from checkpoint 4 and we do
not know anymore the correct key.
CRYPT INFO: for checkpoint 14 search 4
CRYPT INFO: for checkpoint 13 search 4
CRYPT INFO: for checkpoint 12 search 4
CRYPT INFO: for checkpoint 11 search 4
CRYPT INFO: for checkpoint 10 search 4
CRYPT INFO: for checkpoint 9 search 4 (NOTE: NOT FOUND)
For every checkpoint, code generated a new encrypted key based on key
from encryption plugin and random numbers. Only random numbers are
stored on checkpoint.
Fix: Generate only one key for every log file. If checkpoint contains only
one key, use that key to encrypt/decrypt all log blocks. If checkpoint
contains more than one key (this is case for databases created
using MariaDB server version 10.1.0 - 10.1.12 if log encryption was
used). If looked checkpoint_no is found from keys on checkpoint we use
that key to decrypt the log block. For encryption we use always the
first key. If the looked checkpoint_no is not found from keys on checkpoint
we use the first key.
Modified code also so that if log is not encrypted, we do not generate
any empty keys. If we have a log block and no keys is found from
checkpoint we assume that log block is unencrypted. Log corruption or
missing keys is found by comparing log block checksums. If we have
a keys but current log block checksum is correct we again assume
log block to be unencrypted. This is because current implementation
stores checksum only before encryption and new checksum after
encryption but before disk write is not stored anywhere.
Make sure that on decrypt we do not try to reference
NULL pointer and if page contains undefined
FIL_PAGE_FILE_FLUSH_LSN field on when page is not
the first page or page is not in system tablespace,
clear it.
* only copy args[0] to args[2] after fix_fields (when all item
substitutions have already happened)
* change QT_ITEM_FUNC_NULLIF_TO_CASE (that allows to print NULLIF
as CASE) to QT_ITEM_ORIGINAL_FUNC_NULLIF (that prohibits it).
So that NULLIF-to-CASE is allowed by default and only disabled
explicitly for SHOW VIEW|FUNCTION|PROCEDURE and mysql_make_view.
By default it is allowed (in particular in error messages and
debug output, that can happen anytime before or after optimizer).
don't do special SUM_FUNC_ITEM treatment in NULLIF for views
(as before), but do it for derived tables (when
context_analysis_only == CONTEXT_ANALYSIS_ONLY_DERIVED)
During SST, since wsrep_sst_rsync waits for mysqld to create
"tables_flushed" file after it has successfully executed FTWRL,
it would wait forever if FTWRL fails.
Fixed by introducing a mechanism to report failure to the script.
- "Early NULLs filtering" optimization used to "peel off" Item_ref and
Item_direct_ref wrappers from an outside column reference before
adding "outer_table_col IS NOT NULL" into JOIN::outer_ref_cond.
- When this happened in a subquery that was evaluated in a post-GROUP-BY
context, attempt to evaluate JOIN::outer_ref_cond would fetch an
incorrect value of outer_table_col.
If any given variable the xtrabackup-v2 sst script looks for is specified
multiple times in cnf file then it tend to pick both of them causing
some of the follow-up command to fail.
Avoid this programatic mistake by honoring only the last variable assigned
setting as done by mysqld too.
Check https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1362830
Semantics:
---------
* Generally end-user will create a separate user with needed
privileges for
performing DONOR action.
* This user credentials are specified using wsrep_sst_auth.
* Along with this user there could be other user(s) created on the
server
that sysadmin may use for normal or other operations
* Credentials for these user(s) can be specified in same
cluster/server
cnf file as part of [client] section
When cluster act as DONOR and if wsrep_sst_auth is provided then it
should
strictly use it for performing SST based action.
What if end-user has same credentials for performing both SST action
and
normal admin work ?
* Then end-user can simply specify these credentials as part of
[client]
section in cnf file and skip providing wsrep_sst_auth.
Issue:
-----
MySQL client user/password parsing preference order is as follows:
* command line (through --user/--password)
* cnf file
* MYSQL_PWD enviornment variable.
Recent change tried passing sst user password through MYSQL_PWD
(and user though --user command line param as before).
On the system where-in admin had another user for performing non-SST
actions,
credentials for such user were present in cnf file under [client]
section.
Due to mysql client preference order, SST user name was used (as it
was
passed through command line) but password of other user (meant for
non-SST)
action was being used as it was passed through cnf file.
Password passed through MYSQL_PWD was completely ignored causing
user-name/password mismatch.
Solution:
---------
* If user has specified credentials for SST then pass them through
command
line so that they are used in priority.
(There could be security concern on passing things through command
line but
when I tried passing user-name and password through command line to
mysql
client and then did ps I saw this
./bin/mysql --user=sstuser --password=x xxxxxxxx -S /tmp/n1.sock
so seems like password is not shown)