cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about
deprecated `register` keyword.
Too much warnings came from Mroonga and I gave up on it.
Rather than iterating global plugin collection, iterate THD local
collection. Removes two LOCK_plugin locks per connection.
Part of MDEV-19515 - Improve connect speed
Rather than parsing session_track_system_variables when thread starts, do
it when first trackable event occurs.
Benchmarked on a 2socket/20core/40threads Broadwell system using sysbench
connect brencmark @40 threads (with select 1 disabled):
101379.77 -> 143016.68 CPS, whereas 10.2 is currently at 137766.31 CPS.
Part of MDEV-14984 - regression in connect performance
Fixed by caching last binary log number used in last_used_log_number
Other things:
- Moved locking of LOCK_log form new_file_impl() to new_file(). This fixed
a bug where LOCK_log could have been unlocked even if 'need_lock' was
not set. Removed not anymore used argument need_lock.
- Made generate_new_name() virtual to simplify the code between
other logs and binary log.
Reviewed by Andrei Elkin
now we can afford it. Fix -Werror errors. Note:
* old gcc is bad at detecting uninit variables, disable it.
* time_t is int or long, cast it for printf's
This patch contains a fix for the MDEV-17262/17243 issues and
new mtr test.
These issues (MDEV-17262/17243) have two reasons:
1) After an intermediate commit, a transaction loses its status
of "transaction that registered in the MySQL for 2pc coordinator"
(in the InnoDB) due to the fact that since version 10.2 the
write_row() function (which located in the ha_innodb.cc) does
not call trx_register_for_2pc(m_prebuilt->trx) during the processing
of split transactions. It is necessary to restore this call inside
the write_row() when an intermediate commit was made (for a split
transaction).
Similarly, we need to set the flag of the started transaction
(m_prebuilt->sql_stat_start) after intermediate commit.
The table->file->extra(HA_EXTRA_FAKE_START_STMT) called from the
wsrep_load_data_split() function (which located in sql_load.cc)
will also do this, but it will be too late. As a result, the call
to the wsrep_append_keys() function from the InnoDB engine may be
lost or function may be called with invalid transaction identifier.
2) If a transaction with the LOAD DATA statement is divided into
logical mini-transactions (of the 10K rows) and binlog is rotated,
then in rare cases due to the wsrep handler re-registration at the
boundary of the split, the last portion of data may be lost. Since
splitting of the LOAD DATA into mini-transactions is technical,
I believe that we should not allow these mini-transactions to fall
into separate binlogs. Therefore, it is necessary to prohibit the
rotation of binlog in the middle of processing LOAD DATA statement.
https://jira.mariadb.org/browse/MDEV-17262 and
https://jira.mariadb.org/browse/MDEV-17243
* MDEV-16509 Improve wsrep commit performance with binlog disabled
Release commit order critical section early after trx_commit_low() if
binlog is not transaction coordinator. In order to avoid two phase commit,
binlog_hton is not registered for THD during IO_CACHE population.
Implemented a test which verifies that the transactions release
commit order early.
This optimization will change behavior during recovery as the commit
is not two phase when binlog is off. Fixed and recorded wsrep-recover-v25
and wsrep-recover to match the behavior.
* MDEV-18730 Ordering for wsrep binlog group commit
Previously out of order execution was allowed for wsrep commits.
Established proper ordering by populating wait_for_commit
for every wsrep THD and making group commit leader to wait for
prior commits before proceeding to trx_group_commit_leader().
* MDEV-18730 Added a test case to verify correct commit ordering
* MDEV-16509, MDEV-18730 Review fixes
Use WSREP_EMULATE_BINLOG() macro to decide if the binlog_hton
should be registered. Whitespace/syntax fixes and cleanups.
* MDEV-16509 Require binlog for galera_var_innodb_disallow_writes test
If the commit to InnoDB is done in one phase, the native InnoDB behavior
is that the transaction is committed in memory before it is persisted to
disk. This means that the innodb_disallow_writes=ON may not prevent
transaction to become visible to other readers before commit is completely
over. On the other hand, if the commit is two phase (as it is with binlog),
the transaction will be blocked in prepare phase.
Fixed the test to use binlog, which enforces two phase commit, which
in turn makes commit to block before the changes become visible to
other connections. This guarantees that the test produces expected
result.
With wsrep_gtid_mode=ON, the appropriate commit hooks were not
called in all cases for applied streaming transactions.
As a fix, removed all special handling of commit order critical
section from Wsrep_high_priority_service and Wsrep_storage_service.
Now commit order critical section is always entered in ha_commit_trans().
Check for wsrep_run_commit_hook is now done in handler.cc, log.cc.
This makes it explicit that the transaction is an active wsrep
transaction which must go through commit hooks.
fill_status.
Also, remove LOCK_status around calc_sum_of_all_status()
Also, rename LOCK_show_status into LOCK_all_status_vars.
This reflects the variable the lock protects.
PROBLEM
-------
Memory sanitizer reports uninitialized comparisons
in log_in_use(), because strings are compared with
memcmp() instead of strncmp.
FIX
---
Use strncmp() to compare strings
extra/mariabackup/fil_cur.cc:361:42: warning: format specifies type 'unsigned long' but the argument has type 'ib_int64_t' (aka 'long long') [-Wformat]
extra/mariabackup/fil_cur.cc:376:9: warning: format specifies type 'unsigned long' but the argument has type 'ib_int64_t' (aka 'long long') [-Wformat]
sql/handler.cc:6196:45: warning: format specifies type 'unsigned long' but the argument has type 'wsrep_trx_id_t' (aka 'unsigned long long') [-Wformat]
sql/log.cc:1681:16: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
sql/log.cc:1687:16: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
sql/wsrep_sst.cc:1388:86: warning: format specifies type 'long' but the argument has type 'wsrep_seqno_t' (aka 'long long') [-Wformat]
sql/wsrep_sst.cc:232:86: warning: format specifies type 'long' but the argument has type 'wsrep_seqno_t' (aka 'long long') [-Wformat]
storage/connect/filamdbf.cpp:450:47: warning: format specifies type 'short' but the argument has type 'int' [-Wformat]
storage/connect/filamdbf.cpp:970:47: warning: format specifies type 'short' but the argument has type 'int' [-Wformat]
storage/connect/inihandl.cpp:197:16: warning: address of array 'key->name' will always evaluate to 'true' [-Wpointer-bool-conversion]
storage/innobase/btr/btr0scrub.cc:151:17: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/innobase/buf/buf0buf.cc:5085:8: warning: nonnull parameter 'bpage' will evaluate to 'true' on first encounter [-Wpointer-bool-conversion]
storage/innobase/fil/fil0crypt.cc:2454:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/innobase/handler/ha_innodb.cc:18685:7: warning: format specifies type 'unsigned long' but the argument has type 'wsrep_trx_id_t' (aka 'unsigned long long') [-Wformat]
storage/innobase/row/row0mysql.cc:3319:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/innobase/row/row0mysql.cc:3327:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/maria/ma_norec.c:35:10: warning: implicit conversion from 'int' to 'my_bool' (aka 'char') changes value from 131 to -125 [-Wconstant-conversion]
storage/maria/ma_norec.c:42:10: warning: implicit conversion from 'int' to 'my_bool' (aka 'char') changes value from 131 to -125 [-Wconstant-conversion]
storage/maria/ma_test2.c:1009:12: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
storage/maria/ma_test2.c:1010:12: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
storage/mroonga/ha_mroonga.cpp:9189:44: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
storage/mroonga/vendor/groonga/lib/expr.c:4987:22: warning: comparison of constant -1 with expression of type 'grn_operator' is always false [-Wtautological-constant-out-of-range-compare]
storage/xtradb/btr/btr0scrub.cc:151:17: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/xtradb/buf/buf0buf.cc:5047:8: warning: nonnull parameter 'bpage' will evaluate to 'true' on first encounter [-Wpointer-bool-conversion]
storage/xtradb/fil/fil0crypt.cc:2454:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/xtradb/row/row0mysql.cc:3324:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
storage/xtradb/row/row0mysql.cc:3332:5: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
unittest/sql/mf_iocache-t.cc:120:35: warning: format specifies type 'unsigned long' but the argument has type 'int' [-Wformat]
unittest/sql/mf_iocache-t.cc:96:35: note: expanded from macro 'INFO_TAIL'
- Changed ERROR to WARNING for MyISAM/Aria message
that are warnings in the check utilities.
This affects for example "client is using or
hasn't closed the table properly".
- Print "Table is fixed" if check succeded in
fixing the table.
Changing data types for:
- seconds from longlong to ulonglong
- microseconds from long to ulong
in:
- parameters of calc_time_diff()
- parameters of calc_time_from_sec()
- Members of Sec6_add
This will help to reuse the code easier:
all other functions use ulonglong+long
for seconds/microsecond, e.g.:
- number_to_time()
- number_to_datetime()
- number_to_datetime_with_warn()
- Field_temporal_with_date::store_decimal()
- my_decimal2seconds()
- Item::get_seconds()
For the purpose of reporting an error to error log, shutdown thread was
attempting to access current_thd->variables.lc_messages->errmsgs->errmsgs.
Whereas current_thd was NULL.
We should log errors according to global lc_messages setting instead of
session setting.
Fixed by deleting the sequence if we where not able to initialize it
I also noticed that we didn't always set the error message when
check_killed(), which could lead to aborted queries without error
beeing properly set. Fixed by default setting error message if
check_error() noticed that killed had been called.
This allowed me to remove a lot of calls to thd->send_kill_message().
Modern compilers (such as GCC 8) emit warnings that the
'register' keyword is deprecated and not valid C++17.
Let us remove most use of the 'register' keyword.
Code in 'extra/' is not touched.
MyRocks internally will print non-critical messages to
sql_print_verbose_info() which will do what InnoDB does in similar cases:
check if (global_system_variables.log_warnings > 2).
For galera compatibility, the main thing is to ensure the FD 1, 2 are
not opened with O_CLOEXEC otherwise galera sst errors don't appear in
the error.log
Files without O_CLOEXEC from the test below:
0 -> /dev/pts/9
1 -> /tmp/error.log (intended)
2 -> /tmp/error.log (intended)
5 -> /tmp/datadir
6 -> /tmp/datadir/aria_log.00000001
(Innodb temp files)
8 -> /tmp/ibIIrhFL (deleted)
9 -> /tmp/ibfx1vai (deleted)
10 -> /tmp/ibAQUKFO (deleted)
11 -> /tmp/ibWBQSHR (deleted)
15 -> /tmp/ibEXEcfo (deleted)
20 -> /tmp/datadir/mysql/host.MYD
22 -> /tmp/datadir/mysql/user.MYD
... (rest of MYD files)
Test for this and the previous commit.
sql/mysqld --skip-networking --datadir=/tmp/datadir --log-bin=/tmp/datadir/mysqlbin --socket /tmp/s.sock --lc-messages-dir=${PWD}/sql/share --verbose --log-error=/tmp/error.log --general-log-file=/tmp/general.log --general-log=1 --slow-query-log-file=/tmp/slow.log --slow-query-log=1
180302 10:56:41 [Note] sql/mysqld (mysqld 5.5.60-MariaDB-wsrep) starting as process 26056 ...
$ cd /proc/26056
$ ls -la --sort=none fd
total 0
dr-x------. 2 dan dan 0 Mar 2 10:57 .
dr-xr-xr-x. 9 dan dan 0 Mar 2 10:56 ..
lrwx------. 1 dan dan 64 Mar 2 10:57 0 -> /dev/pts/9
l-wx------. 1 dan dan 64 Mar 2 10:57 1 -> /tmp/error.log
l-wx------. 1 dan dan 64 Mar 2 10:57 2 -> /tmp/error.log
lrwx------. 1 dan dan 64 Mar 2 10:57 3 -> /tmp/datadir/mysqlbin.index
lrwx------. 1 dan dan 64 Mar 2 10:57 4 -> /tmp/datadir/aria_log_control
lr-x------. 1 dan dan 64 Mar 2 10:57 5 -> /tmp/datadir
lrwx------. 1 dan dan 64 Mar 2 10:57 6 -> /tmp/datadir/aria_log.00000001
lrwx------. 1 dan dan 64 Mar 2 10:57 7 -> /tmp/datadir/ibdata1
lrwx------. 1 dan dan 64 Mar 2 10:57 8 -> /tmp/ibIIrhFL (deleted)
lrwx------. 1 dan dan 64 Mar 2 10:57 9 -> /tmp/ibfx1vai (deleted)
lrwx------. 1 dan dan 64 Mar 2 10:57 10 -> /tmp/ibAQUKFO (deleted)
lrwx------. 1 dan dan 64 Mar 2 10:57 11 -> /tmp/ibWBQSHR (deleted)
lrwx------. 1 dan dan 64 Mar 2 10:57 12 -> /tmp/datadir/ib_logfile0
lrwx------. 1 dan dan 64 Mar 2 10:57 13 -> /tmp/datadir/ib_logfile1
l-wx------. 1 dan dan 64 Mar 2 10:57 14 -> /tmp/slow.log
lrwx------. 1 dan dan 64 Mar 2 10:57 15 -> /tmp/ibEXEcfo (deleted)
l-wx------. 1 dan dan 64 Mar 2 10:57 16 -> /tmp/general.log
lrwx------. 1 dan dan 64 Mar 2 10:57 17 -> socket:[1897356]
lrwx------. 1 dan dan 64 Mar 2 10:57 18 -> socket:[45335]
l-wx------. 1 dan dan 64 Mar 2 10:57 19 -> /tmp/datadir/mysqlbin.000004
lrwx------. 1 dan dan 64 Mar 2 10:57 20 -> /tmp/datadir/mysql/host.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 21 -> /tmp/datadir/mysql/host.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 22 -> /tmp/datadir/mysql/user.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 23 -> /tmp/datadir/mysql/user.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 24 -> /tmp/datadir/mysql/db.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 25 -> /tmp/datadir/mysql/db.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 26 -> /tmp/datadir/mysql/proxies_priv.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 27 -> /tmp/datadir/mysql/proxies_priv.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 28 -> /tmp/datadir/mysql/tables_priv.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 29 -> /tmp/datadir/mysql/tables_priv.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 30 -> /tmp/datadir/mysql/columns_priv.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 31 -> /tmp/datadir/mysql/columns_priv.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 32 -> /tmp/datadir/mysql/procs_priv.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 33 -> /tmp/datadir/mysql/procs_priv.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 34 -> /tmp/datadir/mysql/servers.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 35 -> /tmp/datadir/mysql/servers.MYI
lrwx------. 1 dan dan 64 Mar 2 10:57 36 -> /tmp/datadir/mysql/event.MYD
lrwx------. 1 dan dan 64 Mar 2 10:57 37 -> /tmp/datadir/mysql/event.MYI
O_CLOEXEC files are those with flags 02000000
/usr/include/bits/fcntl-linux.h:# define __O_CLOEXEC 02000000
/usr/include/bits/fcntl-linux.h:# define O_CLOEXEC __O_CLOEXEC /* Set close_on_exec. */
$ find fdinfo/ -type f -ls -exec cat {} \; | more
1924720 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/0
pos: 0
flags: 0100002
mnt_id: 25
1924721 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/1
pos: 9954
flags: 0102001
mnt_id: 82
1924722 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/2
pos: 10951
flags: 0102001
mnt_id: 82
1924723 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/3
pos: 116
flags: 02100002
mnt_id: 82
1924724 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/4
pos: 52
flags: 0100002
mnt_id: 82
lock: 1: POSIX ADVISORY WRITE 26056 00:2c:1866365 0 EOF
1924725 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/5
pos: 0
flags: 0100000
mnt_id: 82
1924726 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/6
pos: 16384
flags: 0100002
mnt_id: 82
1924727 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/7
pos: 0
flags: 02100002
mnt_id: 82
lock: 1: POSIX ADVISORY WRITE 26056 00:2c:1866491 0 EOF
1924728 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/8
pos: 0
flags: 0100002
mnt_id: 82
1924729 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/9
pos: 0
flags: 0100002
mnt_id: 82
1924730 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/10
pos: 0
flags: 0100002
mnt_id: 82
1924731 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/11
pos: 0
flags: 0100002
mnt_id: 82
1924732 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/12
pos: 0
flags: 02100002
mnt_id: 82
lock: 1: POSIX ADVISORY WRITE 26056 00:2c:1866492 0 EOF
1924733 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/13
pos: 0
flags: 02100002
mnt_id: 82
lock: 1: POSIX ADVISORY WRITE 26056 00:2c:1866493 0 EOF
1924734 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/14
pos: 763
flags: 02102001
mnt_id: 82
1924735 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/15
pos: 0
flags: 0100002
mnt_id: 82
1924736 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/16
pos: 473
flags: 02102001
mnt_id: 82
1924737 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/17
pos: 0
flags: 02000002
mnt_id: 9
1924738 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/18
pos: 0
flags: 02
mnt_id: 9
1924739 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/19
pos: 245
flags: 02100001
mnt_id: 82
1924740 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/20
pos: 0
flags: 0100002
mnt_id: 82
1924741 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/21
pos: 503
flags: 0500002
mnt_id: 82
1924742 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/22
pos: 324
flags: 0100002
mnt_id: 82
1924743 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/23
pos: 642
flags: 0500002
mnt_id: 82
1924744 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/24
pos: 880
flags: 0100002
mnt_id: 82
1924745 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/25
pos: 581
flags: 0500002
mnt_id: 82
1924746 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/26
pos: 1386
flags: 0100002
mnt_id: 82
1924747 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/27
pos: 498
flags: 0500002
mnt_id: 82
1924748 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/28
pos: 0
flags: 0100002
mnt_id: 82
1924749 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/29
pos: 513
flags: 0500002
mnt_id: 82
1924750 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/30
pos: 0
flags: 0100002
mnt_id: 82
1924751 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/31
pos: 494
flags: 0500002
mnt_id: 82
1924752 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/32
pos: 0
flags: 0100002
mnt_id: 82
1924753 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/33
pos: 535
flags: 0500002
mnt_id: 82
1924754 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/34
pos: 0
flags: 0100002
mnt_id: 82
1924755 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/35
pos: 396
flags: 0500002
mnt_id: 82
1924756 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/36
pos: 0
flags: 0100002
mnt_id: 82
1924757 0 -r-------- 1 dan dan 0 Mar 2 10:57 fdinfo/37
pos: 517
flags: 0500002
mnt_id: 82
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.
This fix excludes rocksdb, spider,spider, sphinx and connect for now.
This will make it easier to how memory allocation is done when debugging
with either DBUG or gdb.
Will especially help when debugging stored procedures
Main change is a name argument as second argument to init_alloc_root()
init_sql_alloc()
Other things:
- Added DBUG_ENTER/EXIT to some Virtual_tmp_table functions
This preserves const str for constant strings
Other things
- A few variables where changed from LEX_STRING to LEX_CSTRING
- Incident_log_event::Incident_log_event and record_incident where
changed to take LEX_CSTRING* as an argument instead of LEX_STRING
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db
Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")
Main problem was that no log-event print function checked for disk
full error on the IO_CACHE.
All changes in this patch only affects mysqlbinlog, not the server!
- Changed all log-event print functions to return 1 on error
- Fixed memory usage when not using --flashback.
- Added printing of number of rows in row events. Can be disabled with
--print-row-count=0
- Print annotated rows when using mysqlbinlog --short-form
- Fixed that mysqlbinlog --debug works
- Fixed create_drop_binlog.test test failure
- Reorganized fields in PRINT_EVENT_INFO to be according to size to
optimize storage
- Don't change print_row_event_position or print_row_counts if set by user
- Remove some testing of argument to my_free is 0
- base64-output=never is now supported and works in all context
- Updated help information for --base64-output and --short-form
- print_row_count is now on by default. Reset automatically if --short-form
is used
- Removed obsolote warning for mysql 5.6.0
- More DBUG_PRINT for mysqltest.cc
- my_b_write_byte() now checks for flush failures. This fixed a memory
overrun on disk full
- my_b_printf() now returns 1 on failure, 0 on ok. This simplifies code
and no old code was using the old return value of my_b_printf().
- my_b_Write_backtick_quote() now returns 1 on failure and 0 on ok
- Fixed some error conditions in log printing that was not previously
handled.
- Slave_rows_error_report() can now handle longlong positions
- Write_on_release_cache() rewritten so that we can detect errors
on flush. Not depending on automatic release anymore.
- Changed types for Pos and End_log_pos to 64 bit in SHOW BINLOG EVENTS
- Fixed that copy_event_cache_to_string_and_reinit() works with strings
longer than 4G (Changed to use LEX_STRING instead of String)
- Restricted binlog_rows_event_max_size to UINT32_MAX-1 as String's are
anyway restricted to UINT32_MAX
- Fixed bug in rpl_binlog_state::write_to_iocache() which hide write
failures (duplicate variable name)
- Fixed bug in String::append if original string was not allocated
- Stop mysqlbinlog output at once if there is an error.
- Before printing error message, flush result file. This ensures that
the error message is printed last. (Easier to find)
Problem:- Gtid are not transferred in Galera Cluster.
Solution:- We need to transfer gtid in the case on either when cluster is
slave/master in async replication. In normal Gtid replication gtid are generated on
recieving node itself and it is always on sync with other nodes. Because galera keeps
node in sync , So all nodes get same no of event groups. So the issue arises when
say galera is slave in async replication.
A
| (Async replication)
D <-> E <-> F {Galera replication}
So what should happen is that all node should apply the master gtid but this does
node happen, becuase node E, F does not recieve gtid from D in write set , So what E(or F)
does is that it applies wsrep_gtid_domain_id, D server-id , E gtid next seq no. This
generated gtid does not always work when say A has different domain id.
So In this commit, on galera node when we see that this event is recieved from master
we simply write Gtid_Log_Event in write_set and send it to other nodes.
and specifically the ack receiving functionality.
Semisync is turned to be static instead of plugin so its functions
are invoked at the same points as RUN_HOOKS.
The RUN_HOOKS and the observer interface remain to be removed by later
patch.
Todo:
React on killed status by repl_semisync_master.wait_after_sync(). Currently
Repl_semi_sync_master::commit_trx does not check the killed status.
There were few bugfixes found that are present in mysql and its unclear
whether/how they are covered. Those include:
Bug#15985893: GTID SKIPPED EVENTS ON MASTER CAUSE SEMI SYNC TIME-OUTS
Bug#17932935 CALLING IS_SEMI_SYNC_SLAVE() IN EACH FUNCTION CALL
HAS BAD PERFORMANCE
Bug#20574628: SEMI-SYNC REPLICATION PERFORMANCE DEGRADES WITH A HIGH NUMBER OF THREADS
RUN_HOOK() is only called if semisync is enabled
As the server can't disable the hooks if something is in progress, I added
a new variable, run_hooks_enabled, that is set the first time semi sync is
used. This means that RUN_HOOK will have no overhead, unless semi sync
master or slave has been enabled once.
Some of the changes was just to get rid of warnings for embedded server
Part of MDEV-13073 AliSQL Optimize performance of semisync
The idea it to use a dedicated lock detecting if there is new data in
the master's binary log instead of the overused LOCK_log.
Changes:
- Use dedicated COND variables for the relay and binary log signaling.
This was needed as we where the old 'update_cond' variable was used
with different mutex's, which could cause deadlocks.
- Relay log uses now COND_relay_log_updated and LOCK_log
- Binary log uses now COND_bin_log_updated and LOCK_binlog_end_pos
- Renamed signal_cnt to relay_signal_cnt (as we now have two signals)
- Added some missing error handling in MYSQL_BIN_LOG::new_file_impl()
- Reformatted some comments with old style
- Renamed m_key_LOCK_binlog_end_pos to key_LOCK_binlog_end_pos
- Changed 'signal_update()' to update_binlog_end_pos() which works for
both relay and binary log
The crash (sometimes assert) in MYSQL_BIN_LOG::mark_xid_done was caused by a
fact that log.cc:binlog_background_thread_queue could become a cyclic list.
This possibility becomes real with two checkpoint capable engines that
may execute TC_LOG_BINLOG::commit_checkpoint_notify() in succession before
binlog_background thread gets control and eventually finds a freed memory
while otherwise endlessly looping in while(queue).
It is fixed with counting the notificaion kind instead of en-listing the same notificaion kind in commit_checkpoint_notify as formerly. The while(queue) of binlog background thread is refined to pay attention to the new counter. In effectno more access to free memory is possible.
- Remove not used thd_rpl_is_parallel()
- Remove not used mysql_notify_thread_having_shared_lock()
- Remove not needed LOCK_thread_count from MYSQL_BIN_LOG::reset_logs()
- LOCK_thread_count is not protecting against rollback, so this
code and comment is not needed
- Remove mutex_locks in slave.cc that are not needed.
Added THD::assert_not_linked() to ensure that it was safe to remove
- Fixed not repeatable test load_data_stmt_view
- Updated binlog_killed to test removal of mutex
(thanks to Andrei Elkin for test)
- More code comments
With combination of --log-bin and Galera the server may crash
reporting two characteristic stacks:
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0xc7)[0x7f182a8e2cb7]
/usr/sbin/mysqld(binlog_background_thread+0x2b5)[0x7f182a8e3275]
or
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7ff395b2dafd]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7ff395b2db91]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7ff395b300b2]
The reason of the failure appears to be non-matching decrements for
`xid_count_per_binlog::xid_count`
which can occur when a transaction is executed having its connection issued
`SET @@sql_log_bin=0`. In such case the xid count is not incremented but
its decrements still runs to turn `binlog_xid_count_list` into improper state
which the following FLUSH BINARY LOGS exposes through the crash.
*Note_1*: the regression test reuses an existing galera.sql_log_bin
which does not run stably (even in its base form) by mtr with --log-bin.
*Note_2*: 10.0-galera branch is free of this issue having missed MDEV-7205
fixes.
As reported in MDEV-11969 "there's no way to ditch knowledge" about some
domain that is no longer updated on a server. Besides being of annoyance to
clutter output in DBA console stale domains can prevent the slave
to connect the master as MDEV-12012 witnesses.
What domain is obsolete must be evaluated by the user (DBA) according
to whether the domain info is still relevant and will the domain ever
receive any update.
This patch introduces a method to discard obsolete gtid domains from
the server binlog state. The removal requires no event group from such
domain present in existing binlog files though. If there are any the
containing logs must be first PURGEd in order for
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)
succeed. Otherwise the command returns an error.
The list of obsolete domains can be computed through
intersecting two sets - the earliest (first) binlog's Gtid_list
and the current value of @@global.gtid_binlog_state - and extracting
the domain id components from the intersection list items.
The new DELETE_DOMAIN_ID featured FLUSH continues to rotate binlog
omitting the deleted domains from the active binlog file's Gtid_list.
Notice though when the command is ineffective - that none of requested to delete
domain exists in the binlog state - rotation does not occur.
Obsolete domain deletion is not harmful for connected slaves as long
as master side binlog files *purge* is synchronized with FLUSH-DELETE_DOMAIN_ID.
The slaves must have the last event from purged files processed as usual,
in order not to bump later into requesting a gtid from a file which
was already gone.
While the command is not replicated (as ordinary FLUSH BINLOG LOGS is)
slaves, even though having extra domains, won't suffer from reconnection errors
thanks to master-slave gtid connection protocol allowing the master
to be ignorant about a gtid domain.
Should at failover such slave to be promoted into master role it may run
the ex-master's
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)
to clean its own binlog state.
NOTES.
suite/perfschema/r/start_server_low_digest.result
is re-recorded as consequence of internal parser codes changes.
This was done to get more information about where time is spent.
Now we can get proper timing for time spent in commit, rollback,
binlog write etc.
Following stages was added:
- Commit
- Commit_implicit
- Rollback
- Rollback implicit
- Binlog write
- Init for update
- This is used instead of "Init" for insert, update and delete.
- Staring cleanup
Following stages where changed:
- "Unlocking tables" stage reset stage to previous stage at end
- "binlog write" stage resets stage to previous stage at end
- "end" -> "end of update loop"
- "cleaning up" -> "Reset for next command"
- Added stage_searching_rows_for_update when searching for rows
to be deleted.
Other things:
- Renamed all stages to start with big letter (before there was no
consitency)
- Increased performance_schema_max_stage_classes from 150 to 160.
- Most of the test changes in performance schema comes from renaming of
stages.
- Removed duplicate output of variables and inital state in a lot of
performance schema tests.
This was done to make it easier to change a default value for a
performance variable without affecting all tests.
- Added start_server_variables.test to check configuration
- Removed some duplicate "closing tables" stages
- Updated position for "stage_init_update" and "stage_updating" for
delete, insert and update to be just before update loop (for more
exact timing).
- Don't set "Checking permissions" twice in a row.
- Remove stage_end stage from creating views (not done for create table
either).
- Updated default performance history size from 10 to 20 because of new
stages
- Ensure that ps_enabled is correct (to be used in a later patch)
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)
- Also fix printf-format warnings
Make the above mentioned warnings fatal.
- fix pthread_join on Windows to set return value.
Binlog_background_thread does not make a call to set_time(), And when
we call binlog_checkpoint_log_event->write() , we write the wrong timestamp.
In this patch we correct this by calling thd->set_time().
For running the Galera tests, the variable my_disable_leak_check
was set to true in order to avoid assertions due to memory leaks
at shutdown.
Some adjustments due to MDEV-13625 (merge InnoDB tests from MySQL 5.6)
were performed. The most notable behaviour changes from 10.0 and 10.1
are the following:
* innodb.innodb-table-online: adjustments for the DROP COLUMN
behaviour change (MDEV-11114, MDEV-13613)
* innodb.innodb-index-online-fk: the removal of a (1,NULL) record
from the result; originally removed in MySQL 5.7 in the
Oracle Bug #16244691 fix
377774689b
* innodb.create-index-debug: disabled due to MDEV-13680
(the MySQL Bug #77497 fix was not merged from 5.6 to 5.7.10)
* innodb.innodb-alter-autoinc: MariaDB 10.2 behaves like MySQL 5.6/5.7,
while MariaDB 10.0 and 10.1 assign different values when
auto_increment_increment or auto_increment_offset are used.
Also MySQL 5.6/5.7 exhibit different behaviour between
LGORITHM=INPLACE and ALGORITHM=COPY, so something needs to be tested
and fixed in both MariaDB 10.0 and 10.2.
* innodb.innodb-wl5980-alter: disabled because it would trigger an
InnoDB assertion failure (MDEV-13668 may need additional effort in 10.2)
Assertions failed due to incorrect handling of the --tc-heuristic-recover
option when InnoDB is in read-only mode either due to innodb_read_only=1
or innodb_force_recovery>3. InnoDB failed to refuse a XA COMMIT or
XA ROLLBACK operation, and there were errors in the error handling in
the upper layer.
This was fixed by making InnoDB XA operations respect the
high_level_read_only flag. The InnoDB part of the fix and
parts of the test main.tc_heuristic_recover were provided
by Marko Mäkelä.
LOCK_log mutex lock/unlock had to be added to fix MDEV-13438.
The measure is confirmed by mysql sources as well.
For testing of the conflicting option combination, mysql-test-run is
made to export a new $MYSQLD_LAST_CMD. It holds the very last value
generated by mtr.mysqld_start(). Even though the options have been
also always stored in $mysqld->{'started_opts'} there were no access
to them beyond the automatic server restart by mtr through the expect
file interface.
Effectively therefore $MYSQLD_LAST_CMD represents a more general
interface to $mysqld->{'started_opts'} which can be used in wider
scopes including server launch with incompatible options.
Notice another existing method to restart the server with incompatible
options relying on $MYSQLD_CMD is is aware of $mysqld->{'started_opts'}
(the actual options that the server is launched by mtr). In order to use
this method they would have to be provided manually.
NOTE: When merging to 10.2, the file search_pattern_in_file++.inc
should be replaced with the pre-existing search_pattern_in_file.inc.
This fixes MDEV-7742 and MDEV-8305 (Allow user to specify if stored
procedures should be logged in the slow and general log)
New functionality:
- Added new variables log_slow_disable_statements and log_disable_statements
that can be used to disable logging of certain queries to slow and
general log. Currently supported options are 'admin', 'call', 'slave'
and 'sp'.
Defaults are as before. Only 'sp' (stored procedure statements) is
disabled for slow and general_log.
- Slow log to files now includes the following new information:
- When logging stored procedure statements the name of stored
procedure is logged.
- Number of created tmp_tables, tmp_disk_tables and the space used
by temporary tables.
- When logging 'call', the logged status now contains the sum of all
included statements. Before only 'time' was correct.
- Added filsort_priority_queue as an option for log_slow_filter (this
variable existed before, but was not exposed)
- Added support for BIT types in my_getopt()
Mapped some old variables to bitmaps (old variables can still be used)
- Variable 'log_queries_not_using_indexes' is mapped to
log_slow_filter='not_using_index'
- Variable 'log_slow_slave_statements' is mapped to
log_slow_disabled_statements='slave'
- Variable 'log_slow_admin_statements' is mapped to
log_slow_disabled_statements='admin'
- All the above variables are changed to session variables from global
variables
Other things:
- Simplified LOGGER::log_command. We don't need to check for super if
OPTION_LOG_OFF is set as this flag can only be set if one is a super
user.
- Removed some setting of enable_slow_log as it's guaranteed to be set by
mysql_parse()
- mysql_admin_table() now sets thd->enable_slow_log
- Added prepare_logs_for_admin_command() to reset thd->enable_slow_log if
needed.
- Added new functions to store, restore and add slow query status
- Added new functions to store and restore query start time
- Reorganized Sub_statement_state according to types
- Added code in dispatch_command() to ensure that
thd->reset_for_next_command() is always called for a query.
- Added thd->last_sql_command to simplify checking of what was the type
of the last command. Needed when logging to slow log as lex->sql_command
may have changed before slow logging is called.
- Moved QPLAN_TMP_... to where status for tmp tables are updated
- Added new THD variable, affected_rows, to be able to correctly log
number of affected rows to slow log.
- Simplified use_trans_cache() to return at once if is_transactional is set
- Indentation and spelling errors fixed
- Don't call signal_update() if update_binlog_end_pos() is called as the
function already calls signal_update()
- Removed not used function wait_for_update_bin_log(), which would cause
errors if ever used.
- Simplified handler::clone() by always allocating 'ref' in ha_open(). To do
this I added an optional MEM_ROOT argument to ha_open() to be used when
allocating 'ref'
- Changed arguments to get_system_var() from LEX_CSTRING to LEX_CSTRING*
- Added THD as argument to create_select_for_variable(). Changed also char*
argument to LEX_CSTRING to avoid strlen() call.
- Change calls to append() to use LEX_CSTRING
- Now we have same thread_id in general log, slow long and error log
instead of a long meaningless thread number that may even change for
one user.
- Align error and slow log header with output
- Extend thread_id with one number to handle nice printing up to ten
million connections
- Added sql/mariadb.h file that should be included first by files in sql
directory, if sql_plugin.h is not used (sql_plugin.h adds SHOW variables
that must be done before my_global.h is included)
- Removed a lot of include my_global.h from include files
- Removed include's of some files that my_global.h automatically includes
- Removed duplicated include's of my_sys.h
- Replaced include my_config.h with my_global.h
Problem:-
This crash happens because logged stmt is quite big and while writing
Annotate_rows_log_event it throws EFBIG error but we ignore this error
and do not call cache_data->set_incident().
Solution:-
When we normally write Binlog_log_event we check for error EFBIG, but we did
do this for Annotate_rows_log_event. We check for this error and call
cache_data->set_incident() accordingly.
# Conflicts:
# sql/log.cc
Do not silence uncertain cases, or fix any bugs.
The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
Do not silence uncertain cases, or fix any bugs.
The only functional change should be that ha_federated::extra()
is not calling DBUG_PRINT to report an unhandled case for
HA_EXTRA_PREPARE_FOR_DROP.
Binlog_background_thread does not make a call to set_time(), And when
we call binlog_checkpoint_log_event->write() , we write the wrong timestamp.
In this patch we correct this by calling thd->set_time().
Benefits of this patch:
- Removed a lot of calls to strlen(), especially for field_string
- Strings generated by parser are now const strings, less chance of
accidently changing a string
- Removed a lot of calls with LEX_STRING as parameter (changed to pointer)
- More uniform code
- Item::name_length was not kept up to date. Now fixed
- Several bugs found and fixed (Access to null pointers,
access of freed memory, wrong arguments to printf like functions)
- Removed a lot of casts from (const char*) to (char*)
Changes:
- This caused some ABI changes
- lex_string_set now uses LEX_CSTRING
- Some fucntions are now taking const char* instead of char*
- Create_field::change and after changed to LEX_CSTRING
- handler::connect_string, comment and engine_name() changed to LEX_CSTRING
- Checked printf() related calls to find bugs. Found and fixed several
errors in old code.
- A lot of changes from LEX_STRING to LEX_CSTRING, especially related to
parsing and events.
- Some changes from LEX_STRING and LEX_STRING & to LEX_CSTRING*
- Some changes for char* to const char*
- Added printf argument checking for my_snprintf()
- Introduced null_clex_str, star_clex_string, temp_lex_str to simplify
code
- Added item_empty_name and item_used_name to be able to distingush between
items that was given an empty name and items that was not given a name
This is used in sql_yacc.yy to know when to give an item a name.
- select table_name."*' is not anymore same as table_name.*
- removed not used function Item::rename()
- Added comparision of item->name_length before some calls to
my_strcasecmp() to speed up comparison
- Moved Item_sp_variable::make_field() from item.h to item.cc
- Some minimal code changes to avoid copying to const char *
- Fixed wrong error message in wsrep_mysql_parse()
- Fixed wrong code in find_field_in_natural_join() where real_item() was
set when it shouldn't
- ER_ERROR_ON_RENAME was used with extra arguments.
- Removed some (wrong) ER_OUTOFMEMORY, as alloc_root will already
give the error.
TODO:
- Check possible unsafe casts in plugin/auth_examples/qa_auth_interface.c
- Change code to not modify LEX_CSTRING for database name
(as part of lower_case_table_names)
This happens because the master writes a table_map event to the binary log, but no row event.
The slave has a check that there should always be a row event if there was a table_map event, which
causes a crash.
Fixed by remembering in the cache what kind of events are logged
and ignore cached statements which is just a table map event.
Define my_thread_id as an unsigned type, to avoid mismatch with
ulonglong. Change some parameters to this type.
Use size_t in a few more places.
Declare many flag constants as unsigned to avoid sign mismatch
when shifting bits or applying the unary ~ operator.
When applying the unary ~ operator to enum constants, explictly
cast the result to an unsigned type, because enum constants can
be treated as signed.
In InnoDB, change the source code line number parameters from
ulint to unsigned type. Also, make some InnoDB functions return
a narrower type (unsigned or uint32_t instead of ulint;
bool instead of ibool).
Protection added to reopen_file() and new_file_impl().
Without this we could get an assert in fn_format() as name == 0,
because the file was closed and name reset, atthe same time
new_file_impl() was called.
- Changed error handlers interface so that they can change error level in
the handler
- Give warnings and errors when calculating virtual columns
- On insert/update error is fatal in strict mode.
- SELECT and DELETE will only give a warning if a virtual field generates an error
- Added VCOL_UPDATE_FOR_DELETE and VCOL_UPDATE_INDEX_FOR_REPLACE to be able to
easily detect in update_virtual_fields() if we should use an error
handler to mask errors or not.
Add some event types for the compressed event, there are:
QUERY_COMPRESSED_EVENT,
WRITE_ROWS_COMPRESSED_EVENT_V1,
UPDATE_ROWS_COMPRESSED_EVENT_V1,
DELETE_POWS_COMPRESSED_EVENT_V1,
WRITE_ROWS_COMPRESSED_EVENT,
UPDATE_ROWS_COMPRESSED_EVENT,
DELETE_POWS_COMPRESSED_EVENT.
These events inheritance the uncompressed editor events. One of their constructor functions and write
function have been overridden for uncompressing and compressing. Anything but this is totally the same.
On slave, The IO thread will uncompress and convert them When it receiving the events from the master.
So the SQL and worker threads can be stay unchanged.
Now we use zlib as compress algorithm. It maybe support other algorithm in the future.
Merge feature into 10.2 from feature branch.
Delayed replication adds an option
CHANGE MASTER TO master_delay=<seconds>
Replication will then delay applying events with that many
seconds. This creates a replication slave that reflects the state of
the master some time in the past.
Feature is ported from MySQL source tree.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The original MySQL patch left some refactoring todo's, possibly
because of known conflicts with other parallel development (like
info-repository feature perhaps).
This patch fixes those todos/refactorings.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Initial merge of delayed replication from MySQL git.
The code from the initial push into MySQL is merged, and the
associated test case passes. A number of tasks are still pending:
1. Check full test suite run for any regressions or .result file updates.
2. Extend the feature to also work for parallel replication.
3. There are some todo-comments about future refactoring left from
MySQL, these should be located and merged on top.
4. There are some later related MySQL commits, these should be checked
and merged. These include:
e134b9362ba0b750d6ac1b444780019622d14aa5
b38f0f7857c073edfcc0a64675b7f7ede04be00f
fd2b210383358fe7697f201e19ac9779879ba72a
afc397376ec50e96b2918ee64e48baf4dda0d37d
5. The testcase from MySQL relies heavily on sleep and timing for
testing, and seems likely to sporadically fail on heavily loaded test
servers in buildbot or distro build farms.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
.. file '/var/log/mysql/mariadb-bin.000001' not found in binlog
index, needed for recovery. Aborting.
In Galera cluster, while preparing for rsync/xtrabackup based
SST, the donor node takes an FTWRL followed by (REFRESH_ENGINE_LOG
in rsync based state transfer and) REFRESH_BINARY_LOG. The latter
rotates the binary log and logs Binlog_checkpoint_log_event
corresponding to the penultimate binary log file into the new file.
The checkpoint event for the current file is later logged
synchronously by binlog_background_thread.
Now, since in rsync/xtrabackup based snapshot state transfer methods,
only the last binary log file is transferred to the joiner node; the
file could get transferred even before the checkpoint event for the
same file gets written to it. As a result, the joiner node would fail
to start complaining about the missing binlog file needed for recovery.
In order to fix this, a mechanism has been put in place to make
REFRESH_BINARY_LOG operation wait for Binlog_checkpoint_log_event
to be logged for the current binary log file if the node is part of
a Galera cluster. As further safety, during rsync based state transfer
the donor node now acquires and owns LOCK_log for the duration of file
transfer during SST.
In the AFTER_SYNC case, semi-sync was taking the binlog file name from
the wrong place, so around binlog rotation it could be using the new
name with a position belonging to the previous binlog file name.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
.. file '/var/log/mysql/mariadb-bin.000001' not found in binlog
index, needed for recovery. Aborting.
In Galera cluster, while preparing for rsync/xtrabackup based
SST, the donor node takes an FTWRL followed by (REFRESH_ENGINE_LOG
in rsync based state transfer and) REFRESH_BINARY_LOG. The latter
rotates the binary log and logs Binlog_checkpoint_log_event
corresponding to the penultimate binary log file into the new file.
The checkpoint event for the current file is later logged
synchronously by binlog_background_thread.
Now, since in rsync/xtrabackup based snapshot state transfer methods,
only the last binary log file is transferred to the joiner node; the
file could get transferred even before the checkpoint event for the
same file gets written to it. As a result, the joiner node would fail
to start complaining about the missing binlog file needed for recovery.
In order to fix this, a mechanism has been put in place to make
REFRESH_BINARY_LOG operation wait for Binlog_checkpoint_log_event
to be logged for the current binary log file if the node is part of
a Galera cluster. As further safety, during rsync based state transfer
the donor node now acquires and owns LOCK_log for the duration of file
transfer during SST.
- To ensure that mallocs are marked for the correct THD, even if it's
allocated in another thread, I added the thread_id to the THD constructor
- Added st_my_thread_var to thr_lock_info_init() to avoid a call to my_thread_var
- Moved things from THD::THD() to THD::init()
- Moved some things to THD::cleanup()
- Added THD::free_connection() and THD::reset_for_reuse()
- Added THD to CONNECT::create_thd()
- Added THD::thread_dbug_id and st_my_thread_var->dbug_id. These are needed
to ensure that we have a constant thread_id used for debugging with a THD,
even if it changes thread_id (=connection_id)
- Set variables.pseudo_thread_id in constructor. Removed not needed sets.
Revert following bug fix:
Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE
This fix results in a deadlock between slave IO thread
and SQL thread.
(cherry picked from commit e3fea6c6dbb36c6ab21c4ab777224560e9608b53)
- Change some static variables to dynamic to ensure that we don't do any memory
allocations before server starts or stops
- Print more memory information on SIGHUP. Fixed output.
- Write out if memory was lost if run with --debug-at-exit
- Fixed wrong #ifdef in sql_cache.cc
FAILURES
Analysis:
=========
Test script is not ensuring that "assert_grep.inc" should be
called only after 'Disk is full' error is written to the
error log.
Test checks for "Queueing master event to the relay log"
state. But this state is set before invoking 'queue_event'.
Actual 'Disk is full' error happens at a very lower level.
It can happen that we might even reset the debug point
before even the actual disk full simulation occurs and the
"Disk is full" message will never appear in the error log.
In order to guarentee that we must have some mechanism where
in after we write "Disk is full" error messge into the error
log we must signal the test to execute SSS and then reset
the debug point. So that test is deterministic.
Fix:
===
Added debug sync point to make script deterministic.
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE
Problem:
========
Currently SHOW SLAVE STATUS blocks if IO thread waits for
disk space. This makes automation tools verifying
server health block on taking relevant action. Finally this
will create SHOW SLAVE STATUS piles.
Analysis:
=========
SHOW SLAVE STATUS hangs on mi->data_lock if relay log write
is waiting for free disk space while holding mi->data_lock.
mi->data_lock is needed to protect the format description
event (mi->format_description_event) which is accessed by
the clients running FLUSH LOGS and slave IO thread. Note
relay log writes don't need to be protected by
mi->data_lock, LOCK_log is used to protect relay log between
IO and SQL thread (see MYSQL_BIN_LOG::append_event). The
code takes mi->data_lock to protect
mi->format_description_event during relay log rotate which
might get triggered right after relay log write.
Fix:
====
Release the data_lock just for the duration of writing into
relay log.
Made change to ensure the following lock order is maintained
to avoid deadlocks.
data_lock, LOCK_log
data_lock is held during relay log rotations to protect
the description event.
Limit binlog recovery so that the wsrep position found from
storage engines is not exceeded. This is required to have consistent
position between wsrep position stored in innodb header and
recoverd binlog.
Creating a CONNECT object on client connect and pass this to the working thread which creates the THD.
Split LOCK_thread_count to different mutexes
Added LOCK_thread_start to syncronize threads
Moved most usage of LOCK_thread_count to dedicated functions
Use next_thread_id() instead of thread_id++
Other things:
- Thread id now starts from 1 instead of 2
- Added cast for thread_id as thread id is now of type my_thread_id
- Made THD->host const (To ensure it's not changed)
- Removed some DBUG_PRINT() about entering/exiting mutex as these was already logged by mutex code
- Fixed that aborted_connects and connection_errors_internal are counted in all cases
- Don't take locks for current_linfo when we set it (not needed as it was 0 before)