Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Trigger `socket.ssl_reload` when FLUSH SSL is issued. To triger reloading
of certificate, key and CA, files needs to be physically changed.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
MDEV-20945: BACKUP UNLOCK + FTWRL assertion failure | SIGSEGV in I_P_List
from MDL_context::release_lock on INSERT w/ BACKUP LOCK (on optimized
builds) | Assertion `ticket->m_duration == MDL_EXPLICIT' failed
BACKUP LOCK behavior is modified so it won't be used wrong:
- BACKUP LOCK should commit any active transactions.
- BACKUP LOCK should not be allowed in stored procedures.
- When BACKUP LOCK is active, don't allow any DDL's for that connection.
- FTWRL is forbidden on the same connection while BACKUP LOCK is active.
Reviewed-by: monty@mariadb.com
Reverted original patch (c2e0a0b).
For consistency with "LOCK TABLE <table_name> READ" and "FLUSH TABLES
WITH READ LOCK", which are forbidden under "BACKUP STAGE", forbid "FLUSH
TABLE <table_name> FOR EXPORT" and "FLUSH TABLE <table_name> WITH READ
LOCK" as well.
It'd allow consistent fixes for problems like MDEV-18643.
Part of MDEV-5336 Implement LOCK FOR BACKUP
- Changed check of Global_only_lock to also include BACKUP lock.
- We store latest MDL_BACKUP_DDL lock in thd->mdl_backup_ticket to be able
to downgrade lock during copy_data_between_tables()
Part of MDEV-5336 Implement LOCK FOR BACKUP
- Added new locks to MDL_BACKUP for all stages of backup locks and
a new MDL lock needed for backup stages.
- Renamed MDL_BACKUP_STMT to MDL_BACKUP_DDL
- flush_tables() takes a new parameter that decides what should be flushed.
- InnoDB, Aria (transactional tables with checksums), Blackhole, Federated
and Federatedx tables are marked to be safe for online backup. We are
using MDL_BACKUP_TRANS_DML instead of MDL_BACKUP_DML locks for these
which allows any DML's to proceed for these tables during the whole
backup process until BACKUP STAGE COMMIT which will block the final
commit.
Part of MDEV-5336 Implement LOCK FOR BACKUP
Originally both table metadata lock and global read lock protection
were acquired before getting TABLE from table cache. This will be
reordered in a future commit with MDL_BACKUP_XXX locks so that we
first take table metadata lock, then get TABLE from table cache, then
acquire analogue of global read lock.
This patch both simplifies FLUSH TABLES code, makes FLUSH TABLES to
lock less and also enables FLUSH TABLES code to be used with backup
locks.
The usage of FLUSH TABLES changes slightly:
- FLUSH TABLES without any arguments will now only close not used tables
and tables locked by the FLUSH TABLES connection. All not used table
shares will be closed.
Tables locked by the FLUSH TABLES connection will be reopened and
re-locked after all others has stoped using the table (as before).
If there was no locked tables, then FLUSH TABLES is instant and will
not cause any waits.
FLUSH TABLES will not wait for any in use table.
- FLUSH TABLES with a table list, will ensure that the tables are closed
before statement returns. The code is now only using MDL locks and not
table share versions, which simplices the code greatly. One visible
change is that the server will wait for the end of the transaction that
are using the tables. Before FLUSH TABLES only waited for the statements
to end.
Signed-off-by: Monty <monty@mariadb.org>
Part of MDEV-5336 Implement LOCK FOR BACKUP
The idea is that instead of waiting in close_cached_tables() for all
tables to be closed, we instead call flush_tables() that does:
- Flush not used objects in table cache to free memory
- Collect all tables that are open
- Call HA_EXTRA_FLUSH on the objects, to get them into "closed state"
- Added HA_EXTRA_FLUSH support to archive and CSV
- Added multi-user protection to HA_EXTRA_FLUSH in MyISAM and Aria
The benefit compared to old code is:
- FTWRL doesn't have to wait for long running read operations or
open HANDLER's
When "FLUSH TABLE ... FOR EXPORT" fails, the SQL layer should rollback
the statement. Otherwise we hit an assert when we try to close the
tables while having a non-empty list of statement transaction participants.
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db
Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")
As reported in MDEV-11969 "there's no way to ditch knowledge" about some
domain that is no longer updated on a server. Besides being of annoyance to
clutter output in DBA console stale domains can prevent the slave
to connect the master as MDEV-12012 witnesses.
What domain is obsolete must be evaluated by the user (DBA) according
to whether the domain info is still relevant and will the domain ever
receive any update.
This patch introduces a method to discard obsolete gtid domains from
the server binlog state. The removal requires no event group from such
domain present in existing binlog files though. If there are any the
containing logs must be first PURGEd in order for
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)
succeed. Otherwise the command returns an error.
The list of obsolete domains can be computed through
intersecting two sets - the earliest (first) binlog's Gtid_list
and the current value of @@global.gtid_binlog_state - and extracting
the domain id components from the intersection list items.
The new DELETE_DOMAIN_ID featured FLUSH continues to rotate binlog
omitting the deleted domains from the active binlog file's Gtid_list.
Notice though when the command is ineffective - that none of requested to delete
domain exists in the binlog state - rotation does not occur.
Obsolete domain deletion is not harmful for connected slaves as long
as master side binlog files *purge* is synchronized with FLUSH-DELETE_DOMAIN_ID.
The slaves must have the last event from purged files processed as usual,
in order not to bump later into requesting a gtid from a file which
was already gone.
While the command is not replicated (as ordinary FLUSH BINLOG LOGS is)
slaves, even though having extra domains, won't suffer from reconnection errors
thanks to master-slave gtid connection protocol allowing the master
to be ignorant about a gtid domain.
Should at failover such slave to be promoted into master role it may run
the ex-master's
FLUSH BINARY LOGS DELETE_DOMAIN_ID=(list-of-domains)
to clean its own binlog state.
NOTES.
suite/perfschema/r/start_server_low_digest.result
is re-recorded as consequence of internal parser codes changes.
- Added sql/mariadb.h file that should be included first by files in sql
directory, if sql_plugin.h is not used (sql_plugin.h adds SHOW variables
that must be done before my_global.h is included)
- Removed a lot of include my_global.h from include files
- Removed include's of some files that my_global.h automatically includes
- Removed duplicated include's of my_sys.h
- Replaced include my_config.h with my_global.h
Problem:- In the case of multisource replication/named slave
when we run "FLUSH LOGS" , it does not flush logs.
Solution:- A new function Master_info_index->flush_all_relay_logs()
is created which will rotate relay logs for all named slave.
This will be called in reload_acl_and_cache function when
connection_name.length == 0
Benefits of this patch:
- Removed a lot of calls to strlen(), especially for field_string
- Strings generated by parser are now const strings, less chance of
accidently changing a string
- Removed a lot of calls with LEX_STRING as parameter (changed to pointer)
- More uniform code
- Item::name_length was not kept up to date. Now fixed
- Several bugs found and fixed (Access to null pointers,
access of freed memory, wrong arguments to printf like functions)
- Removed a lot of casts from (const char*) to (char*)
Changes:
- This caused some ABI changes
- lex_string_set now uses LEX_CSTRING
- Some fucntions are now taking const char* instead of char*
- Create_field::change and after changed to LEX_CSTRING
- handler::connect_string, comment and engine_name() changed to LEX_CSTRING
- Checked printf() related calls to find bugs. Found and fixed several
errors in old code.
- A lot of changes from LEX_STRING to LEX_CSTRING, especially related to
parsing and events.
- Some changes from LEX_STRING and LEX_STRING & to LEX_CSTRING*
- Some changes for char* to const char*
- Added printf argument checking for my_snprintf()
- Introduced null_clex_str, star_clex_string, temp_lex_str to simplify
code
- Added item_empty_name and item_used_name to be able to distingush between
items that was given an empty name and items that was not given a name
This is used in sql_yacc.yy to know when to give an item a name.
- select table_name."*' is not anymore same as table_name.*
- removed not used function Item::rename()
- Added comparision of item->name_length before some calls to
my_strcasecmp() to speed up comparison
- Moved Item_sp_variable::make_field() from item.h to item.cc
- Some minimal code changes to avoid copying to const char *
- Fixed wrong error message in wsrep_mysql_parse()
- Fixed wrong code in find_field_in_natural_join() where real_item() was
set when it shouldn't
- ER_ERROR_ON_RENAME was used with extra arguments.
- Removed some (wrong) ER_OUTOFMEMORY, as alloc_root will already
give the error.
TODO:
- Check possible unsafe casts in plugin/auth_examples/qa_auth_interface.c
- Change code to not modify LEX_CSTRING for database name
(as part of lower_case_table_names)
The reason for this is that stop slave takes LOCK_active_mi over the
whole operation while some slave operations will also need LOCK_active_mi
which causes deadlocks.
Fixed by introducing object counting for Master_info and not taking
LOCK_active_mi over stop slave or even stop_all_slaves()
Another benefit of this approach is that it allows:
- Multiple threads can run SHOW SLAVE STATUS at the same time
- START/STOP/RESET/SLAVE STATUS on a slave will not block other slaves
- Simpler interface for handling get_master_info()
- Added some missing unlock of 'log_lock' in error condtions
- Moved rpl_parallel_inactivate_pool(&global_rpl_thread_pool) to end
of stop_slave() to not have to use LOCK_active_mi inside
terminate_slave_threads()
- Changed argument for remove_master_info() to Master_info, as we always
have this available
- Fixed core dump when doing FLUSH TABLES WITH READ LOCK and parallel
replication. Problem was that waiting for pause_for_ftwrl was not done
when deleting rpt->current_owner after a force_abort.
.. file '/var/log/mysql/mariadb-bin.000001' not found in binlog
index, needed for recovery. Aborting.
In Galera cluster, while preparing for rsync/xtrabackup based
SST, the donor node takes an FTWRL followed by (REFRESH_ENGINE_LOG
in rsync based state transfer and) REFRESH_BINARY_LOG. The latter
rotates the binary log and logs Binlog_checkpoint_log_event
corresponding to the penultimate binary log file into the new file.
The checkpoint event for the current file is later logged
synchronously by binlog_background_thread.
Now, since in rsync/xtrabackup based snapshot state transfer methods,
only the last binary log file is transferred to the joiner node; the
file could get transferred even before the checkpoint event for the
same file gets written to it. As a result, the joiner node would fail
to start complaining about the missing binlog file needed for recovery.
In order to fix this, a mechanism has been put in place to make
REFRESH_BINARY_LOG operation wait for Binlog_checkpoint_log_event
to be logged for the current binary log file if the node is part of
a Galera cluster. As further safety, during rsync based state transfer
the donor node now acquires and owns LOCK_log for the duration of file
transfer during SST.
- To ensure that mallocs are marked for the correct THD, even if it's
allocated in another thread, I added the thread_id to the THD constructor
- Added st_my_thread_var to thr_lock_info_init() to avoid a call to my_thread_var
- Moved things from THD::THD() to THD::init()
- Moved some things to THD::cleanup()
- Added THD::free_connection() and THD::reset_for_reuse()
- Added THD to CONNECT::create_thd()
- Added THD::thread_dbug_id and st_my_thread_var->dbug_id. These are needed
to ensure that we have a constant thread_id used for debugging with a THD,
even if it changes thread_id (=connection_id)
- Set variables.pseudo_thread_id in constructor. Removed not needed sets.
Revert following bug fix:
Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE
This fix results in a deadlock between slave IO thread
and SQL thread.
(cherry picked from commit e3fea6c6dbb36c6ab21c4ab777224560e9608b53)
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE
Problem:
========
Currently SHOW SLAVE STATUS blocks if IO thread waits for
disk space. This makes automation tools verifying
server health block on taking relevant action. Finally this
will create SHOW SLAVE STATUS piles.
Analysis:
=========
SHOW SLAVE STATUS hangs on mi->data_lock if relay log write
is waiting for free disk space while holding mi->data_lock.
mi->data_lock is needed to protect the format description
event (mi->format_description_event) which is accessed by
the clients running FLUSH LOGS and slave IO thread. Note
relay log writes don't need to be protected by
mi->data_lock, LOCK_log is used to protect relay log between
IO and SQL thread (see MYSQL_BIN_LOG::append_event). The
code takes mi->data_lock to protect
mi->format_description_event during relay log rotate which
might get triggered right after relay log write.
Fix:
====
Release the data_lock just for the duration of writing into
relay log.
Made change to ensure the following lock order is maintained
to avoid deadlocks.
data_lock, LOCK_log
data_lock is held during relay log rotations to protect
the description event.