mariadb/sql/log_cache.cc

142 lines
3.9 KiB
C++
Raw Permalink Normal View History

MDEV-32014 Rename binlog cache temporary file to binlog file for large transaction Description =========== When a transaction commits, it copies the binlog events from binlog cache to binlog file. Very large transactions (eg. gigabytes) can stall other transactions for a long time because the data is copied while holding LOCK_log, which blocks other commits from binlogging. The solution in this patch is to rename the binlog cache file to a binlog file instead of copy, if the commiting transaction has large binlog cache. Rename is a very fast operation, it doesn't block other transactions a long time. Design ====== * binlog_large_commit_threshold type: ulonglong scope: global dynamic: yes default: 128MB Only the binlog cache temporary files large than 128MB are renamed to binlog file. * #binlog_cache_files directory To support rename, all binlog cache temporary files are managed as normal files now. `#binlog_cache_files` directory is in the same directory with binlog files. It is created at server startup if it doesn't exist. Otherwise, all files in the directory is deleted at startup. The temporary files are named with ML_ prefix and the memorary address of the binlog_cache_data object which guarantees it is unique. * Reserve space To supprot rename feature, It must reserve enough space at the begin of the binlog cache file. The space is required for Format description, Gtid list, checkpoint and Gtid events when renaming it to a binlog file. Since binlog_cache_data's cache_log is directly accessed by binlog log, online alter and wsrep. It is not easy to update all the code. Thus binlog cache will not reserve space if it is not session binlog cache or wsrep session is enabled. - m_file_reserved_bytes Stores the bytes reserved at the begin of the cache file. It is initialized in write_prepare() and cleared by reset(). The reserved file header is hide to callers. Thus there is no change for callers. E.g. - get_byte_position() still get the length of binlog data written to the cache, but not the file length. - truncate(0) will truncate the file to m_file_reserved_bytes but not 0. - write_prepare() write_prepare() is called everytime when anything is being written into the cache. It will call init_file_reserved_bytes() to create the cache file (if it doesn't exist) and reserve suitable space if the data written exceeds buffer's size. * Binlog_commit_by_rotate It is used to encapsulate the code for remaing a binlog cache tempoary file to binlog file. - should_commit_by_rotate() it is called by write_transaction_to_binlog_events() to check if a binlog cache should be rename to a binlog file. - commit() That is the entry to rename a binlog cache and commit the transaction. Both rename and commit are protected by LOCK_log, Thus not other transactions can write anything into the renamed binlog before it. Rename happens in a rotation. After the new binlog file is generated, replace_binlog_file() is called to: - copy data from the new binlog file to its binlog cache file. - write gtid event. - rename the binlog cache file to binlog file. After that the rotation will continue to succeed. Then the transaction is committed in a seperated group itself. Its cache file will be detached and cache log will be reset before calling trx_group_commit_with_engines(). Thus only Xid event be written.
2024-09-05 00:16:35 +08:00
#include "my_global.h"
#include "log_cache.h"
#include "handler.h"
#include "my_sys.h"
#include "mysql/psi/mysql_file.h"
#include "mysql/service_wsrep.h"
const char *BINLOG_CACHE_DIR= "#binlog_cache_files";
char binlog_cache_dir[FN_REFLEN];
extern uint32 binlog_cache_reserved_size();
bool binlog_cache_data::init_file_reserved_bytes()
{
// Session's cache file is not created, so created here.
if (cache_log.file == -1)
{
char name[FN_REFLEN];
/* Cache file is named with PREFIX + binlog_cache_data object's address */
snprintf(name, FN_REFLEN, "%s/%s_%llu", cache_log.dir, cache_log.prefix,
(ulonglong) this);
if ((cache_log.file=
mysql_file_open(0, name, O_CREAT | O_RDWR, MYF(MY_WME))) < 0)
{
sql_print_error("Failed to open binlog cache temporary file %s", name);
cache_log.error= -1;
return true;
}
}
#ifdef WITH_WSREP
/*
WSREP code accesses cache_log directly, so don't reserve space if WSREP is
on.
*/
if (unlikely(wsrep_on(current_thd)))
return false;
#endif
m_file_reserved_bytes= binlog_cache_reserved_size();
cache_log.pos_in_file= m_file_reserved_bytes;
cache_log.seek_not_done= 1;
return false;
}
void binlog_cache_data::detach_temp_file()
{
mysql_file_close(cache_log.file, MYF(0));
cache_log.file= -1;
reset();
}
extern void ignore_db_dirs_append(const char *dirname_arg);
bool init_binlog_cache_dir()
{
size_t length;
uint max_tmp_file_name_len=
2 /* prefix */ + 10 /* max len of thread_id */ + 1 /* underline */;
/*
Even if the binary log is disabled (and thereby we wouldn't use the binlog
cache), we need to try to build the directory name, so if it exists while
the binlog is off (e.g. due to a previous run of mariadbd, or an SST), we
can delete it.
*/
dirname_part(binlog_cache_dir,
opt_bin_log ? log_bin_basename : opt_log_basename, &length);
MDEV-32014 Rename binlog cache temporary file to binlog file for large transaction Description =========== When a transaction commits, it copies the binlog events from binlog cache to binlog file. Very large transactions (eg. gigabytes) can stall other transactions for a long time because the data is copied while holding LOCK_log, which blocks other commits from binlogging. The solution in this patch is to rename the binlog cache file to a binlog file instead of copy, if the commiting transaction has large binlog cache. Rename is a very fast operation, it doesn't block other transactions a long time. Design ====== * binlog_large_commit_threshold type: ulonglong scope: global dynamic: yes default: 128MB Only the binlog cache temporary files large than 128MB are renamed to binlog file. * #binlog_cache_files directory To support rename, all binlog cache temporary files are managed as normal files now. `#binlog_cache_files` directory is in the same directory with binlog files. It is created at server startup if it doesn't exist. Otherwise, all files in the directory is deleted at startup. The temporary files are named with ML_ prefix and the memorary address of the binlog_cache_data object which guarantees it is unique. * Reserve space To supprot rename feature, It must reserve enough space at the begin of the binlog cache file. The space is required for Format description, Gtid list, checkpoint and Gtid events when renaming it to a binlog file. Since binlog_cache_data's cache_log is directly accessed by binlog log, online alter and wsrep. It is not easy to update all the code. Thus binlog cache will not reserve space if it is not session binlog cache or wsrep session is enabled. - m_file_reserved_bytes Stores the bytes reserved at the begin of the cache file. It is initialized in write_prepare() and cleared by reset(). The reserved file header is hide to callers. Thus there is no change for callers. E.g. - get_byte_position() still get the length of binlog data written to the cache, but not the file length. - truncate(0) will truncate the file to m_file_reserved_bytes but not 0. - write_prepare() write_prepare() is called everytime when anything is being written into the cache. It will call init_file_reserved_bytes() to create the cache file (if it doesn't exist) and reserve suitable space if the data written exceeds buffer's size. * Binlog_commit_by_rotate It is used to encapsulate the code for remaing a binlog cache tempoary file to binlog file. - should_commit_by_rotate() it is called by write_transaction_to_binlog_events() to check if a binlog cache should be rename to a binlog file. - commit() That is the entry to rename a binlog cache and commit the transaction. Both rename and commit are protected by LOCK_log, Thus not other transactions can write anything into the renamed binlog before it. Rename happens in a rotation. After the new binlog file is generated, replace_binlog_file() is called to: - copy data from the new binlog file to its binlog cache file. - write gtid event. - rename the binlog cache file to binlog file. After that the rotation will continue to succeed. Then the transaction is committed in a seperated group itself. Its cache file will be detached and cache log will be reset before calling trx_group_commit_with_engines(). Thus only Xid event be written.
2024-09-05 00:16:35 +08:00
/*
Must ensure the full name of the tmp file is shorter than FN_REFLEN, to
avoid overflowing the name buffer in write and commit.
*/
if (length + strlen(BINLOG_CACHE_DIR) + max_tmp_file_name_len >= FN_REFLEN)
{
sql_print_error("Could not create binlog cache dir %s%s. It is too long.",
binlog_cache_dir, BINLOG_CACHE_DIR);
return true;
}
memcpy(binlog_cache_dir + length, BINLOG_CACHE_DIR,
strlen(BINLOG_CACHE_DIR));
binlog_cache_dir[length + strlen(BINLOG_CACHE_DIR)]= 0;
MY_DIR *dir_info= my_dir(binlog_cache_dir, MYF(0));
/*
If the binlog cache dir exists, yet binlogging is disabled, delete the
directory and skip the initialization logic.
*/
if (!opt_bin_log)
{
if (dir_info)
{
sql_print_information(
"Found binlog cache dir '%s', yet binary logging is "
"disabled. Deleting directory.",
binlog_cache_dir);
my_dirend(dir_info);
my_rmtree(binlog_cache_dir, MYF(0));
}
memset(binlog_cache_dir, 0, sizeof(binlog_cache_dir));
return false;
}
ignore_db_dirs_append(BINLOG_CACHE_DIR);
MDEV-32014 Rename binlog cache temporary file to binlog file for large transaction Description =========== When a transaction commits, it copies the binlog events from binlog cache to binlog file. Very large transactions (eg. gigabytes) can stall other transactions for a long time because the data is copied while holding LOCK_log, which blocks other commits from binlogging. The solution in this patch is to rename the binlog cache file to a binlog file instead of copy, if the commiting transaction has large binlog cache. Rename is a very fast operation, it doesn't block other transactions a long time. Design ====== * binlog_large_commit_threshold type: ulonglong scope: global dynamic: yes default: 128MB Only the binlog cache temporary files large than 128MB are renamed to binlog file. * #binlog_cache_files directory To support rename, all binlog cache temporary files are managed as normal files now. `#binlog_cache_files` directory is in the same directory with binlog files. It is created at server startup if it doesn't exist. Otherwise, all files in the directory is deleted at startup. The temporary files are named with ML_ prefix and the memorary address of the binlog_cache_data object which guarantees it is unique. * Reserve space To supprot rename feature, It must reserve enough space at the begin of the binlog cache file. The space is required for Format description, Gtid list, checkpoint and Gtid events when renaming it to a binlog file. Since binlog_cache_data's cache_log is directly accessed by binlog log, online alter and wsrep. It is not easy to update all the code. Thus binlog cache will not reserve space if it is not session binlog cache or wsrep session is enabled. - m_file_reserved_bytes Stores the bytes reserved at the begin of the cache file. It is initialized in write_prepare() and cleared by reset(). The reserved file header is hide to callers. Thus there is no change for callers. E.g. - get_byte_position() still get the length of binlog data written to the cache, but not the file length. - truncate(0) will truncate the file to m_file_reserved_bytes but not 0. - write_prepare() write_prepare() is called everytime when anything is being written into the cache. It will call init_file_reserved_bytes() to create the cache file (if it doesn't exist) and reserve suitable space if the data written exceeds buffer's size. * Binlog_commit_by_rotate It is used to encapsulate the code for remaing a binlog cache tempoary file to binlog file. - should_commit_by_rotate() it is called by write_transaction_to_binlog_events() to check if a binlog cache should be rename to a binlog file. - commit() That is the entry to rename a binlog cache and commit the transaction. Both rename and commit are protected by LOCK_log, Thus not other transactions can write anything into the renamed binlog before it. Rename happens in a rotation. After the new binlog file is generated, replace_binlog_file() is called to: - copy data from the new binlog file to its binlog cache file. - write gtid event. - rename the binlog cache file to binlog file. After that the rotation will continue to succeed. Then the transaction is committed in a seperated group itself. Its cache file will be detached and cache log will be reset before calling trx_group_commit_with_engines(). Thus only Xid event be written.
2024-09-05 00:16:35 +08:00
if (!dir_info)
{
/* Make a dir for binlog cache temp files if not exist. */
if (my_mkdir(binlog_cache_dir, 0777, MYF(0)) < 0)
{
sql_print_error("Could not create binlog cache dir %s.",
binlog_cache_dir);
return true;
}
return false;
}
/* Try to delete all cache files in the directory. */
for (uint i= 0; i < dir_info->number_of_files; i++)
{
FILEINFO *file= dir_info->dir_entry + i;
if (strncmp(file->name, LOG_PREFIX, strlen(LOG_PREFIX)))
{
sql_print_warning("%s is in %s/, but it is not a binlog cache file",
file->name, BINLOG_CACHE_DIR);
continue;
}
char file_path[FN_REFLEN];
fn_format(file_path, file->name, binlog_cache_dir, "",
MYF(MY_REPLACE_DIR));
my_delete(file_path, MYF(0));
}
my_dirend(dir_info);
return false;
}