mariadb

mirror of https://github.com/MariaDB/server.git synced 2026-05-02 13:15:32 +02:00

Author	SHA1	Message	Date
Marko Mäkelä	898521e2dd	Merge 10.4 into 10.5	2020-10-30 11:15:30 +02:00
Marko Mäkelä	199863d72b	MDEV-23991 fixup: Initialize the memory Also, revert the work-around for the test that was attempted in commit `85613a3247`. This issue was caught by MemorySanitizer as well as on the Microsoft Windows debug builds, thanks to /MD being used starting with 10.4. The code fix will also be applied to 10.2 because the regression was introduced in commit `afc9d00c66`.	2020-10-30 11:04:16 +02:00
Jan Lindström	9936235985	MDEV-23659: Update Galera disabled.def file Disable galera_var_replicate_myisam until fixed on 10.4	2020-10-30 08:54:05 +02:00
Jan Lindström	5485671474	Remove test that does not apply for 10.4.	2020-10-30 08:52:10 +02:00
Daniel Black	571bcf9aaa	deb: logrotate - fix my_print_defaults arg Corrects: `7803601dcb`	2020-10-30 15:09:25 +11:00
Monty	eb38e7ef60	MDEV-22879 SIGSEGV (or hang) in free/my_free This bug was already fixed in a previous commit. Added test case from the MDEV to prove it's fixed.	2020-10-29 19:20:10 +02:00
Marko Mäkelä	85613a3247	After-merge fix: main,innodb_ext_key,off For some reason, in the test main,innodb_ext_key,off we frequently get unexpected EXPLAIN output, in particular on Microsoft Windows debug builders. Let us comment out that EXPLAIN statement for now.	2020-10-29 16:27:04 +02:00
Marko Mäkelä	6d3356c12e	MDEV-24053 MSAN use-of-uninitialized-value in tpool::simulated_aio::simulated_aio_callback() Starting with commit `ef3f71fa74` MemorySanitizer would complain that we are writing uninitialized data via the doublewrite buffer. buf_dblwr_t::add_to_batch(): Zero out any unused part of the doublewrite buffer, for PAGE_COMPRESSED and ROW_FORMAT=COMPRESSED tables. Reviewed by: Eugene Kosov	2020-10-29 15:55:07 +02:00
Vicențiu Ciorbaru	8cfdddac71	MYSQL_JSON: Update test case to omit .so or .dll extension	2020-10-29 15:01:33 +02:00
Vicențiu Ciorbaru	8b2800d076	Fix decimals to 0 for MySQL JSON This prevents the clash between NOT_FIXED_DEC differing between server and plugins if MYSQL_SERVER is not defined during plugin compilation.	2020-10-29 15:01:33 +02:00
Vicențiu Ciorbaru	f3c5a92490	Add type_mysql_json.so to debian packages	2020-10-29 15:01:33 +02:00
Vicențiu Ciorbaru	a041b94032	Move vers_type_timestamp within the CC file It's a virtual method and it can't be inlined anyway. This allows type plugins (mysql_json in particular) to use Type_handler_blob and / or subclass it, without needing to explicitly expose the vers_type_timestamp object.	2020-10-29 15:01:33 +02:00
Vicențiu Ciorbaru	76fabe816f	Expose utf8mb4_bin charset for plugins Cleanup other linker errors	2020-10-29 15:01:33 +02:00
Vicențiu Ciorbaru	17ec6d6ce1	Skip MYSQL_JSON related tests if the plugin is not compiled	2020-10-29 15:01:33 +02:00
Marko Mäkelä	7b2bb67113	Merge 10.3 into 10.4	2020-10-29 13:38:38 +02:00
Aleksey Midenkov	27b762e23d	MDEV-22805 SIGSEGV in check_fields on UPDATE Additional case for PS protocol: UPDATE is converted to multi-update in mysql_multi_update_prepare().	2020-10-29 13:47:50 +03:00
Sergei Golubchik	9a4398b048	update columnstore	2020-10-29 10:06:32 +01:00
Sergei Golubchik	05bd281697	SPIDER storage engine plugin -> Stable	2020-10-29 10:03:15 +01:00
Marko Mäkelä	7f04686a2a	MDEV-24049 InnoDB: Failing assertion: node->is_open() in fil_space_t::flush_low As part of MDEV-23855, we eliminated fil_system.LRU and changed the way how InnoDB data files are opened. We are also enforcing the innodb_open_files limit when new data files are created. The function fil_space_t::flush() would be invoked by row_quiesce_table_start(). If the table was already in clean state, it is possible that the data file is not open. fil_space_t::flush_low(): If the data file is not open, check with a debug assertion that there are no unflushed changes, and carry on. Reviewed by: Eugene Kosov and Thirunarayanan Balathandayuthapani	2020-10-29 09:15:35 +02:00
Marko Mäkelä	e33d452b4d	Fix bogus -Wmaybe-uninitialized in GCC 10.2.0 -Og If and only if read_variable_length() returns true, the variable blob_length will be uninitialized and not used. For some reason, GCC 10.2.0 -Og debug builds would issue a warning.	2020-10-29 08:16:44 +02:00
Marko Mäkelä	1e778a3b56	MDEV-21201 fixup: GCC 10.2.0 -Wparentheses An assertion inadvertently contained an assignment and an implicit comparison to zero. The intention was to test equality.	2020-10-29 08:02:33 +02:00
Marko Mäkelä	dee6902922	After-merge fix: sys_vars.sysvars_innodb,32bit	2020-10-28 18:48:14 +02:00
Vladislav Vaintroub	e451145aa9	MDEV-24040 Named pipe permission issue Tighten access control - deny FILE_CREATE_PIPE_INSTANCE permission to everyone except current user (the one that runs mysqld)	2020-10-28 14:24:10 +01:00
Vicențiu Ciorbaru	f6549e9544	MDEV-18323 Convert MySQL JSON type to MariaDB TEXT in mysql_upgrade This patch solves two key problems. 1. There is a type number clash between MySQL and MariaDB. The number 245, used for MariaDB Virtual Fields is the same as MySQL's JSON. This leads to corrupt FRM errors if unhandled. The code properly checks frm table version number and if it matches 5.7+ (until 10.0+) it will assume it is dealing with a MySQL table with the JSON datatype. 2. MySQL JSON datatype uses a proprietary format to pack JSON data. The patch introduces a datatype plugin which parses the format and convers it to its string representation. The intended conversion path is to only use the JSON datatype within ALTER TABLE <table> FORCE, to force a table recreate. This happens during mysql_upgrade or via a direct ALTER TABLE <table> FORCE.	2020-10-28 11:38:14 +02:00
Vicențiu Ciorbaru	85c686e2d1	cleanup: Static_binary_string need not take non-const double parameter Convert the parameter to const as the function won't modify the pointer value.	2020-10-28 11:38:14 +02:00
Marko Mäkelä	2b6f804490	Merge 10.2 into 10.3	2020-10-28 10:44:40 +02:00
Marko Mäkelä	a8de8f261d	Merge 10.2 into 10.3	2020-10-28 10:01:50 +02:00
Teemu Ollakka	ec0e9d6f76	MDEV-22681 EXECUTE IMMEDIATE crashes server if wsrep is on. A wsrep transaction was started for EXECUTE IMMEDIATE, which caused assertion failure when the executed statement was CREATE TABLE which should be executed in TOI mode. As a fix, don't start wsrep transaction for EXECUTE IMMEDIATE to let the wsrep state logic to be handled from inside stored procedure codepath. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2020-10-28 09:51:35 +02:00
Vladislav Vaintroub	9478368d81	MDEV-24037 Use NtFlushBuffersFileEx(FLUSH_FLAGS_FILE_DATA_SYNC_ONLY) on Windows This avoids flushing file metadata on NTFS , and writing to <drive>:\$Log file. With heavy write workload this can consume up to 1/3 of the server's IO bandwidth. Reviewed by : Marko	2020-10-28 08:30:31 +01:00
Marko Mäkelä	cc5f4428b8	MDEV-23693 fixup: Remove unused btr_search_t::withdraw_clock	2020-10-28 08:13:06 +02:00
Marko Mäkelä	527ade2590	MDEV-23163 Merge new release of InnoDB 5.7.32 to 10.2 All relevant InnoDB changes from MySQL 5.7.32 have been applied in preceding commits.	2020-10-28 07:27:18 +02:00
Varun Gupta	db56f9b852	MDEV-24015: SQL Error (1038): Out of sort memory when enough memory for the sort buffer is provided For a correlated subquery filesort is executed multiple times. During each execution, sortlength() computed total sort key length in Sort_keys::sort_length, without resetting it first. Eventually Sort_keys::sort_length got larger than @@sort_buffer_size, which caused filesort() to be aborted with error. Fixed by making sortlength() to compute lengths only during the first invocation. Subsequent invocations return pre-computed values.	2020-10-28 10:53:22 +05:30
Daniele Sciascia	46c273892e	MDEV-23623 - Fix assertion in MTR test galera_sr.GCF-1051 Fix assertion `thd->in_active_multi_stmt_transaction() \|\| thd->m_transaction_psi == __null' failed on MTR test galera_sr.GCF-1051. Add a new MTR test MDEV-23623 that reproduces the issue deterministically and update wsrep-lib submodule, containing the actual fix. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2020-10-28 06:49:56 +02:00
Eugene Kosov	afc9d00c66	MDEV-23991 dict_table_stats_lock() has unnecessarily long scope Patch removes dict_index_t::stats_latch. Table/index statistics now protected with dict_sys->mutex. That way statistics computation can happen in parallel in several threads and dict_sys->mutex will be locked only for a short period of time. This patch is a joint work with Marko Mäkelä dict_index_t:🔒 make mutable which allows to pass const pointer when only lock is touched in an object btr_height_get() btr_get_size(): make index argument const for better type safety btr_estimate_number_of_different_key_vals(): now returns computed values instead of setting fields in dict_index_t directly remove everything related to dict_index_t::stats_latch dict_stats_index_set_n_diff(): now returns computed values instead of setting fields in dict_index_t directly dict_stats_analyze_index(): now returns computed values instead of setting fields in dict_index_t directly Reviewed by: Marko Mäkelä	2020-10-27 19:09:20 +03:00
Sergei Golubchik	2cec0523eb	INET6 type plugin -> Beta	2020-10-27 16:45:35 +01:00
Anel Husakovic	e183aec1d7	MDEV-24018: SIGSEGV in Item_func_nextval::update_table on SELECT SETVAL Reviewed-by: wlad@mariadb.com	2020-10-27 15:17:54 +01:00
Marko Mäkelä	42e1815ad8	MDEV-16952 Introduce SET GLOBAL innodb_max_purge_lag_wait Let us introduce a dummy variable innodb_max_purge_lag_wait for waiting that the InnoDB history list length is below the user-specified limit. Specifically, SET GLOBAL innodb_max_purge_lag_wait=0; should wait for all history to be purged. This could be useful when upgrading from an older version to MariaDB 10.3 or later, to avoid hitting MDEV-15912. Note: the history cannot be purged if there exist transactions that may see old versions. Reviewed by: Vladislav Vaintroub	2020-10-27 15:47:18 +02:00
Alexey Botchkov	8761571a71	MDEV-22524 SIGABRT in safe_mutex_unlock with session_track_system_variables and max_relay_log_size. lock LOCK_global_system_variables around the get_one_variable() call in the Session_sysvars_tracker::store_variable().	2020-10-27 16:44:11 +04:00
Thirunarayanan Balathandayuthapani	bc540b8706	MDEV-23693 Failing assertion: my_atomic_load32_explicit(&lock->lock_word, MY_MEMORY_ORDER_RELAXED) == X_LOCK_DECR InnoDB frees the block lock during buffer pool shrinking when other thread is yet to release the block lock. While shrinking the buffer pool, InnoDB allows the page to be freed unless it is buffer fixed. In some cases, InnoDB releases the latch after unfixing the block. Fix: ==== - InnoDB should unfix the block after releases the latch. - Add more assertion to check buffer fix while accessing the page. - Introduced block_hint structure to store buf_block_t pointer and allow accessing the buf_block_t pointer only by passing a functor. It returns original buf_block_t* pointer if it is valid or nullptr if the pointer become stale. - Replace buf_block_is_uncompressed() with buf_pool_t::is_block_pointer() This change is motivated by a change in mysql-5.7.32: mysql/mysql-server@46e60de444 Bug #31036301 ASSERTION FAILURE: SYNC0RW.IC:429:LOCK->LOCK_WORD	2020-10-27 18:30:00 +05:30
Dmitry Shulga	97b10b7fdc	MDEV-22805: SIGSEGV in check_fields on UPDATE For debug build of MariaDB server running of the following test case will hit the assert `thd->lex->sql_command == SQLCOM_UPDATE' in the function check_fields() on attempt to execute the UPDATE statement. CREATE TABLE t1 (a INT); UPDATE t1 FOR PORTION OF APPTIME FROM (SELECT 1 FROM t1) TO 2 SET a = 1; Stack trace to the fired assert statement DBUG_ASSERT(thd->lex->sql_command == SQLCOM_UPDATE) listed below: mysql_execute_command() -> mysql_multi_update_prepare() --> Multiupdate_prelocking_strategy::handle_end() --> check_fiels() It's worth to note that this stack trace looks like a multi update statement is being executed. The fired assert is checked inside the function check_fields() in case table->has_period() returns the value true that in turns happens when temporal period specified in the UPDATE statement. Condition specified in the DEBUG_ASSERT statement returns the false value since the data member thd->lex->sql_command have the value SQLCOM_UPDATE_MULTI. So, the main question is why a program control flow go to the path prescribed for handling MULTI update statement despite of the fact that the ordinary UPDATE statement being executed. The answer is a way that SQL grammar rules written. When the statement UPDATE t1 FOR PORTION OF APPTIME FROM (SELECT 1 FROM t1) TO 2 SET a = 1; being parsed an action for the rule 'table_primary_ident' (part of this action is listed below to simplify description) is invoked to handle the table name 't1' specified in the clause 'SELECT 1 FROM t1'. table_primary_ident: table_ident opt_use_partition opt_for_system_time_clause opt_table_alias_clause opt_key_definition { SELECT_LEX sel= Select; sel->table_join_options= 0; if (!($$= Select->add_table_to_list(thd, $1, $4, This action calls the method st_select_lex::add_table_to_list() to add the table name 't1' to the list of tables being used by the statement. Later, an action for the following grammar rule update_table_list: table_ident opt_use_partition for_portion_of_time_clause opt_table_alias_clause opt_key_definition { SELECT_LEX sel= Select; sel->table_join_options= 0; if (!($$= Select->add_table_to_list(thd, $1, $4, is invoked to handle the clause 't1 FOR PORTION OF APPTIME FROM ... TO 2'. This action also calls the method st_select_lex::add_table_to_list() to add the table name 't1' to the list of tables being used by the statement. In result the table name 't1' contained twice in this list. Presence of duplicate names for the table 't1' in a list of table used by a statement leads to the fact that the function unique_table() called from the function mysql_update() returns the value true that forces implementation of the function mysql_update() to return the value 2 as a signal to fall through the case boundary of the switch statement placed in the function mysql_execute_statement() and start handling of the case for sql_command SQLCOM_UPDATE_MULTI. The compound statement block for the case SQLCOM_UPDATE_MULTI invokes the function mysql_multi_update_prepare() that executes the statement set thd->lex->sql_command= SQLCOM_UPDATE_MULTI; and after that calls the method Multiupdate_prelocking_strategy::handle_end(). Finally, this method invokes the check_field() function and assert is fired. The above analysis shows that update for a table that simultaneously specified both as a destination table of UPDATE statement and as a table taking part in subquery is actually treated by MariaDB server as multi-update statement. Taking into account that multi-update statement for temporal period table is not supported yet by MariaDB, correct way to fix the bug is to return the error ER_NOT_SUPPORTED_YET for this case.	2020-10-27 18:55:22 +07:00
mkaruza	6a614d6934	MDEV-22707: galera got stuck after flush tables Deadlock is possible between applier thread and local committing thread with active FLUSH TABLE. Applier thread should skip table share checks and locks when opening table. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2020-10-27 11:28:45 +02:00
Marko Mäkelä	00ddea4f2f	MDEV-24024 innodb.ibuf_not_empty failed in buildbot Probably due to the changes to page flushing in MDEV-23399 (commit `7cffb5f6e8`) the command CHECK TABLE would occasionally report a different number of rows for the corrupted secondary index. (The reported number was 991 instead of 990 on one occasion.) Let us map all numbers to 990 in the output. We only care that the injected corruption will be detected.	2020-10-27 09:52:42 +02:00
Marko Mäkelä	c27e53f459	MDEV-23855: Use normal mutex for log_sys.mutex, log_sys.flush_order_mutex With an unreasonably small innodb_log_file_size, the page cleaner thread would frequently acquire log_sys.flush_order_mutex and spend a significant portion of CPU time spinning on that mutex when determining the checkpoint LSN.	2020-10-26 17:53:55 +02:00
Marko Mäkelä	a5a2ef079c	MDEV-23855: Implement asynchronous doublewrite Synchronous writes and calls to fdatasync(), fsync() or FlushFileBuffers() would ruin performance. So, let us submit asynchronous writes for the doublewrite buffer. We submit a single request for the likely case that the two doublewrite buffers are contiquous in the system tablespace. buf_dblwr_t::flush_buffered_writes_completed(): The completion callback of buf_dblwr_t::flush_buffered_writes(). os_aio_wait_until_no_pending_writes(): Also wait for doublewrite batches. buf_dblwr_t::element::space: Remove. We can simply use element::request.node->space instead. Reviewed by: Vladislav Vaintroub	2020-10-26 17:53:55 +02:00
Marko Mäkelä	ef3f71fa74	MDEV-23399 fixup: Interleaved doublewrite batches Author: Vladislav Vaintroub	2020-10-26 17:53:54 +02:00
Marko Mäkelä	8cb01c51fb	MDEV-16264 fixup: Clean up asynchronous I/O os_aio_userdata_t: Remove. It was basically duplicating IORequest. buf_page_write_complete(): Take only IORequest as a parameter. os_aio_func(), pfs_os_aio_func(): Replaced with os_aio() that has no redundant parameters. There is only one caller, so there is no point to pass __FILE__, __LINE__ as a parameter.	2020-10-26 17:53:54 +02:00
Marko Mäkelä	118e258aaa	MDEV-23855: Shrink fil_space_t Merge n_pending_ios, n_pending_ops to std::atomic<uint32_t> n_pending. Change some more fil_space_t members to uint32_t to reduce the memory footprint. fil_space_t::add(), fil_ibd_create(): Attach the already opened handle to the tablespace, and enforce the fil_system.n_open limit. dict_boot(): Initialize fil_system.max_assigned_id. srv_boot(): Call srv_thread_pool_init() before anything else, so that files should be opened in the correct mode on Windows. fil_ibd_create(): Create the file in OS_FILE_AIO mode, just like fil_node_open_file_low() does it. dict_table_t::is_accessible(): Replaces fil_table_accessible(). Reviewed by: Vladislav Vaintroub	2020-10-26 17:53:54 +02:00
Marko Mäkelä	45ed9dd957	MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contention Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored for system tablespace on SSD When the maximum configured number of file is exceeded, InnoDB will close data files. We used to maintain a fil_system.LRU list and a counter fil_node_t::n_pending to achieve this, at the huge cost of multiple fil_system.mutex operations per I/O operation. fil_node_open_file_low(): Implement a FIFO replacement policy: The last opened file will be moved to the end of fil_system.space_list, and files will be closed from the start of the list. However, we will not move tablespaces in fil_system.space_list while i_s_tablespaces_encryption_fill_table() is executing (producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION) because it may cause information of some tablespaces to go missing. We also avoid this in mariabackup --backup because datafiles_iter_next() assumes that the ordering is not changed. IORequest: Fold more parameters to IORequest::type. fil_space_t::io(): Replaces fil_io(). fil_space_t::flush(): Replaces fil_flush(). OS_AIO_IBUF: Remove. We will always issue synchronous reads of the change buffer pages in buf_read_page_low(). We will always ignore some errors for background reads. This should reduce fil_system.mutex contention a little. fil_node_t::complete_write(): Replaces fil_node_t::complete_io(). On both read and write completion, fil_space_t::release_for_io() will have to be called. fil_space_t::io(): Do not acquire fil_system.mutex in the normal code path. xb_delta_open_matching_space(): Do not try to open the system tablespace which was already opened. This fixes a file sharing violation in mariabackup --prepare --incremental. Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Marko Mäkelä	3a9a3be1c6	MDEV-23855: Improve InnoDB log checkpoint performance After MDEV-15053, MDEV-22871, MDEV-23399 shifted the scalability bottleneck, log checkpoints became a new bottleneck. If innodb_io_capacity is set low or innodb_max_dirty_pct_lwm is set high and the workload fits in the buffer pool, the page cleaner thread will perform very little flushing. When we reach the capacity of the circular redo log file ib_logfile0 and must initiate a checkpoint, some 'furious flushing' will be necessary. (If innodb_flush_sync=OFF, then flushing would continue at the innodb_io_capacity rate, and writers would be throttled.) We have the best chance of advancing the checkpoint LSN immediately after a page flush batch has been completed. Hence, it is best to perform checkpoints after every batch in the page cleaner thread, attempting to run once per second. By initiating high-priority flushing in the page cleaner as early as possible, we aim to make the throughput more stable. The function buf_flush_wait_flushed() used to sleep for 10ms, hoping that the page cleaner thread would do something during that time. The observed end result was that a large number of threads that call log_free_check() would end up sleeping while nothing useful is happening. We will revise the design so that in the default innodb_flush_sync=ON mode, buf_flush_wait_flushed() will wake up the page cleaner thread to perform the necessary flushing, and it will wait for a signal from the page cleaner thread. If innodb_io_capacity is set to a low value (causing the page cleaner to throttle its work), a write workload would initially perform well, until the capacity of the circular ib_logfile0 is reached and log_free_check() will trigger checkpoints. At that point, the extra waiting in buf_flush_wait_flushed() will start reducing throughput. The page cleaner thread will also initiate log checkpoints after each buf_flush_lists() call, because that is the best point of time for the checkpoint LSN to advance by the maximum amount. Even in 'furious flushing' mode we invoke buf_flush_lists() with innodb_io_capacity_max pages at a time, and at the start of each batch (in the log_flush() callback function that runs in a separate task) we will invoke os_aio_wait_until_no_pending_writes(). This tweak allows the checkpoint to advance in smaller steps and significantly reduces the maximum latency. On an Intel Optane 960 NVMe SSD on Linux, it reduced from 4.6 seconds to 74 milliseconds. On Microsoft Windows with a slower SSD, it reduced from more than 180 seconds to 0.6 seconds. We will make innodb_adaptive_flushing=OFF simply flush innodb_io_capacity per second whenever the dirty proportion of buffer pool pages exceeds innodb_max_dirty_pages_pct_lwm. For innodb_adaptive_flushing=ON we try to make page_cleaner_flush_pages_recommendation() more consistent and predictable: if we are below innodb_adaptive_flushing_lwm, let us flush pages according to the return value of af_get_pct_for_dirty(). innodb_max_dirty_pages_pct_lwm: Revert the change of the default value that was made in MDEV-23399. The value innodb_max_dirty_pages_pct_lwm=0 guarantees that a shutdown of an idle server will be fast. Users might be surprised if normal shutdown suddenly became slower when upgrading within a GA release series. innodb_checkpoint_usec: Remove. The master task will no longer perform periodic log checkpoints. It is the duty of the page cleaner thread. log_sys.max_modified_age: Remove. The current span of the buf_pool.flush_list expressed in LSN only matters for adaptive flushing (outside the 'furious flushing' condition). For the correctness of checkpoints, the only thing that matters is the checkpoint age (log_sys.lsn - log_sys.last_checkpoint_lsn). This run-time constant was also reported as log_max_modified_age_sync. log_sys.max_checkpoint_age_async: Remove. This does not serve any purpose, because the checkpoints will now be triggered by the page cleaner thread. We will retain the log_sys.max_checkpoint_age limit for engaging 'furious flushing'. page_cleaner.slot: Remove. It turns out that page_cleaner_slot.flush_list_time was duplicating page_cleaner.slot.flush_time and page_cleaner.slot.flush_list_pass was duplicating page_cleaner.flush_pass. Likewise, there were some redundant monitor counters, because the page cleaner thread no longer performs any buf_pool.LRU flushing, and because there only is one buf_flush_page_cleaner thread. buf_flush_sync_lsn: Protect writes by buf_pool.flush_list_mutex. buf_pool_t::get_oldest_modification(): Add a parameter to specify the return value when no persistent data pages are dirty. Require the caller to hold buf_pool.flush_list_mutex. log_buf_pool_get_oldest_modification(): Take the fall-back LSN as a parameter. All callers will also invoke log_sys.get_lsn(). log_preflush_pool_modified_pages(): Replaced with buf_flush_wait_flushed(). buf_flush_wait_flushed(): Implement two limits. If not enough buffer pool has been flushed, signal the page cleaner (unless innodb_flush_sync=OFF) and wait for the page cleaner to complete. If the page cleaner thread is not running (which can be the case durign shutdown), initiate the flush and wait for it directly. buf_flush_ahead(): If innodb_flush_sync=ON (the default), submit a new buf_flush_sync_lsn target for the page cleaner but do not wait for the flushing to finish. log_get_capacity(), log_get_max_modified_age_async(): Remove, to make it easier to see that af_get_pct_for_lsn() is not acquiring any mutexes. page_cleaner_flush_pages_recommendation(): Protect all access to buf_pool.flush_list with buf_pool.flush_list_mutex. Previously there were some race conditions in the calculation. buf_flush_sync_for_checkpoint(): New function to process buf_flush_sync_lsn in the page cleaner thread. At the end of each batch, we try to wake up any blocked buf_flush_wait_flushed(). If everything up to buf_flush_sync_lsn has been flushed, we will reset buf_flush_sync_lsn=0. The page cleaner thread will keep 'furious flushing' until the limit is reached. Any threads that are waiting in buf_flush_wait_flushed() will be able to resume as soon as their own limit has been satisfied. buf_flush_page_cleaner: Prioritize buf_flush_sync_lsn and do not sleep as long as it is set. Do not update any page_cleaner statistics for this special mode of operation. In the normal mode (buf_flush_sync_lsn is not set for innodb_flush_sync=ON), try to wake up once per second. No longer check whether srv_inc_activity_count() has been called. After each batch, try to perform a log checkpoint, because the best chances for the checkpoint LSN to advance by the maximum amount are upon completing a flushing batch. log_t: Move buf_free, max_buf_free possibly to the same cache line with log_sys.mutex. log_margin_checkpoint_age(): Simplify the logic, and replace a 0.1-second sleep with a call to buf_flush_wait_flushed() to initiate flushing. Moved to the same compilation unit with the only caller. log_close(): Clean up the calculations. (Should be no functional change.) Return whether flush-ahead is needed. Moved to the same compilation unit with the only caller. mtr_t::finish_write(): Return whether flush-ahead is needed. mtr_t::commit(): Invoke buf_flush_ahead() when needed. Let us avoid external calls in mtr_t::commit() and make the logic easier to follow by having related code in a single compilation unit. Also, we will invoke srv_stats.log_write_requests.inc() only once per mini-transaction commit, while not holding mutexes. log_checkpoint_margin(): Only care about log_sys.max_checkpoint_age. Upon reaching log_sys.max_checkpoint_age where we must wait to prevent the log from getting corrupted, let us wait for at most 1MiB of LSN at a time, before rechecking the condition. This should allow writers to proceed even if the redo log capacity has been reached and 'furious flushing' is in progress. We no longer care about log_sys.max_modified_age_sync or log_sys.max_modified_age_async. The log_sys.max_modified_age_sync could be a relic from the time when there was a srv_master_thread that wrote dirty pages to data files. Also, we no longer have any log_sys.max_checkpoint_age_async limit, because log checkpoints will now be triggered by the page cleaner thread upon completing buf_flush_lists(). log_set_capacity(): Simplify the calculations of the limit (no functional change). log_checkpoint_low(): Split from log_checkpoint(). Moved to the same compilation unit with the caller. log_make_checkpoint(): Only wait for everything to be flushed until the current LSN. create_log_file(): After checkpoint, invoke log_write_up_to() to ensure that the FILE_CHECKPOINT record has been written. This avoids ut_ad(!srv_log_file_created) in create_log_file_rename(). srv_start(): Do not call recv_recovery_from_checkpoint_start() if the log has just been created. Set fil_system.space_id_reuse_warned before dict_boot() has been executed, and clear it after recovery has finished. dict_boot(): Initialize fil_system.max_assigned_id. srv_check_activity(): Remove. The activity count is counting transaction commits and therefore mostly interesting for the purge of history. BtrBulk::insert(): Do not explicitly wake up the page cleaner, but do invoke srv_inc_activity_count(), because that counter is still being used in buf_load_throttle_if_needed() for some heuristics. (It might be cleaner to execute buf_load() in the page cleaner thread!) Reviewed by: Vladislav Vaintroub	2020-10-26 17:09:01 +02:00
Marko Mäkelä	bd67cb9284	MDEV-23399 fixup: Assertion bpage->in_file() failed buf_flush_remove_pages(), buf_flush_dirty_pages(): Because buf_page_t::state() is protected by buf_pool.mutex, which we are not holding, the state may be BUF_BLOCK_REMOVE_HASH when the page is being relocated. Let us relax these assertions similar to buf_flush_validate_low(). The other in_file() assertions in buf0flu.cc look valid.	2020-10-26 17:09:01 +02:00

1 2 3 4 5 ...

191,071 commits