mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-29 02:05:57 +01:00

Author	SHA1	Message	Date
Sergei Golubchik	3f6038bc51	Merge branch '10.5' into 10.6	2024-01-31 18:04:03 +01:00
Sergei Golubchik	01f6abd1d4	Merge branch '10.4' into 10.5	2024-01-31 17:32:53 +01:00
Daniel Black	82e8633420	innodb: IO Error message missing space Noted by Susmeet Khaire - thanks.	2024-01-19 17:33:44 +11:00
Marko Mäkelä	a6290a5bc5	MDEV-33095 innodb_flush_method=O_DIRECT creates excessive errors on Solaris The directio(3C) function on Solaris is supported on NFS and UFS while the majority of users should be on ZFS, which is a copy-on-write file system that implements transparent compression and therefore cannot support unbuffered I/O. Let us remove the call to directio() and simply treat innodb_flush_method=O_DIRECT in the same way as the previous default value innodb_flush_method=fsync on Solaris. Also, let us remove some dead code around calls to os_file_set_nocache() on platforms where fcntl(2) is not usable with O_DIRECT. On IBM AIX, O_DIRECT is not documented for fcntl(2), only for open(2).	2024-01-19 15:34:33 +11:00
Marko Mäkelä	ee1407f74d	MDEV-32268: GNU libc posix_fallocate() may be extremely slow os_file_set_size(): Let us invoke the Linux system call fallocate(2) directly, because the GNU libc posix_fallocate() implements a fallback that writes to the file 1 byte every 4096 or fewer bytes. In one environment, invoking fallocate() directly would lead to 4 times the file growth rate during ALTER TABLE. Presumably, what happened was that the NFS server used a smaller allocation block size than 4096 bytes and therefore created a heavily fragmented sparse file when posix_fallocate() was used. For example, extending a file by 4 MiB would create 1,024 file fragments. When the file is actually being written to with data, it would be "unsparsed". The built-in EOPNOTSUPP fallback in os_file_set_size() writes a buffer of 1 MiB of NUL bytes. This was always used on musl libc and other Linux implementations of posix_fallocate().	2024-01-18 11:00:27 +02:00
Marko Mäkelä	f21a6cbf6e	MDEV-32939 If tables are frequently created, renamed, dropped, a backup cannot be restored During mariadb-backup --backup, a table could be renamed, created and dropped. We could have both oldname.ibd and oldname.new, and one of the files would be deleted before the InnoDB recovery starts. The desired end result would be that we will recover both oldname.ibd and newname.ibd. During normal crash recovery, at most one file operation (create, rename, delete) may require to be replayed from the write-ahead log before the DDL recovery starts. deferred_spaces.create(): In mariadb-backup --prepare, try to create the file in case it does not exist. fil_name_process(): Display a message about not found files not only if innodb_force_recovery is set, but also in mariadb-backup --prepare. If we are processing a FILE_RENAME for a tablespace whose recovery is deferred, suppress the message and adjust the file name in case fil_ibd_load() returns FIL_LOAD_NOT_FOUND or FIL_LOAD_DEFER. fil_ibd_load(): Remove a redundant file name comparison. The caller already compared that the file names are different. We used to wrongly return FIL_LOAD_OK instead of FIL_LOAD_ID_CHANGED if only the schema name differed, such as a/t1.ibd and b/t1.ibd. Tested by: Matthias Leich Reviewed by: Thirunarayanan Balathandayuthapani	2023-12-14 13:16:28 +02:00
Marko Mäkelä	625a150a86	Merge 10.5 into 10.6	2023-10-06 14:34:01 +03:00
Yuchen Pei	6b343de8ef	Merge branch '10.4' into 10.5	2023-09-25 13:06:57 +10:00
Vladislav Vaintroub	1ee0d09a2b	MDEV-32228 speedup opening tablespaces on Windows is_file_on_ssd() is more expensive than it should be. It caches the results by volume name, but still calls GetVolumePathName() every time, which, as procmon shows, opens multiple directories in filesystem hierarchy (db directory, datadir, and all ancestors) The fix is to cache SSD status by volume serial ID, which is cheap to retrieve with GetFileInformationByHandleEx()	2023-09-22 21:07:50 +02:00
Marko Mäkelä	72928e640e	MDEV-27593: Crashing on I/O error is unhelpful buf_page_t::write_complete(), buf_page_write_complete(), IORequest::write_complete(): Add a parameter for passing an error code. If an error occurred, we will release the io-fix, buffer-fix and page latch but not reset the oldest_modification field. The block would remain in buf_pool.LRU and possibly buf_pool.flush_list, to be written again later, by buf_flush_page_cleaner(). If all page writes start consistently failing, all write threads should eventually hang in log_free_check() because the log checkpoint cannot be advanced to make room in the circular write-ahead-log ib_logfile0. IORequest::read_complete(): Add a parameter for passing an error code. If a read operation fails, we report the error and discard the page, just like we would do if the page checksum was not validated or the page could not be decrypted. This only affects asynchronous reads, due to linear or random read-ahead or crash recovery. When buf_page_get_low() invokes buf_read_page(), that will be a synchronous read, not involving this code. This was tested by randomly injecting errors in write_io_callback() and read_io_callback(), like this: if (!ut_rnd_interval(100)) cb->m_err= 42;	2023-08-01 14:39:29 +03:00
Marko Mäkelä	f2c17cc9d9	MDEV-29911 InnoDB recovery and mariadb-backup --prepare fail to report detailed progress This is a 10.6 port of commit `2f9e264781` from MariaDB Server 10.9 that is missing some optimization due to a more complex redo log format and recovery logic (which was simplified in commit `685d958e38`). The progress reporting of InnoDB crash recovery was rather intermittent. Nothing was reported during the single-threaded log record parsing, which could consume minutes when parsing a large log. During log application, there only was progress reporting in background threads that would be invoked on data page read completion. The progress reporting here will be detailed like this: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1963895808 InnoDB: Multi-batch recovery needed at LSN 2534560930 InnoDB: Read redo log up to LSN=3312233472 InnoDB: Read redo log up to LSN=1599646720 InnoDB: Read redo log up to LSN=2160831488 InnoDB: To recover: LSN 2806789376/2806819840; 195082 pages InnoDB: To recover: LSN 2806789376/2806819840; 63507 pages InnoDB: Read redo log up to LSN=3195776000 InnoDB: Read redo log up to LSN=3687099392 InnoDB: Read redo log up to LSN=4165315584 InnoDB: To recover: LSN 4374395699/4374440960; 241454 pages InnoDB: To recover: LSN 4374395699/4374440960; 123701 pages InnoDB: Read redo log up to LSN=4508724224 InnoDB: Read redo log up to LSN=5094550528 InnoDB: To recover: 205230 pages The previous messages "Starting a batch to recover" or "Starting a final batch to recover" will be replaced by "To recover: ... pages" messages. If a batch lasts longer than 15 seconds, then there will be progress reports every 15 seconds, showing the number of remaining pages. For the non-final batch, the "To recover:" message includes two end LSN: that of the batch, and of the recovered log. This is the primary measure of progress. The batch will end once the number of pages to recover reaches 0. If recovery is possible in a single batch, the output will look like this, with a shorter "To recover:" message that counts only the remaining pages: InnoDB: Starting crash recovery from checkpoint LSN=628599973,5653727799 InnoDB: Read redo log up to LSN=1984539648 InnoDB: Read redo log up to LSN=2710875136 InnoDB: Read redo log up to LSN=3358895104 InnoDB: Read redo log up to LSN=3965299712 InnoDB: Read redo log up to LSN=4557417472 InnoDB: Read redo log up to LSN=5219527680 InnoDB: To recover: 450915 pages We will also speed up recovery by improving the memory management and implementing multi-threaded recovery of data pages that will not need to be read into the buffer pool ("fake read"). Log application in the "fake read" threads will be protected by an atomic being_recovered field and exclusive buf_page_t::lock. Recovery will reserve for data pages two thirds of the buffer pool, or 256 pages, whichever is smaller. Previously, we could only use at most one third of the buffer pool for buffered log records. This would typically mean that with large buffer pools, recovery unnecessary consisted of multiple batches. If recovery runs out of memory, it will "roll back" or "rewind" the current mini-transaction. The recv_sys.recovered_lsn and recv_sys.pages will correspond to the "out of memory LSN", at the end of the previous complete mini-transaction. If recovery runs out of memory while executing the final recovery batch, we can simply invoke recv_sys.apply(false) to make room, and resume parsing. If recovery runs out of memory before the final batch, we will scan the redo log to the end and check for any missing or inconsistent files. In this version of the patch, we will throw away any previously buffered recv_sys.pages and rescan the log from the checkpoint onwards. recv_sys_t::pages_it: A cached iterator to recv_sys.pages. recv_sys_t::is_memory_exhausted(): Remove. We will have out-of-memory handling deep inside recv_sys_t::parse(). recv_sys_t::rewind(), page_recv_t::recs_t::rewind(): Remove all log starting with a specific LSN. IORequest::write_complete(), IORequest::read_complete(): Replaces fil_aio_callback(). read_io_callback(), write_io_callback(): Replaces io_callback(). IORequest::fake_read_complete(), fake_io_callback(), os_fake_read(): Process a "fake read" request for concurrent recovery. recv_sys_t::apply_batch(): Choose a number of successive pages for a recovery batch. recv_sys_t::erase(recv_sys_t::map::iterator): Remove log records for a page whose recovery is not in progress. Log application threads will not invoke this; they will only set being_recovered=-1 to indicate that the entry is no longer needed. recv_sys_t::garbage_collect(): Remove all being_recovered=-1 entries. recv_sys_t::wait_for_pool(): Wait for some space to become available in the buffer pool. mlog_init_t::mark_ibuf_exist(): Avoid calls to recv_sys::recover_low() via ibuf_page_exists() and buf_page_get_low(). Such calls would lead to double locking of recv_sys.mutex, which depending on implementation could cause a deadlock. We will use lower-level calls to look up index pages. buf_LRU_block_remove_hashed(): Disable consistency checks for freed ROW_FORMAT=COMPRESSED pages. Their contents could be uninitialized garbage. This fixes an occasional failure of the test innodb.innodb_bulk_create_index_debug. Tested by: Matthias Leich	2023-05-19 15:20:07 +03:00
Marko Mäkelä	0976afec88	MDEV-31114 Assertion !...is_waiting() failed in os_aio_wait_until_no_pending_writes() os_aio_wait_until_no_pending_reads(), os_aio_wait_until_pending_writes(): Add a Boolean parameter to indicate whether the wait should be declared in the thread pool. buf_flush_wait(): The callers have already declared a wait, so let us avoid doing that again, just call os_aio_wait_until_pending_writes(false). buf_flush_wait_flushed(): Do not declare a wait in the rare case that the buf_flush_page_cleaner thread has been shut down already. buf_flush_page_cleaner(), buf_flush_buffer_pool(): In the code that runs during shutdown, do not declare waits. buf_flush_buffer_pool(): Remove a debug assertion that might fail. What really matters here is buf_pool.flush_list.count==0. buf_read_recv_pages(), srv_prepare_to_delete_redo_log_file(): Do not declare waits during InnoDB startup.	2023-04-24 09:57:58 +03:00
Marko Mäkelä	7e31a8e7fa	MDEV-26827 fixup: Fix os_aio_wait_until_no_pending_writes() io_callback(): Process the request before releasing the write slot. Before commit `a091d6ac4e` when we had a duplicated counter for writes, either ordering was fine. Now, correctness depends on os_aio_wait_until_no_pending_writes().	2023-04-20 14:08:48 +03:00
Marko Mäkelä	f50abab195	MDEV-31048 PERFORMANCE_SCHEMA lakcs InnoDB read_slots and write_slots tpool::cache::m_mtx: Add PERFORMANCE_SCHEMA instrumentation (wait/synch/mutex/innodb/tpool_cache_mutex). This covers the InnoDB read_slots and write_slots for asynchronous data page I/O.	2023-04-13 15:18:26 +03:00
Marko Mäkelä	a091d6ac4e	MDEV-26827 fixup: Do not duplicate io_slots::pending_io_count() os_aio_pending_reads_approx(), os_aio_pending_reads(): Replaces buf_pool.n_pend_reads. os_aio_pending_writes(): Replaces buf_dblwr.pending_writes(). buf_dblwr_t::write_cond, buf_dblwr_t::writes_pending: Remove.	2023-04-12 13:49:57 +03:00
Marko Mäkelä	5bada1246d	Merge 10.5 into 10.6	2023-04-11 16:15:19 +03:00
Oleksandr Byelkin	ac5a534a4c	Merge remote-tracking branch '10.4' into 10.5	2023-03-31 21:32:41 +02:00
Marko Mäkelä	dfa90257f6	MDEV-30936 clang 15.0.7 -fsanitize=memory fails massively handle_slave_io(), handle_slave_sql(), os_thread_exit(): Remove a redundant pthread_exit(nullptr) call, because it would cause SIGSEGV. mysql_print_status(): Add MEM_MAKE_DEFINED() to work around some missing instrumentation around mallinfo2(). que_graph_free_stat_list(): Invoke que_node_get_next(node) before que_graph_free_recursive(node). That is the logical and MSAN_OPTIONS=poison_in_dtor=1 compatible way of freeing memory. ins_node_t::~ins_node_t(): Invoke mem_heap_free(entry_sys_heap). que_graph_free_recursive(): Rely on ins_node_t::~ins_node_t(). fts_t::~fts_t(): Invoke mem_heap_free(fts_heap). fts_free(): Replace with direct calls to fts_t::~fts_t(). The failures in free_root() due to MSAN_OPTIONS=poison_in_dtor=1 will be covered in MDEV-30942.	2023-03-28 11:44:24 +03:00
Vlad Lesin	4c226c1850	MDEV-29050 mariabackup issues error messages during InnoDB tablespaces export on partial backup preparing The solution is to suppress error messages for missing tablespaces if mariabackup is launched with "--prepare --export" options. "mariabackup --prepare --export" invokes itself with --mysqld parameter. If the parameter is set, then it starts server to feed "FLUSH TABLES ... FOR EXPORT;" queries for exported tablespaces. This is "normal" server start, that's why new srv_operation value is introduced. Reviewed by Marko Makela.	2023-03-27 20:15:10 +03:00
Marko Mäkelä	4783f37cf7	MDEV-30069 fixup: Do not truncate files on recovery recv_sys_t::recover_deferred(): If the file has been determined to be large enough, skip the call to os_file_set_size(), which would use the current value of FSP_SIZE, which during a multi-batch recovery can be smaller than the actual file size. os_file_io(): Also display the file offset in the warning message about partial I/O.	2022-11-30 12:06:52 +02:00
Marko Mäkelä	15ab2e122d	MDEV-30132 Crash after recovery, with InnoDB: Tried to read ... os_file_read(): Merged with os_file_read_no_error_handling(). Crashing on a partial page read is as unhelpful as crashing on a corrupted page read (commit `0b47c126e3`). Report the file name if it is available via IORequest.	2022-11-30 10:54:03 +02:00
Marko Mäkelä	44fd2c4b24	Merge 10.5 into 10.6	2022-09-20 16:53:20 +03:00
Marko Mäkelä	0792aff161	Merge 10.4 into 10.5	2022-09-20 13:17:02 +03:00
Marko Mäkelä	0c0a569028	Merge 10.3 into 10.4	2022-09-20 12:38:25 +03:00
Marko Mäkelä	c22dff21a5	InnoDB cleanup: Replace UNIV_LINUX, UNIV_SOLARIS, UNIV_AIX Let us use the normal platform-specific preprocessor symbols __linux__, __sun__, _AIX instead of some homebrew ones. The preprocessor symbol UNIV_HPUX must have lost its meaning by `f6deb00a56` (note: the symbol UNIV_HPUX10 is being checked for, but only UNIV_HPUX is defined).	2022-09-19 12:20:53 +03:00
Marko Mäkelä	bbf81b51f2	Correct typos in a function comment Thanks to Thirunarayanan Balathandayuthapani for spotting this.	2022-09-19 10:23:57 +03:00
Marko Mäkelä	76bb671e42	Merge 10.5 into 10.6	2022-08-25 16:02:44 +03:00
Vladislav Vaintroub	a3fd9e6b06	MDEV-29367 Refactor tpool::cache Removed use std::vector's ba push_back(), pop_back() to make it more obvious that memory in the vectors won't be reallocated. Also, "borrowed" elements can be debugged a little better now, they are put into the start of the m_cache vector.	2022-08-24 13:36:49 +02:00
Marko Mäkelä	30914389fe	Merge 10.5 into 10.6	2022-07-27 17:52:37 +03:00
Marko Mäkelä	098c0f2634	Merge 10.4 into 10.5	2022-07-27 17:17:24 +03:00
Marko Mäkelä	e5c4f4e590	Merge 10.3 into 10.4	2022-07-27 14:25:36 +03:00
Marko Mäkelä	0ee1082bd2	MDEV-28495 InnoDB corruption due to lack of file locking Starting with commit `da094188f6` (MDEV-24393), MariaDB will no longer acquire advisory file locks on InnoDB data files by default, because it would create a large number of entries in Linux /proc/locks. The motivation for acquiring the file locks is to prevent accidental concurrent startup of multiple server processes on the same data files. Such mistake still turns out to be relatively common, based on corruption bug reports from the community. To prevent corruption due to concurrent startup attempts, the Aria storage engine would unconditionally acquire an advisory lock on one of its log files. Solution: InnoDB will always lock its system tablespace files. (Ever since commit `685d958e38` the InnoDB log file will not necessarily be open while the server is running, because it can be accessed via memory-mapped I/O.) If more protection is desired, then the option --external-locking can be used. The mandatory advisory lock also fixes intermittent failures of some crash recovery tests. It turns out that when the mtr test harness kills and restarts the server, it will not actually ensure that the old process has terminated before starting the new one.	2022-07-27 14:15:14 +03:00
Marko Mäkelä	b9eb63618e	Merge 10.5 into 10.6	2022-07-26 11:37:36 +03:00
Daniel Black	fc456bc97e	MDEV-27593 InnoDB handle AIO errors - more detailed assertion Step 1 in handling InnoDB AIO assertions better is to get more detail of the cases of error. This doesn't resolve MDEV-27593, but increases the level of information in the assertion.	2022-07-09 12:34:02 +10:00
Vladislav Vaintroub	d96436c999	MDEV-28935 crash in io_slots::release Revert "TSAN: data race on vptr (ctor/dtor vs virtual call)" This reverts commit `78084fa747`. This commit was done to please TSAN, which falsely reported an error where there was none. Yet as consequence, it could cause a real error, a crash in os_aio_free on shutdown	2022-06-23 17:42:32 +02:00
Marko Mäkelä	0b47c126e3	MDEV-13542: Crashing on corrupted page is unhelpful The approach to handling corruption that was chosen by Oracle in commit `177d8b0c12` is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.	2022-06-06 14:03:22 +03:00
Marko Mäkelä	2f8d0af883	Merge 10.5 into 10.6	2022-06-02 17:39:13 +03:00
Marko Mäkelä	4b3c3e526e	Merge 10.4 into 10.5	2022-06-02 16:51:13 +03:00
Marko Mäkelä	96f4b4a55b	Merge 10.3 into 10.4	2022-06-02 16:34:17 +03:00
Marko Mäkelä	fde99e006d	MDEV-28716: Portability: unlink() can return EPERM instead of EISDIR	2022-06-01 11:13:15 +03:00
Marko Mäkelä	d1edb011ee	Cleanup: Remove os0thread Let us use the common pthread_t wrapper for Microsoft Windows. This fixes up commit `dbe941e06f`	2022-04-19 13:49:52 +03:00
Marko Mäkelä	ff99413804	MDEV-25975: Merge 10.5 into 10.6	2022-04-06 12:45:14 +03:00
Marko Mäkelä	5d8dcfd86c	MDEV-25975: Merge 10.4 into 10.5	2022-04-06 10:30:49 +03:00
Marko Mäkelä	d172df9913	MDEV-25975: Merge 10.3 into 10.4	2022-04-06 09:18:38 +03:00
Marko Mäkelä	e9735a8185	MDEV-25975 innodb_disallow_writes causes shutdown to hang We will remove the parameter innodb_disallow_writes because it is badly designed and implemented. The parameter was never allowed at startup. It was only internally used by Galera snapshot transfer. If a user executed SET GLOBAL innodb_disallow_writes=ON; the server could hang even on subsequent read operations. During Galera snapshot transfer, we will block writes to implement an rsync friendly snapshot, as follows: sst_flush_tables() will acquire a global lock by executing FLUSH TABLES WITH READ LOCK, which will block any writes at the high level. sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true), will suspend or disable InnoDB background tasks or threads that could initiate writes. As part of this, log_make_checkpoint() will be invoked to ensure that anything in the InnoDB buf_pool.flush_list will be written to the data files. This has the nice side effect that the Galera joiner will avoid crash recovery. The changes to sql/wsrep.cc and to the tests are based on a prototype that was developed by Jan Lindström. Reviewed by: Jan Lindström	2022-04-06 08:06:49 +03:00
Marko Mäkelä	572e34304e	Merge 10.5 into 10.6	2022-03-14 10:59:46 +02:00
Marko Mäkelä	59359fb44a	MDEV-24841 Build error with MSAN use-of-uninitialized-value in comp_err The MemorySanitizer implementation in clang includes some built-in instrumentation (interceptors) for GNU libc. In GNU libc 2.33, the interface to the stat() family of functions was changed. Until the MemorySanitizer interceptors are adjusted, any MSAN code builds will act as if that the stat() family of functions failed to initialize the struct stat. A fix was applied in https://reviews.llvm.org/rG4e1a6c07052b466a2a1cd0c3ff150e4e89a6d87a but it fails to cover the 64-bit variants of the calls. For now, let us work around the MemorySanitizer bug by defining and using the macro MSAN_STAT_WORKAROUND().	2022-03-14 09:28:55 +02:00
Daniel Black	bd1ba7801f	Merge branch 10.5 into 10.6	2022-03-12 16:16:03 +11:00
Daniel Black	d78173828e	MDEV-27900: aio handle partial reads/writes As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can really confuse MariaDB. Filipe Manana (SuSE)[1] showed how database programmers can assume O_DIRECT is all or nothing. While a fix was done in the kernel side, we can do better in our code by requesting that the rest of the block be read/written synchronously if we do only get a partial read/write. Per the APIs, a partial read/write can occur before an error, so reattempting the request will leave the caller with a concrete error to handle. [1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a Also spell synchronously correctly in other files.	2022-03-12 09:47:53 +11:00
Marko Mäkelä	6daf8f8a0d	Merge 10.5 into 10.6	2022-02-25 13:48:47 +02:00

1 2 3 4 5 ...

763 commits