mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 11:01:52 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	0792aff161	Merge 10.4 into 10.5	2022-09-20 13:17:02 +03:00
Marko Mäkelä	0c0a569028	Merge 10.3 into 10.4	2022-09-20 12:38:25 +03:00
Marko Mäkelä	c22dff21a5	InnoDB cleanup: Replace UNIV_LINUX, UNIV_SOLARIS, UNIV_AIX Let us use the normal platform-specific preprocessor symbols __linux__, __sun__, _AIX instead of some homebrew ones. The preprocessor symbol UNIV_HPUX must have lost its meaning by `f6deb00a56` (note: the symbol UNIV_HPUX10 is being checked for, but only UNIV_HPUX is defined).	2022-09-19 12:20:53 +03:00
Marko Mäkelä	bbf81b51f2	Correct typos in a function comment Thanks to Thirunarayanan Balathandayuthapani for spotting this.	2022-09-19 10:23:57 +03:00
Vladislav Vaintroub	a3fd9e6b06	MDEV-29367 Refactor tpool::cache Removed use std::vector's ba push_back(), pop_back() to make it more obvious that memory in the vectors won't be reallocated. Also, "borrowed" elements can be debugged a little better now, they are put into the start of the m_cache vector.	2022-08-24 13:36:49 +02:00
Marko Mäkelä	098c0f2634	Merge 10.4 into 10.5	2022-07-27 17:17:24 +03:00
Marko Mäkelä	e5c4f4e590	Merge 10.3 into 10.4	2022-07-27 14:25:36 +03:00
Marko Mäkelä	0ee1082bd2	MDEV-28495 InnoDB corruption due to lack of file locking Starting with commit `da094188f6` (MDEV-24393), MariaDB will no longer acquire advisory file locks on InnoDB data files by default, because it would create a large number of entries in Linux /proc/locks. The motivation for acquiring the file locks is to prevent accidental concurrent startup of multiple server processes on the same data files. Such mistake still turns out to be relatively common, based on corruption bug reports from the community. To prevent corruption due to concurrent startup attempts, the Aria storage engine would unconditionally acquire an advisory lock on one of its log files. Solution: InnoDB will always lock its system tablespace files. (Ever since commit `685d958e38` the InnoDB log file will not necessarily be open while the server is running, because it can be accessed via memory-mapped I/O.) If more protection is desired, then the option --external-locking can be used. The mandatory advisory lock also fixes intermittent failures of some crash recovery tests. It turns out that when the mtr test harness kills and restarts the server, it will not actually ensure that the old process has terminated before starting the new one.	2022-07-27 14:15:14 +03:00
Daniel Black	fc456bc97e	MDEV-27593 InnoDB handle AIO errors - more detailed assertion Step 1 in handling InnoDB AIO assertions better is to get more detail of the cases of error. This doesn't resolve MDEV-27593, but increases the level of information in the assertion.	2022-07-09 12:34:02 +10:00
Marko Mäkelä	4b3c3e526e	Merge 10.4 into 10.5	2022-06-02 16:51:13 +03:00
Marko Mäkelä	96f4b4a55b	Merge 10.3 into 10.4	2022-06-02 16:34:17 +03:00
Marko Mäkelä	fde99e006d	MDEV-28716: Portability: unlink() can return EPERM instead of EISDIR	2022-06-01 11:13:15 +03:00
Marko Mäkelä	5d8dcfd86c	MDEV-25975: Merge 10.4 into 10.5	2022-04-06 10:30:49 +03:00
Marko Mäkelä	d172df9913	MDEV-25975: Merge 10.3 into 10.4	2022-04-06 09:18:38 +03:00
Marko Mäkelä	e9735a8185	MDEV-25975 innodb_disallow_writes causes shutdown to hang We will remove the parameter innodb_disallow_writes because it is badly designed and implemented. The parameter was never allowed at startup. It was only internally used by Galera snapshot transfer. If a user executed SET GLOBAL innodb_disallow_writes=ON; the server could hang even on subsequent read operations. During Galera snapshot transfer, we will block writes to implement an rsync friendly snapshot, as follows: sst_flush_tables() will acquire a global lock by executing FLUSH TABLES WITH READ LOCK, which will block any writes at the high level. sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true), will suspend or disable InnoDB background tasks or threads that could initiate writes. As part of this, log_make_checkpoint() will be invoked to ensure that anything in the InnoDB buf_pool.flush_list will be written to the data files. This has the nice side effect that the Galera joiner will avoid crash recovery. The changes to sql/wsrep.cc and to the tests are based on a prototype that was developed by Jan Lindström. Reviewed by: Jan Lindström	2022-04-06 08:06:49 +03:00
Marko Mäkelä	59359fb44a	MDEV-24841 Build error with MSAN use-of-uninitialized-value in comp_err The MemorySanitizer implementation in clang includes some built-in instrumentation (interceptors) for GNU libc. In GNU libc 2.33, the interface to the stat() family of functions was changed. Until the MemorySanitizer interceptors are adjusted, any MSAN code builds will act as if that the stat() family of functions failed to initialize the struct stat. A fix was applied in https://reviews.llvm.org/rG4e1a6c07052b466a2a1cd0c3ff150e4e89a6d87a but it fails to cover the 64-bit variants of the calls. For now, let us work around the MemorySanitizer bug by defining and using the macro MSAN_STAT_WORKAROUND().	2022-03-14 09:28:55 +02:00
Daniel Black	d78173828e	MDEV-27900: aio handle partial reads/writes As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can really confuse MariaDB. Filipe Manana (SuSE)[1] showed how database programmers can assume O_DIRECT is all or nothing. While a fix was done in the kernel side, we can do better in our code by requesting that the rest of the block be read/written synchronously if we do only get a partial read/write. Per the APIs, a partial read/write can occur before an error, so reattempting the request will leave the caller with a concrete error to handle. [1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a Also spell synchronously correctly in other files.	2022-03-12 09:47:53 +11:00
Marko Mäkelä	b791b942e1	Merge 10.4 into 10.5	2022-02-25 13:27:41 +02:00
Marko Mäkelä	f5ff7d09c7	Merge 10.3 into 10.4	2022-02-25 13:00:48 +02:00
Marko Mäkelä	00b70bbb51	Merge 10.2 into 10.3	2022-02-25 10:43:38 +02:00
Vlad Lesin	a112a80b47	Merge 10.4 into 10.5	2022-02-22 10:35:16 +03:00
Vladislav Vaintroub	24ec144c63	MDEV-27901 Windows : expensive system calls used to calculate file system block size The result is not used anywhere but in the output of Innodb information schema, but this can take as much as 7%CPU (only) on a benchmark. Fix to move fs blocksize calculate to where it is used.	2022-02-20 22:00:42 +01:00
Vladislav Vaintroub	fa557986ac	MDEV-24175 Windows - fix detection of whether file is on SSD Fix detection. SSD is when storage does not incur a seek penalty.	2022-02-17 22:55:08 +01:00
Marko Mäkelä	4c3ad24413	MDEV-27416 InnoDB hang in buf_flush_wait_flushed(), on log checkpoint InnoDB could sometimes hang when triggering a log checkpoint. This is due to commit `7b1252c03d` (MDEV-24278), which introduced an untimed wait to buf_flush_page_cleaner(). The hang was noticed by occasional failures of IMPORT TABLESPACE tests, such as innodb.innodb-wl5522, which would (unnecessarily) invoke log_make_checkpoint() from row_import_cleanup(). The reason of the hang was that buf_flush_page_cleaner() would enter untimed sleep despite buf_flush_sync_lsn being set. The exact failure scenario is unclear, because buf_flush_sync_lsn should actually be protected by buf_pool.flush_list_mutex. We prevent the hang by invoking buf_pool.page_cleaner_set_idle(false) whenever we are setting buf_flush_sync_lsn and signaling buf_pool.do_flush_list. The bulk of these changes was originally developed as a preparation for MDEV-26827, to invoke buf_flush_list() from fewer threads, and tested on 10.6 by Matthias Leich. This fix was tested by running 100 repetitions of 100 concurrent instances of the test innodb.innodb-wl5522 on a RelWithDebInfo build, using ext4fs and innodb_flush_method=O_DIRECT on a SATA SSD with 4096-byte block size. During the test, the call to log_make_checkpoint() in row_import_cleanup() was present. buf_flush_list(): Make static. buf_flush_wait(): Wait for buf_pool.get_oldest_modification() to reach a target, by work done in the buf_flush_page_cleaner. If buf_flush_sync_lsn is going to be set, we will invoke buf_pool.page_cleaner_set_idle(false). buf_flush_ahead(): If buf_flush_sync_lsn or buf_flush_async_lsn is going to be set and the page cleaner woken up, we will invoke buf_pool.page_cleaner_set_idle(false). buf_flush_wait_flushed(): Invoke buf_flush_wait(). buf_flush_sync(): Invoke recv_sys.apply() at the start in case crash recovery is active. Invoke buf_flush_wait(). buf_flush_sync_batch(): A lower-level variant of buf_flush_sync() that is only called by recv_sys_t::apply(). buf_flush_sync_for_checkpoint(): Do not trigger log apply or checkpoint during recovery. buf_dblwr_t::create(): Only initiate a buffer pool flush, not a checkpoint. row_import_cleanup(): Do not unnecessarily invoke log_make_checkpoint(). Invoking buf_flush_list_space() before starting to generate redo log for the imported tablespace should suffice. srv_prepare_to_delete_redo_log_file(): Set recv_sys.recovery_on in order to prevent buf_flush_sync_for_checkpoint() from initiating a checkpoint while the log is inaccessible. Remove a wait loop that is already part of buf_flush_sync(). Do not invoke fil_names_clear() if the log is being upgraded, because the FILE_MODIFY record is specific to the latest format. create_log_file(): Clear recv_sys.recovery_on only after calling log_make_checkpoint(), to prevent buf_flush_page_cleaner from invoking a checkpoint. innodb_shutdown(): Simplify the logic in mariadb-backup --prepare. os_aio_wait_until_no_pending_writes(): Update the function comment. Apart from row_quiesce_table_start() during FLUSH TABLES...FOR EXPORT, this is being called by buf_flush_list_space(), which is invoked by ALTER TABLE...IMPORT TABLESPACE as well as some encryption operations.	2022-01-04 07:40:31 +02:00
Marko Mäkelä	2d0847818d	Merge 10.4 into 10.5	2021-09-11 11:49:12 +03:00
Marko Mäkelä	101d10b883	Merge 10.3 into 10.4	2021-09-11 11:21:39 +03:00
Marko Mäkelä	bcd25e1066	Merge 10.2 into 10.3	2021-09-11 11:14:18 +03:00
Marko Mäkelä	d09426f9e6	MDEV-26537 InnoDB corrupts files due to incorrect st_blksize calculation The st_blksize returned by fstat(2) is not documented to be a power of 2, like we assumed in commit `58252fff15` (MDEV-26040). While on Linux, the st_blksize appears to report the file system block size (which hopefully is not smaller than the sector size of the underlying block device), on FreeBSD we observed st_blksize values that might have been something similar to st_size. Also IBM AIX was affected by this. A simple test case would lead to a crash when using the minimum innodb_buffer_pool_size=5m on both FreeBSD and AIX: seq -f 'create table t%g engine=innodb select * from seq_1_to_200000;' \ 1 100\|mysql test& seq -f 'create table u%g engine=innodb select * from seq_1_to_200000;' \ 1 100\|mysql test& We will fix this by not trusting st_blksize at all, and assuming that the smallest allowed write size (for O_DIRECT) is 4096 bytes. We hope that no storage systems with larger block size exist. Anything larger than 4096 bytes should be unlikely, given that it is the minimum virtual memory page size of many contemporary processors. MariaDB Server on Microsoft Windows was not affected by this. While the 512-byte sector size of the venerable Seagate ST-225 is still in widespread use, the minimum innodb_page_size is 4096 bytes, and innodb_log_file_size can be set in integer multiples of 65536 bytes. The only occasion where InnoDB uses smaller data file block sizes than 4096 bytes is with ROW_FORMAT=COMPRESSED tables with KEY_BLOCK_SIZE=1 or KEY_BLOCK_SIZE=2 (or innodb_page_size=4096). For such tables, we will from now on preallocate space in integer multiples of 4096 bytes and let regular writes extend the file by 1024, 2048, or 3072 bytes. The view INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES.FS_BLOCK_SIZE should report the raw st_blksize. For page_compressed tables, the function fil_space_get_block_size() will map to 512 any st_blksize value that is larger than 4096. os_file_set_size(): Assume that the file system block size is 4096 bytes, and only support extending files to integer multiples of 4096 bytes. fil_space_extend_must_retry(): Round down the preallocation size to an integer multiple of 4096 bytes.	2021-09-10 19:15:41 +03:00
Marko Mäkelä	eb2f2c1e5f	MDEV-26547 fixup: Wait for read completion buf_load(): Wait for the submitted reads to finish before updating innodb_buffer_pool_load_status.	2021-09-07 08:55:08 +03:00
Oleksandr Byelkin	ae6bdc6769	Merge branch '10.4' into 10.5	2021-07-31 23:19:51 +02:00
Oleksandr Byelkin	7841a7eb09	Merge branch '10.3' into 10.4	2021-07-31 22:59:58 +02:00
Marko Mäkelä	f50eb0d398	Merge 10.2 into 10.3	2021-07-27 10:47:17 +03:00
Marko Mäkelä	da094188f6	MDEV-24393 InnoDB disregards --skip-external-locking On POSIX systems, InnoDB would unconditionally acquire advisory locks on the files that it opens. On Linux, this would be observable by a large number of entries in /proc/locks. Other storage engines would only acquire advisory locks on files based on the Boolean configuration parameter external_locking. Let InnoDB do the same. NOTE: The --skip-external-locking is activated by default. To have InnoDB acquire advisory locks, --external-locking must be specified. Reviewed by: Sergei Golubchik	2021-07-27 08:52:59 +03:00
Marko Mäkelä	15dcb8bd3e	Merge 10.4 into 10.5	2021-07-02 13:02:26 +03:00
Sergei Petrunia	eebe2090c8	Merge 10.3 -> 10.4	2021-06-30 18:41:46 +03:00
Sergei Petrunia	586870f9ef	Merge 10.2->10.3	2021-06-30 15:06:54 +03:00
Marko Mäkelä	58252fff15	MDEV-26040 os_file_set_size() may not work on O_DIRECT files os_file_set_size(): Trim the current size down to the file system block size, to obey the constraints for unbuffered I/O.	2021-06-29 14:28:23 +03:00
Marko Mäkelä	6dfd44c828	MDEV-25954: Trim os_aio_wait_until_no_pending_writes() It turns out that we had some unnecessary waits for no outstanding write requests to exist. They were basically working around a bug that was fixed in MDEV-25953. On write completion callback, blocks will be marked clean. So, it is sufficient to consult buf_pool.flush_list to determine which writes have not been completed yet. On FLUSH TABLES...FOR EXPORT we must still wait for all pending asynchronous writes to complete, because buf_flush_file_space() would merely guarantee that writes will have been initiated.	2021-06-23 19:06:49 +03:00
Marko Mäkelä	db8fb40824	Merge 10.4 into 10.5	2021-05-19 08:39:39 +03:00
Marko Mäkelä	08b6fd9395	MDEV-25710: Dead code os_file_opendir() in the server The functions fil_file_readdir_next_file(), os_file_opendir(), os_file_closedir() became dead code in the server in MariaDB 10.4.0 with commit `09af00cbde` (the removal of the crash recovery logic for the TRUNCATE TABLE implementation that was replaced in MDEV-13564). os_file_opendir(), os_file_closedir(): Define as macros.	2021-05-18 12:13:18 +03:00
Marko Mäkelä	dd07cfcecd	MDEV-15756: Remove some garbage output os_aio_print(): Remove output that should have been removed in commit `5e62b6a5e0` (MDEV-16264).	2021-04-26 15:30:19 +03:00
Marko Mäkelä	e7ddf46632	MDEV-25211 Remove useless counter Innodb_buffered_aio_submitted In commit `412533b4a7` (MDEV-18582), one of the counters that was ported from XtraDB is useless. Innodb_buffered_aio_submitted would be 0 or 1, depending on whether is_linux_native_aio_supported() was executed to the point where it would be incremented. Let us remove this counter, because it has no practical value. Even if its value were 1, io_setup() can still fail and we may end up with innodb_use_native_aio=0.	2021-03-20 13:47:05 +02:00
Marko Mäkelä	be881ec457	Merge 10.4 into 10.5	2021-03-19 13:09:21 +02:00
Marko Mäkelä	44d70c01f0	Merge 10.3 into 10.4	2021-03-19 11:42:44 +02:00
Marko Mäkelä	190a8312f5	Merge 10.4 into 10.5	2021-03-18 15:07:01 +02:00
Marko Mäkelä	126725421e	MDEV-25121: innodb_flush_method=O_DIRECT fails on compressed tables Tests with 4096-byte sector size confirm that it is safe to use O_DIRECT with page_compressed tables. That had been disabled on Linux, in an attempt to fix MDEV-21584 which had been filed for the O_DIRECT problems earlier. The fil_node_t::block_size was being set mostly correctly until commit `10dd290b4b` (MDEV-17380) introduced a regression in MariaDB Server 10.4.4. fil_node_open_file(): Only avoid setting O_DIRECT on ROW_FORMAT=COMPRESSED tables that use KEY_BLOCK_SIZE=1 or 2 (1024 or 2048 bytes). fil_ibd_create(): Avoid setting O_DIRECT on ROW_FORMAT=COMPRESSED tables that use KEY_BLOCK_SIZE=1 or 2 (1024 or 2048 bytes). fil_node_t::find_metadata(): Require fstat() to be always invoked outside Microsoft Windows, so that fil_node_t::block_size can be set. fil_node_t::read_page0(): Rely on find_metadata() to assign block_size. Thanks to Vladislav Vaintroub for testing this on Microsoft Windows using an old-fashioned rotational hard disk with 4KiB sector size. Reviewed by: Vladislav Vaintroub This is a port of commit `00f620b27e` and commit `6505662c23` from 10.2.	2021-03-18 14:43:08 +02:00
Marko Mäkelä	19052b6deb	Merge 10.2 into 10.3	2021-03-18 12:34:48 +02:00
Vladislav Vaintroub	00f620b27e	MDEV-21584 - portability fix This patch implements OS_DATA_FILE_NO_O_DIRECT on Windows.	2021-03-18 12:24:35 +02:00
Marko Mäkelä	14a8b700f3	Cleanup: Remove unused OS_DATA_TEMP_FILE This had been originally added in mysql/mysql-server@192bb153b6 with the motivation to disable O_DIRECT for the dedicated tablespace for temporary tables. In MariaDB Server, commit `5eb539555b` (MDEV-12227) should be a better solution. The code became orphaned later in mysql/mysql-server@c61244c0e6 and it had been applied to MariaDB Server 10.2.2 in commit `2e814d4702` and commit `fec844aca8`. Thanks to Vladislav Vaintroub for spotting this.	2021-03-18 12:24:35 +02:00
Vladislav Vaintroub	d8373fea5f	MDEV-24685 - remove IO thread states output from SHOW ENGINE INNODB STATUS There are no IO threads anymore.	2021-01-29 18:02:14 +02:00

1 2 3 4 5 ...

685 commits