mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 02:51:44 +01:00

Author	SHA1	Message	Date
Oleksandr Byelkin	4fb2cb1a30	Merge branch '10.7' into 10.8	2022-02-04 14:50:25 +01:00
Oleksandr Byelkin	9ed8deb656	Merge branch '10.6' into 10.7	2022-02-04 14:11:46 +01:00
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Oleksandr Byelkin	c04a203a10	Rocksdb result fix after merge	2022-01-31 08:37:33 +01:00
Oleksandr Byelkin	a576a1cea5	Merge branch '10.3' into 10.4	2022-01-30 09:46:52 +01:00
Oleksandr Byelkin	41a163ac5c	Merge branch '10.2' into 10.3	2022-01-29 15:41:05 +01:00
Monty	a85d942be9	Fixed result file for rocksdb.i_s_deadlock This failed because of MDEV-18918 which removed DEFAULT's	2022-01-27 19:15:02 +02:00
Sergei Golubchik	918f524490	RocksDB doesn't support DESC indexes yet disallow descending indexes in rocksdb	2022-01-26 18:43:06 +01:00
Alexander Barkov	216834b068	A cleanup for MDEV-18918/MDEV-20254 Adjusting rocksdb tests results.	2022-01-25 17:48:44 +04:00
Marko Mäkelä	685d958e38	MDEV-14425 Improve the redo log for concurrency The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.	2022-01-21 16:03:47 +02:00
Sergei Petrunia	fa7a67ff49	MDEV-27149: Add rocksdb_ignore_datadic_errors Add a --rocksdb_ignore_datadic_errors plugin option for MyRocks. The default is 0, and this means MyRocks will call abort() if it detects a DDL mismatch. Setting rocksdb_ignore_datadic_errors=1 makes MyRocks to try to ignore the errors and allow to start the server for repairs.	2022-01-21 09:31:16 +03:00
Sergei Petrunia	ad88c428c5	Avoid a crash on MyRocks data inconsistency. In ha_rocksdb::open(), check if the number of indexes seen from the SQL layer matches the number of indexes in the internal MyRocks data dictionary. Produce an error if there is a mismatch. (If we don't produce this error, we are likely to crash as soon as we attempt to use an index)	2022-01-20 22:32:55 +03:00
Monty	dfbfd39e85	Updated rocksdb.corrupted_data_reads_debug result file The change was because of new rocksdb error message.	2022-01-19 17:00:46 +02:00
Aleksey Midenkov	c81677bebb	rocksdb.tbl_opt_data_index_dir test fix Handler error codes in storage engine may change as they depend on volatile HA_ERR_LAST.	2022-01-14 12:06:15 +03:00
Rucha Deodhar	452c9a4d72	MDEV-26698: Incorrect row number upon INSERT .. SELECT from the same table: rows are counted twice Analysis: When the table we are trying to insert into and the SELECT table are same for INSERT ... SELECT, rows from the SELECT table are copied into internal temporary table and then to the INSERT table. We only want to count the rows when we start inserting into the table. Fix: Reset the counter to 1 before starting to copy from internal temporary table to select table and then increment the counter.	2022-01-03 18:14:59 +05:30
alexfanqi	5fd5e9fff3	improve checks for libatomic linking This code piece is adapted from `451e1415fd/cmake/modules/CheckAtomic.cmake (L23)` Fixes: `f502ccbcb5` Fixes: https://bugs.gentoo.org/828065 Tested-by: Yixun Lan <dlan@gentoo.org> Reviewed-by: Daniel Black	2021-12-30 16:20:29 +11:00
Marko Mäkelä	897d8c57b6	Merge 10.7 into 10.8	2021-11-29 11:58:15 +02:00
Marko Mäkelä	c22107fd90	Merge 10.6 into 10.7	2021-11-29 11:42:07 +02:00
Marko Mäkelä	51c89849d1	Merge 10.5 into 10.6	2021-11-29 11:39:34 +02:00
Marko Mäkelä	d4cb177603	Merge 10.4 into 10.5	2021-11-29 11:16:20 +02:00
Marko Mäkelä	4da2273876	Merge 10.3 into 10.4	2021-11-29 10:59:22 +02:00
Marko Mäkelä	289721de9a	Merge 10.2 into 10.3	2021-11-29 10:33:06 +02:00
Sergei Krivonos	73df7a3009	MDEV-27036: resolve duplicated key issues of JSON tracing outputs: MDEV-27036: repeated "table" key resolve for print_explain_json MDEV-27036: duplicated keys in best_access_path MDEV-27036: Explain_aggr_filesort::print_json_members: resolve duplicated "filesort" member in Json object MDEV-27036: Explain_basic_join:: print_explain_json_interns fixed start_dups_weedout case for main.explain_json test	2021-11-26 15:11:06 +02:00
Alexey Bychko	fe065f8d90	MDEV-22522 RPM packages have meaningless summary/description this patch moves cpack summury and description for optional packages to the appropriate CMakeLists.txt files	2021-11-23 11:29:24 +07:00
Marko Mäkelä	06988bdcaa	Merge 10.6 into 10.7	2021-11-09 09:40:29 +02:00
Marko Mäkelä	25ac047baf	Merge 10.5 into 10.6	2021-11-09 09:11:50 +02:00
Aleksey Midenkov	5cae401b00	MDEV-25555 Server crashes in tree_record_pos after INPLACE-recreating index on HEAP table Drop and add same key is considered rename (look ALTER_RENAME_INDEX in fill_alter_inplace_info()). But in this case order of keys may be changed, because mysql_prepare_alter_table() yet does not know about rename and treats 2 operations: drop and add. In that case we disable inplace algorithm for such engines as Memory, MyISAM and Aria with ALTER_INDEX_ORDER flag. These engines have no specialized check_if_supported_inplace_alter() and default handler::check_if_supported_inplace_alter() sees an unknown flag and returns HA_ALTER_INPLACE_NOT_SUPPORTED. ha_innobase::check_if_supported_inplace_alter() works differently and inplace is not disabled (with the help of modified INNOBASE_INPLACE_IGNORE). add_drop_v_cols fork was also tweaked as it wrongly failed with MSG_UNSUPPORTED_ALTER_ONLINE_ON_VIRTUAL_COLUMN when it seen ALTER_INDEX_ORDER. No-op operation must be still no-op no matter of ALTER_INDEX_ORDER presence, so we tweek its condition as well.	2021-11-03 12:31:47 +03:00
Marko Mäkelä	d7af7bfc2b	Merge 10.6 into 10.7	2021-10-28 09:14:51 +03:00
Marko Mäkelä	d8c6c53a06	Merge 10.5 into 10.6	2021-10-28 09:08:58 +03:00
Marko Mäkelä	a8ded39557	Merge 10.4 into 10.5	2021-10-28 08:48:36 +03:00
Marko Mäkelä	3a79e5fd31	Merge 10.3 into 10.4	2021-10-28 08:28:39 +03:00
Marko Mäkelä	657bcf928e	Merge 10.2 into 10.3	2021-10-28 07:50:05 +03:00
Sergei Golubchik	a010959a56	MDEV-26774 Compression provider unloading at runtime has no effect but doesn't produce a warning	2021-10-27 15:55:14 +02:00
Kartik Soneji	bf8b699f64	MDEV-12933 sort out the compression library chaos bzip2/lz4/lzma/lzo/snappy compression is now provided via services they're almost like normal services, but in include/providers/ and they're supposed to provide exactly the same interface as original compression libraries (but not everything, only enough of if for the code to compile). the services are implemented via dummy functions that return corresponding error values (LZMA_PROG_ERROR, LZO_E_INTERNAL_ERROR, etc). the actual compression libraries are linked into corresponding provider plugins. Providers are daemon plugins that when loaded replace service pointers to point to actual compression functions. That is, run-time dependency on compression libraries is now on plugins, and the server doesn't need any compression libraries to run, but will automatically support the compression when a plugin is loaded. InnoDB and Mroonga use compression plugins now. RocksDB doesn't, because it comes with standalone utility binaries that cannot load plugins.	2021-10-27 15:55:14 +02:00
Kartik Soneji	c356714d77	Change Find*.cmake modules to match conventions	2021-10-27 15:55:14 +02:00
Sergei Golubchik	9e32f229d4	plugin can signal that it cannot be unloaded by failing deinit() if plugin->deinit() returns a failure, it is no longer ignored, it means that the plugin isn't ready to be unloaded from memory yet. So it's marked "dying", deinitialized as much as possible, but stays in memory until shutdown. also: * increment MARIA_PLUGIN_INTERFACE_VERSION * rewrite ha_rocksdb to use the new approach, update the test	2021-10-27 15:55:14 +02:00
Sergei Petrunia	3a9967d757	Fix compile warning: ha_rocksdb.h:459:15: warning: 'table_type' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]	2021-10-27 10:51:08 +03:00
Marko Mäkelä	da46c37bc7	Merge 10.6 into 10.7	2021-10-27 10:24:08 +03:00
Marko Mäkelä	d4a89b9262	Merge 10.5 into 10.6	2021-10-27 10:06:02 +03:00
Sergei Golubchik	a398fcbff6	MDEV-26635 ROW_NUMBER is not 0 for errors not caused because of rows	2021-10-26 17:29:40 +02:00
Marko Mäkelä	f1ba07a044	Merge 10.4 into 10.5	2021-10-21 18:09:17 +03:00
Marko Mäkelä	36f8cca6f3	Merge 10.3 into 10.4	2021-10-21 18:06:31 +03:00
Marko Mäkelä	f9b856b052	Merge 10.2 into 10.3	2021-10-21 17:39:34 +03:00
Sergei Krivonos	7d6617e966	MDEV-19129: Xcode compatibility update: mysql-test-run.pl: rename $opt_vs_config to $multiconfig to use with other cmake multiconfig generators	2021-10-21 16:48:00 +03:00
Eric Herman	401ff6994d	MDEV-26221: DYNAMIC_ARRAY use size_t for sizes https://jira.mariadb.org/browse/MDEV-26221 my_sys DYNAMIC_ARRAY and DYNAMIC_STRING inconsistancy The DYNAMIC_STRING uses size_t for sizes, but DYNAMIC_ARRAY used uint. This patch adjusts DYNAMIC_ARRAY to use size_t like DYNAMIC_STRING. As the MY_DIR member number_of_files is copied from a DYNAMIC_ARRAY, this is changed to be size_t. As MY_TMPDIR members 'cur' and 'max' are copied from a DYNAMIC_ARRAY, these are also changed to be size_t. The lists of plugins and stored procedures use DYNAMIC_ARRAY, but their APIs assume a size of 'uint'; these are unchanged.	2021-10-19 16:00:26 +03:00
Marko Mäkelä	2255649939	Merge 10.6 into 10.7	2021-09-17 20:23:17 +03:00
Marko Mäkelä	03c09837fc	Merge 10.5 into 10.6	2021-09-16 20:17:12 +03:00
Monty	65cce297be	Updated rocksdb test result This was required as I added a new error code to my_base.h and rocksdb is adding it's own errors after the last official one Updated result file also for index_merge_rocksdb2. This is a big test and we have probably not before noticed that some optimizer changes caused a difference.	2021-09-16 14:06:29 +03:00
Marko Mäkelä	58aaa67064	Merge 10.6 into 10.7	2021-08-31 11:01:19 +03:00

1 2 3 4 5 ...

930 commits