mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-30 18:41:56 +01:00

Author	SHA1	Message	Date
Sergei Golubchik	7970ac7fe8	Merge branch '10.4' into 10.5	2022-05-18 09:50:26 +02:00
Sergei Golubchik	23ddc3518f	Merge branch '10.3' into 10.4	2022-05-18 01:25:30 +02:00
Sergei Golubchik	8609254f4f	cleanup:have_log_bin.inc prefer if/skip over require (works better with debugging, not affected by query log)	2022-05-18 01:22:29 +02:00
Sergei Golubchik	443c2a715d	Merge branch '10.7' into 10.8	2022-05-11 12:21:36 +02:00
Sergei Golubchik	fd132be117	Merge branch '10.6' into 10.7	2022-05-11 11:25:33 +02:00
Sergei Golubchik	3bc98a4ec4	Merge branch '10.5' into 10.6	2022-05-10 14:01:23 +02:00
Sergei Golubchik	ef781162ff	Merge branch '10.4' into 10.5	2022-05-09 22:04:06 +02:00
Sergei Golubchik	a70a1cf3f4	Merge branch '10.3' into 10.4	2022-05-08 23:03:08 +02:00
Oleg Smirnov	a0475cb9ca	MDEV-27021 Add explicit indication of SHOW EXPLAIN/ANALYZE. 1. Add explicit indication that the output is produced by SHOW EXPLAIN/ANALYZE FORMAT=JSON command. 2. Remove useless "r_total_time_ms" field from SHOW ANALYZE FORMAT=JSON output when there is no timed statistics gathered. 3. Add "r_query_time_in_progress_ms" to the output of SHOW ANALYZE FORMAT=JSON.	2022-04-29 10:48:25 +03:00
Aleksey Midenkov	9286c9e647	MDEV-28254 Wrong position for row_start, row_end after adding column to implicit versioned table Implicit system-versioned table does not contain system fields in SHOW CREATE. Therefore after mysqldump recovery such table has system fields in the last place in frm image. The original table meanwhile does not guarantee these system fields on last place because adding new fields via ALTER TABLE places them last. Thus the order of fields may be different between master and slave, so row-based replication may fail. To fix this on ALTER TABLE we now place system-invisible fields always last in frm image. If the table was created via old revision and has an incorrect order of fields it can be fixed via any copy operation of ALTER TABLE, f.ex.: ALTER TABLE t1 FORCE; To check the order of fields in frm file one can use hexdump: hexdump -C t1.frm Note, the replication fails only when all 3 conditions are met: 1. row-based or mixed mode replication; 2. table has new fields added via ALTER TABLE; 3. table was rebuilt on some, but not all nodes via mysqldump image. Otherwise it will operate properly even with incorrect order of fields.	2022-04-22 15:49:37 +03:00
Daniel Black	f0c52bfe3b	MDEV-28263: mariadb-tzinfo-to-sql (postfix) I missed include/no_protocol.inc already existed.	2022-04-21 19:13:58 +10:00
Daniel Black	13e77930e6	MDEV-28263: mariadb-tzinfo-to-sql improve wsrep and binlog cases The --skip-write-binlog message was confusing that it only had an effect if the galera was enabled. There are uses beyond galera so we apply SET SESSION SQL_LOG_BIN=0 as implied by the option without being conditional on the wsrep status. We also with --skip-write-binlog actually check the session @@WSREP_ON variable rather than the global server variable. Since 10.6, the wsrep_mode could replicate Aria and MyISAM, in which case no change to innodb and back is needed. By removing the conditions, we can use LOCK TABLES in a general case improving the load speed of Aria (MDEV-23326), regardless of the skip-write-binlog flag. The only case where we don't use LOCK TABLES is when we are replicating via Innodb, because wsrep_on=1 and wsrep_mode doesn't contain REPLICATE_ARIA{,MYISAM}. This uses an Innodb transaction instead. When replicating via InnoDB we change the table engine type back to what it was originally. By removing the \d and other syntax that requires parsing by the mariadb client, we can use the generated SQL more generally, like in the embedded server. We also save and restore the SQL_LOG_BIN and WSREP_ON session server variables so this can be included in the same session as other data without taking into changes in state. Remove wsrep.mysql_tzinfo_to_sql_symlink{,_skip} tests as they offered no additional coverage beyond main.mysql_tzinfo_to_sql_symlink (no server testing was done). Add galera.mariadb_tzinfo_to_sql to actually test the replication of tzinfo data through galera. The conditional executable comment around /M!100602 ... START TRANSACTION .. LOCK TABLES.. / is so that we can provide tzinfo files (MDEV-27113, MDBF-389) and in the case that a user uses it on a pre-10.6 server version it will still work. Both START TRANSACTION and LOCK TABLES are not supported in prepared statements in MariaDB versions earlier than 10.6.2. Reviewed by Brandon Nesterenko	2022-04-21 14:59:29 +10:00
Marko Mäkelä	6cb6ba8b7b	Merge 10.8 into 10.9	2022-04-06 13:33:33 +03:00
Marko Mäkelä	b2baeba415	Merge 10.7 into 10.8	2022-04-06 13:28:25 +03:00
Marko Mäkelä	2d8e38bc94	Merge 10.6 into 10.7	2022-04-06 13:00:09 +03:00
Marko Mäkelä	ff99413804	MDEV-25975: Merge 10.5 into 10.6	2022-04-06 12:45:14 +03:00
Marko Mäkelä	5d8dcfd86c	MDEV-25975: Merge 10.4 into 10.5	2022-04-06 10:30:49 +03:00
Marko Mäkelä	d172df9913	MDEV-25975: Merge 10.3 into 10.4	2022-04-06 09:18:38 +03:00
Marko Mäkelä	e9735a8185	MDEV-25975 innodb_disallow_writes causes shutdown to hang We will remove the parameter innodb_disallow_writes because it is badly designed and implemented. The parameter was never allowed at startup. It was only internally used by Galera snapshot transfer. If a user executed SET GLOBAL innodb_disallow_writes=ON; the server could hang even on subsequent read operations. During Galera snapshot transfer, we will block writes to implement an rsync friendly snapshot, as follows: sst_flush_tables() will acquire a global lock by executing FLUSH TABLES WITH READ LOCK, which will block any writes at the high level. sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true), will suspend or disable InnoDB background tasks or threads that could initiate writes. As part of this, log_make_checkpoint() will be invoked to ensure that anything in the InnoDB buf_pool.flush_list will be written to the data files. This has the nice side effect that the Galera joiner will avoid crash recovery. The changes to sql/wsrep.cc and to the tests are based on a prototype that was developed by Jan Lindström. Reviewed by: Jan Lindström	2022-04-06 08:06:49 +03:00
Marko Mäkelä	8680eedb26	Merge 10.8 into 10.9	2022-03-30 09:41:14 +03:00
Daniel Black	88ce8a3d8b	Merge 10.7 into 10.8	2022-03-25 15:06:56 +11:00
Daniel Black	8b92e346b1	Merge 10.6 into 10.7	2022-03-25 14:31:59 +11:00
Daniel Black	ec62f46a61	Merge 10.5 to 10.6	2022-03-25 11:31:49 +11:00
Marko Mäkelä	b101f19d29	MDEV-23974 fixup: rpl.rpl_gtid_stop_start fails The call mtr.add_suppression() that was added in commit `75b7cd680b` for MemorySanitizer and Valgrind runs is causing a result difference for the test rpl.rpl_gtid_stop_start. Let us disable the binlog for executing that statement. Also, the test perfschema.statement_program_lost_inst would fail due to the changes to have_innodb.inc in this commit. To compensate for that, we will make more --suite=perfschema tests run without InnoDB, and explicitly enable InnoDB in those tests that depend on a transactional storage engine.	2022-03-24 13:43:58 +02:00
Marko Mäkelä	75b7cd680b	MDEV-23974 Tests fail due to [Warning] InnoDB: Trying to delete tablespace A few regression tests invoke heavy flushing of the buffer pool and may trigger warnings that tablespaces could not be deleted because of pending writes. Those warnings are to be expected during the execution of such tests. The warnings are also frequently seen with Valgrind or MemorySanitizer. For those, the global suppression in have_innodb.inc does the trick.	2022-03-23 16:42:43 +02:00
Alexey Yurchenko	9d7e596ba6	MDEV-26971: JSON file interface to wsrep node state. Integration with status reporter in wsrep-lib. Status reporter reports changes in wsrep state and logged errors/ warnings to a json file which then can be read and interpreted by an external monitoring tool. Rationale: until the server is fully initialized it is unaccessible by client and the only source of information is an error log which is not machine-friendly. Since wsrep node can spend a very long time in initialization phase (state transfer), it may be a very long time that automatic tools can't easily monitor its liveness and progression. New variable: wsrep_status_file specifies the output file name. If not set, no file is created and no reporting is done. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2022-03-18 16:38:41 +01:00
Marko Mäkelä	0a8412b5cf	Merge 10.7 into 10.8	2022-03-01 11:58:32 +02:00
Marko Mäkelä	a43777cc92	Merge 10.6 into 10.7	2022-03-01 09:36:28 +02:00
Marko Mäkelä	07f4a4a6d1	Merge 10.5 into 10.6	2022-03-01 09:15:25 +02:00
Marko Mäkelä	9af4c7c6d7	MDEV-27964: A better work-around Already the detection part of have_crypt.inc must be skipped in WITH_MSAN builds until the SIGSEGV in the crypt() interceptor has been fixed.	2022-03-01 09:12:15 +02:00
Marko Mäkelä	50fa94ea2b	Merge 10.7 into 10.8	2022-02-23 16:42:59 +02:00
Marko Mäkelä	01bb003a3a	Merge 10.6 into 10.7	2022-02-23 16:22:34 +02:00
Marko Mäkelä	164a6aa41c	Merge 10.5 into 10.6	2022-02-23 16:19:45 +02:00
Marko Mäkelä	b91a123d8c	Extend have_sanitizer with ASAN+UBSAN and MSAN Disable some tests that are too slow or big for MSAN.	2022-02-23 15:48:08 +02:00
Marko Mäkelä	080676ca51	Merge 10.7 into 10.8	2022-02-17 14:57:49 +02:00
Marko Mäkelä	c76bdc57ff	Merge 10.6 into 10.7	2022-02-17 14:57:00 +02:00
Marko Mäkelä	f04b459fb7	Merge 10.5 into 10.6	2022-02-17 14:37:17 +02:00
Marko Mäkelä	cac995ec6f	Merge 10.4 into 10.5	2022-02-17 11:58:25 +02:00
Marko Mäkelä	f921db7aa5	Merge 10.3 into 10.4	2022-02-17 11:33:08 +02:00
Lena Startseva	6c3f1f661c	MDEV-27691: make working view-protocol Added ability to disable/enable (--disable_view_protocol/--enable_view_protocol) view-protocol in tests. When the option "--disable_view_protocol" is used util connections are closed. Added new test for checking view-protocol	2022-02-16 13:06:23 +07:00
Oleksandr Byelkin	4fb2cb1a30	Merge branch '10.7' into 10.8	2022-02-04 14:50:25 +01:00
Oleksandr Byelkin	9ed8deb656	Merge branch '10.6' into 10.7	2022-02-04 14:11:46 +01:00
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Oleksandr Byelkin	a576a1cea5	Merge branch '10.3' into 10.4	2022-01-30 09:46:52 +01:00
Oleksandr Byelkin	41a163ac5c	Merge branch '10.2' into 10.3	2022-01-29 15:41:05 +01:00
Monty	a1f630ccfe	Fixed result for embedded server - Revert wrongly record embedded result files. These were either recorded with normal server (not embedded) or an embedded server with not default compile option. This can be seen that the committed result file had replication variables which should never happen. - Reverted back change of include/is_embedded.inc. One cannot check for $MYSQL_EMBEDDED as this only tells if there exists an embedded server, not if the current server we are testing is the embedded server. This could easily be verified by doing 'mtr sys_vars.sysvars_server_embedded'. This would fail with a wrong result instead of being marked as skipped as --embedded was not used.	2022-01-28 16:31:53 +02:00
Sachin	0c5d1342ae	MDEV-11675 Lag Free Alter On Slave This commit implements two phase binloggable ALTER. When a new @@session.binlog_alter_two_phase = YES ALTER query gets logged in two parts, the START ALTER and the COMMIT or ROLLBACK ALTER. START Alter is written in binlog as soon as necessary locks have been acquired for the table. The timing is such that any concurrent DML:s that update the same table are either committed, thus logged into binary log having done work on the old version of the table, or will be queued for execution on its new version. The "COMPLETE" COMMIT or ROLLBACK ALTER are written at the very point of a normal "single-piece" ALTER that is after the most of the query work is done. When its result is positive COMMIT ALTER is written, otherwise ROLLBACK ALTER is written with specific error happened after START ALTER phase. Replication of two-phase binloggable ALTER is cross-version safe. Specifically the OLD slave merely does not recognized the start alter part, still being able to process and memorize its gtid. Two phase logged ALTER is read from binlog by mysqlbinlog to produce BINLOG 'string', where 'string' contains base64 encoded Query_log_event containing either the start part of ALTER, or a completion part. The Query details can be displayed with `-v` flag, similarly to ROW format events. Notice, mysqlbinlog output containing parts of two-phase binloggable ALTER is processable correctly only by binlog_alter_two_phase server. @@log_warnings > 2 can reveal details of binlogging and slave side processing of the ALTER parts. The current commit also carries fixes to the following list of reported bugs: MDEV-27511, MDEV-27471, MDEV-27349, MDEV-27628, MDEV-27528. Thanks to all people involved into early discussion of the feature including Kristian Nielsen, those who helped to design, implement and test: Sergei Golubchik, Andrei Elkin who took the burden of the implemenation completion, Sujatha Sivakumar, Brandon Nesterenko, Alice Sherepa, Ramesh Sivaraman, Jan Lindstrom.	2022-01-27 21:25:07 +02:00
Daniel Black	68b3fa8865	MDEV-27289: mtr test for WITH_SERVER_EMBEDDED=ON reenable mtr is checking the wrong path for the embedded executable on out of tree builds. The is_embedded.inc tests are also checking the version rather than the MTR MYSQL_EMBEDDED environment variable. As a result, a few tests are out of date in the result recordings.	2022-01-27 10:36:39 +11:00
Marko Mäkelä	685d958e38	MDEV-14425 Improve the redo log for concurrency The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.	2022-01-21 16:03:47 +02:00

... 4 5 6 7 8 ...

3070 commits