mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 02:51:44 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	685d958e38	MDEV-14425 Improve the redo log for concurrency The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.	2022-01-21 16:03:47 +02:00
Sergei Petrunia	ce4956f322	Code cleanup	2022-01-19 18:14:07 +03:00
Sergei Petrunia	f7e49c98e6	Switch the default histogram_type to still be DOUBLE_PREC_HB MTR still uses JSON_HB as the default.	2022-01-19 18:10:12 +03:00
Sergei Petrunia	c2d2c1e727	MDEV-26519: Improved histograms Save extra information in the histogram: "target_histogram_size": nnn, "collected_at": "(date and time)", "collected_by": "(server version)",	2022-01-19 18:10:12 +03:00
Vladislav Vaintroub	e222e44d1b	Merge branch 'preview-10.8-MDEV-26713-Windows-i18-support' into 10.8	2022-01-18 21:37:52 +01:00
Marko Mäkelä	daf4fa5238	Merge 10.7 into 10.8	2022-01-04 10:30:45 +02:00
Marko Mäkelä	7dfaded962	Merge 10.6 into 10.7	2022-01-04 09:55:58 +02:00
Marko Mäkelä	3f5726768f	Merge 10.5 into 10.6	2022-01-04 09:26:38 +02:00
Julius Goryavsky	55bb933a88	Merge branch 10.4 into 10.5	2021-12-26 12:51:04 +01:00
Igor Babaev	42fea34d4a	MDEV-27262 Unexpected index intersection with full index scan for an index If when extracting a range condition for an index from the WHERE condition Range Optimizer sees that the range condition covers the whole index then such condition should be discarded because it cannot be used in any range scan. In some cases Range Optimizer really does it, but there remained some conditions for which it was not done. As a result the optimizer could produce index merge plans with the full index scan for one of the indexes participating in the index merge. This could be observed in one of the test cases from index_merge1.inc where a plan with index_merge_sort_union was produced and in the test case reported for this bug where a plan with index_merge_sort_intersect was produced. In both cases one of two index scans participating in index merge ran over the whole index. The patch slightly changes the original above mentioned test case from index_merge1.inc to be able to produce an intended plan employing index_merge_sort_union. The original query was left to show that index merge is not used for it anymore. It should be noted that for the plan with index_merge_sort_intersect could be chosen for execution only due to a defect in the InnoDB code that returns wrong estimates for the cardinality of big ranges. This bug led to serious problems in 10.4+ where the optimization using Rowid filters is employed (see mdev-26446). Approved by Sergey Petrunia <sergey@mariadb.com>	2021-12-23 19:12:58 -08:00
Monty	20f22dfa2f	Fixed some tests that failes when built with valgrind Example build: ./BUILD/compile-pentium64-valgrind-max Fixes: - sp-no-valgrind failed if binary was built for valgrind as in this case mem_root is allocated in very small hunks which the test cannot handle. Fixed by testing of valgrind build - truncate_notembedded failed in reap because of more memory used. Fixed by allowing reap to fail too	2021-12-15 23:29:04 +02:00
Vladislav Vaintroub	74f2e6c85e	MDEV-26713 Add test for mysql_install_db creating service, with i18	2021-12-15 19:13:57 +01:00
Vladislav Vaintroub	ba9d231b5a	MDEV-26713 Set activeCodePage=UTF8 for windows programs - Use corresponding entry in the manifest, as described in https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page - If if ANSI codepage is UTF8 (i.e for Windows 1903 and later) Use UTF8 as default client charset Set console codepage(s) to UTF8, in case process is using console - Allow some previously disabled MTR tests, that used Unicode for in "exec", for the recent Windows versions	2021-12-15 19:13:57 +01:00
Alexander Barkov	02de93d158	MDEV-27154 allkeys.txt based tests for Unicode-4.0.0 and 5.2.0	2021-12-02 05:35:35 +04:00
Sergei Golubchik	193314f402	show "dying" state in I_S.PLUGINS	2021-10-27 15:55:14 +02:00
Aleksey Midenkov	b7bba721ee	MDEV-22166 CONVERT PARTITION: move out partition into a table Syntax for CONVERT keyword ALTER TABLE tbl_name [alter_option [, alter_option] ...] \| [partition_options] partition_option: { ... \| CONVERT PARTITION partition_name TO TABLE tbl_name } Examples: ALTER TABLE t1 CONVERT PARTITION p2 TO TABLE tp2; New ALTER_PARTITION_CONVERT_OUT command for fast_alter_partition_table() is done in alter_partition_convert_out() function which basically does ha_rename_table(). Partition to extract is marked with the same flag as dropped partition: PART_TO_BE_DROPPED. Note that we cannot have multiple partitioning commands in one ALTER. For DDL logging basically the principle is the same as for other fast_alter_partition_table() commands. The only difference is that it integrates late Atomic DDL functions and introduces additional phase of WFRM_BACKUP_ORIGINAL. That is required for binlog consistency because otherwise we could not revert back after WFRM_INSTALL_SHADOW is done. And before DDL log is complete if we crash or fail the altered table will be already new but binlog will miss that ALTER command. Note that this is different from all other atomic DDL in that it rolls back until the ddl_log_complete() is done even if everything was done fully before the crash. Test cases added to: parts.alter_table \ parts.partition_debug \ versioning.partition \ atomic.alter_partition	2021-10-26 17:07:46 +02:00
Monty	267a07e846	MDEV-26307 multi-source-replication support mysql syntax(for channel) Author: woqutech Reviewer: monty@mariadb.org	2021-09-14 17:57:27 +03:00
Alexander Barkov	0629711db4	MDEV-26572 Improve simple multibyte collation performance on the ASCII range	2021-09-13 08:03:25 +04:00
Marko Mäkelä	9608773f75	MDEV-4750 follow-up: Reduce disabling innodb_stats_persistent This essentially reverts commit `4e89ec6692` and only disables InnoDB persistent statistics for tests where it is desirable. By design, InnoDB persistent statistics will not be updated except by ANALYZE TABLE or by STATS_AUTO_RECALC. The internal transactions that update persistent InnoDB statistics in background tasks (with innodb_stats_auto_recalc=ON) may cause nondeterministic query plans or interfere with some tests that deal with other InnoDB internals, such as the purge of transaction history.	2021-08-31 13:55:02 +03:00
Oleksandr Byelkin	6efb5e9f5e	Merge branch '10.5' into 10.6	2021-08-02 10:11:41 +02:00
Oleksandr Byelkin	ae6bdc6769	Merge branch '10.4' into 10.5	2021-07-31 23:19:51 +02:00
Oleksandr Byelkin	7841a7eb09	Merge branch '10.3' into 10.4	2021-07-31 22:59:58 +02:00
Sergei Golubchik	2575eaa502	dissapear -> disappear	2021-07-26 12:40:01 +02:00
Vladislav Vaintroub	e7f4daf88c	merge 10.5 to 10.6	2021-07-16 22:12:09 +02:00
Dmitry Shulga	429382c29f	MDEV-26142: Fix failures of the tests main.features and sys_vars.stored_program_cache_func when they are run in PS mode These tests produced different results in case they were run with the option --ps-protocol. These tests produced different result sets since a value of Feature_subquery and handler_read_key status system variables are updated one time more for ps-protocol (the first time it is updated on Prepare phase and the second time on Execute phase of PS protocol) So different result sets are expected for both tests. To make tests successfully runnable both for case it is run with and without the option --ps-protocol the new protocol combination [ps, nm] and protocol specific result files have been added. Moreover, the perl script mysql-test/mariadb-test-run.pl has been updated to make the variable opt_ps_protocol visible outside perl file containing this variable.	2021-07-15 16:27:31 +07:00
Daniel Black	f84b3b8807	mtr: aix has no thread_pool	2021-07-06 15:50:58 +10:00
Daniel Black	a8136d13b2	mtr: aix - no pool of threads	2021-07-06 15:29:00 +10:00
Marko Mäkelä	0ad8a825a8	Merge 10.5 into 10.6	2021-07-02 17:00:05 +03:00
Marko Mäkelä	15dcb8bd3e	Merge 10.4 into 10.5	2021-07-02 13:02:26 +03:00
Daniel Black	6a3a046013	mtr: aix - no pool of threads	2021-07-02 17:17:19 +10:00
Daniel Black	2301093f8f	MDEV-25894: support AIX as a platform in mtr Parital backport of `48938c57c7` so platform dependent AIX tests can be done.	2021-07-02 17:17:19 +10:00
Daniel Black	0a9487b62b	mtr: aix - no pool of threads	2021-07-02 14:46:10 +10:00
Daniel Black	3f2c4758b0	MDEV-25894: support AIX as a platform in mtr Parital backport of `48938c57c7` so platform dependent AIX tests can be done.	2021-07-02 14:46:05 +10:00
Marko Mäkelä	b630f0b1b9	Merge 10.5 into 10.6	2021-06-18 09:16:20 +03:00
Dmitry Shulga	97e8d27bed	MDEV-16708: fix in test failures(added --enable_prepared_warnings/--disable_prepared_warnings)	2021-06-17 19:30:24 +02:00
Dmitry Shulga	b126c3f3fa	MDEV-16708: fixed issue with handling of the directive --enable-prepared-warnings in mysqltest	2021-06-17 19:30:24 +02:00
Daniel Black	48938c57c7	MDEV-25894: support AIX as a platform in mtr Add fixed for tests mysqld--help,aix.rdiff and sysvars_server_notembedded,aix.rdiff AIX couldn't compile in embedded mode so leaving sysvars_server_embedded for later (if required).	2021-06-16 15:44:18 +10:00
Marko Mäkelä	1bd681c8b3	MDEV-25506 (3 of 3): Do not delete .ibd files before commit This is a complete rewrite of DROP TABLE, also as part of other DDL, such as ALTER TABLE, CREATE TABLE...SELECT, TRUNCATE TABLE. The background DROP TABLE queue hack is removed. If a transaction needs to drop and create a table by the same name (like TRUNCATE TABLE does), it must first rename the table to an internal #sql-ib name. No committed version of the data dictionary will include any #sql-ib tables, because whenever a transaction renames a table to a #sql-ib name, it will also drop that table. Either the rename will be rolled back, or the drop will be committed. Data files will be unlinked after the transaction has been committed and a FILE_RENAME record has been durably written. The file will actually be deleted when the detached file handle returned by fil_delete_tablespace() will be closed, after the latches have been released. It is possible that a purge of the delete of the SYS_INDEXES record for the clustered index will execute fil_delete_tablespace() concurrently with the DDL transaction. In that case, the thread that arrives later will wait for the other thread to finish. HTON_TRUNCATE_REQUIRES_EXCLUSIVE_USE: A new handler flag. ha_innobase::truncate() now requires that all other references to the table be released in advance. This was implemented by Monty. ha_innobase::delete_table(): If CREATE TABLE..SELECT is detected, we will "hijack" the current transaction, drop the table in the current transaction and commit the current transaction. This essentially fixes MDEV-21602. There is a FIXME comment about making the check less failure-prone. ha_innobase::truncate(), ha_innobase::delete_table(): Implement a fast path for temporary tables. We will no longer allow temporary tables to use the adaptive hash index. dict_table_t::mdl_name: The original table name for the purpose of acquiring MDL in purge, to prevent a race condition between a DDL transaction that is dropping a table, and purge processing undo log records of DML that had executed before the DDL operation. For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the dict_table_t::mdl_name will differ from dict_table_t::name. dict_table_t::parse_name(): Use mdl_name instead of name. dict_table_rename_in_cache(): Update mdl_name. For the internal FTS_ tables of FULLTEXT INDEX, purge would acquire MDL on the FTS_ table name, but not on the main table, and therefore it would be able to run concurrently with a DDL transaction that is dropping the table. Previously, the DROP TABLE queue hack prevented a race between purge and DDL. For now, we introduce purge_sys.stop_FTS() to prevent purge from opening any table, while a DDL transaction that may drop FTS_ tables is in progress. The function fts_lock_table(), which will be invoked before the dictionary is locked, will wait for purge to release any table handles. trx_t::drop_table_statistics(): Drop statistics for the table. This replaces dict_stats_drop_index(). We will drop or rename persistent statistics atomically as part of DDL transactions. On lock conflict for dropping statistics, we will fail instantly with DB_LOCK_WAIT_TIMEOUT, because we will be holding the exclusive data dictionary latch. trx_t::commit_cleanup(): Separated from trx_t::commit_in_memory(). Relax an assertion around fts_commit() and allow DB_LOCK_WAIT_TIMEOUT in addition to DB_DUPLICATE_KEY. The call to fts_commit() is entirely misplaced here and may obviously break the consistency of transactions that affect FULLTEXT INDEX. It needs to be fixed separately. dict_table_t::n_foreign_key_checks_running: Remove (MDEV-21175). The counter was a work-around for missing meta-data locking (MDL) on the SQL layer, and not really needed in MariaDB. ER_TABLE_IN_FK_CHECK: Replaced with ER_UNUSED_28. HA_ERR_TABLE_IN_FK_CHECK: Remove. row_ins_check_foreign_constraints(): Do not acquire dict_sys.latch either. The SQL-layer MDL will protect us. This was reviewed by Thirunarayanan Balathandayuthapani and tested by Matthias Leich.	2021-06-09 17:06:07 +03:00
Monty	9ec2129f71	Fixed bug in mtr that caused restart to fail if mysqld died to fast	2021-05-26 22:17:51 +03:00
Elena Stepanova	71e1ddda22	Race condition occurs upon server restart inside test cases Server restart is reported as failed (with exit code 0), and the whole MTR worker aborts	2021-05-25 21:20:08 +03:00
Monty	e5b6db0179	Speed up atomic test suite by improving wait_until_connected_again.inc and remove usage of RESET MASTER in loops. - Remove sleep of 0.1 second that was done even when not needed. - Don't call include/wait_wsrep_ready.inc if NO_WSREP is defined. - Added NO_WSREP=1 to all atomic tests. - Use 'select 1' instead of 'show status' to check is server is up. - Changed RESET MASTER to FLUSH BINARY LOGS to speed up atomic tests. To be able to do this, added a new parameter variable to show_events.inc to allow one to specify the name of the binary log in the output.	2021-05-24 21:04:40 +03:00
Rucha Deodhar	4e19539c14	MDEV-22189: Change error messages inside code to have mariadb instead of mysql Fix: Changed error messages, rerecorded results and changed other relevant files.	2021-05-24 11:38:13 +05:30
Monty	83e529eced	MDEV-18465 Logging of DDL statements during backup Many of the changes was needed to be able to collect and print engine name and table version id's in the ddl log.	2021-05-19 22:54:13 +02:00
Monty	7762ee5dbe	MDEV-25180 Atomic ALTER TABLE MDEV-25604 Atomic DDL: Binlog event written upon recovery does not have default database The purpose of this task is to ensure that ALTER TABLE is atomic even if the MariaDB server would be killed at any point of the alter table. This means that either the ALTER TABLE succeeds (including that triggers, the status tables and the binary log are updated) or things should be reverted to their original state. If the server crashes before the new version is fully up to date and commited, it will revert to the original table and remove all temporary files and tables. If the new version is commited, crash recovery will use the new version, and update triggers, the status tables and the binary log. The one execption is ALTER TABLE .. RENAME .. where no changes are done to table definition. This one will work as RENAME and roll back unless the whole statement completed, including updating the binary log (if enabled). Other changes: - Added handlerton->check_version() function to allow the ddl recovery code to check, in case of inplace alter table, if the table in the storage engine is of the new or old version. - Added handler->table_version() so that an engine can report the current version of the table. This should be changed each time the table definition changes. - Added ha_signal_ddl_recovery_done() and handlerton::signal_ddl_recovery_done() to inform all handlers when ddl recovery has been done. (Needed by InnoDB). - Added handlerton call inplace_alter_table_committed, to signal engine that ddl_log has been closed for the alter table query. - Added new handerton flag HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT to signal when we should call hton->notify_tabledef_changed() during mysql_inplace_alter_table. This was required as MyRocks and InnoDB needed the call at different times. - Added function server_uuid_value() to be able to generate a temporary xid when ddl recovery writes the query to the binary log. This is needed to be able to handle crashes during ddl log recovery. - Moved freeing of the frm definition to end of mysql_alter_table() to remove duplicate code and have a common exit strategy. ------- InnoDB part of atomic ALTER TABLE (Implemented by Marko Mäkelä) innodb_check_version(): Compare the saved dict_table_t::def_trx_id to determine whether an ALTER TABLE operation was committed. We must correctly recover dict_table_t::def_trx_id for this to work. Before purge removes any trace of DB_TRX_ID from system tables, it will make an effort to load the user table into the cache, so that the dict_table_t::def_trx_id can be recovered. ha_innobase::table_version(): return garbage, or the trx_id that would be used for committing an ALTER TABLE operation. In InnoDB, table names starting with #sql-ib will remain special: they will be dropped on startup. This may be revisited later in MDEV-18518 when we implement proper undo logging and rollback for creating or dropping multiple tables in a transaction. Table names starting with #sql will retain some special meaning: dict_table_t::parse_name() will not consider such names for MDL acquisition, and dict_table_rename_in_cache() will treat such names specially when handling FOREIGN KEY constraints. Simplify InnoDB DROP INDEX. Prevent purge wakeup To ensure that dict_table_t::def_trx_id will be recovered correctly in case the server is killed before ddl_log_complete(), we will block the purge of any history in SYS_TABLES, SYS_INDEXES, SYS_COLUMNS between ha_innobase::commit_inplace_alter_table(commit=true) (purge_sys.stop_SYS()) and purge_sys.resume_SYS(). The completion callback purge_sys.resume_SYS() must be between ddl_log_complete() and MDL release. -------- MyRocks support for atomic ALTER TABLE (Implemented by Sergui Petrunia) Implement these SE API functions: - ha_rocksdb::table_version() - hton->check_version = rocksdb_check_versionMyRocks data dictionary now stores table version for each table. (Absence of table version record is interpreted as table_version=0, that is, which means no upgrade changes are needed) - For inplace alter table of a partitioned table, call the underlying handlerton when checking if the table is ok. This assumes that the partition engine commits all changes at once.	2021-05-19 22:54:13 +02:00
Monty	e3cfb7c803	MDEV-23844 Atomic DROP TABLE (single table) Logging logic: - Log tables, just before they are dropped, to the ddl log - After the last table for the statement is dropped, log an xid for the whole ddl log event In case of crash: - Remove first any active DROP TABLE events from the ddl log that matches xids found in binary log (this mean the drop was successful and was propery logged). - Loop over all active DROP TABLE events - Ensure that the table is completely dropped - Write a DROP TABLE entry to the binary log with the dropped tables. Other things: - Added code to ha_drop_table() to be able to tell the difference if a get_new_handler() failed because of out-of-memory or because the handler refused/was not able to create a a handler. This was needed to get sequences to work as sequences needs a share object to be passed to get_new_handler() - TC_LOG_BINLOG::recover() was changed to always collect Xid's from the binary log and always call ddl_log_close_binlogged_events(). This was needed to be able to collect DROP TABLE events with embedded Xid's (used by ddl log). - Added a new variable "$grep_script" to binlog filter to be able to find only rows that matches a regexp. - Had to adjust some test that changed because drop statements are a bit larger in the binary log than before (as we have to store the xid) Other things: - MDEV-25588 Atomic DDL: Binlog query event written upon recovery is corrupt fixed (in the original commit).	2021-05-19 22:54:12 +02:00
Rucha Deodhar	2fdb556e04	MDEV-8334: Rename utf8 to utf8mb3 This patch changes the main name of 3 byte character set from utf8 to utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default, so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.	2021-05-19 06:48:36 +02:00
Marko Mäkelä	f09d33f521	Merge 10.5 into 10.6	2021-05-18 11:13:45 +03:00
Marko Mäkelä	cc2651b74c	Merge 10.4 into 10.5	2021-05-18 09:21:59 +03:00
Marko Mäkelä	4240704abc	Merge 10.3 into 10.4	2021-05-18 08:59:12 +03:00
Marko Mäkelä	ca3f497564	Merge 10.2 into 10.3, except MDEV-25682	2021-05-18 08:40:19 +03:00

1 2 3 4 5 ...

2871 commits