mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-17 04:22:27 +01:00

Author	SHA1	Message	Date
Eugene Kosov	0c2365c4e3	cleanup redo log Write log header just ones when file is created, instead of writing to it on every log file wrap around. log_t::file::write_header_durable(): this one writes to log header log_write_buf(): this one stops writing to log header	2020-03-16 17:27:51 +03:00
Eugene Kosov	ce496d4f9e	cleanup redo log move statistics modification into one place	2020-03-16 17:27:51 +03:00
Marko Mäkelä	e5e95a287e	Merge 10.3 into 10.4	2020-03-16 16:24:36 +02:00
Vladislav Vaintroub	92d61c2229	fix typo on non-Linux/Windows	2020-03-16 11:19:41 +01:00
Marko Mäkelä	17080cbcf0	MDEV-21945 Assertion w==OPT failed in trx_purge_add_undo_to_history() By default, when redo log is being written for modifying a persistent data page, the data page must actually be changed. If the write can sometimes be optimized away, then the template parameter w=mtr_t::OPT should be passed in order to silence the debug assertion failure. InnoDB undo log pages can be reused without properly freeing or initializing them in between. In particular, the undo log header page field TRX_UNDO_TRX_NO could have been part of an undo log record page, and those bytes could accidentally have the desired value when the page is reused as an undo log header page of another transaction. Because the function trx_undo_set_state_at_finish() always changes the TRX_UNDO_STATE of the page, and because recovery is only reading TRX_UNDO_TRX_NO for pages that either have the correct TRX_UNDO_STATE or, in trx_rseg_array_init(), are attached to the TRX_SYS page, the garbage values in TRX_UNDO_TRX_NO do not seem to cause a problem. This assertion failure affects debug builds only.	2020-03-16 08:58:54 +02:00
Sergei Golubchik	79499b597a	update the test result for new perfschema	2020-03-16 01:13:01 +01:00
Kentoku SHIBA	2fde97119e	Merge branch '10.5' of github.com:MariaDB/server into 10.5	2020-03-16 08:42:50 +09:00
Kentoku SHIBA	5929e222e4	fix evaluating bitmap issue in spider	2020-03-16 08:39:49 +09:00
Otto Kekäläinen	c8388de2fd	Fix various spelling errors e.g. - dont -> don't - occurence -> occurrence - succesfully -> successfully - easyly -> easily Also remove trailing space in selected files. These changes span: - server core - Connect and Innobase storage engine code - OQgraph, Sphinx and TokuDB storage engines Related to MDEV-21769.	2020-03-16 00:10:50 +02:00
Vladislav Vaintroub	3c57693ff1	MDEV-21534 - Improve innodb redo log group commit performance Instrument new synchronization primitive with thd_wait_begin/end to inform threadpool about waits. This considerably improve performance on write benchmarks (e.g sysbench update_index) with generic threadpool, of course the cost is possibility of many newly created threads.	2020-03-15 21:40:11 +01:00
Andrei Elkin	c8ae357341	MDEV-742 XA PREPAREd transaction survive disconnect/server restart Lifted long standing limitation to the XA of rolling it back at the transaction's connection close even if the XA is prepared. Prepared XA-transaction is made to sustain connection close or server restart. The patch consists of - binary logging extension to write prepared XA part of transaction signified with its XID in a new XA_prepare_log_event. The concusion part - with Commit or Rollback decision - is logged separately as Query_log_event. That is in the binlog the XA consists of two separate group of events. That makes the whole XA possibly interweaving in binlog with other XA:s or regular transaction but with no harm to replication and data consistency. Gtid_log_event receives two more flags to identify which of the two XA phases of the transaction it represents. With either flag set also XID info is added to the event. When binlog is ON on the server XID::formatID is constrained to 4 bytes. - engines are made aware of the server policy to keep up user prepared XA:s so they (Innodb, rocksdb) don't roll them back anymore at their disconnect methods. - slave applier is refined to cope with two phase logged XA:s including parallel modes of execution. This patch does not address crash-safe logging of the new events which is being addressed by MDEV-21469. CORNER CASES: read-only, pure myisam, binlog-, @@skip_log_bin, etc Are addressed along the following policies. 1. The read-only at reconnect marks XID to fail for future completion with ER_XA_RBROLLBACK. 2. binlog- filtered XA when it changes engine data is regarded as loggable even when nothing got cached for binlog. An empty XA-prepare group is recorded. Consequent Commit-or-Rollback succeeds in the Engine(s) as well as recorded into binlog. 3. The same applies to the non-transactional engine XA. 4. @@skip_log_bin=OFF does not record anything at XA-prepare (obviously), but the completion event is recorded into binlog to admit inconsistency with slave. The following actions are taken by the patch. At XA-prepare: when empty binlog cache - don't do anything to binlog if RO, otherwise write empty XA_prepare (assert(binlog-filter case)). At Disconnect: when Prepared && RO (=> no binlogging was done) set Xid_cache_element::error := ER_XA_RBROLLBACK keep XID in the cache, and rollback the transaction. At XA-"complete": Discover the error, if any don't binlog the "complete", return the error to the user. Kudos ----- Alexey Botchkov took to drive this work initially. Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of good recommendations. Sergei Voitovich made a magnificent review and improvements to the code. They all deserve a bunch of thanks for making this work done!	2020-03-14 22:45:48 +02:00
Monty	5754ea2eca	Fixed compiler failures with gcc 7.4.1 and new my_malloc code	2020-03-14 15:24:13 +02:00
Sergei Golubchik	91d1588d30	Merge branch 'github/10.5' into 10.5	2020-03-14 09:52:35 +01:00
Eugene Kosov	774fe8969a	cleanup redo log	2020-03-14 00:52:21 +03:00
Sergey Vojtovich	78cc9c9ebf	Pre-MDEV-742 InnoDB fixes 1. Refactored innobase_close_connection(). Transaction must've already been rolled back by this time. We should expect only transactions in the PREPARED state when MDEV-742 is done. 2. Added missing put_pins() to trx_disconnect_prepared(). Missing put_pins() wasn't a problem because trx_disconnect_prepared() is a dead code. But it will get revived in the main MDEV-742 patch. 3. Fixed missing reset of trx->mysql_log_file_name when RW transaction didn't emit any log records (zero-modification RW). The problem was detected by ASAN when disconnected XA transaction was trying to make use of inherited mysql_log_file_name pointing into binlog data of detached THD. This missing reset also had user-visible side effect, when trx_sys_print_mysql_binlog_offset() would report binlog position not of the most recently committed transaction. One of possible scenarios that is expected to misbehave is as following: thr1> CREATE TABLE t1(a INT) ENGINE=InnoDB; thr1> INSERT INTO t1 VALUES(1); thr1> BEGIN; thr1> UPDATE t1 SET a=1 thr1> COMMIT; -- zero-modification, misses to reset mysql_log_file_name thr2> BEGIN; thr2> INSERT INTO t1 VALUES(2); thr2> COMMIT; thr1> BEGIN; thr1> do-some-real-changes; thr1> ROLLBACK; -- will store binlog pos from previous COMMIT in thr1? In this case it means if binlog is replayed from position reported by trx_sys_print_mysql_binlog_offset(), t1 will end up with two records containing '2'. Part of MDEV-742 - XA PREPAREd transaction survive disconnect/server restart	2020-03-13 15:44:42 +04:00
Marko Mäkelä	5fe87ac413	Merge 10.2 into 10.3	2020-03-13 12:31:55 +02:00
Marko Mäkelä	ed21202a14	Fix GCC 10.0 -Wstringop-overflow myrg_open(): Reduce the scope of the variable 'end' and simplify the code. For some reason, I got no warning for this code in the 10.2 branch, only 10.3 or later. The ENGINE=MERGE is covered by the tests main.merge, main.merge_debug, and main.merge-big.	2020-03-13 12:09:19 +02:00
Thirunarayanan Balathandayuthapani	c58686447f	MDEV-21903 FTS optimize thread aborts during shutdown - This issue was caused by `5e62b6a5e0`. fts_optimize_callback() should free fts_optimize_wq and make it as NULL when it receives FTS_MSG_STOP message. So that subsequent fts_optimize_callback() doesn't fail with segmentation fault.	2020-03-13 13:52:07 +05:30
Marko Mäkelä	fbe662a503	MDEV-15058: Remove buf_pool_get_dirty_pages_count() Starting with commit `1a6f708ec5` the function buf_pool_get_dirty_pages_count() is only used in a debug check. It was dead code for non-debug builds. buf_flush_dirty_pages(): Perform the debug check inline, and replace the assertion ut_ad(first \|\| buf_pool_get_dirty_pages_count(id) == 0); with another one that is executed while holding the mutexes: ut_ad(id != bpage->id.space());	2020-03-13 10:09:15 +02:00
Marko Mäkelä	9f858f38c0	Fix clang 10 warnings _ma_fetch_keypage(): Correct an assertion that used to always hold. Thanks to clang -Wint-in-bool-context for flagging this. double_to_datetime_with_warn(): Suppress -Wimplicit-int-float-conversion by adding a cast. LONGLONG_MAX converted to double will actually be LONGLONG_MAX+1.	2020-03-13 08:37:22 +02:00
Marko Mäkelä	2e8b0c56a0	MDEV-21933 INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES accesses SYS_DATAFILES All tablespace metadata is buffered in fil_system. There is a LRU mechanism, but that only controls the opening and closing of fil_node_t::handle. It is much more efficient and less error-prone to access data file names by looking up the fil_space_t object rather than by essentially joining each row with an access to SYS_DATAFILES via the InnoDB internal SQL parser. dict_get_first_path(): Declare static. The function may only be needed when loading or updating the data dictionary. Also, change a condition in order to avoid a bogus GCC 10 -Wstringop-overflow warning for mem_strdupl() about len==ULINT_UNDEFINED. i_s_sys_tablespaces_fill_table(): Do not access other InnoDB internal dictionary tables than SYS_TABLESPACES.	2020-03-13 08:07:02 +02:00
Marko Mäkelä	a8566f727f	Fix GCC 10 -Wstringop-truncation	2020-03-13 07:39:14 +02:00
Marko Mäkelä	32904dc5fa	Merge 10.1 into 10.2	2020-03-13 07:20:36 +02:00
Marko Mäkelä	c57b207958	MDEV-21907: Fix or disable -Wconversion on GCC 5.3.0 i386 Fix or disable those -Wconversion that were missed by GCC 5.4.0 targeting AMD64.	2020-03-13 06:55:00 +02:00
Marko Mäkelä	f224525204	MDEV-21907: InnoDB: Enable -Wconversion on clang and GCC The -Wconversion in GCC seems to be stricter than in clang. GCC at least since version 4.4.7 issues truncation warnings for assignments to bitfields, while clang 10 appears to only issue warnings when the sizes in bytes rounded to the nearest integer powers of 2 are different. Before GCC 10.0.0, -Wconversion required more casts and would not allow some operations, such as x<<=1 or x+=1 on a data type that is narrower than int. GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining about x\|=y even when x and y are compatible types that are narrower than int. Hence, we must rewrite some x\|=y as x=static_cast<byte>(x\|y) or similar, or we must disable -Wconversion. In GCC 6 and later, the warning for assigning wider to bitfields that are narrower than 8, 16, or 32 bits can be suppressed by applying a bitwise & with the exact bitmask of the bitfield. For older GCC, we must disable -Wconversion for GCC 4 or 5 in such cases. The bitwise negation operator appears to promote short integers to a wider type, and hence we must add explicit truncation casts around them. Microsoft Visual C does not allow a static_cast to truncate a constant, such as static_cast<byte>(1) truncating int. Hence, we will use the constructor-style cast byte(~1) for such cases. This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0, clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019) on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).	2020-03-12 19:46:41 +02:00
Marko Mäkelä	c7920fa8ff	MDEV-16264: Eliminate unsafe os_aio_userdata_t type cast	2020-03-12 19:43:45 +02:00
Marko Mäkelä	8be3794b42	MDEV-21924 Clean up InnoDB GIS record comparison The extension of the record comparison functions for SPATIAL INDEX in mysql/mysql-server@b66ad511b6 was suboptimal for multiple reasons: Some functions used unnecessary temporary variables of the int type, instead of the more appropriate size_t, causing type mismatch. Many functions unnecessarily required rec_get_offsets() to be computed, or a parameter for length, although the size of the minimum bounding rectangle (MBR) is hard-coded as SPDIMS * 2 * sizeof(double), or 32 bytes. In InnoDB SPATIAL INDEX records, there always is a 32-byte key followed by either a 4-byte child page number or the PRIMARY KEY value. The length parameters were not properly validated. The function cmp_geometry_field() was making an incorrect attempt at checking that the lengths are at least sizeof(double) (8 bytes), even though the function is accessing up to 32 bytes in both MBR. Functions that are called from only one compilation unit are defined in another compilation unit, making the code harder to follow and potentially slower to execute. cmp_dtuple_rec_with_gis(): FIXME: Correct the debug assertion and possibly the function TABLE_SHARE::init_from_binary_frm_image() or related code, which causes an unexpected length of DATA_MBR_LEN + 2 bytes to be passed to this function.	2020-03-12 18:13:53 +02:00
Eugene Kosov	5257bcfc7a	InnoDB: improve error message for checksum mismatch	2020-03-12 14:47:45 +03:00
Monty	cebf43e166	Fixed wrong assert (found by clang)	2020-03-11 22:04:06 +02:00
Oleksandr Byelkin	fad47df995	Merge branch '10.4' into 10.5	2020-03-11 17:52:49 +01:00
Eugene Kosov	7f36300df5	MDEV-21918 improve page_zip_verify_checksum() actually, page_zip_verify_checksum() generally allows all-zeroes checksums because our CRC32 checksum is something like crc_1 ^ crc_2 ^ crc_3 Also, all zeroes page is considered correct. As a side effect fix nasty reinterpret_cast<> UB Also, since `c0f47a4a58` innodb_checksum_algorithm=full_crc32 exists which computes CRC32 in one go (without bitwise arithmetic)	2020-03-11 18:02:12 +03:00
Oleksandr Byelkin	b7362d5fbc	Merge branch '10.3' into 10.4	2020-03-11 14:28:24 +01:00
Eugene Kosov	df88e7cefa	fix typedef-related warning and cleanup using namespace std	2020-03-11 16:27:37 +03:00
Oleksandr Byelkin	3c9bc0ce19	Merge branch '10.2' into 10.3	2020-03-11 14:05:41 +01:00
Marko Mäkelä	574d8b2940	MDEV-21907: Fix most clang -Wconversion in InnoDB Declare innodb_purge_threads as 4-byte integer (UINT) instead of 4-or-8-byte (ULONG) and adjust the documentation string.	2020-03-11 08:29:48 +02:00
Alexander Barkov	a1e330de5a	MDEV-21743 Split up SUPER privilege to smaller privileges	2020-03-10 23:49:47 +04:00
Sergei Golubchik	7180afa094	fix perfschema for pool-of-threads	2020-03-10 19:24:24 +01:00
Sergei Golubchik	cbede21d0d	cleanup: pass trxid by value	2020-03-10 19:24:23 +01:00
Sergei Golubchik	211421d5cc	cleanup: remove unused argument	2020-03-10 19:24:23 +01:00
Sergei Golubchik	c1c5222cae	cleanup: PSI key is always the first argument	2020-03-10 19:24:23 +01:00
Sergei Golubchik	7af733a5a2	perfschema compilation, test and misc fixes	2020-03-10 19:24:23 +01:00
Sergei Golubchik	81cffda2e6	perfschema transaction instrumentation related changes	2020-03-10 19:24:23 +01:00
Sergei Golubchik	6ded554fc2	perfschema thread instrumentation related changes	2020-03-10 19:24:23 +01:00
Sergei Golubchik	0d837e8153	perfschema table io instrumentation related changes	2020-03-10 19:24:23 +01:00
Sergei Golubchik	d5a0069702	perfschema socket instrumentation related changes	2020-03-10 19:24:23 +01:00
Sergei Golubchik	05779bc6f1	perfschema mdl related instrumentation changes	2020-03-10 19:24:22 +01:00
Sergei Golubchik	22b6d8487a	perfschema file instrumentation related changes	2020-03-10 19:24:22 +01:00
Sergei Golubchik	7c58e97bf6	perfschema memory related instrumentation changes	2020-03-10 19:24:22 +01:00
Sergei Golubchik	2ac3121af2	perfschema - various collateral cleanups and small changes	2020-03-10 19:24:22 +01:00
Sergei Golubchik	0ea717f51a	P_S 5.7.28	2020-03-10 19:24:22 +01:00
Marko Mäkelä	02343c4a54	MDEV-19740: Correct a type mismatch WITH_INNODB_EXTRA_DEBUG	2020-03-10 15:47:52 +02:00
Marko Mäkelä	561b5ce364	MDEV-21748 ASAN use-after-poison in PageBulk::insertPage() PageBulk::insertPage(): Check the array bounds before comparing. We used to read one byte beyond the end of the 'rec' payload. The incorrect logic was originally introduced in commit `7ae21b18a6`.	2020-03-10 09:53:29 +01:00
Marko Mäkelä	e2e2f89303	MDEV-15528: Minor cleanup buf_flush_freed_page(): Reformat in the common style, and simplify some code. Prefer to request all information from smaller data structures (buf_page_t) than from fil_space_t or the global variable srv_immediate_scrub_data_uncompressed. SysTablespace::open_or_create(): Assert that the temporary tablespace will not be created in page_compressed format, so that buf_flush_freed_page() can avoid checking that on every call. IORequest: Remove duplicated constructors, and do not explicitly declare a default constructor.	2020-03-10 09:53:23 +01:00
Thirunarayanan Balathandayuthapani	a5584b13d1	MDEV-15528 Punch holes when pages are freed The following parameters are deprecated: innodb-background-scrub-data-uncompressed innodb-background-scrub-data-compressed innodb-background-scrub-data-interval innodb-background-scrub-data-check-interval Removed scrubbing code completely(btr0scrub.h, btr0scrub.cc) Removed information_schema.innodb_tablespaces_scrubbing tables Removed the scrubbing logic from fil_crypt_thread()	2020-03-10 10:51:08 +05:30
Thirunarayanan Balathandayuthapani	a35b4ae898	MDEV-15528 Punch holes when pages are freed When a InnoDB data file page is freed, its contents becomes garbage, and any storage allocated in the data file is wasted. During flushing, InnoDB initializes the page with zeros if scrubbing is enabled. If the tablespace is compressed then InnoDB should punch a hole else ignore the flushing of the freed page. buf_page_t: - Replaced the variable file_page_was_freed, init_on_flush in buf_page_t with status enum variable. - Changed all debug assert of file_page_was_freed to DBUG_ASSERT of buf_page_t::status Removed buf_page_set_file_page_was_freed(), buf_page_reset_file_page_was_freed(). buf_page_free(): Newly added function which takes X-lock on the page before marking the status as FREED. So that InnoDB flush handler can avoid concurrent flush of the freed page. Also while flushing the page, InnoDB make sure that redo log which does freeing of the page also written to the disk. Currently, this function only marks the page as FREED if it is in buffer pool buf_flush_freed_page(): Newly added function which initializes zeros asynchorously if innodb_immediate_scrub_data_uncompressed is enabled. Punch a hole to the file synchorously if page_compressed is enabled. Reset the io_fix to NORMAL. Release the block from flush list and associated mutex before writing zeros or punch a hole to the file. buf_flush_page(): Removed the unnecessary usage of temporary variable "flush" fil_io(): Introduce new parameter called punch_hole. It allows fil_io() to punch the hole to the file for the given offset. buf_page_create(): Let the callers assign buf_page_t::status. Every caller should eventually invoke mtr_t::init(). fsp_page_create(): Remove the unused mtr_t parameter. In all other callers of buf_page_create() except fsp_page_create(), before invoking mtr_t::init(), invoke mtr_t::sx_latch_at_savepoint() or mtr_t::x_latch_at_savepoint(). mtr_t::init(): Initialize buf_page_t::status also for the temporary tablespace (when redo logging is disabled), to avoid assertion failures.	2020-03-10 10:51:08 +05:30
Monty	c037cdadf4	Added keyread_time() to HEAP The default keyread_time() was optimized for blocks and not suitable for HEAP. The effect was the HEAP prefered table scans over ranges for btree indexes. Fixed also get_sweep_read_cost() for HEAP tables.	2020-03-09 13:53:34 +02:00
Marko Mäkelä	276e042de3	MDEV-21893: Assertion failure on upgrade with innodb_encrypt_log recv_log_recover_10_4(): Add a missing bit pattern negation that was forgotten when commit `f8a9f90667` (MDEV-12353) removed the support for crash-upgrading.	2020-03-09 11:38:43 +02:00
Marko Mäkelä	adb4117631	MDEV-21892: Assertion ...row_get_rec_trx_id... failed on SELECT btr_cur_upd_rec_in_place(): Invoke page_zip_rec_set_deleted() for ROW_FORMAT=COMPRESSED pages, so that the change will be written to the redo log. This part of crash recovery was broken in commit `08ba388713` (MDEV-12353).	2020-03-09 11:38:34 +02:00
Marko Mäkelä	57c592f74d	Cleanup: Remove recv_sys.remove_extra_log_files create_log_file(): Delete all old redo log files where they used to be deleted, after the crash injection point innodb_log_abort_6, before commit `9ef2d29ff4` deprecated and ignored the setting innodb_log_files_in_group.	2020-03-07 14:47:15 +02:00
Marko Mäkelä	70f0dbe4d3	Cleanup: log upgrade and encryption log_crypt_101_read_checkpoint(), log_crypt_101_read_block(): Declare as ATTRIBUTE_COLD. These are only used when checking that a MariaDB 10.1 encrypted redo log is clean. log_block_calc_checksum_format_0(): Define in the only compilation unit where it is needed. This is only used when reading the checkpoint information from redo logs before MariaDB 10.2.2. crypt_info_t: Declare the byte arrays directly with alignas(). log_crypt(): Use memcpy_aligned instead of reinterpret_cast on integers.	2020-03-07 14:31:36 +02:00
Marko Mäkelä	522fbfcb5c	Cleanup: Remove recv_sys.buf_size Also, correctly document what recv_sys.mutex is protecting.	2020-03-07 12:01:12 +02:00
Sergei Petrunia	cbbe4971b6	MDEV-21887: federatedx crashes on SELECT ... INTO query in select_handler code - Don't try to push down SELECTs that have a side effect - In case the storage engine did support pushdown of SELECT with an INTO clause, write the rows we've got from it into select->join->result, and not thd->protocol. This way, SELECT ... INTO ... FROM smart_engine_table will put the result into where instructed, and NOT send it to the client.	2020-03-07 01:14:41 +03:00
Marko Mäkelä	23685378ba	MDEV-14425 preparation: Simplify redo log upgrade recv_log_recover_pre_10_2(): Merged from recv_find_max_checkpoint_0(), recv_log_format_0_recover().	2020-03-06 11:06:59 +02:00
Marko Mäkelä	a4ab54d70f	MDEV-14425 Cleanup: Use std::atomic for some log_sys members Some fields were protected by log_sys.mutex, which adds quite some overhead for readers. Some readers were submitting dirty reads. log_t::lsn: Declare private and atomic. Add wrappers get_lsn() and set_lsn() that will use relaxed memory access. Many accesses to log_sys.lsn are still protected by log_sys.mutex; we avoid the mutex for some readers. log_t::flushed_to_disk_lsn: Declare private and atomic, and move to the same cache line with log_t::lsn. log_t::buf_free: Declare as size_t, and move to the same cache line with log_t::lsn. log_t::check_flush_or_checkpoint_: Declare private and atomic, and move to the same cache line with log_t::lsn. log_get_lsn(): Define as an alias of log_sys.get_lsn(). log_get_lsn_nowait(), log_peek_lsn(): Remove. log_get_flush_lsn(): Define as an alias of log_sys.get_flush_lsn(). log_t::initiate_write(): Replaces log_buffer_sync_in_background().	2020-03-05 16:21:31 +02:00
Eugene Kosov	555f955a16	use O_DSYNC for InnoDB O_DSYNC is faster than O_SYNC because it syncs as little as needed (e.g. no timestamp changes) This change is similar to change fsync() -> fdatasync() in MDEV-21382	2020-03-05 11:35:09 +03:00
Marko Mäkelä	4b42fa3ce3	MDEV-14425: Remove the unused function mtr_write_log() This amends commit `37e7bde12a`	2020-03-05 08:51:40 +02:00
Marko Mäkelä	1312b4ebb6	MDEV-14425 preparation: Provide ut_crc32_low() The ut_crc32() function uses a hard-coded initial CRC-32C value of 0. Replace it with ut_crc32_low(), which allows to specify the initial checksum value, and provide an inlined compatibility wrapper ut_crc32(). Also, remove non-inlined wrapper functions on ARMv8 and POWER8, and remove dead code (the generic implementation) on POWER8. Note: The original AMD64 instruction set architecture in 2003 only included SSE2. The CRC-32C instructions are part of the SSE4.2 instruction set extension for IA-32 and AMD64, with first processors released in November 2007 (using the AMD Barcelona microarchitecture) and November 2008 (Intel Nehalem microarchiteture). It might be safe to assume that SSE4.2 is available on all currently used AMD64 based systems, but we are not taking that step yet.	2020-03-05 07:39:04 +02:00
Marko Mäkelä	6b317c1cc3	Remove some redundant code flagged by clang or GCC	2020-03-05 07:31:52 +02:00
Marko Mäkelä	64be4ab4a8	MDEV-21870 Deprecate and ignore innodb_scrub_log and innodb_scrub_log_speed The configuration parameter innodb_scrub_log never really worked, as reported in MDEV-13019 and MDEV-18370. Because MDEV-14425 is changing the redo log format, the innodb_scrub_log feature would have to be adjusted for it. Due to the known problems, it is easier to remove the feature for now, and to ignore and deprecate the parameters. If old log contents should be kept secret, then enabling innodb_encrypt_log or setting a smaller innodb_log_file_size could help.	2020-03-04 19:01:09 +02:00
Marko Mäkelä	8a25eb666d	MDEV-18214 cleanup: Remove redundant MONITOR_INC calls MONITOR_PENDING_CHECKPOINT_WRITE and MONITOR_LOG_IO track log_sys.n_pending_checkpoint_writes and log_sys.n_log_ios, respectively. The MONITOR_INC calls are redundant, because the values will be overwritten in srv_mon_process_existing_counter().	2020-03-04 13:05:22 +02:00
Marko Mäkelä	9e488653ae	Cleanup: Make MONITOR_LSN_CHECKPOINT_AGE a value. Compute MONITOR_LSN_CHECKPOINT_AGE on demand in srv_mon_process_existing_counter(). This allows us to remove the overhead of MONITOR_SET calls for the counter.	2020-03-04 12:59:20 +02:00
Marko Mäkelä	4383897a01	MDEV-14425 preparation: Remove log_header_read() The function log_header_read() was only used during server startup, and it will mostly be used only for reading checkpoint information from pre-MDEV-14425 format redo log files. Let us replace the function with more direct calls, so that it is clearer what is going on. It is not strictly necessary to hold any mutex during this operation, and because there will be only a limited number of operations during early server startup, it is not necessary to increment any I/O counters.	2020-03-04 10:08:33 +02:00
Marko Mäkelä	37e7bde12a	MDEV-14425 preparation: Remove log_t::append_on_checkpoint Simplify the logging of ALTER TABLE operations, by making use of the TRX_UNDO_RENAME_TABLE undo log record that was introduced in commit `0bc36758ba`. commit_try_rebuild(): Invoke row_rename_table_for_mysql() and actually rename the files before committing the transaction. fil_mtr_rename_log(), commit_cache_rebuild(), log_append_on_checkpoint(), row_merge_rename_tables_dict(): Remove. mtr_buf_copy_t, log_t::append_on_checkpoint: Remove. row_rename_table_for_mysql(): If !use_fk, ignore missing foreign keys. Remove a call to dict_table_rename_in_cache(), because trx_rollback_to_savepoint() should invoke the function if needed.	2020-03-03 22:25:20 +02:00
Marko Mäkelä	1ef10744ab	MDEV-21534: Fix -Wmaybe-uninitialized group_commit_lock::release(): Ensure that prev will be initialized, simplify a comparison, and fix some white space.	2020-03-03 15:00:36 +02:00
Marko Mäkelä	a736a2cbc4	MDEV-21724: Correctly invoke page_dir_split_slot() In commit `138cbec5f2`, we computed an incorrect parameter to page_dir_split_slot(), leading us to splitting the wrong directory slot, or an out-of-bounds access when splitting the supremum slot. This was once caught in the test innodb_gis.kill_server for inserting records to a clustered index root page. page_dir_split_slot(): Take the slot as a pointer, instead of a numeric index. page_apply_insert_redundant(), page_apply_insert_dynamic(): Rename slot to last_slot, and make owner_slot a pointer.	2020-03-03 14:41:32 +02:00
Marko Mäkelä	fae259f036	MDEV-12353: Introduce an EXTENDED record subtype TRIM_PAGES For undo log truncation, commit `055a3334ad` repurposed the MLOG_FILE_CREATE2 record with a nonzero page size to indicate that an undo tablespace will be shrunk in size. In commit `7ae21b18a6` the MLOG_FILE_CREATE2 record was replaced by a FILE_CREATE record. Now that the redo log encoding was changed, there is no actual need to write a file name in the log record; it suffices to write the page identifier of the first page that is not part of the file. This TRIM_PAGES record could allow us to shrink any data files in the future. For now, it will be limited to undo tablespaces. mtr_t::log_file_op(): Remove the parameter first_page_no, because it would always be 0 for file operations. mtr_t::trim_pages(): Replaces fil_truncate_log(). mtr_t::log_write(): Avoid same_page encoding if !bpage&&!m_last. fil_op_replay_rename(): Remove the constant parameter first_page_no=0.	2020-03-03 13:25:45 +02:00
Aleksey Midenkov	193725b81e	MDEV-7318 RENAME INDEX This patch adds support of RENAME INDEX operation to the ALTER TABLE statement. Code which determines if ALTER TABLE can be done in-place for "simple" storage engines like MyISAM, Heap and etc. was updated to handle ALTER TABLE ... RENAME INDEX as an in-place operation. Support for in-place ALTER TABLE ... RENAME INDEX for InnoDB was covered by MDEV-13301. Syntax changes ============== A new type of <alter_specification> is added: <rename index clause> ::= RENAME ( INDEX \| KEY ) <oldname> TO <newname> Where <oldname> and <newname> are identifiers for old name and new name of the index. Semantic changes ================ The result of "ALTER TABLE t1 RENAME INDEX a TO b" is a table which contents and structure are identical to the old version of 't1' with the only exception index 'a' being called 'b'. Neither <oldname> nor <newname> can be "primary". The index being renamed should exist and its new name should not be occupied by another index on the same table. Related to: WL#6555, MDEV-13301	2020-03-03 13:50:33 +03:00
mkaruza	d87c16be79	MDEV-20616: MariaDB-Galera 10.4.8 \| Transaction aborted \| Sig 6 Shutdown When connections go to same node and deadlock happens, BF abort should not happen for victim thread. Fixed by guarding `wsrep_handle_SR_rollback()` so that is called only for SR transactions. Co-authored-by: Seppo Jaakola <seppo.jaakola@iki.fi> Co-authored-by: Daniele Sciascia <daniele.sciascia@galeracluster.com>	2020-03-03 10:29:45 +01:00
Marko Mäkelä	8511f04fdb	Cleanup: Remove srv_start_lsn Most of the time, we can refer to recv_sys.recovered_lsn.	2020-03-02 15:01:46 +02:00
Marko Mäkelä	55a5b5baf6	MDEV-12353 cleanup: Simplify mtr_t::undo_append()	2020-03-02 10:07:01 +02:00
Vlad Lesin	721ec44e2a	MDEV-14479: Do not acquire InnoDB record locks when covering table locks exist lock_rec_lock() does not set record lock if table lock is stronger or equal to the acquired record lock.	2020-03-02 09:09:51 +03:00
Vladislav Vaintroub	47d8fcf4cd	MDEV-21534 - fix debug build	2020-03-01 23:33:16 +01:00
Vladislav Vaintroub	30ea63b7d2	MDEV-21534 - Improve innodb redo log group commit performance Introduce special synchronization primitive group_commit_lock for more efficient synchronization of redo log writing and flushing. The goal is to reduce CPU consumption on log_write_up_to, to reduce the spurious wakeups, and improve the throughput in write-intensive benchmarks.	2020-03-01 19:02:21 +01:00
Sergey Vojtovich	607960c772	MDEV-21766 - Forbid XID with empty 'gtrid' XA specification doesn't permit empty gtrid. It is now enforced by this patch. This solution was agreed in favour of fixing InnoDB, which doesn't expect empty XID since early 10.5. Also fixed wrong assertion (and added a test cases) that didn't permit 64 bytes gtrid + 64 bytes bqual.	2020-02-28 22:27:55 +04:00
Marko Mäkelä	8db623038f	Fix GCC -Wsign-compare	2020-02-27 18:19:31 +02:00
Marko Mäkelä	a263ca26db	Fix GCC -Wparentheses	2020-02-27 17:51:59 +02:00
Marko Mäkelä	138cbec5f2	MDEV-21724: Optimize page_cur_insert_low() redo logging Inserting a record into an index page involves updating multiple fields in the page header as well as updating the next-record links and potentially updating fields related to the sparse page directory. Let us cover the insert operations by higher-level log records, to avoid 'redundant' logging about the writes. The code for applying the high-level log records will check the consistency of the page thoroughly, to avoid crashes during recovery. We will refuse to replay the inserts if any inconsistency is detected. With innodb_force_recovery=1, recovery will continue, but the affected pages may be more inconsistent if some changes were omitted. mrec_ext_t: Introduce the EXTENDED record subtypes INSERT_HEAP_REDUNDANT, INSERT_REUSE_REDUNDANT, INSERT_HEAP_DYNAMIC, INSERT_REUSE_DYNAMIC. The record will explicitly identify the page type and whether the space will be allocated from PAGE_HEAP_TOP or reused from the PAGE_FREE list. It will also tell how many bytes to copy from the preceding record header and payload, and how to initialize the rest of the record header and payload. mtr_t::page_insert(): Write the high-level log records. log_phys_t::apply(): Parse the high-level log records. page_apply_insert_redundant(), page_apply_insert_dynamic(): Apply the high-level log records. page_dir_split_slot(): Introduce a variant that does not write log nor deal with ROW_FORMAT=COMPRESSED pages. page_mem_alloc_heap(): Remove the mtr_t parameter page_cur_insert_rec_low(): Write log only via mtr_t::page_insert().	2020-02-27 17:19:44 +02:00
Marko Mäkelä	dee6fb356b	MDEV-12353 Cleanup: Remove page_rec_get_base_extra_size() The function page_rec_get_base_extra_size() became dead code in commit `08ba388713`.	2020-02-27 17:15:20 +02:00
Marko Mäkelä	e15ae1cfe1	MDEV-12353: Improve page_cur_delete_rec() recovery This is a follow-up to commit `572d20757b` where we introduced the EXTENDED log record subtypes DELETE_ROW_FORMAT_REDUNDANT and DELETE_ROW_FORMAT_DYNAMIC. log_phys_t::apply(): If corruption was noticed, stop applying the log unless innodb_force_recovery is set.	2020-02-27 16:47:00 +02:00
Marko Mäkelä	4431144ae5	MDEV-12353: Make UNDO_APPEND more robust This is a follow-up to commit `84e3f9ce84` that introduced the EXTENDED log record of UNDO_APPEND subtype. mtr_t::undo_append(): Accurately enforce the mtr_buf_t::MAX_DATA_SIZE limit. Also, replace mtr_buf_t::push() with simpler code, to append 1 byte to the log. log_phys_t::undo_append(): Return whether the page was found to be in an inconsistent state. log_phys_t::apply(): If corruption was noticed, stop applying log unless innodb_force_recovery is set.	2020-02-27 16:47:00 +02:00
Marko Mäkelä	0eca30a70d	MDEV-21749: page_cur_insert_rec_low(): Assertion rdm - rd + bd <= insert_buf + rec_size failed. This bug was introduced in commit `7ae21b18a6` (the main commit of MDEV-12353). page_cur_insert_rec_low(): Before entering the comparison loop, make sure that the range does not exceed c_end already at the start of the loop. The loop is only comparing for pointer equality, and that condition cdm == c_end would never hold if the end was already exceeded in the beginning. Also, skip the comparison altogether if we could find at most 2 equal bytes. PageBulk::insertPage(): Apply a similar change. It seems that this code was correct, because the loop checks for cdm < c_end.	2020-02-24 16:12:48 +02:00
Marko Mäkelä	956e12d639	MDEV-12353: Fix cmake -DWITH_INNODB_EXTRA_DEBUG The compilation was accidentally broken in commit `22f649a67a`.	2020-02-24 15:13:00 +02:00
Marko Mäkelä	572d20757b	MDEV-12353: Reduce log volume of page_cur_delete_rec() mrec_ext_t: Introduce DELETE_ROW_FORMAT_REDUNDANT, DELETE_ROW_FORMAT_DYNAMIC. mtr_t::page_delete(): Write DELETE_ROW_FORMAT_REDUNDANT or DELETE_ROW_FORMAT_DYNAMIC log records. We log the byte offset of the preceding record, so that on recovery we can easily find everything to update. For DELETE_ROW_FORMAT_DYNAMIC, we must also write the header and data size of the record. We will retain the physical logging for ROW_FORMAT=COMPRESSED pages. page_zip_dir_balance_slot(): Renamed from page_dir_balance_slot(), and specialized for ROW_FORMAT=COMPRESSED only. page_rec_set_n_owned(), page_dir_slot_set_n_owned(), page_dir_balance_slot(): New variants that do not write any log. page_mem_free(): Take data_size, extra_size as parameters. Always zerofill the record payload. page_cur_delete_rec(): For other than ROW_FORMAT=COMPRESSED, only write log by mtr_t::page_delete().	2020-02-22 21:19:47 +02:00
Marko Mäkelä	96901d9545	Cleanup: Remove dict_ind_redundant There is no reason for the dummy index object dict_ind_redundant to exist any more. It was only being passed to btr_create(). btr_create(): If !index, assume that a ROW_FORMAT=REDUNDANT table is being created. We could pass ibuf.index, dict_sys.sys_tables->indexes.start and so on, if those objects had been initialized before the function btr_create() is called.	2020-02-20 22:00:43 +02:00
Eugene Kosov	6618fc2974	MDEV-21774 Innodb, Windows : restore file sharing logic in Innodb recv_sys_t opened redo log files along with log_sys_t. That's why I removed file sharing logic from InnoDB in `9ef2d29ff4` But it was actually used to ensure that only one MariaDB instance will touch the same InnoDB files. os0file.cc: revert some changes done previously mapped_file_t::map(): now has arguments read_only, nvme file_io::open(): now has argument read_only class file_os_io: make final log_file_t::open(): now has argument read_only	2020-02-20 18:24:21 +03:00
Marko Mäkelä	84e3f9ce84	MDEV-12353: Reduce log volume by an UNDO_APPEND record We introduce an EXTENDED log record for appending an undo log record to an undo log page. This is equivalent to the MLOG_UNDO_INSERT record that was removed in commit `f802c989ec`, only using more compact encoding. mtr_t::log_write(): Fix a bug that affects longer log record writes in the !same_page && !have_offset case. Similar code is already implemented for the have_offset code path. The bug was unobservable before we started to write longer EXTENDED records. All !have_offset records (FREE_PAGE, INIT_PAGE, EXTENDED) that were written so far are short, and we never write RESERVED or OPTION records. mtr_t::undo_append(): Write an UNDO_APPEND record. log_phys_t::undo_append(): Apply an UNDO_APPEND record. trx_undo_page_set_next_prev_and_add(), trx_undo_page_report_modify(), trx_undo_page_report_rename(): Invoke mtr_t::undo_append() instead of emitting WRITE records.	2020-02-19 16:42:38 +02:00
Marko Mäkelä	86f262f1c7	MDEV-12353: Reduce log volume by an UNDO_INIT record We introduce an EXTENDED log record for initializing an undo log page. The size of the record will be 2 bytes plus the optional page identifier. The entire undo page will be initialized, except the space that is already reserved for TRX_UNDO_SEG_HDR in trx_undo_seg_create(). mtr_t::undo_create(): Write the UNDO_INIT record. trx_undo_page_init(): Initialize the undo page corresponding to the UNDO_INIT record. Unlike the former MLOG_UNDO_INIT record, we will initialize almost the entire page, including initializing the TRX_UNDO_PAGE_NODE to an empty list node, so that the subsequent call to flst_init() will avoid writing log for the undo page.	2020-02-19 15:52:16 +02:00
Eugene Kosov	29bb3744b4	fix libpmem InnoDB linking	2020-02-19 16:37:06 +03:00
Sergei Petrunia	adcfea710f	Fix compile failure, compare_key_parts in handler shadowed by MyRocks The two functions have different signature. Use "using ..." to prevent shadowing	2020-02-19 14:57:47 +03:00
Eugene Kosov	e62e285fc4	remove unused function	2020-02-19 12:51:08 +03:00
Eugene Kosov	9ef2d29ff4	MDEV-14425 deprecate and ignore innodb_log_files_in_group Now there can be only one log file instead of several which logically work as a single file. Possible names of redo log files: ib_logfile0, ib_logfile101 (for just created one) innodb_log_fiels_in_group: value of this variable is not used by InnoDB. Possible values are still 1..100, to not break upgrade LOG_FILE_NAME: add constant of value "ib_logfile0" LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile" get_log_file_path(): convenience function that returns full path of a redo log file SRV_N_LOG_FILES_MAX: removed srv_n_log_files: we can't remove this for compatibility reasons, but now server doesn't use this variable log_sys_t::file::fd: now just one, not std::vector log_sys_t::log_capacity: removed word 'group' find_and_check_log_file(): part of logic from huge srv_start() moved here recv_sys_t::files: file descriptors of redo log files. There can be several of those in case we're upgrading from older MariaDB version. recv_sys_t::remove_extra_log_files: whether to remove ib_logfile{1,2,3...} after successfull upgrade. recv_sys_t::read(): open if needed and read from one of several log files recv_sys_t::files_size(): open if needed and return files count redo_file_sizes_are_correct(): check that redo log files sizes are equal. Just to log an error for a user. Corresponding check was moved from srv0start.cc namespace deprecated: put all deprecated variables here to prevent usage of it by us, developers	2020-02-19 12:21:59 +03:00
Eugene Kosov	df07e00a81	MDEV-20726 InnoDB: Assertion failure in file data0type.cc line 67 Do not rebuild index when it's key part converted from utf8mb3 to utf8mb4 but key part stays the same. dict_index_add_to_cache(): assert that prefix_len is divided by mbmaxlen ha_innobase::compare_key_parts(): compare key part lenght in symbols instead of bytes.	2020-02-18 22:53:29 +03:00
Eugene Kosov	7ccc1710a0	cleanup: key parts comparison Engine specific code moved to engine.	2020-02-18 22:53:28 +03:00
Marko Mäkelä	9fd309498c	MDEV-12353 Cleanup: Rename INIT_INDEX_PAGE to EXTENDED We plan use the redo log record main type code 0x20 for InnoDB specific index page operations. mrec_type_t: Rename INIT_INDEX_PAGE to EXTENDED. mrec_ext_t: The EXTENDED subtypes. This is a non-functional change: the redo log record encoding that was introduced in commit `7ae21b18a6` is not affected.	2020-02-18 12:08:33 +02:00
Marko Mäkelä	23de5b8f07	MDEV-21725 Optimize btr_page_reorganize_low() redo logging btr_page_reorganize_low(): Log only the changed data in the page. TODO: Do not copy the entire changed payload to the redo log. Emit a combination of MEMMOVE and WRITE records to reduce the log volume.	2020-02-18 10:54:28 +02:00
Marko Mäkelä	41fe972db7	MDEV-21744 Assertion `!rec_offs_nth_sql_null(offsets, n)' failed commit `08ba388713` of MDEV-12353 introduced an incorrect assumption, which was documented by the failing assertion. After instant ADD COLUMN, we can have a null (and in-place) UPDATE of NULL to NULL. No data needs to be written for such updates. For ROW_FORMAT=REDUNDANT, we reserve space for the NULL values, and to be compatible with existing behaviour, we will zerofill the unused data bytes when updating to NULL value.	2020-02-17 15:32:24 +02:00
Marko Mäkelä	055ce75d8b	MDEV-21174: Correct a debug assertion failure trx_purge_free_segment(): In some cases (observed when running the test innodb_zip.wl5522_debug_zip), there is no change to the TRX_UNDO_NEEDS_PURGE field. Add mtr_t::OPT to disable a debug check. The bogus debug check was introduced in commit `56f6dab1d0`.	2020-02-17 15:32:24 +02:00
Marko Mäkelä	22f649a67a	MDEV-12353: Reformat page_delete_rec_list_end() We add FIXME comments and some sketch code for the following cases: It is possible to write considerably less log for ROW_FORMAT=COMPRESSED pages. For now, we will delete the records one by one. It is also possible to treat 'deleting the last records' as a special case that would involve shrinking PAGE_HEAP_TOP. That should reduce the need of reorganizing pages.	2020-02-17 15:32:24 +02:00
Marko Mäkelä	09feb176e9	MDEV-12353: Optimize page_cur_delete_rec() logging further page_mem_free(): When deleting the very last record of the page, even if the record did not fully utilize all bytes in a former PAGE_FREE record, truncate the PAGE_HEAP_TOP and reduce PAGE_GARBAGE by the saved amount.	2020-02-17 15:32:24 +02:00
Marko Mäkelä	fc87698048	MDEV-12353: Write less log for BLOB pages fsp_page_create(): Always initialize the page. The logic to avoid initialization was made redundant and should have been removed in mysql/mysql-server@ce0a1e85e2 (MySQL 5.7.5). btr_store_big_rec_extern_fields(): Remove the redundant initialization of FIL_PAGE_PREV and FIL_PAGE_NEXT. An INIT_PAGE record will have been written already. Only write the ROW_FORMAT=COMPRESSED page payload from FIL_PAGE_DATA onwards. We were unnecessarily writing from FIL_PAGE_TYPE onwards, which caused an assertion failure on recovery: recv_sys_t::alloc(size_t): Assertion 'len <= srv_page_size' failed when running the following tests: ./mtr --no-reorder innodb_zip.blob,4k innodb_zip.bug56680,4k	2020-02-17 10:13:32 +02:00
Marko Mäkelä	5874aac71f	MDEV-12353: Fix a Galera assertion failure trx_rseg_write_wsrep_checkpoint(): Add missing mtr_t::OPT, and avoid an unnecessary call to mtr_t::memset(). This addresses a debug assertion failure in wsrep_info.plugin.	2020-02-16 17:22:28 +02:00
Marko Mäkelä	d657cd7465	MDEV-12353: Optimize page_delete_rec_list_end() logging	2020-02-16 15:45:12 +02:00
Marko Mäkelä	5876de19d0	MDEV-12353: Remove bogus conditions page_update_max_trx_id(), page_delete_rec_list_end(): Remove conditions on recv_recovery_is_on(). These conditions should have been removed in or before commit `f8a9f90667` (removing the support for crash-upgrade). The physical redo log based recovery will not call such high-level code.	2020-02-16 15:09:01 +02:00
Marko Mäkelä	3887daf826	MDEV-12353: Optimize page_cur_delete_rec() logging page_mem_free(): When deleting the last record of a page, do not add it to the PAGE_FREE list, but instead truncate the PAGE_HEAP_TOP. Modify the page header fields by writing fewer records. page_cur_delete_rec(): Let page_mem_free() reset the PAGE_LAST_INSERT. page_header_reset_last_insert(): Issue memset(), not memcpy(), for the ROW_FORMAT=COMPRESSED page.	2020-02-16 14:10:26 +02:00
Eugene Kosov	735c6ea3e6	fix Win build	2020-02-14 15:45:18 +03:00
Eugene Kosov	3daef523af	MDEV-17084 Optimize append only files for NVDIMM Optionally use libpmem for InnoDB redo log writing. When server is built -DWITH_PMEM=ON InnoDB tries to detect that redo log is located on persistent memory storage and uses faster file access method. When server is built with -DWITH_PMEM=OFF preprocessor is used to ensure that no slowdown will present due to allocations and virtual function calls. So, we don't slow down server in a common case. mapped_file_t: an map file, unmap file and returns mapped memory buffer file_io: abstraction around memory mapped files and file descriptors. Allows writing, reading and flushing to files. file_io::writes_are_durable(): notable method of a class. When it returns true writes are flushed immediately. file_os_io: file descriptor based file access. Depends on a global state like srv_read_only_mode file_pmem_io: file access via libpmem This is a collaboration work with Sergey Vojtovich	2020-02-14 14:11:10 +03:00
Marko Mäkelä	d901919db2	MDEV-19747: Fix a warning In commit `fc2f2fa853` we replaced FlushObserver* with bool, but forgot to replace one NULL with false.	2020-02-14 11:03:11 +02:00
Marko Mäkelä	37dc087f58	MDEV-12353: Remove bogus comments and clean up code This is a fixup for commit `7ae21b18a6`. It turns out that even if we in the future made LSN count mini-transactions instead of bytes, we will need both start LSN and end LSN, which must exactly match between mtr_t::commit() and log_phys_t::apply(). log_rec_t::lsn: Restore the const qualifier. log_phys_t::append(): Remove the lsn parameter. Both the start and end LSN must remain unchanged. We can only append log from the same mini-transaction to a single log record snippet. If we combined the log from mini-transactions A and B, it could happen that the FIL_PAGE_LSN of the page is somewhere between A.start_lsn and B.start_lsn. In that case, also the log of B would be wrongly skipped. recv_sys_t::add(): Assert that if the start LSN matches, also the end LSN will match.	2020-02-14 10:57:52 +02:00
Marko Mäkelä	f8a9f90667	MDEV-12353: Remove support for crash-upgrade We tighten some assertions regarding dict_index_t::is_dummy and crash recovery, now that redo log processing will no longer create dummy objects.	2020-02-13 19:13:45 +02:00
Marko Mäkelä	7ae21b18a6	MDEV-12353: Change the redo log encoding log_t::FORMAT_10_5: physical redo log format tag log_phys_t: Buffered records in the physical format. The log record bytes will follow the last data field, making use of alignment padding that would otherwise be wasted. If there are multiple records for the same page, also those may be appended to an existing log_phys_t object if the memory is available. In the physical format, the first byte of a record identifies the record and its length (up to 15 bytes). For longer records, the immediately following bytes will encode the remaining length in a variable-length encoding. Usually, a variable-length-encoded page identifier will follow, followed by optional payload, whose length is included in the initially encoded total record length. When a mini-transaction is updating multiple fields in a page, it can avoid repeating the tablespace identifier and page number by setting the same_page flag (most significant bit) in the first byte of the log record. The byte offset of the record will be relative to where the previous record for that page ended. Until MDEV-14425 introduces a separate file-level log for redo log checkpoints and file operations, we will write the file-level records in the page-level redo log file. The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT) will be removed in MDEV-14425, and one sequential scan of the page recovery log will suffice. Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags. If the information is needed, it can be parsed from WRITE records that modify FSP_SPACE_FLAGS. MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily as part of this work, before being replaced with WRITE (along with MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES). mtr_buf_t::empty(): Check if the buffer is empty. mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty. mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record, for the same_page encoding. page_recv_t::last_offset: Reflects mtr_t::m_last_offset. Valid values for last_offset during recovery should be 0 or above 8. (The first 8 bytes of a page are the checksum and the page number, and neither are ever updated directly by log records.) Internally, the special value 1 indicates that the same_page form will not be allowed for the subsequent record. mtr_t::page_create(): Take the block descriptor as parameter, so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE record will always followed by a subtype byte, because same_page records must be longer than 1 byte. trx_undo_page_init(): Combine the writes in WRITE record. trx_undo_header_create(): Write 4 bytes using a special MEMSET record that includes 1 bytes of length and 2 bytes of payload. flst_write_addr(): Define as a static function. Combine the writes. flst_zero_both(): Replaces two flst_zero_addr() calls. flst_init(): Do not inline the function. fsp_free_seg_inode(): Zerofill the whole inode. fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT to FIL_NULL when using the physical format. btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page() must have been invoked. fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE. fil_names_dirty_and_write(): Remove the parameter mtr. Write the records using a separate mini-transaction object, because any FILE_ records must be at the start of a mini-transaction log. recv_recover_page(): Add a fil_space_t* parameter. After applying log to the a ROW_FORMAT=COMPRESSED page, invoke buf_zip_decompress() to restore the uncompressed page. buf_page_io_complete(): Remove the temporary hack to discard the uncompressed page of a ROW_FORMAT=COMPRESSED page. page_zip_write_header(): Remove. Use mtr_t::write() or mtr_t::memset() instead, and update the compressed page frame separately. trx_undo_header_add_space_for_xid(): Remove. trx_undo_seg_create(): Perform the changes that were previously made by trx_undo_header_add_space_for_xid(). btr_reset_instant(): New function: Reset the table to MariaDB 10.2 or 10.3 format when rolling back an instant ALTER TABLE operation. page_rec_find_owner_rec(): Merge with the only callers. page_cur_insert_rec_low(): Combine writes by using a local buffer. MEMMOVE data from the preceding record whenever feasible (copying at least 3 bytes). page_cur_insert_rec_zip(): Combine writes to page header fields. PageBulk::insertPage(): Issue MEMMOVE records to copy a matching part from the preceding record. PageBulk::finishPage(): Combine the writes to the page header and to the sparse page directory slots. mtr_t::write(): Only log the least significant (last) bytes of multi-byte fields that actually differ. For updating FSP_SIZE, we must always write all 4 bytes to the redo log, so that the fil_space_set_recv_size() logic in recv_sys_t::parse() will work. mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument instead of a numeric offset to the page frame. Only log the last bytes of multi-byte fields that actually differ. In fil_space_crypt_t::write_page0(), we must log also any unchanged bytes, so that recovery will recognize the record and invoke fil_crypt_parse(). Future work: MDEV-21724 Optimize page_cur_insert_rec_low() redo logging MDEV-21725 Optimize btr_page_reorganize_low() redo logging MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED	2020-02-13 19:12:17 +02:00
Marko Mäkelä	9869005201	Cleanup ibuf_page_exists(): Take simpler parameters	2020-02-13 18:19:15 +02:00
Marko Mäkelä	67c76704a8	MDEV-12353: Remove MLOG_INDEX_LOAD (innodb_log_optimize_ddl) NOTE: This may break crash-upgrade from a dataset that was created with innodb_log_optimize_ddl=ON. Also due to ROW_FORMAT=COMPRESSED pages, it will be easiest to disallow crash-upgrade. It would be more robust to disable the MDEV-12699 logic when crash-upgrading from old redo log format. log_optimized_ddl_op: Remove. fil_space_t::enable_lsn, file_name_t::enable_lsn: Remove. ddl_tracker_t::optimized_ddl: Remove. TODO: Remove ddl_tracker	2020-02-13 18:19:15 +02:00
Marko Mäkelä	f37a29dd66	MDEV-12353: Write log by mtr_t member functions only mtr_t::log_write_low(): Replaces mlog_write_initial_log_record_low(). mtr_t::log_file_op(): Replaces fil_op_write_log(). mtr_t::free(): Write MLOG_INIT_FREE_PAGE. mtr_t::init(): Write MLOG_INIT_FILE_PAGE2. mtr_t::page_create(): Write record about the partial initialization of an index page. mlog_catenate_ulint(), mlog_catenate_string(), mlog_open(), mlog_close(): Remove.	2020-02-13 18:19:15 +02:00
Marko Mäkelä	8a039ee107	MDEV-12353: Introduce mtr_t::zmemcpy() Exclusively write MLOG_ZIP_WRITE_STRING records by mtr_t::zmemcpy().	2020-02-13 18:19:15 +02:00
Marko Mäkelä	2e7a084283	MDEV-21174: Remove mlog_write_initial_log_record_fast() Pass buf_block_t* to all functions that write redo log. Specifically, replace the parameters page,page_zip with buf_block_t* block in page_zip_ functions.	2020-02-13 18:19:15 +02:00
Marko Mäkelä	498f84a87b	MDEV-12353: Remove mlog_open_and_write_index() Now that all logical log records have been replaced, the function mlog_parse_index() is only needed for crash-upgrading from older versions.	2020-02-13 18:19:15 +02:00
Marko Mäkelä	08ba388713	MDEV-12353: Replace MLOG_REC_INSERT,MLOG_COMP_REC_INSERT page_mem_alloc_free(), page_dir_set_n_heap(), page_ptr_set_direction(): Merge with the callers. page_direction_reset(), page_direction_increment(), page_zip_dir_insert(), page_zip_write_rec_ext(), page_zip_write_rec(): Add the parameter mtr, and write log. PageBulk::insert(), PageBulk::finish(): Write log for all changes. page_cur_rec_insert(), page_cur_insert_rec_write_log(), page_cur_insert_rec_write_log(): Remove. page_rec_set_next(), page_header_set_field(), page_header_set_ptr(): Remove. Use lower-level operations with or without logging. page_zip_dir_add_slot(): Move to the same compilation unit with its only caller, page_cur_insert_rec_zip(). page_cur_insert_rec_zip(): Mark pieces of code that must be skipped once this task is completed. btr_defragment_chunk(): Before starting a mini-transaction that is writing (a lot), invoke log_free_check(). This should allow the test innodb.innodb_defrag_concurrent to pass with the mtr default_mysqld.cnf setting of innodb_log_file_size=10M. MLOG_BUF_MARGIN: Remove.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	2c4d5aa0fe	MDEV-12353: Replace MLOG_ZIP_PAGE_COMPRESS page_zip_compress_write_log(): Write MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_WRITE_STRING records instead of MLOG_ZIP_PAGE_COMPRESS. This depends on the changes to buf_page_io_complete() and friends in the parent commit.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	2a77b2a510	MDEV-12353: Replace MLOG_LIST__DELETE and MLOG_*REC_DELETE No longer write the following redo log records: MLOG_COMP_LIST_END_DELETE, MLOG_LIST_END_DELETE, MLOG_COMP_LIST_START_DELETE, MLOG_LIST_START_DELETE, MLOG_REC_DELETE,MLOG_COMP_REC_DELETE. Each individual deleted record will be logged separately using physical log records. page_dir_slot_set_n_owned(), page_zip_rec_set_owned(), page_zip_dir_delete(), page_zip_clear_rec(): Add the parameter mtr, and write redo log. page_dir_slot_set_rec(): Remove. Replaced with lower-level operations that write redo log when necessary. page_rec_set_n_owned(): Replaces rec_set_n_owned_old(), rec_set_n_owned_new(). rec_set_heap_no(): Replaces rec_set_heap_no_old(), rec_set_heap_no_new(). page_mem_free(), page_dir_split_slot(), page_dir_balance_slot(): Add the parameter mtr. page_dir_set_n_slots(): Merge with the caller page_dir_split_slot(). page_dir_slot_set_rec(): Merge with the callers page_dir_split_slot() and page_dir_balance_slot(). page_cur_insert_rec_low(), page_cur_insert_rec_zip(): Suppress the logging of lower-level operations. page_cur_delete_rec_write_log(): Remove. page_cur_delete_rec(): Do not tolerate mtr=NULL. rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_comp(): Replace rec_set_heap_no_old() and rec_set_heap_no_new() with direct access that does not involve redo logging. mtr_t::memcpy(): Do allow non-redo-logged writes to uncompressed pages of ROW_FORMAT=COMPRESSED pages. buf_page_io_complete(): Evict the uncompressed page of a ROW_FORMAT=COMPRESSED page after recovery. Because we no longer write logical log records for deleting index records, but instead write physical records that may refer directly to the compressed page frame of a ROW_FORMAT=COMPRESSED page, and because on recovery we will only apply the changes to the ROW_FORMAT=COMPRESSED page, the uncompressed page frame can be stale until page_zip_decompress() is executed. recv_parse_or_apply_log_rec_body(): After applying MLOG_ZIP_WRITE_STRING, ensure that the FIL_PAGE_TYPE of the uncompressed page matches the compressed page, because buf_flush_init_for_writing() assumes that field to be valid. mlog_init_t::mark_ibuf_exist(): Invoke page_zip_decompress(), because the uncompressed page after buf_page_create() is not necessarily up to date. buf_LRU_block_remove_hashed(): Bypass a page_zip_validate() check during redo log apply. recv_apply_hashed_log_recs(): Invoke mlog_init.mark_ibuf_exist() also for the last batch, to ensure that page_zip_decompress() will be called for freshly initialized pages.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	d00185c40d	MDEV-12353: Replace MLOG_PAGE_CREATE_RTREE, MLOG_PAGE_COMP_CREATE_RTREE page_create(): Create normal B-tree pages. Callers that create R-tree pages will set FIL_PAGE_TYPE and reset the split sequence number afterwards. The creation of ROW_FORMAT=COMPRESSED pages is unaffected; they will be logged as compressed page images. page_create_low(): Take const buf_block_t* as a parameter. Let the callers invoke buf_block_modify_clock_inc().	2020-02-13 18:19:14 +02:00
Marko Mäkelä	b3d02a1fcf	MDEV-12353: Replace DELETE_MARK redo log records with MLOG_WRITE_STRING btr_cur_upd_rec_sys(): Replaces row_upd_rec_sys_fields() and implements redo logging. row_upd_rec_sys_fields_in_recovery(): Remove, and merge to the only remaining caller btr_cur_parse_update_in_place(). btr_cur_del_mark_set_clust_rec_log(), btr_cur_del_mark_set_sec_rec_log(), btr_cur_set_deleted_flag_for_ibuf(): Remove, and replace with btr_rec_set_deleted<bool>(). page_zip_rec_set_deleted(): Add the parameter mtr, and write a MLOG_ZIP_WRITE_STRING record to the log.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	f3230111fc	MDEV-12353: Introduce MLOG_ZIP_WRITE_STRING Log the low-level operations for ROW_FORMAT=COMPRESSED index pages using a new record, MLOG_ZIP_WRITE_STRING. We will still use MLOG_1BYTE,..., MLOG_8BYTES or MLOG_WRITE_STRING for operations on other than index pages (such as the page allocation bitmap pages). We will stop writing the record MLOG_ZIP_PAGE_COMPRESS later, after replacing all MLOG_REC_ and MLOG_COMP_REC_ that update index pages.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	db5cdc3195	MDEV-12353: Replace MLOG_PAGE_REORGANIZE, MLOG_COMP_PAGE_REORGANIZE Log page reorganize as a series of insert operations. This will make the redo log volume proportional to the page payload size. btr_page_reorganize_low(): Add template <bool recovery=false> btr_page_reorganize_block(): Remove the parameter 'bool recovery'	2020-02-13 18:19:14 +02:00
Marko Mäkelä	276f996af9	MDEV-12353: Replace MLOG_*_END_COPY_CREATED Instead of writing the high-level redo log records MLOG_LIST_END_COPY_CREATED, MLOG_COMP_LIST_END_COPY_CREATED write log for each individual insert of a record. page_copy_rec_list_end_to_created_page(): Remove. This will improve the fill factor of some pages. Adjust some tests accordingly. PageBulk::init(), PageBulk::finish(): Avoid setting bogus limits to PAGE_HEAP_TOP and PAGE_N_DIR_SLOTS. Avoid accessor functions that would enforce these limits before the correct ones are set at the end of PageBulk::finish().	2020-02-13 18:19:14 +02:00
Marko Mäkelä	acd265b69b	MDEV-12353: Exclusively use page_zip_reorganize() for ROW_FORMAT=COMPRESSED page_zip_reorganize(): Restore the page on failure. In callers, omit now-redundant calls to page_zip_decompress(). btr_page_reorganize_low(): Define in static scope only, and remove the z_level parameter. Assert that ROW_FORMAT is not COMPRESSED. btr_page_reorganize_block(), btr_page_reorganize(): Invoke page_zip_reorganize() for ROW_FORMAT=COMPRESSED.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	f802c989ec	MDEV-12353: Replace MLOG_UNDO_INSERT trx_undof_page_add_undo_rec_log(): Remove. trx_undo_page_set_next_prev_and_add(), trx_undo_page_report_modify(), trx_undo_page_report_rename(): Write lower-level redo log records.	2020-02-13 18:19:14 +02:00
Marko Mäkelä	e0bc29df18	MDEV-12353: Replace MLOG_UNDO_HDR_CREATE trx_undo_header_create(): Emit lower-level records instead of writing MLOG_UNDO_HDR_CREATE records.	2020-02-13 18:19:13 +02:00
Marko Mäkelä	737b701786	MDEV-12353: Remove trx_undo_erase_page_end() MariaDB stopped writing the record MLOG_UNDO_ERASE_END in commit `0fd3def284` (10.3.3). Merge trx_undo_erase_page_end() with its callers.	2020-02-13 18:19:13 +02:00
Marko Mäkelä	07d39cde92	MDEV-12353: Replace MLOG_UNDO_INIT trx_undo_page_init(): Write lower-level redo log records by invoking mtr_t::write().	2020-02-13 18:19:13 +02:00
Marko Mäkelä	5bea43f5e0	MDEV-12353: Deprecate and ignore innodb_log_compressed_pages page_zip_compress_write_log_no_data(): Remove. We no longer write the MLOG_ZIP_PAGE_COMPRESS_NO_DATA record. Instead, we will write MLOG_ZIP_PAGE_COMPRESS records.	2020-02-13 18:19:13 +02:00
Marko Mäkelä	600eae9179	MDEV-12353: Remove MTR_LOG_SHORT_INSERTS No longer emit the redo log records MLOG_LIST_END_COPY_CREATED, MLOG_COMP_LIST_END_COPY_CREATED.	2020-02-13 18:19:13 +02:00
Eugene Kosov	c400a73d7a	micro optimization: avoid std::string copy	2020-02-13 16:26:47 +03:00
Vicențiu Ciorbaru	5aebd78e27	MDEV-18650: Options deprecated in previous versions - mroonga_default_parser Variable is marked as deprecated since 10.1.6. Update tests to not make use of it.	2020-02-13 13:42:01 +02:00
Marko Mäkelä	20a7f75fbf	MDEV-15058: Revert the changes to INFORMATION_SCHEMA For compatibility with diagnostic software, let us return a dummy buffer pool identifier 0 and restore the columns that were initially deleted in commit `1a6f708ec5`: information_schema.innodb_buffer_page.pool_id information_schema.innodb_buffer_page_lru.pool_id information_schema.innodb_buffer_pool_stats.pool_id information_schema.innodb_cmpmem.buffer_pool_instance information_schema.innodb_cmpmem_reset.buffer_pool_instance Thanks to Vladislav Vaintroub for pointing this out.	2020-02-12 20:54:59 +02:00
Marko Mäkelä	1a6f708ec5	MDEV-15058: Deprecate and ignore innodb_buffer_pool_instances Our benchmarking efforts indicate that the reasons for splitting the buf_pool in commit `c18084f71b` have mostly gone away, possibly as a result of mysql/mysql-server@ce6109ebfd or similar work. Only in one write-heavy benchmark where the working set size is ten times the buffer pool size, the buf_pool->mutex would be less contended with 4 buffer pool instances than with 1 instance, in buf_page_io_complete(). That contention could be alleviated further by making more use of std::atomic and by splitting buf_pool_t::mutex further (MDEV-15053). We will deprecate and ignore the following parameters: innodb_buffer_pool_instances innodb_page_cleaners There will be only one buffer pool and one page cleaner task. In a number of INFORMATION_SCHEMA views, columns that indicated the buffer pool instance will be removed: information_schema.innodb_buffer_page.pool_id information_schema.innodb_buffer_page_lru.pool_id information_schema.innodb_buffer_pool_stats.pool_id information_schema.innodb_cmpmem.buffer_pool_instance information_schema.innodb_cmpmem_reset.buffer_pool_instance	2020-02-12 14:45:21 +02:00
Marko Mäkelä	0448c614c8	MDEV-16264: Remove unused page_cleaner_t::is_started	2020-02-12 11:32:09 +02:00
Marko Mäkelä	2a6fa1c42b	MDEV-21132: Use memcpy_aligned, memset_aligned	2020-02-12 11:32:09 +02:00
Oleksandr Byelkin	4b087e1754	Merge branch '10.4' into 10.5	2020-02-12 08:55:17 +01:00
Marko Mäkelä	fc2f2fa853	MDEV-19747: Deprecate and ignore innodb_log_optimize_ddl During native table rebuild or index creation, InnoDB used to skip redo logging and write MLOG_INDEX_LOAD records to inform crash recovery and Mariabackup of the gaps in redo log. This is fragile and prohibits some optimizations, such as skipping the doublewrite buffer for newly (re)initialized pages (MDEV-19738). row_merge_write_redo(): Remove. We do not write MLOG_INDEX_LOAD records any more. Instead, we write full redo log. FlushObserver: Remove. fseg_free_page_func(): Remove the parameter log. Redo logging cannot be disabled. fil_space_t::redo_skipped_count: Remove. We cannot remove buf_block_t::skip_flush_check, because PageBulk will temporarily generate invalid B-tree pages in the buffer pool.	2020-02-11 18:44:26 +02:00
Marko Mäkelä	8ccb3caafb	MDEV-17491 micro optimize page_id_t further Let us define page_id_t as a thin wrapper of uint64_t so that the comparison operators can be simplified. This is a follow-up to the original commit `14be814380`. The comparison operator for recv_sys.pages.emplace() turned out to be a busy spot in a recovery benchmark. That data structure was introduced in MDEV-19586 in commit `177a571e01`.	2020-02-11 18:03:19 +02:00

1 2 3 4 5 ...

23161 commits