mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-16 03:52:35 +01:00

Author	SHA1	Message	Date
Monty	52c29f3bdc	MDEV-35469 Heap tables are calling mallocs to often Heap tables are allocated blocks to store rows according to my_default_record_cache (mapped to the server global variable read_buffer_size). This causes performance issues when the record length is big (> 1000 bytes) and the my_default_record_cache is small. Changed to instead split the default heap allocation to 1/16 of the allowed space and not use my_default_record_cache anymore when creating the heap. The allocation is also aligned to be just under a power of 2. For some test that I have been running, which was using record length=633, the speed of the query doubled thanks to this change. Other things: - Fixed calculation of max_records passed to hp_create() to take into account padding between records. - Updated calculation of memory needed by heap tables. Before we did not take into account internal structures needed to access rows. - Changed block sized for memory_table from 1 to 16384 to get less fragmentation. This also avoids a problem where we need 1K to manage index and row storage which was not counted for before. - Moved heap memory usage to a separate test for 32 bit. - Allocate all data blocks in heap in powers of 2. Change reported memory usage for heap to reflect this. Reviewed-by: Sergei Golubchik <serg@mariadb.org>	2025-01-05 16:40:11 +02:00
Vlad Lesin	8c7786e7d5	MDEV-34690 lock_rec_unlock_unmodified() causes deadlock lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock() or under a combination of lock_sys.rd_lock() + record locks hash table cell latch. It also requests page latch to check if locked records were changed by the current transaction or not. Usually InnoDB requests page latch to find the certain record on the page, and then requests lock_sys and/or record lock hash cell latch to request record lock. lock_rec_unlock_unmodified() requests the latches in the opposite order, what causes deadlocks. One of the possible scenario for the deadlock is the following: thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table cell latch, the latch is acquired; thread 2 - purge thread acquires page latch and tries to remove delete-marked record, it invokes lock_update_delete(), which requests locks hash table cell latch, held by thread 1; thread 1 - requests page latch, held by thread 2. To fix it we need to release lock_sys.latch and/or lock hash cell latch, acquire page latch and re-acquire lock_sys related latches. When lock_sys.latch and/or lock hash cell latch are released in lock_release_on_prepare() and lock_release_on_prepare_try(), the page on which the current lock is held, can be merged. In this case the bitmap of the current lock must be cleared, and the new lock must be added to the end of trx->lock.trx_locks list, or bitmap of already existing lock must be changed. The new field trx_lock_t::set_nth_bit_calls indicates if new locks (bits in existing lock bitmaps or new lock objects) were created during the period when lock_sys was released in trx->lock.trx_locks list iteration loop in lock_release_on_prepare() or lock_release_on_prepare_try(). And, if so, we traverse the list again. The block can be freed during pages merging, what causes assertion failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page get mode to it. That's why page_get_mode parameter was added to btr_block_get() to pass BUF_GET_POSSIBLY_FREED from lock_release_on_prepare() and lock_release_on_prepare_try() to buf_page_get_gen(). As searching for id of trx, which modified secondary index record, is quite expensive operation, restrict its usage for master. System variable was added to remove the restriction for testing simplifying. The variable exists only either for debug build or for build with -DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the probability of catching bugs for release build with RQG. Note that the code, which does primary index lookup to find out what transaction modified secondary index record, is necessary only when there is no primary key and no unique secondary key on replica with row based replication, because only in this case extra X locks on unmodified records can be set during scan phase. Reviewed by Marko Mäkelä.	2024-10-23 12:36:17 +03:00
Sergei Golubchik	3a1cf2c85b	MDEV-34679 ER_BAD_FIELD uses non-localizable substrings	2024-10-17 21:37:37 +02:00
Marko Mäkelä	7e0afb1c73	Merge 10.5 into 10.6	2024-10-03 09:31:39 +03:00
Sergei Petrunia	1cda4726ca	MDEV-34993, part2: backport optimizer_adjust_secondary_key_costs ...and make the fix for MDEV-34993 switchable. It is enabled by default and controlled with @optimizer_adjust_secondary_key_costs=fix_card_multiplier	2024-10-02 10:52:09 +03:00
Sergei Petrunia	8166a5d33d	MDEV-34993: Incorrect cardinality estimation causes poor query plan When calculate_cond_selectivity_for_table() takes into account multi- column selectivities from range access, it tries to take-into account that selectivity for some columns may have been already taken into account. For example, for range access on IDX1 using {kp1, kp2}, the selectivity of restrictions on "kp2" might have already been taken into account to some extent. So, the code tries to "discount" that using rec_per_key[] estimates. This seems to be wrong and unreliable: the "discounting" may produce a rselectivity_multiplier number that hints that the overall selectivity of range access on IDX1 was greater than 1. Do a conservative fix: if we arrive at conclusion that selectivity of range access on condition in IDX1 >1.0, clip it down to 1.	2024-10-02 10:52:09 +03:00
Lena Startseva	0a5e4a0191	MDEV-31005: Make working cursor-protocol Updated tests: cases with bugs or which cannot be run with the cursor-protocol were excluded with "--disable_cursor_protocol"/"--enable_cursor_protocol" Fix for v.10.5	2024-09-18 18:39:26 +07:00
Sergei Petrunia	c630e23a18	MDEV-34894: Poor query plan, because range estimates are not reused for ref(const) (Variant 4, with @@optimizer_adjust_secondary_key_costs, reuse in two places, and conditions are replaced with equivalent simpler forms in two more) In best_access_path(), ReuseRangeEstimateForRef-3, the check for whether "all used key_part_i used key_part_i=const" was incorrect: it may produced a "NO" answer for cases when we had: key_part1= const // some key parts are usable key_part2= value_not_in_join_prefix //present but unusable key_part3= non_const_value // unusable due to gap in key parts. This caused the optimizer to fail to apply ReuseRangeEstimateForRef heuristics. The consequence is poor query plan choice when the index in question has very skewed data distribution. The fix is enabled if its @@optimizer_adjust_secondary_key_costs flag is set.	2024-09-08 16:26:13 +03:00
Sergei Petrunia	c8d040938a	MDEV-34720: Poor plan choice for large JOIN with ORDER BY and small LIMIT (Variant 2b: call greedy_search() twice, correct handling for limited search_depth) Modify the join optimizer to specifically try to produce join orders that can short-cut their execution for ORDER BY..LIMIT clause. The optimization is controlled by @@optimizer_join_limit_pref_ratio. Default value 0 means don't construct short-cutting join orders. Other value means construct short-cutting join order, and prefer it only if it promises speedup of more than #value times. In Optimizer Trace, look for these names: * join_limit_shortcut_is_applicable * join_limit_shortcut_plan_search * join_limit_shortcut_choice	2024-09-02 16:37:18 +03:00
Marko Mäkelä	bda40ccb85	MDEV-34803 innodb_lru_flush_size is no longer used In commit `fa8a46eb68` (MDEV-33613) the parameter innodb_lru_flush_size ceased to have any effect. Let us declare the parameter as deprecated and additionally as MARIADB_REMOVED_OPTION, so that there will be a warning written to the error log in case the option is specified in the command line. Let us also do the same for the parameter innodb_purge_rseg_truncate_frequency that was deprecated&ignored earlier in MDEV-32050. Reviewed by: Debarun Banerjee	2024-08-28 07:18:03 +03:00
Marko Mäkelä	b7b9f3ce82	MDEV-34515: Contention between purge and workload In a Sysbench oltp_update_index workload that involves 1 table, a serious contention between the workload and the purge of history was observed. This was the worst when the table contained only 1 record. This turned out to be fixed by setting innodb_purge_batch_size=128, which corresponds to the number of usable persistent rollback segments. When we go above that, there would be contention between row_purge_poss_sec() and the workload, typically on the clustered index page latch, sometimes also on a secondary index page latch. It might be that with smaller batches, trx_sys.history_size() will end up pausing all concurrent transaction start/commit frequently enough so that purge will be able to make some progress, so that there would be less contention on the index page latches between purge and SQL execution. In commit `aa719b5010` (part of MDEV-32050) the interpretation of the parameter innodb_purge_batch_size was slightly changed. It would correspond to the maximum desired size of the purge_sys.pages cache. Before that change, the parameter was referring to a number of undo log pages, but the accounting might have been inaccurate. To avoid a regression, we will reduce the default value to innodb_purge_batch_size=127, which will also be compatible with innodb_undo_tablespaces>1 (which will disable rollback segment 0). Additionally, some logic in the purge and MVCC checks is simplified. The purge tasks will make use of purge_sys.pages when accessing undo log pages to find out if a secondary index record can be removed. If an undo page needs to be looked up in buf_pool.page_hash, we will merely buffer-fix it. This is correct, because the undo pages are append-only in nature. Holding purge_sys.latch or purge_sys.end_latch or the fact that the current thread is executing as a part of an in-progress purge batch will prevent the contents of the undo page from being freed and subsequently reused. The buffer-fix will prevent the page from being evicted form the buffer pool. Thanks to this logic, we can refer to the undo log record directly in the buffer pool page and avoid copying the record. buf_pool_t::page_fix(): Look up and buffer-fix a page. This is useful for accessing undo log pages, which are append-only by nature. There will be no need to deal with change buffer or ROW_FORMAT=COMPRESSED in that case. purge_sys_t::view_guard::view_guard(): Allow the type of guard to be acquired: end_latch, latch, or no latch (in case we are a purge thread). purge_sys_t::view_guard::get(): Read-only accessor to purge_sys.pages. purge_sys_t::get_page(): Invoke buf_pool_t::page_fix(). row_vers_old_has_index_entry(): Replaced with row_purge_is_unsafe() and row_undo_mod_sec_unsafe(). trx_undo_get_undo_rec(): Merged to trx_undo_prev_version_build(). row_purge_poss_sec(): Add the parameter mtr and remove redundant or unused parameters sec_pcur, sec_mtr, is_tree. We will use the caller's mtr object but release any acquired page latches before returning. btr_cur_get_page(), page_cur_get_page(): Do not invoke page_align(). row_purge_remove_sec_if_poss_leaf(): Return the value of PAGE_MAX_TRX_ID to be checked against the page in row_purge_remove_sec_if_poss_tree(). If the secondary index page was not changed meanwhile, it will be unnecessary to invoke row_purge_poss_sec() again. trx_undo_prev_version_build(): Access any undo log pages using the caller's mini-transaction object. row_purge_vc_matches_cluster(): Moved to the only compilation unit that needs it. Reviewed by: Debarun Banerjee	2024-08-26 12:23:06 +03:00
Monty	4bf7c966b3	MDEV-34664: Add an option to fix InnoDB's doubling of secondary index cardinalities (With trivial fixes by sergey@mariadb.com) Added option fix_innodb_cardinality to optimizer_adjust_secondary_key_costs Using fix_innodb_cardinality disables the 'divide by 2' of rec_per_key_int in InnoDB that in effect doubles the Cardinality for secondary keys. This has the biggest effect for indexes where a few rows has the same key value. Using this may also cause table scans for very small tables (which in some cases may be better than an index scan). The user visible effect is that 'SHOW INDEX FROM table_name' will for InnoDB show the true Cardinality (and not 2x the real value). It will also allow the optimizer to chose a better index in some cases as the division by 2 could have a bad effect for tables with 2-5 identical values per key. A few notes about using fix_innodb_cardinality: - It has direct affect for SHOW INDEX FROM table_name. SHOW INDEX will also update the statistics in table share. - The effect of fix_innodb_cardinality for query plans or EXPLAIN is only visible after first open of the table. This is why one must do a flush tables or use SHOW INDEX for the option to take effect. - Using fix_innodb_cardinality can thus affect all user in their query plans if they are using the same tables. Because of this, it is strongly recommended that one uses optimizer_adjust_secondary_key_costs=fix_innodb_cardinality mainly in configuration files to not cause issues for other users.	2024-07-29 16:40:53 +03:00
Oleksandr Byelkin	dcd8a64892	Merge branch '10.5' into 10.6	2024-07-03 13:27:23 +02:00
Monty	2739b5f5f8	MDEV-34494 Add server_uid global variable and add it to error log at startup The feedback plugin server_uid variable and the calculate_server_uid() function is moved from feedback/utils.cc to sql/mysqld.cc server_uid is added as a global variable (shown in 'show variables') and is written to the error log on server startup together with server version and server commit id.	2024-07-02 11:26:13 +03:00
Monty	d8c9c5ead6	MDEV-34491 Setting log_slow_admin="" at startup should be converted to log_slow_admin=ALL We have an issue if a user have the following in a configuration file: log_slow_filter="" # Log everything to slow query log log_queries_not_using_indexes=ON This set log_slow_filter to 'not_using_index' which disables slow_query_logging of most queries. In effect, on should never use log_slow_filter="" in config files but instead use log_slow_filter=ALL. Fixed by changing log_slow_filter="" that comes either from a configuration file or from the command line, when starting to the server, to log_slow_filter=ALL. A warning will be printed when this happens. Other things: - One can now use =ALL for any 'set' variable to set all options at once. (backported from 10.6)	2024-07-02 11:26:13 +03:00
Daniel Black	e7b76f87c4	MDEV-34437 restrict port and extra-port to tcp valid values extra_port and port are 16 bit numbers and not 32 bit as they are tcp ports. Restrict their value.	2024-07-01 17:43:12 +10:00
Sergei Golubchik	41296a07c8	Merge branch '10.5' into 10.6	2024-04-11 13:58:22 +02:00
Alexander Barkov	9fb8881ef8	MDEV-28366 GLOBAL debug_dbug setting affected by collation_connection=utf16... When the system variables @@debug_dbug was assigned to some expression, Sys_debug_dbug::do_check() did not properly convert the value from the expression character set to utf8. So the value was erroneously re-interpretted as utf8 without conversion. In case of a tricky expression character set (e.g. utf16le), this led to unexpected results. Fix: Re-using Sys_var_charptr::do_string_check() in Sys_debug_dbug::do_check().	2024-04-10 06:09:45 +04:00
Sergei Golubchik	f50694c52b	remove pointless test	2024-03-27 16:14:55 +01:00
Marko Mäkelä	b8a6719889	MDEV-26642/MDEV-26643/MDEV-32898 Implement innodb_snapshot_isolation https://jepsen.io/analyses/mysql-8.0.34 highlights that the transaction isolation levels in the InnoDB storage engine do not correspond to any widely accepted definitions, such as "Generalized Isolation Level Definitions" https://pmg.csail.mit.edu/papers/icde00.pdf (PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ, PL-3 = SERIALIZABLE). Only READ UNCOMMITTED in InnoDB seems to match the above definition. The issue is that InnoDB does not detect write/write conflicts (Section 4.4.3, Definition 6) in the above. It appears that as soon as we implement write/write conflict detection (SET SESSION innodb_snapshot_isolation=ON), the default isolation level (SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of "A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf Locking reads inside InnoDB used to read the latest committed version, ignoring what should actually be visible to the transaction. The added test innodb.lock_isolation illustrates this. The statement UPDATE t SET a=3 WHERE b=2; is executed in a transaction that was started before a read view or a snapshot of the current transaction was created, and committed before the current transaction attempts to execute UPDATE t SET b=3; If SET innodb_snapshot_isolation=ON is in effect when the second transaction was started, the second transaction will be aborted with the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF), the second transaction would execute inconsistently, displaying an incorrect SELECT COUNT(*) FROM t in its read view. If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a record that does not exist in the current read view is made, an error DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will be raised. This error will be treated in the same way as a deadlock: the transaction will be rolled back. lock_clust_rec_read_check_and_lock(): If the current transaction has a read view where the record is not visible and innodb_snapshot_isolation=ON, fail before trying to acquire the lock. row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON, disable the "semi-consistent read" logic that had been implemented by myself on the directions of Heikki Tuuri in order to address https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer wanting UPDATE to skip locked rows that do not match the WHERE condition. It looks like my changes were included in the MySQL 5.1.5 commit ad126d90e019f223470e73e1b2b528f9007c4532; at that time, employees of Innobase Oy (a recent acquisition of Oracle) had lost write access to the repository. The only reason why we set innodb_snapshot_isolation=OFF by default is backward compatibility with applications, such as the one that motivated the implementation of "semi-consistent read" back in 2005. In a later major release, we can default to innodb_snapshot_isolation=ON. Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations and some testing of this fix. Thanks to Vladislav Lesin for the initial test for MDEV-26643, as well as reviewing these changes.	2024-03-20 09:48:03 +02:00
Monty	3907345e22	MDEV-33306 Optimizer choosing incorrect index in 10.6, 10.5 but not in 10.4 In MariaDB up to 10.11, the test_if_cheaper_ordering() code (that tries to optimizer how GROUP BY is executed) assumes that if a table scan is used then if there is any index usable by GROUP BY it will be used. The reason MySQL 10.4 provides a better plan is because of two differences: - Plans using 'ref' has a cost of 1/10 of what it should be (as a protection against table scans). This is why 'ref' is used in 10.4 and not in 10.5. - When 'ref' is used, then GROUP BY will not use an index for GROUP BY. In MariaDB 10.5 the chosen plan is a table scan (as it calculated to be faster) but as 'ref' is not used, the test_if_cheaper_ordering() optimizer phase decides (as ref is not usd) to use an index for GROUP BY, which has bad performance. Description of fix: - All new code is protected by the "optimizer_adjust_secondary_key_costs" variable, which is now a bit map, and is only executed if the option "disable_forced_index_in_group_by" set. - Corrects GROUP BY handling in test_if_cheaper_ordering() by making the choise of using and index with GROUP BY cost based instead of rule based. - Adds TIME_FOR_COMPARE to all costs, when using group by, to make read_time, index_scan_time and range_cost comparable. Other things: - Made optimizer_adjust_secondary_key_costs a bit map (compatible with old code). Notes: Current code ignores costs for the algorithm used when doing GROUP BY on the first table: - Create an in-memory temporary table for handling group by and doing a filesort of the result file We can probably in 10.6 continue to ignore this cost. This patch should NOT be merged to 11.0 series (not needed in 11.0).	2024-02-12 16:43:00 +02:00
Marko Mäkelä	91a2192bf2	Merge 10.5 into 10.6	2024-02-07 13:51:03 +02:00
Thirunarayanan Balathandayuthapani	c31b1ee26a	MDEV-33341 innodb.undo_space_dblwr test case fails with Unknown Storage Engine InnoDB - Failed to reset the innodb_fil_make_page_dirty_debug variable in innodb_saved_page_number_debug_basic test case.	2024-02-07 12:35:18 +02:00
Thirunarayanan Balathandayuthapani	21f18bd9d7	MDEV-33341 innodb.undo_space_dblwr test case fails with Unknown Storage Engine InnoDB Reason: ====== undo_space_dblwr test case fails if the first page of undo tablespace is not flushed before restart the server. While restarting the server, InnoDB fails to detect the first page of undo tablespace from doublewrite buffer. Fix: === Use "ib_log_checkpoint_avoid_hard" debug sync point to avoid checkpoint and make sure to flush the dirtied page before killing the server. innodb_make_page_dirty(): Fails to set srv_fil_make_page_dirty_debug variable.	2024-01-31 15:55:09 +05:30
Monty	ed76a2e8ac	Updated some 32 bit result files in sys_vars	2024-01-27 16:51:15 +02:00
Monty	26c86c39fc	Fixed some mtr tests that failed on windows Most things where wrong in the test suite. The one thing that was a bug was that table_map_id was in some places defined as ulong and in other places as ulonglong. On Linux 64 bit this is not a problem as ulong == ulonglong, but on windows this caused failures. Fixed by ensuring that all instances of table_map_id are ulonglong.	2024-01-23 13:03:12 +02:00
Monty	6f65e08277	MDEV-33118 optimizer_adjust_secondary_key_costs variable optimizer-adjust_secondary_key_costs is added to provide 2 small adjustments to the 10.x optimizer cost model. This can be used in the case where the optimizer wrongly uses a secondary key instead of a clustered primary key. The reason behind this change is that MariaDB 10.x does not take into account that for engines like InnoDB, that scanning a primary key can be up to 7x faster than scanning a secondary key + read the row data trough the primary key. The different values for optimizer_adjust_secondary_key_costs are: optimizer_adjust_secondary_key_costs=0 - No changes to current model optimizer_adjust_secondary_key_costs=1 - Ensure that the cost of of secondary indexes has a cost of at least 5x times the cost of a clustered primary key (if one exists). This disables part of the worst_seek optimization described below. optimizer_adjust_secondary_key_costs=2 - Disable "worst_seek optimization" and adjust filter cost slightly (add cost of 1 if filter is used). The idea behind 'worst_seek optimization' is that we limit the cost for all non clustered ref access to the least of: - best-rows-by-range (or all rows in no range found) / 10 - scan-time-table (roughly number of file blocks to scan table) * 3 In addition we also do not try to use rowid_filter if number of rows estimated for 'ref' access is less than the worst_seek limitation. The idea is that worst_seek is trying to take into account that if we do a lot of accesses through a key, this is likely to be cached. However it only does this for secondary keys, and not for clustered keys or index only reads. The effect of the worst_seek are: - In some cases 'ref' will have a much lower cost than range or using a clustered key. - Some possible rowid filters for secondary keys will be ignored. When implementing optimizer_adjust_secondary_key_costs=2, I noticed that there is a slightly different costs for how ref+filter and range+filter are calculated. This caused a lot of range and range+filter to change to ref+filter, which is not good as range+filter provides the optimizer a better estimate of how many accepted rows there will be in the result set. Adding a extra small cost (1 seek) when using filter mitigated the above problems in almost all cases. This patch should not be applied to MariaDB 11.0 as worst_seeks is removed in 11.0 and the cost calculation for clustered keys, secondary keys, index scan and filter is more exact. Test case changes for --optimizer-adjust_secondary_key_costs=1 (Fix secondary key costs to be 5x of primary key): - stat_tables_innodb: - Complex change (probably ok as number of rows are really small) - ref over 1 row changed to range over 10 rows with join buffer - ref over 5 rows changed to eq_ref - secondary ref over 1 row changed to ref of primary key over 4 rows - Change of key to use longer key with index pushdown (a little bit worse but not significant). - Change to use secondary (1 row) -> primary (4 rows) - rowid_filter_innodb: - index_merge (2 rows) & ref (1) -> all (23 rows) -> primary eq_ref. Test case changes for --optimizer-adjust_secondary_key_costs=2 (remove of worst_seeks & adjust filter cost): - stat_tables_innodb: - Join order change (probably ok as number of rows are really small) - ref (5 rows) & ref(1 row) changed to range (10 rows & join buffer) & eq_ref. - selectivity_innodb: - ref -> ref\|filter (ok) - rowid_filter_innodb: - ref -> ref\|filter (ok) - range\|filter (64 rows) changed to ref\|filter (128 rows). ok as ref\|filter outputs wrong number of rows in explain. - range, range_mrr_icp: -ref (500 rows -> ALL (1000 rows) (ok) - select_pkeycache, select, select_jcl6: - ref\|filter (2 rows) -> ref (2 rows) (ok) - selectivity: - ref -> ref_filter (ok) - range: - Change of 'filtered' but no stat or plan change (ok) - selectivity: - ref -> ref+filter (ok) - Change of filtered but no plan change (ok) - join_nested_jcl6: - range -> ref\|filter (ok as only 2 rows) - subselect3, subselect3_jcl6: - ref_or_null (4 rows) -> ALL (10 rows) (ok) - Index_subquery (4 rows) -> ALL (10 rows) (ok) - partition_mrr_myisam, partition_mrr_aria and partition_mrr_innodb: - Uses ALL instead of REF for a key value that is the same for > 50% of rows. (good) order_by_innodb: - range (200 rows) -> ref (20 rows)+filesort (ok) - subselect_sj2_mat: - One test changed. One ALL removed and replaced with eq_ref. Likely to be better. - join_cache: - Changed ref over 60% of the rows to use hash join (ok) - opt_tvc: - Changed to use eq_ref instead of ref with plan change (probably ok) - opt_trace: - No worst/max seeks clipping (good). - Almost double range_scan_time and index_scan_time (ok). - rowid_filter: - ref -> ref\|filtered (ok) - range\|filter (77 rows) changed to ref\|filter (151 rows). Proably ok as ref\|filter outputs wrong number of rows in explain. Reviewer: Sergei Petrunia <sergey@mariadb.com>	2024-01-23 13:03:11 +02:00
Michael Widenius	7af50e4df4	MDEV-32551: "Read semi-sync reply magic number error" warnings on master rpl_semi_sync_slave_enabled_consistent.test and the first part of the commit message comes from Brandon Nesterenko. A test to show how to induce the "Read semi-sync reply magic number error" message on a primary. In short, if semi-sync is turned on during the hand-shake process between a primary and replica, but later a user negates the rpl_semi_sync_slave_enabled variable while the replica's IO thread is running; if the io thread exits, the replica can skip a necessary call to kill_connection() in repl_semisync_slave.slave_stop() due to its reliance on a global variable. Then, the replica will send a COM_QUIT packet to the primary on an active semi-sync connection, causing the magic number error. The test in this patch exits the IO thread by forcing an error; though note a call to STOP SLAVE could also do this, but it ends up needing more synchronization. That is, the STOP SLAVE command also tries to kill the VIO of the replica, which makes a race with the IO thread to try and send the COM_QUIT before this happens (which would need more debug_sync to get around). See THD::awake_no_mutex for details as to the killing of the replica’s vio. Notes: - The MariaDB documentation does not make it clear that when one enables semi-sync replication it does not matter if one enables it first in the master or slave. Any order works. Changes done: - The rpl_semi_sync_slave_enabled variable is now a default value for when semisync is started. The variable does not anymore affect semisync if it is already running. This fixes the original reported bug. Internally we now use repl_semisync_slave.get_slave_enabled() instead of rpl_semi_sync_slave_enabled. To check if semisync is active on should check the @@rpl_semi_sync_slave_status variable (as before). - The semisync protocol conflicts in the way that the original MySQL/MariaDB client-server protocol was designed (client-server send and reply packets are strictly ordered and includes a packet number to allow one to check if a packet is lost). When using semi-sync the master and slave can send packets at 'any time', so packet numbering does not work. The 'solution' has been that each communication starts with packet number 1, but in some cases there is still a chance that the packet number check can fail. Fixed by adding a flag (pkt_nr_can_be_reset) in the NET struct that one can use to signal that packet number checking should not be done. This is flag is set when semi-sync is used. - Added Master_info::semi_sync_reply_enabled to allow one to configure some slaves with semisync and other other slaves without semisync. Removed global variable semi_sync_need_reply that would not work with multi-master. - Repl_semi_sync_master::report_reply_packet() can now recognize the COM_QUIT packet from semisync slave and not give a "Read semi-sync reply magic number error" error for this case. The slave will be removed from the Ack listener. - On Windows, don't stop semisync Ack listener just because one slave connection is using socket_id > FD_SETSIZE. - Removed busy loop in Ack_receiver::run() by using "Self-pipe trick" to signal new slave and stop Ack_receiver. - Changed some Repl_semi_sync_slave functions that always returns 0 from int to void. - Added Repl_semi_sync_slave::slave_reconnect(). - Removed dummy_function Repl_semi_sync_slave::reset_slave(). - Removed some duplicate semisync notes from the error log. - Add test of "if (get_slave_enabled() && semi_sync_need_reply)" before calling Repl_semi_sync_slave::slave_reply(). (Speeds up the code as we can skip all initializations). - If epl_semisync_slave.slave_reply() fails, we disable semisync for that connection. - We do not call semisync.switch_off() if there are no active slaves. Instead we check in Repl_semi_sync_master::commit_trx() if there are no active threads. This simplices the code. - Changed assert() to DBUG_ASSERT() to ensure that the DBUG log is flushed in case of asserts. - Removed the internal rpl_semi_sync_slave_status as it is not needed anymore. The @@rpl_semi_sync_slave_status status variable is now mapped to rpl_semi_sync_enabled. - Removed rpl_semi_sync_slave_enabled as it is not needed anymore. Repl_semi_sync_slave::get_slave_enabled() contains the active status. - Added checking that we do not add a slave twice with Ack_receiver::add_slave(). This could happen with old code. - Removed Repl_semi_sync_master::check_and_switch() as it is not needed anymore. - Ensure that when we call Ack_receiver::remove_slave() that the slave is removed from the listener before function returns. - Call listener.listen_on_sockets() outside of mutex for better performance and less contested mutex. - Ensure that listening is ignoring newly added slaves when checking for responses. - Fixed the master ack_receiver listener is not killed if there are no connected slaves (and thus stop semisync handling of future connections). This could happen if all slaves sockets where would be marked as unreliable. - Added unlink() to base_ilist_iterator and remove() to I_List_iterator. This enables us to remove 'dead' slaves in Ack_recever::run(). - kill_zombie_dump_threads() now does killing of dump threads properly. - It can now kill several threads (should be impossible but could happen if IO slaves reconnects very fast). - We now wait until the dump thread is done before starting the dump. - Added an error if kill_zombie_dump_threads() fails. - Set thd->variables.server_id before calling kill_zombie_dump_threads(). This simplies the code. - Added a lot of comments both in code and tests. - Removed DBUG_EVALUATE_IF "failed_slave_start" as it is not used. Test changes: - rpl.rpl_session_var2 added which runs rpl.rpl_session_var test with semisync enabled. - Some timings changed slight with startup of slave which caused rpl_binlog_dump_slave_gtid_state_info.text to fail as it checked the error log file before the slave had started properly. Fixed by adding wait_for_pattern_in_file.inc that allows waiting for the pattern to appear in the log file. - Tests have been updated so that we first set rpl_semi_sync_master_enabled on the master and then set rpl_semi_sync_slave_enabled on the slaves (this is according to how the MariaDB documentation document how to setup semi-sync). - Error text "Master server does not have semi-sync enabled" has been replaced with "Master server does not support semi-sync" for the case when the master supports semi-sync but semi-sync is not enabled. Other things: - Some trivial cleanups in Repl_semi_sync_master::update_sync_header(). - We should in 11.3 changed the default value for rpl-semi-sync-master-wait-no-slave from TRUE to FALSE as the TRUE does not make much sense as default. The main difference with using FALSE is that we do not wait for semisync Ack if there are no slave threads. In the case of TRUE we wait once, which did not bring any notable benefits except slower startup of master configured for using semisync. Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com> This solves the problem reported in MDEV-32960 where a new slave may not be registered in time and the master disables semi sync because of that.	2024-01-23 13:03:11 +02:00
Sergei Golubchik	e95bba9c58	Merge branch '10.5' into 10.6	2023-12-17 11:20:43 +01:00
Sergei Golubchik	4231cf6d3f	MDEV-32617 deprecate secure_auth=0	2023-12-12 15:21:28 +01:00
Sergei Golubchik	98a39b0c91	Merge branch '10.4' into 10.5	2023-12-02 01:02:50 +01:00
Marko Mäkelä	b3a628c7d4	Merge 10.5 into 10.6	2023-11-30 10:45:01 +02:00
Monty	387b92df97	Remove deprication from mariadbd --debug --debug is supported by allmost all our other binaries and we should keep it also in the server to keep option names similar.	2023-11-28 16:33:22 +02:00
Kristian Nielsen	d415f600cd	MDEV-32844: THD::rli_fake/rgi_fake not cleared on new connection Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-11-20 17:33:02 +01:00
Marko Mäkelä	561093701b	MDEV-29180 fixup: 32-bit tests This fixes up commit `01031f43d8`	2023-11-13 09:27:01 +02:00
Oleksandr Byelkin	01031f43d8	MDEV-29180: Description of log_warnings incorrectly mentions "general log"	2023-11-02 16:46:34 +01:00
Marko Mäkelä	aa719b5010	MDEV-32050: Do not copy undo records in purge Also, default to innodb_purge_batch_size=1000, replacing the old default value of processing 300 undo log pages in a batch. Axel Schwenke found this value to help reduce purge lag without having a significant impact on workload throughput. In purge, we can simply acquire a shared latch on the undo log page (to avoid a race condition like the one that was fixed in commit `b102872ad5`) and retain a buffer-fix after releasing the latch. The buffer-fix will prevent the undo log page from being evicted from the buffer pool. Concurrent modification is prevented by design. Only the purge_coordinator_task (or its accomplice purge_truncation_task) may free the undo log pages, after any purge_worker_task have completed execution. Hence, we do not have to worry about any overwriting or reuse of the undo log records. trx_undo_rec_copy(): Remove. The only remaining caller would have been trx_undo_get_undo_rec_low(), which is where the logic was merged. purge_sys_t::m_initialized: Replaces heap. purge_sys_t::pages: A cache of buffer-fixed pages that have been looked up from buf_pool.page_hash. purge_sys_t::get_page(): Return a buffer-fixed undo page, using the pages cache. trx_purge_t::batch_cleanup(): Renamed from clone_end_view(). Clear the pages cache and clone the end_view at the end of a batch. purge_sys_t::n_pages_handled(): Return pages.size(). This determines if innodb_purge_batch_size was exceeded. purge_sys_t::rseg_get_next_history_log(): Replaces trx_purge_rseg_get_next_history_log(). purge_sys_t::choose_next_log(): Replaces trx_purge_choose_next_log() and trx_purge_read_undo_rec(). purge_sys_t::get_next_rec(): Replaces trx_purge_get_next_rec() and trx_undo_get_next_rec(). purge_sys_t::fetch_next_rec(): Replaces trx_purge_fetch_next_rec() and some use of trx_undo_get_first_rec(). trx_purge_attach_undo_recs(): Do not allow purge_sys.n_pages_handled() exceed the innodb_purge_batch_size or ¾ of the buffer pool, whichever is smaller. Reviewed by: Vladislav Lesin Tested by: Matthias Leich and Axel Schwenke	2023-10-25 10:19:17 +03:00
Marko Mäkelä	14685b10df	MDEV-32050: Deprecate&ignore innodb_purge_rseg_truncate_frequency The motivation of introducing the parameter innodb_purge_rseg_truncate_frequency in mysql/mysql-server@28bbd66ea5 and mysql/mysql-server@8fc2120fed seems to have been to avoid stalls due to freeing undo log pages or truncating undo log tablespaces. In MariaDB Server, innodb_undo_log_truncate=ON should be a much lighter operation than in MySQL, because it will not involve any log checkpoint. Another source of performance stalls should be trx_purge_truncate_rseg_history(), which is shrinking the history list by freeing the undo log pages whose undo records have been purged. To alleviate that, we will introduce a purge_truncation_task that will offload this from the purge_coordinator_task. In that way, the next innodb_purge_batch_size pages may be parsed and purged while the pages from the previous batch are being freed and the history list being shrunk. The processing of innodb_undo_log_truncate=ON will still remain the responsibility of the purge_coordinator_task. purge_coordinator_state::count: Remove. We will ignore innodb_purge_rseg_truncate_frequency, and act as if it had been set to 1 (the maximum shrinking frequency). purge_coordinator_state::do_purge(): Invoke an asynchronous task purge_truncation_callback() to free the undo log pages. purge_sys_t::iterator::free_history(): Free those undo log pages that have been processed. This used to be a part of trx_purge_truncate_history(). purge_sys_t::clone_end_view(): Take a new value of purge_sys.head as a parameter, so that it will be updated while holding exclusive purge_sys.latch. This is needed for race-free access to the field in purge_truncation_callback(). Reviewed by: Vladislav Lesin	2023-10-25 09:11:58 +03:00
Sergei Petrunia	4941ac9192	MDEV-32113: utf8mb3_key_col=utf8mb4_value cannot be used for ref (Variant#3: Allow cross-charset comparisons, use a special CHARSET_INFO to create lookup keys. Review input addressed.) Equalities that compare utf8mb{3,4}_general_ci strings, like: WHERE ... utf8mb3_key_col=utf8mb4_value (MB3-4-CMP) can now be used to construct ref[const] access and also participate in multiple-equalities. This means that utf8mb3_key_col can be used for key-lookups when compared with an utf8mb4 constant, field or expression using '=' or '<=>' comparison operators. This is controlled by optimizer_switch='cset_narrowing=on', which is OFF by default. IMPLEMENTATION Item value comparison in (MB3-4-CMP) is done using utf8mb4_general_ci. This is valid as any utf8mb3 value is also an utf8mb4 value. When making index lookup value for utf8mb3_key_col, we do "Charset Narrowing": characters that are in the Basic Multilingual Plane (=BMP) are copied as-is, as they can be represented in utf8mb3. Characters that are outside the BMP cannot be represented in utf8mb3 and are replaced with U+FFFD, the "Replacement Character". In utf8mb4_general_ci, the Replacement Character compares as equal to any character that's not in BMP. Because of this, the constructed lookup value will find all index records that would be considered equal by the original condition (MB3-4-CMP). Approved-by: Monty <monty@mariadb.org>	2023-10-19 17:24:30 +03:00
Monty	d4347177c7	Change SEL_ARG::MAX_SEL_ARGS to a user defined variable optimizer_max_sel_args This allows a user to to change the default value of MAX_SEL_ARGS (16000) in the rare case where they neeed more generated SEL_ARGS (as part of the range optimizer)	2023-10-03 08:25:31 +03:00
Monty	4e9322e2ff	MDEV-32203 Raise notes when an index cannot be used on data type mismatch Raise notes if indexes cannot be used: - in case of data type or collation mismatch (diferent error messages). - in case if a table field was replaced to something else (e.g. Item_func_conv_charset) during a condition rewrite. Added option to write warnings and notes to the slow query log for slow queries. New variables added/changed: - note_verbosity, with is a set of the following options: basic - All old notes unusable_keys - Print warnings about keys that cannot be used for select, delete or update. explain - Print unusable_keys warnings for EXPLAIN querys. The default is 'basic,explain'. This means that for old installations the only notable new behavior is that one will get notes about unusable keys when one does an EXPLAIN for a query. One can turn all of all notes by either setting note_verbosity to "" or setting sql_notes=0. - log_slow_verbosity has a new option 'warnings'. If this is set then warnings and notes generated are printed in the slow query log (up to log_slow_max_warnings times per statement). - log_slow_max_warnings - Max number of warnings written to slow query log. Other things: - One can now use =ALL for any 'set' variable to set all options at once. For example using "note_verbosity=ALL" in a config file or "SET @@note_verbosity=ALL' in SQL. - mysqldump will in the future use @@note_verbosity=""' instead of @sql_notes=0 to disable notes. - Added "enum class Data_type_compatibility" and changing the return type of all Field::can_optimize*() methods from "bool" to this new data type. Reviewer & Co-author: Alexander Barkov <bar@mariadb.com> - The code that prints out the notes comes mainly from Alexander	2023-10-03 08:25:31 +03:00
Marko Mäkelä	e9723c2cbb	MDEV-31473 Wrong information about innodb_checksum_algorithm in information_schema.SYSTEM_VARIABLES MYSQL_SYSVAR_ENUM(checksum_algorithm): Correct the documentation string. Fixes up commit `7a4fbb55b0` (MDEV-25105).	2023-08-14 13:36:17 +03:00
Oleksandr Byelkin	6b8310c27a	fix postmerge 32bit tests	2023-08-04 10:11:03 +02:00
Oleksandr Byelkin	6bf8483cac	Merge branch '10.5' into 10.6	2023-08-01 15:08:52 +02:00
Oleksandr Byelkin	4235c133ae	Merge branch '10.4' into 10.5	2023-07-31 10:14:46 +02:00
Kristian Nielsen	d632c85bb7	MDEV-31723: Crash on SET SESSION gtid_seq_no= DEFAULT A simple "SET SESSION gtid_seq_no= DEFAULT" did not work, it would straight up crash the server! Also, explicitly setting gtid_seq_no to 0 gave an error in --gtid-strict-mode=1. Setting to DEFAULT or 0 should disable any prior setting of gtid_seq_no, so that the next transaction is allocated the next GTID in sequence, as normal. Reviewed-by: Monty <monty@mariadb.org> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-07-30 22:00:43 +02:00
Oleksandr Byelkin	f291c3df2c	Merge branch '10.4' into 10.5	2023-07-27 15:43:21 +02:00
Lena Startseva	9854fb6fa7	MDEV-31003: Second execution for ps-protocol This patch adds for "--ps-protocol" second execution of queries "SELECT". Also in this patch it is added ability to disable/enable (--disable_ps2_protocol/--enable_ps2_protocol) second execution for "--ps-prototocol" in testcases.	2023-07-26 17:15:00 +07:00
Oleksandr Byelkin	f52954ef42	Merge commit '10.4' into 10.5	2023-07-20 11:54:52 +02:00
Monty	99bd226059	MDEV-31558 Add InnoDB engine information to the slow query log The new statistics is enabled by adding the "engine", "innodb" or "full" option to --log-slow-verbosity Example output: # Pages_accessed: 184 Pages_read: 95 Pages_updated: 0 Old_rows_read: 1 # Pages_read_time: 17.0204 Engine_time: 248.1297 Page_read_time is time doing physical reads inside a storage engine. (Writes cannot be tracked as these are usually done in the background). Engine_time is the time spent inside the storage engine for the full duration of the read/write/update calls. It uses the same code as 'analyze statement' for calculating the time spent. The engine statistics is done with a generic interface that should be easy for any engine to use. It can also easily be extended to provide even more statistics. Currently only InnoDB has counters for Pages_% and Undo_% status. Engine_time works for all engines. Implementation details: class ha_handler_stats holds all engine stats. This class is included in handler and THD classes. While a query is running, all statistics is updated in the handler. In close_thread_tables() the statistics is added to the THD. handler::handler_stats is a pointer to where statistics should be collected. This is set to point to handler::active_handler_stats if stats are requested. If not, it is set to 0. handler_stats has also an element, 'active' that is 1 if stats are requested. This is to allow engines to avoid doing any 'if's while updating the statistics. Cloned or partition tables have the pointer set to the base table if status are requested. There is a small performance impact when using --log-slow-verbosity=engine: - All engine calls in 'select' will be timed. - IO calls for InnoDB reads will be timed. - Incrementation of counters are done on local variables and accesses are inline, so these should have very little impact. - Statistics has to be reset for each statement for the THD and each used handler. This is only 40 bytes, which should be neglectable. - For partition tables we have to loop over all partitions to update the handler_status as part of table_init(). Can be optimized in the future to only do this is log-slow-verbosity changes. For this to work we have to update handler_status for all opened partitions and also for all partitions opened in the future. Other things: - Added options 'engine' and 'full' to log-slow-verbosity. - Some of the new files in the test suite comes from Percona server, which has similar status information. - buf_page_optimistic_get(): Do not increment any counter, since we are only validating a pointer, not performing any buf_pool.page_hash lookup. - Added THD argument to save_explain_data_intern(). - Switched arguments for save_explain_.*_data() to have always THD first (generates better code as other functions also have THD first).	2023-07-07 12:53:18 +03:00

1 2 3 4 5 ...

1998 commits