(Review input addressed)
(Added handling of UPDATE/DELETE and partitioning w/o index)
If the properties of the used collation allow, do the following
equivalent rewrites:
1. UPPER(key_col)=expr -> key_col=expr
expr=UPPER(key_col) -> expr=key_col
(also rewrite both sides of the equality at the same time)
2. UPPER(key_col) IN (constant-list) -> key_col IN (constant-list)
- Mark utf8mb{3,4}_general_ci as collations that allow this.
- Add optimizer_switch='sargable_casefold=ON' to control this.
(ON by default in this patch)
- Cover the rewrite in Optimizer Trace, rewrite name is
"sargable_casefold_removal".
AES_ENCRYPT(str, key, [, iv [, mode ]])
AES_DECRYPT(str, key, [, iv [, mode ]])
mode is aes-{128,192,256}-{ecb,cbc,ctr} e.g. "aes-128-cbc".
and a @@block_encryption_mode variable for the default value of mode
change in behavior: AES_ENCRYPT(str, key) can no longer
be used in persistent virtual columns (and alike)
A simple "SET SESSION gtid_seq_no= DEFAULT" did not work, it would straight
up crash the server! Also, explicitly setting gtid_seq_no to 0 gave an error
in --gtid-strict-mode=1.
Setting to DEFAULT or 0 should disable any prior setting of
gtid_seq_no, so that the next transaction is allocated the next GTID
in sequence, as normal.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This patch adds for "--ps-protocol" second execution
of queries "SELECT".
Also in this patch it is added ability to disable/enable
(--disable_ps2_protocol/--enable_ps2_protocol) second
execution for "--ps-prototocol" in testcases.
This patch adds a way to override default collations
(or "character set collations") for desired character sets.
The SQL standard says:
> Each collation known in an SQL-environment is applicable to one
> or more character sets, and for each character set, one or more
> collations are applicable to it, one of which is associated with
> it as its character set collation.
In MariaDB, character set collations has been hard-coded so far,
e.g. utf8mb4_general_ci has been a hard-coded character set collation
for utf8mb4.
This patch allows to override (globally per server, or per session)
character set collations, so for example, uca1400_ai_ci can be set as a
character set collation for Unicode character sets
(instead of compiled xxx_general_ci).
The array of overridden character set collations is stored in a new
(session and global) system variable @@character_set_collations and
can be set as a comma separated list of charset=collation pairs, e.g.:
SET @@character_set_collations='utf8mb3=uca1400_ai_ci,utf8mb4=uca1400_ai_ci';
The variable is empty by default, which mean use the hard-coded
character set collations (e.g. utf8mb4_general_ci for utf8mb4).
The variable can also be set globally by passing to the server startup command
line, and/or in my.cnf.
The new statistics is enabled by adding the "engine", "innodb" or "full"
option to --log-slow-verbosity
Example output:
# Pages_accessed: 184 Pages_read: 95 Pages_updated: 0 Old_rows_read: 1
# Pages_read_time: 17.0204 Engine_time: 248.1297
Page_read_time is time doing physical reads inside a storage engine.
(Writes cannot be tracked as these are usually done in the background).
Engine_time is the time spent inside the storage engine for the full
duration of the read/write/update calls. It uses the same code as
'analyze statement' for calculating the time spent.
The engine statistics is done with a generic interface that should be
easy for any engine to use. It can also easily be extended to provide
even more statistics.
Currently only InnoDB has counters for Pages_% and Undo_% status.
Engine_time works for all engines.
Implementation details:
class ha_handler_stats holds all engine stats. This class is included
in handler and THD classes.
While a query is running, all statistics is updated in the handler. In
close_thread_tables() the statistics is added to the THD.
handler::handler_stats is a pointer to where statistics should be
collected. This is set to point to handler::active_handler_stats if
stats are requested. If not, it is set to 0.
handler_stats has also an element, 'active' that is 1 if stats are
requested. This is to allow engines to avoid doing any 'if's while
updating the statistics.
Cloned or partition tables have the pointer set to the base table if
status are requested.
There is a small performance impact when using --log-slow-verbosity=engine:
- All engine calls in 'select' will be timed.
- IO calls for InnoDB reads will be timed.
- Incrementation of counters are done on local variables and accesses
are inline, so these should have very little impact.
- Statistics has to be reset for each statement for the THD and each
used handler. This is only 40 bytes, which should be neglectable.
- For partition tables we have to loop over all partitions to update
the handler_status as part of table_init(). Can be optimized in the
future to only do this is log-slow-verbosity changes. For this to work
we have to update handler_status for all opened partitions and
also for all partitions opened in the future.
Other things:
- Added options 'engine' and 'full' to log-slow-verbosity.
- Some of the new files in the test suite comes from Percona server, which
has similar status information.
- buf_page_optimistic_get(): Do not increment any counter, since we are
only validating a pointer, not performing any buf_pool.page_hash lookup.
- Added THD argument to save_explain_data_intern().
- Switched arguments for save_explain_.*_data() to have
always THD first (generates better code as other functions also have THD
first).
Kenyan Swahili is enabled for error messages as well as datetime.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
Turn the remaining three `binlog*` options binlog_do_db, binlog_ignore_db,
binlog_rows_event_max_size into global variables so that they can be
visible from the SQL user level. This is for audit / secure
configuration check purposes.
Create new MTR tests to make sure that the newly created global
variables can be visible from the command line interface.
Behavior before the code change:
MariaDB [(none)]> SHOW GLOBAL VARIABLES WHERE
-> Variable_name LIKE 'binlog_do_db' OR
-> Variable_name LIKE 'binlog_ignore_db' OR
-> Variable_name LIKE 'binlog_row_event_max_size';
Empty set (0.001 sec)
Behavior after the code change:
MariaDB [(none)]> SHOW GLOBAL VARIABLES WHERE
-> Variable_name LIKE 'binlog_do_db' OR
-> Variable_name LIKE 'binlog_ignore_db' OR
-> Variable_name LIKE 'binlog_row_event_max_size';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| binlog_do_db | |
| binlog_ignore_db | |
| binlog_row_event_max_size | 8192 |
+---------------------------+-------+
3 rows in set (0.001 sec)
Note:
For `binlog_do_db` and `binlog_ignore_db`, we add a new class
`Sys_var_binlog_filter` to handle the dynamically-composable command line
options for `binlog_do_db` and `binlog_ignore_db`. Below
is the motivation:
When the users start the server with the option
--binlog-do-db="database1" --binlog-do-db="database2"
The expected behavior is that the system should allow replication for
both `database1` and `database2`, which is the logic of the original
code.
However, when turning the variables into system variables, the
functionality does not exist any more, since system variables will only
handle the last occurrence of the option, and in this case, the system
will only be able to handle `database2`.
Copyright:
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the BSD-new
license. I am contributing on behalf of my employer Amazon Web Services, Inc.
Do not allow setting wsrep_sst_donor as NULL as it is
incorrect value. User can use value '' (default) that represents
same as NULL. Setting wsrep_cluster_address to NULL is
already handled correctly.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Introduces @@optimizer_switch flag: hash_join_cardinality
When this option is on, use EITS statistics to produce tighter bounds
for hash join output cardinality.
This patch is an extension / replacement to a similar patch in 10.6
New features:
- optimizer_switch hash_join_cardinality is on by default
- records_out is set to fanout when HASH is used
- Fixed bug in is_eits_usable: The function did not work with views
The cause of the crash was that test was setting
aria_sort_buffer_size to MAX_LONG_LONG, which caused an overflow in
my_malloc() when trying to allocate the buffer + 8 bytes.
Fixed by reducing max size of sort_buffer for Aria and MyISAM
Other things:
- Added code in maria_repair_parallell() to not allocate a big sort buffer
for small files.
- Updated size of minumim sort buffer in Aria
Introduce @@optimizer_switch flag: hash_join_cardinality
When it is on, use EITS statistics to produce tighter bounds for
hash join output cardinality.
Amended by Monty.
Reviewed by: Monty <monty@mariadb.org>
MDEV-28769 earlier disabled the use if IDs with non-default collations
in statements like:
SET character_set_results=2/*latin2_czech_cs*/;
SET character_set_client=2/*latin2_czech_cs*/;
SET character_set_server=2/*latin2_czech_cs*/;
SET character_set_connection=2/*latin2_czech_cs*/;
MDEV-30824 later fixed "mysqlbinlog" to dump character set names
instead of IDs in these statements:
< SET @@session.character_set_client=33, ... /*!*/;
> SET @@session.character_set_client=utf8mb3, ... /*!*/;
However, mysqlbinlog from old (pre MDEV-30824) distributions can
still produce incorrect statements with numeric non-default
collation IDs.
New servers should still be able to load old dumps.
Allowing the use of "SET @@character_set_xxx=ID" with numeric
non-default collation IDs but only if:
- the current THD is a true slave thread or
- the current THD a pseudo slave thread
(loading a mysqlbinlog output).
In MariaDB, we have a confusing problem where:
* The transaction_isolation option can be set in a configuration file, but it cannot be set dynamically.
* The tx_isolation system variable can be set dynamically, but it cannot be set in a configuration file.
Therefore, we have two different names for the same thing in different contexts. This is needlessly confusing, and it complicates the documentation. The same thing applys for transaction_read_only.
MySQL 5.7 solved this problem by making them into system variables. https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-20.html
This commit takes a similar approach by adding new system variables and marking the original ones as deprecated. This commit also resolves some legacy problems related to SET STATEMENT and transaction_isolation.
Let us make innodb_buffer_pool_filename a read-only variable
so that a malicious user cannot cause an important file to be
deleted on InnoDB shutdown. An attempt to delete a directory
will fail because it is not a regular file, but what if the
variable pointed to (say) ibdata1, ib_logfile0 or some *.ibd file?
It does not seem to make much sense for this parameter to be
configurable in the first place, but we will not change that in order
to avoid breaking compatibility.
- agressively -> aggressively
- exising -> existing
- occured -> occurred
- releated -> related
- seperated -> separated
- sucess -> success
- use use -> use
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
- Description:
- Before 10.3.8 semisync was a plugin that is built into the server with
MDEV-13073,starting with commit cbc71485e2.
There are still some usage of `rpl_semi_sync_master` in mtr.
Note:
- To recognize the replica in the `dump_thread`, replica is creating
local variable `rpl_semi_sync_slave` (the keyword of plugin) in
function `request_transmit`, that is catched by primary in
`is_semi_sync_slave()`. This is the user variable and as such not
related to the obsolete plugin.
- Found in `sys_vars.all_vars` and `rpl_semi_sync_wait_point` tests,
usage of plugins `rpl_semi_sync_master`, `rpl_semi_sync_slave`.
The former test is disabled by default (`sys_vars/disabled.def`)
and marked as `obsolete`, however this patch will remove the queries.
- Add cosmetic fixes to semisync codebase
Reviewer: <brandon.nesterenko@mariadb.com>
Closes PR #2528, PR #2380
The deprecated parameters will be removed:
innodb_defragment
innodb_defragment_n_pages
innodb_defragment_stats_accuracy
innodb_defragment_fill_factor_n_recs
innodb_defragment_fill_factor
innodb_defragment_frequency
The mysql.innodb_index_stats.stat_name values 'n_page_split' and
'n_pages_freed' will lose their special meaning.
The related changes to OPTIMIZE TABLE in InnoDB will be removed as well.
The parameter innodb_optimize_fulltext_only will retain its special
meaning in OPTIMIZE TABLE.
Tested by: Matthias Leich
This patch also fixes some bugs detected by valgrind after this
patch:
- Not enough copy_func elements was allocated by Create_tmp_table() which
causes an memory overwrite in Create_tmp_table::add_fields()
I added an ASSERT() to be able to detect this also without valgrind.
The bug was that TMP_TABLE_PARAM::copy_fields was not correctly set
when calling create_tmp_table().
- Aria::empty_bits is not allocated if there is no varchar/char/blob
fields in the table. Fixed code to take this into account.
This cannot cause any issues as this is just a memory access
into other Aria memory and the content of the memory would not be used.
- Aria::last_key_buff was not allocated big enough. This may have caused
issues with rtrees and ma_extra(HA_EXTRA_REMEMBER_POS) as they
would use the same memory area.
- Aria and MyISAM didn't take extended key parts into account, which
caused problems when copying rec_per_key from engine to sql level.
- Mark asan builds with 'asan' in version strihng to detect these in
not_valgrind_build.inc.
This is needed to not have main.sp-no-valgrind fail with asan.
There is a little used option innodb_defragment that would make
OPTIMIZE TABLE not rebuild the table as usual for InnoDB, but
instead cause the index B-trees to be optimized in place.
This option uses excessive locking (exclusively locking index trees).
It never covered SPATIAL INDEX or FULLTEXT INDEX. Storage space
was never reclaimed.
Because this option is not particularly useful and causes a
maintenance burden (most recently in
commit de4030e4d4),
it is best to deprecate it, to prepare for its removal.
SUPER privilege used to allow various actions that were alternatively
allowed by one of BINLOG ADMIN, BINLOG MONITOR, BINLOG REPLAY,
CONNECTION ADMIN, FEDERATED ADMIN, REPL MASTER ADMIN, REPL SLAVE ADMIN,
SET USER, SLAVE MONITOR.
Now SUPER no longer does that, one has to grant one of the fine-grained
privileges above to be to perform corresponding actions.
On upgrade from MariaDB versions 10.11 and below all the privileges
above are granted automatically if the user has SUPER.
As a side-effect, such an upgrade will allow SUPER-user to run SHOW
BINLOG EVENTS, SHOW RELAYLOG EVENTS, SHOW SLAVE HOSTS, even if he wasn't
able to do it before the upgrade.
The patch is inspired from MySQL. Instead of using a single String to
hold the current active debug_sync signal, use a Hash_set to store
LEX_STRINGS. This patch ensures that a signal can not be lost, by being
overwritten by another thread via set DEBUG_SYNC = '... SIGNAL ...';
All signals are kepts "alive" until they are consumed by a wait event.
This requires updating test cases that assume the GLOBAL signal is never
consumed.
Follow-up work needed:
Port the additional syntax that allows one to set multiple signals
and also conditionally deactivate signals when waiting.
This was done after discussions with Igor, Sanja and Bar.
The main reason for removing the deprication was to ensure that MariaDB
is always backward compatible whenever possible.
Other things:
- Added statistics counters, mainly for the feedback plugin.
- INTO OUTFILE
- INTO variable
- If INTO is using the old syntax (end of query)
This includes all test changes from
"Changing all cost calculation to be given in milliseconds"
and forwards.
Some of the things that caused changes in the result files:
- As part of fixing tests, I added 'echo' to some comments to be able to
easier find out where things where wrong.
- MATERIALIZED has now a higher cost compared to X than before. Because
of this some MATERIALIZED types have changed to DEPENDEND SUBQUERY.
- Some test cases that required MATERIALIZED to repeat a bug was
changed by adding more rows to force MATERIALIZED to happen.
- 'Filtered' in SHOW EXPLAIN has in many case changed from 100.00 to
something smaller. This is because now filtered also takes into
account the smallest possible ref access and filters, even if they
where not used. Another reason for 'Filtered' being smaller is that
we now also take into account implicit filtering done for subqueries
using FIRSTMATCH.
(main.subselect_no_exists_to_in)
This is caluculated in best_access_path() and stored in records_out.
- Table orders has changed because more accurate costs.
- 'index' and 'ALL' for small tables has changed to use 'range' or
'ref' because of optimizer_scan_setup_cost.
- index can be changed to 'range' as 'range' optimizer assumes we don't
have to read the blocks from disk that range optimizer has already read.
This can be confusing in the case where there is no obvious where clause
but instead there is a hidden 'key_column > NULL' added by the optimizer.
(main.subselect_no_exists_to_in)
- Scan on primary clustered key does not report 'Using Index' anymore
(It's a table scan, not an index scan).
- For derived tables, the number of rows is now 100 instead of 2,
which can be seen in EXPLAIN.
- More tests have "Using index for group by" as the cost of this
optimization is now more correct (lower).
- A primary key could be preferred for a normal key, even if it would
access more rows, as it's faster to do 1 lokoup and 3 'index_next' on a
clustered primary key than one lookup trough a secondary.
(main.stat_tables_innodb)
Notes:
- There was a 4.7% more calls to best_extension_by_limited_search() in
the main.greedy_optimizer test. However examining the test results
it looked that the plans where slightly better (eq_ref where more
chained together) so I assume this is ok.
- I have verified a few test cases where there was notable/unexpected
changes in the plan and in all cases the new optimizer plans where
faster. (main.greedy_optimizer and some others)
Variables added:
- optimizer_index_block_copy_cost
- optimizer_key_copy_cost
- optimizer_key_next_find_cost
- optimizer_key_compare_cost
- optimizer_row_copy_cost
- optimizer_where_compare_cost
Some rename of defines was done to make the internal defines similar to
the visible ones:
TIME_FOR_COMPARE -> WHERE_COST; WHERE_COST was also "inverted" to be
a number between 0 and 1 that is multiply with accepted records
(similar to other optimizer variables).
TIME_FOR_COMPARE_IDX -> KEY_COMPARE_COST. This is also inverted,
similar to TIME_FOR_COMPARE.
TIME_FOR_COMPARE_ROWID -> ROWID_COMPARE_COST. This is also inverted,
similar to TIME_FOR_COMPARE.
All default costs are identical to what they where before this patch.
Other things:
- Compare factor in get_merge_buffers_cost() was inverted.
- Changed namespace to static in filesort_utils.cc
Before this patch, when calculating the cost of fetching and using a
row/key from the engine, we took into account the cost of finding a
row or key from the engine, but did not consistently take into account
index only accessed, clustered key or covered keys for all access
paths.
The cost of the WHERE clause (TIME_FOR_COMPARE) was not consistently
considered in best_access_path(). TIME_FOR_COMPARE was used in
calculation in other places, like greedy_search(), but was in some
cases (like scans) done an a different number of rows than was
accessed.
The cost calculation of row and index scans didn't take into account
the number of rows that where accessed, only the number of accepted
rows.
When using a filter, the cost of index_only_reads and cost of
accessing and disregarding 'filtered rows' where not taken into
account, which made filters cost less than there actually where.
To remedy the above, the following key & row fetch related costs
has been added:
- The cost of fetching and using a row is now split into different costs:
- key + Row fetch cost (as before) but multiplied with the variable
'optimizer_cache_cost' (default to 0.5). This allows the user to
tell the optimizer the likehood of finding the key and row in the
engine cache.
- ROW_COPY_COST, The cost copying a row from the engine to the
sql layer or creating a row from the join_cache to the record
buffer. Mostly affects table scan costs.
- ROW_LOOKUP_COST, the cost of fetching a row by rowid.
- KEY_COPY_COST the cost of finding the next key and copying it from
the engine to the SQL layer. This is used when we calculate the cost
index only reads. It makes index scans more expensive than before if
they cover a lot of rows. (main.index_merge_myisam)
- KEY_LOOKUP_COST, the cost of finding the first key in a range.
This replaces the old define IDX_LOOKUP_COST, but with a higher cost.
- KEY_NEXT_FIND_COST, the cost of finding the next key (and rowid).
when doing a index scan and comparing the rowid to the filter.
Before this cost was assumed to be 0.
All of the above constants/variables are now tuned to be somewhat in
proportion of executing complexity to each other. There is tuning
need for these in the future, but that can wait until the above are
made user variables as that will make tuning much easier.
To make the usage of the above easy, there are new (not virtual)
cost calclation functions in handler:
- ha_read_time(), like read_time(), but take optimizer_cache_cost into
account.
- ha_read_and_copy_time(), like ha_read_time() but take into account
ROW_COPY_TIME
- ha_read_and_compare_time(), like ha_read_and_copy_time() but take
TIME_FOR_COMPARE into account.
- ha_rnd_pos_time(). Read row with row id, taking ROW_COPY_COST
into account. This is used with filesort where we don't need
to execute the WHERE clause again.
- ha_keyread_time(), like keyread_time() but take
optimizer_cache_cost into account.
- ha_keyread_and_copy_time(), like ha_keyread_time(), but add
KEY_COPY_COST.
- ha_key_scan_time(), like key_scan_time() but take
optimizer_cache_cost nto account.
- ha_key_scan_and_compare_time(), like ha_key_scan_time(), but add
KEY_COPY_COST & TIME_FOR_COMPARE.
I also added some setup costs for doing different types of scans and
creating temporary tables (on disk and in memory). This encourages
the optimizer to not use these for simple 'a few row' lookups if
there are adequate key lookup strategies.
- TABLE_SCAN_SETUP_COST, cost of starting a table scan.
- INDEX_SCAN_SETUP_COST, cost of starting an index scan.
- HEAP_TEMPTABLE_CREATE_COST, cost of creating in memory
temporary table.
- DISK_TEMPTABLE_CREATE_COST, cost of creating an on disk temporary
table.
When calculating cost of fetching ranges, we had a cost of
IDX_LOOKUP_COST (0.125) for doing a key div for a new range. This is
now replaced with 'io_cost * KEY_LOOKUP_COST (1.0) *
optimizer_cache_cost', which matches the cost we use for 'ref' and
other key lookups. The effect is that the cost is now a bit higher
when we have many ranges for a key.
Allmost all calculation with TIME_FOR_COMPARE is now done in
best_access_path(). 'JOIN::read_time' now includes the full
cost for finding the rows in the table.
In the result files, many of the changes are now again close to what
they where before the "Update cost for hash and cached joins" commit,
as that commit didn't fix the filter cost (too complex to do
everything in one commit).
The above changes showed a lot of a lot of inconsistencies in
optimizer cost calculation. The main objective with the other changes
was to do calculation as similar (and accurate) as possible and to make
different plans more comparable.
Detailed list of changes:
- Calculate index_only_cost consistently and correctly for all scan
and ref accesses. The row fetch_cost and index_only_cost now
takes into account clustered keys, covered keys and index
only accesses.
- cost_for_index_read now returns both full cost and index_only_cost
- Fixed cost calculation of get_sweep_read_cost() to match other
similar costs. This is bases on the assumption that data is more
often stored on SSD than a hard disk.
- Replaced constant 2.0 with new define TABLE_SCAN_SETUP_COST.
- Some scan cost estimates did not take into account
TIME_FOR_COMPARE. Now all scan costs takes this into
account. (main.show_explain)
- Added session variable optimizer_cache_hit_ratio (default 50%). By
adjusting this on can reduce or increase the cost of index or direct
record lookups. The effect of the default is that key lookups is now
a bit cheaper than before. See usage of 'optimizer_cache_cost' in
handler.h.
- JOIN_TAB::scan_time() did not take into account index only scans,
which produced a wrong cost when index scan was used. Changed
JOIN_TAB:::scan_time() to take into consideration clustered and
covered keys. The values are now cached and we only have to call
this function once. Other calls are changed to use the cached
values. Function renamed to JOIN_TAB::estimate_scan_time().
- Fixed that most index cost calculations are done the same way and
more close to 'range' calculations. The cost is now lower than
before for small data sets and higher for large data sets as we take
into account how many keys are read (main.opt_trace_selectivity,
main.limit_rows_examined).
- Ensured that index_scan_cost() ==
range(scan_of_all_rows_in_table_using_one_range) +
MULTI_RANGE_READ_INFO_CONST. One effect of this is that if there
is choice of doing a full index scan and a range-index scan over
almost the whole table then index scan will be preferred (no
range-read setup cost). (innodb.innodb, main.show_explain,
main.range)
- Fixed the EQ_REF and REF takes into account clustered and covered
keys. This changes some plans to use covered or clustered indexes
as these are much cheaper. (main.subselect_mat_cost,
main.state_tables_innodb, main.limit_rows_examined)
- Rowid filter setup cost and filter compare cost now takes into
account fetching and checking the rowid (KEY_NEXT_FIND_COST).
(main.partition_pruning heap.heap_btree main.log_state)
- Added KEY_NEXT_FIND_COST to
Range_rowid_filter_cost_info::lookup_cost to account of the time
to find and check the next key value against the container
- Introduced ha_keyread_time(rows) that takes into account finding
the next row and copying the key value to 'record'
(KEY_COPY_COST).
- Introduced ha_key_scan_time() for calculating an index scan over
all rows.
- Added IDX_LOOKUP_COST to keyread_time() as a startup cost.
- Added index_only_fetch_cost() as a convenience function to
OPT_RANGE.
- keyread_time() cost is slightly reduced to prefer shorter keys.
(main.index_merge_myisam)
- All of the above caused some index_merge combinations to be
rejected because of cost (main.index_intersect). In some cases
'ref' where replaced with index_merge because of the low
cost calculation of get_sweep_read_cost().
- Some index usage moved from PRIMARY to a covering index.
(main.subselect_innodb)
- Changed cost calculation of filter to take KEY_LOOKUP_COST and
TIME_FOR_COMPARE into account. See sql_select.cc::apply_filter().
filter parameters and costs are now written to optimizer_trace.
- Don't use matchings_records_in_range() to try to estimate the number
of filtered rows for ranges. The reason is that we want to ensure
that 'range' is calculated similar to 'ref'. There is also more work
needed to calculate the selectivity when using ranges and ranges and
filtering. This causes filtering column in EXPLAIN EXTENDED to be
100.00 for some cases where range cannot use filtering.
(main.rowid_filter)
- Introduced ha_scan_time() that takes into account the CPU cost of
finding the next row and copying the row from the engine to
'record'. This causes costs of table scan to slightly increase and
some test to changed their plan from ALL to RANGE or ALL to ref.
(innodb.innodb_mysql, main.select_pkeycache)
In a few cases where scan time of very small tables have lower cost
than a ref or range, things changed from ref/range to ALL.
(main.myisam, main.func_group, main.limit_rows_examined,
main.subselect2)
- Introduced ha_scan_and_compare_time() which is like ha_scan_time()
but also adds the cost of the where clause (TIME_FOR_COMPARE).
- Added small cost for creating temporary table for
materialization. This causes some very small tables to use scan
instead of materialization.
- Added checking of the WHERE clause (TIME_FOR_COMPARE) of the
accepted rows to ROR costs in get_best_ror_intersect()
- Removed '- 0.001' from 'join->best_read' and optimize_straight_join()
to ensure that the 'Last_query_cost' status variable contains the
same value as the one that was calculated by the optimizer.
- Take avg_io_cost() into account in handler::keyread_time() and
handler::read_time(). This should have no effect as it's 1.0 by
default, except for heap that overrides these functions.
- Some 'ref_or_null' accesses changed to 'range' because of cost
adjustments (main.order_by)
- Added scan type "scan_with_join_cache" for optimizer_trace. This is
just to show in the trace what kind of scan was used.
- When using 'scan_with_join_cache' take into account number of
preceding tables (as have to restore all fields for all previous
table combination when checking the where clause)
The new cost added is:
(row_combinations * ROW_COPY_COST * number_of_cached_tables).
This increases the cost of join buffering in proportion of the
number of tables in the join buffer. One effect is that full scans
are now done earlier as the cost is then smaller.
(main.join_outer_innodb, main.greedy_optimizer)
- Removed the usage of 'worst_seeks' in cost_for_index_read as it
caused wrong plans to be created; It prefered JT_EQ_REF even if it
would be much more expensive than a full table scan. A related
issue was that worst_seeks only applied to full lookup, not to
clustered or index only lookups, which is not consistent. This
caused some plans to use index scan instead of eq_ref (main.union)
- Changed federated block size from 4096 to 1500, which is the
typical size of an IO packet.
- Added costs for reading rows to Federated. Needed as there is no
caching of rows in the federated engine.
- Added ha_innobase::rnd_pos_time() cost function.
- A lot of extra things added to optimizer trace
- More costs, especially for materialization and index_merge.
- Make lables more uniform
- Fixed a lot of minor bugs
- Added 'trace_started()' around a lot of trace blocks.
- When calculating ORDER BY with LIMIT cost for using an index
the cost did not take into account the number of row retrivals
that has to be done or the cost of comparing the rows with the
WHERE clause. The cost calculated would be just a fraction of
the real cost. Now we calculate the cost as we do for ranges
and 'ref'.
- 'Using index for group-by' is used a bit more than before as
now take into account the WHERE clause cost when comparing
with 'ref' and prefer the method with fewer row combinations.
(main.group_min_max).
Bugs fixed:
- Fixed that we don't calculate TIME_FOR_COMPARE twice for some plans,
like in optimize_straight_join() and greedy_search()
- Fixed bug in save_explain_data where we could test for the wrong
index when displaying 'Using index'. This caused some old plans to
show 'Using index'. (main.subselect_innodb, main.subselect2)
- Fixed bug in get_best_ror_intersect() where 'min_cost' was not
updated, and the cost we compared with was not the one that was
used.
- Fixed very wrong cost calculation for priority queues in
check_if_pq_applicable(). (main.order_by now correctly uses priority
queue)
- When calculating cost of EQ_REF or REF, we added the cost of
comparing the WHERE clause with the found rows, not all row
combinations. This made ref and eq_ref to be regarded way to cheap
compared to other access methods.
- FORCE INDEX cost calculation didn't take into account clustered or
covered indexes.
- JT_EQ_REF cost was estimated as avg_io_cost(), which is half the
cost of a JT_REF key. This may be true for InnoDB primary key, but
not for other unique keys or other engines. Now we use handler
function to calculate the cost, which allows us to handle
consistently clustered, covered keys and not covered keys.
- ha_start_keyread() didn't call extra_opt() if keyread was already
enabled but still changed the 'keyread' variable (which is wrong).
Fixed by not doing anything if keyread is already enabled.
- multi_range_read_info_cost() didn't take into account io_cost when
calculating the cost of ranges.
- fix_semijoin_strategies_for_picked_join_order() used the wrong
record_count when calling best_access_path() for SJ_OPT_FIRST_MATCH
and SJ_OPT_LOOSE_SCAN.
- Hash joins didn't provide correct best_cost to the upper level, which
means that the cost for hash_joins more expensive than calculated
in best_access_path (a difference of 10x * TIME_OF_COMPARE).
This is fixed in the new code thanks to that we now include
TIME_OF_COMPARE cost in 'read_time'.
Other things:
- Added some 'if (thd->trace_started())' to speed up code
- Removed not used function Cost_estimate::is_zero()
- Simplified testing of HA_POS_ERROR in get_best_ror_intersect().
(No cost changes)
- Moved ha_start_keyread() from join_read_const_table() to join_read_const()
to enable keyread for all types of JT_CONST tables.
- Made a few very short functions inline in handler.h
Notes:
- In main.rowid_filter the join order of order and lineitem is swapped.
This is because the cost of doing a range fetch of lineitem(98 rows) is
almost as big as the whole join of order,lineitem. The filtering will
also ensure that we only have to do very small key fetches of the rows
in lineitem.
- main.index_merge_myisam had a few changes where we are now using
less keys for index_merge. This is because index scans are now more
expensive than before.
- handler->optimizer_cache_cost is updated in ha_external_lock().
This ensures that it is up to date per statements.
Not an optimal solution (for locked tables), but should be ok for now.
- 'DELETE FROM t1 WHERE t1.a > 0 ORDER BY t1.a' does not take cost of
filesort into consideration when table scan is chosen.
(main.myisam_explain_non_select_all)
- perfschema.table_aggregate_global_* has changed because an update
on a table with 1 row will now use table scan instead of key lookup.
TODO in upcomming commits:
- Fix selectivity calculation for ranges with and without filtering and
when there is a ref access but scan is chosen.
For this we have to store the lowest known value for
'accepted_records' in the OPT_RANGE structure.
- Change that records_read does not include filtered rows.
- test_if_cheaper_ordering() needs to be updated to properly calculate
costs. This will fix tests like main.order_by_innodb,
main.single_delete_update
- Extend get_range_limit_read_cost() to take into considering
cost_for_index_read() if there where no quick keys. This will reduce
the computed cost for ORDER BY with LIMIT in some cases.
(main.innodb_ext_key)
- Fix that we take into account selectivity when counting the number
of rows we have to read when considering using a index table scan to
resolve ORDER BY.
- Add new calculation for rnd_pos_time() where we take into account the
benefit of reading multiple rows from the same page.
This commit contains fixes for error codes, which are needed
because OpenSSL 3.x and recent versions of GnuTLS have changed
the indication of error codes when the peer does not send
close_notify before closing the connection.
log_slow_filter=admin as been available for a long time.
Uses can migrate from log_slow_statements_statements=OFF by removing
'admin' from the default log_slow_filter variable setting.
This commit changes backup execution (namely the block ddl phase),
so that node is not paused from cluster. Instead, the following
backup execution is declared as vulnerable for possible cluster
level conflicts, especially with DDL statement applying.
With this, the mariabackup execution may be aborted, if DDL
statements happen during backup execution. This abortable
backup execution is optional feature and may be
enabled/disabled by wsrep_mode: BF_ABORT_MARIABACKUP.
Note that old style node desync and pause, despite of
WSREP_MODE_BF_MARIABACKUP is needed if node is operating as
SST donor.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Starting with commit baf276e6d4 (MDEV-19229)
the parameter innodb_undo_tablespaces can be increased from its
previous default value 0 while allowing an upgrade from old databases.
We will change the default setting to innodb_undo_tablespaces=3
so that the space occupied by possible bursts of undo log records
can be reclaimed after SET GLOBAL innodb_undo_log_truncate=ON.
We will not enable innodb_undo_log_truncate by default, because it
causes some observable performance degradation.
Special thanks to Thirunarayanan Balathandayuthapani for diagnosing
and fixing a number of bugs related to this new default setting.
Tested by: Matthias Leich, Axel Schwenke, Vladislav Vaintroub
(with both values of innodb_undo_log_truncate)
The purpose of the change buffer was to reduce random disk access,
which could be useful on rotational storage, but maybe less so on
solid-state storage.
When we wished to
(1) insert a record into a non-unique secondary index,
(2) delete-mark a secondary index record,
(3) delete a secondary index record as part of purge (but not ROLLBACK),
and the B-tree leaf page where the record belongs to is not in the buffer
pool, we inserted a record into the change buffer B-tree, indexed by
the page identifier. When the page was eventually read into the buffer
pool, we looked up the change buffer B-tree for any modifications to the
page, applied these upon the completion of the read operation. This
was called the insert buffer merge.
We remove the change buffer, because it has been the source of
various hard-to-reproduce corruption bugs, including those fixed in
commit 5b9ee8d819 and
commit 165564d3c3 but not limited to them.
A downgrade will fail with a clear message starting with
commit db14eb16f9 (MDEV-30106).
buf_page_t::state: Merge IBUF_EXIST to UNFIXED and
WRITE_FIX_IBUF to WRITE_FIX.
buf_pool_t::watch[]: Remove.
trx_t: Move isolation_level, check_foreigns, check_unique_secondary,
bulk_insert into the same bit-field. The only purpose of
trx_t::check_unique_secondary is to enable bulk insert into an
empty table. It no longer enables insert buffering for UNIQUE INDEX.
btr_cur_t::thr: Remove. This field was originally needed for change
buffering. Later, its use was extended to cover SPATIAL INDEX.
Much of the time, rtr_info::thr holds this field. When it does not,
we will add parameters to SPATIAL INDEX specific functions.
ibuf_upgrade_needed(): Check if the change buffer needs to be updated.
ibuf_upgrade(): Merge and upgrade the change buffer after all redo log
has been applied. Free any pages consumed by the change buffer, and
zero out the change buffer root page to mark the upgrade completed,
and to prevent a downgrade to an earlier version.
dict_load_tablespaces(): Renamed from
dict_check_tablespaces_and_store_max_id(). This needs to be invoked
before ibuf_upgrade().
btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.
btr_page_alloc(): Renamed from btr_page_alloc_low(). We no longer
allocate any change buffer pages.
btr_cur_open_at_rnd_pos(): Specialize for use in persistent statistics.
The change buffer merge does not need this function anymore.
row_search_index_entry(), btr_lift_page_up(): Add a parameter thr
for the SPATIAL INDEX case.
rtr_page_split_and_insert(): Specialized from btr_page_split_and_insert().
rtr_root_raise_and_insert(): Specialized from btr_root_raise_and_insert().
Note: The support for upgrading from the MySQL 3.23 or MySQL 4.0
change buffer format that predates the MySQL 4.1 introduction of
the option innodb_file_per_table was removed in MySQL 5.6.5
as part of mysql/mysql-server@69b6241a79
and MariaDB 10.0.11 as part of 1d0f70c2f8.
In the tests innodb.log_upgrade and innodb.log_corruption, we create
valid (upgraded) change buffer pages.
Tested by: Matthias Leich
We introduce the following settable Boolean global variables:
innodb_log_file_write_through: Whether writes to ib_logfile0 are
write-through (disabling any caching, as in O_SYNC or O_DSYNC).
innodb_data_file_write_through: Whether writes to any InnoDB data files
(including the temporary tablespace) are write-through.
innodb_data_file_buffering: Whether the file system cache is enabled
for InnoDB data files.
All these parameters are OFF by default, that is, the file system cache
will be disabled, but any hardware caching is enabled, that is,
explicit calls to fsync(), fdatasync() or similar functions are needed.
On systems that support FUA it may make sense to enable write-through,
to avoid extra system calls.
If the deprecated read-only start-up parameter is set to one of the
following values, then the values of the 4 Boolean flags (the above 3
plus innodb_log_file_buffering) will be set as follows:
O_DSYNC:
innodb_log_file_write_through=ON, innodb_data_file_write_through=ON,
innodb_data_file_buffering=OFF, and
(if supported) innodb_log_file_buffering=OFF.
fsync, littlesync, nosync, or (Microsoft Windows specific) normal:
innodb_log_file_write_through=OFF, innodb_data_file_write_through=OFF,
and innodb_data_file_buffering=ON.
Note: fsync() or fdatasync() will only be disabled if the separate
parameter debug_no_sync (in the code, my_disable_sync) is set.
In mariadb-backup, the parameter innodb_flush_method will be ignored.
The Boolean parameters can be modified by SET GLOBAL while the
server is running. This will require reopening the ib_logfile0
or all currently open InnoDB data files.
We will open files straight in O_DSYNC or O_SYNC mode when applicable.
Data files we will try to open straight in O_DIRECT mode when the
page size is at least 4096 bytes. For atomically creating data files,
we will invoke os_file_set_nocache() to enable O_DIRECT afterwards,
because O_DIRECT is not supported on some file systems. We will also
continue to invoke os_file_set_nocache() on ib_logfile0 when
innodb_log_file_buffering=OFF can be fulfilled.
For reopening the ib_logfile0, we use the same logic that was developed
for online log resizing and reused for updates of
innodb_log_file_buffering.
Reopening all data files is implemented in the new function
fil_space_t::reopen_all().
Reviewed by: Vladislav Vaintroub
Tested by: Matthias Leich
Before commit 6112853cda in MySQL 4.1.1
introduced the parameter innodb_file_per_table, all InnoDB data was
written to the InnoDB system tablespace (often named ibdata1).
A serious design problem is that once the system tablespace has grown to
some size, it cannot shrink even if the data inside it has been deleted.
There are also other design problems, such as the server hang MDEV-29930
that should only be possible when using innodb_file_per_table=0 and
innodb_undo_tablespaces=0 (storing both tables and undo logs in the
InnoDB system tablespace).
The parameter innodb_change_buffering was deprecated
in commit b5852ffbee.
Starting with commit baf276e6d4
(MDEV-19229) the number of innodb_undo_tablespaces can be increased,
so that the undo logs can be moved out of the system tablespace
of an existing installation.
If all these things (tables, undo logs, and the change buffer) are
removed from the InnoDB system tablespace, the only variable-size
data structure inside it is the InnoDB data dictionary.
DDL operations on .ibd files was optimized in
commit 86dc7b4d4c (MDEV-24626).
That should have removed any thinkable performance advantage of
using innodb_file_per_table=0.
Since there should be no benefit of setting innodb_file_per_table=0,
the parameter should be deprecated. Starting with MySQL 5.6 and
MariaDB Server 10.0, the default value is innodb_file_per_table=1.
* MDEV-24377: Accept comma separated addresses as --bind-address value
When bind address form the basis of wsrep based variables, the first
address in the comma separated list is used.
The test uses the IP address 192.168.0.1 as we need to include
multiple address. This will include failures without the following
commit.
The tests for bind_multiple_address_resolution could return
addresses that we cannot bind too. Windows and FreeBSD, and
probably other OSs will terminate the service if addresses are
unavailable.
We use the WSAEADDRNOTAVAIL / POSIX EADDRNOTAVAIL codes to
continue to bind to other interfaces. If at the end of the
bind list, if no binds are successful, the we terminate
but still leaving the error messages in the log.
Co-authored-by: Daniel Black <daniel@mariadb.org>
* clarify the help text for --system-versioning-insert-history
* move the vers_write=false check from Item_field::fix_fields()
next to other vers field checks in find_field_in_table()
* move row_start validation from handler::write_row() next to
vers_update_fields()
* make secure_timestamp check to happen in one place only,
extract it into a function is_set_timestamp_vorbidden().
* overwriting vers fields is an error, just like setting @@timestamp
* don't run vers_insert_history() for every row
1. system_versioning_insert_history session variable allows
pseudocolumns ROW_START and ROW_END be specified in INSERT,
INSERT..SELECT and LOAD DATA.
2. Cleaned up select_insert::send_data() from setting vers_write as
this parameter is now set on TABLE initialization.
4. Replication of system_versioning_insert_history via option_bits in
OPTIONS_WRITTEN_TO_BIN_LOG.
- Add `replicate_rewrite_db` status variable, that may accept comma
separated key-value pairs.
- Note that option `OPT_REPLICATE_REWRITE_DB` already existed in `mysqld.h`
from this commit 23d8586dbf
Reviewer:Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Let us disable Valgrind on tests that would fail because a
server shutdown or a STOP SLAVE command would take longer,
causing the test harness to forcibly and silently kill the server
due to an exceeded timeout.
post-merge fixes:
* remove log_slow_queries_not_using_indexes, no need to create variables
that are deprecated since the moment of creation
* rename log_slow_query_enable->log_slow_query
no other variable uses *_enable pattern
* MDEV-29626 Assertion `self == &Sys_slow_query_log' failed in fix_log_state
* tests
Closes#2137
Thus, all these variables will be
grouped together and more logically named.
Descriptions for the old variables were updated to indicate they are
now aliases for the newly introduced variables with prefix log_slow.
log_slow_queries_not_using_indexes_filter will not be addressed
in this merge request.
log_throttle_queries_not_using_indexes seems to no longer be in use.
MTR tests are also updated to include the new variable names.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
There are separate flags DBUG_OFF for disabling the DBUG facility
and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility.
Let us allow debug builds without DEBUG_SYNC.
Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to
define ENABLED_DEBUG_SYNC.
this test loads sql_errlog plugin. then in a second connection
it triggers an error, this locks the plugin in that thd.
then the plugin is uninstalled in the default connection.
but that doesn't unload the plugin, as it's still locked. it'll
auto-unload after the foo connection is closed. without an explicit
disconnect it is closed after mysqltest exits and the post-test check
might still see sql_errlog not fully unoaded.
New Feature:
============
This patch adds a new system variable, @@slave_max_statement_time,
which limits the execution time of s slave’s events that implements
an equivalent to @@max_statement_time for slave applier.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Test fixes:
Since fix for CONC-603 (wrong error handling in TLS read/write) in case
of a read/write error client doesn't return always error 2013 (server
has gone away), so in addition we need to check for error 2026
(TLS/SSL error) and 5014 (write error).
Part #2: Extend heuristic pruning to use multiple tables as the
"Model tables".
Before the patch, heuristic pruning uses only one "Model table":
The table which had the best cost AND record became the "Model table".
After that, if a table's cost and record were both worse than
those of the Model Table, the table would be pruned away.
This didn't work well when the first table (the optimizer sorts them
by record_count) had low record_count but relatively high cost: nothing
could be pruned afterwards.
The patch adds the two additional "Model tables": one with the least
cost and the other with the least record_count.
(In both cases, a table can be pruned away if BOTH its cost and
record_count are worse than those of a Model table)
The new pruning is active when the number of tables to consider for
the prefix is higher than @@optimizer_extra_pruning_depth.
One can see the new pruning in the Optimizer Trace as
- "pruned_by_heuristic":"min_record_count", or
- "pruned_by_heuristic":"min_read_time".
Old heuristic pruning shows as "pruned_by_heuristic":1.
MDEV-28073 Slow query performance in MariaDB when using many table
The idea is to prefer and chain EQ_REF tables (tables that uses an
unique key to find a row) when searching for the best table combination.
This significantly reduces row combinations that has to be examined.
This is optimization is enabled when setting optimizer_prune_level=2
(which is now default).
Implementation:
- optimizer_prune_level has a new level, 2, which enables EQ_REF
optimization in addition to the pruning done by level 1.
Level 2 is now default.
- Added JOIN::eq_ref_tables that contains bits of tables that could use
potentially use EQ_REF access in the query. This is calculated
in sort_and_filter_keyuse()
Under optimizer_prune_level=2:
- When the greedy_optimizer notices that the preceding table was an
EQ_REF table, it tries to add an EQ_REF table next. If an EQ_REF
table exists, only this one will be considered at this level.
We also collect all EQ_REF tables chained by the next levels and these
are ignored on the starting level as we have already examined these.
If no EQ_REF table exists, we continue as normal.
This optimization speeds up the greedy_optimizer combination test with
~25%
Other things:
- I ported the changes in MySQL 5.7 to greedy_optimizer.test to MariaDB
to be able to ensure we can handle all cases that MySQL can do.
- I have run all tests with --mysqld=--optimizer_prune_level=1 to verify that
there where no test changes.
... on semisync slave
To provide semisync master crash-recovery the same server-id transactions
were made to accept for execution on the semisync slave when the strict gtid
mode (see MDEV-27760).
That however caused out-of-order error on a master's transaction
server of the circular setup.
The error was fair in the sense of the gtid strict mode rule as indeed
under the condition of the circular setup the replicated transaction
already exists in the local binlog.
This is fixed by the commit to ignore on the gtid strict mode semisync
slave those gtids that exist in the slave's binlog that effectively restores
the default same-server-id ignore policy.
At the same time the fixes complies with MDEV-21117 semisync slave recovery
to accept the same server-id transactions that do not exist in local binlog.
These system variables:
@@character_set_client
@@character_set_connection
@@character_set_database
@@character_set_filesystem
@@character_set_results
@@character_set_server
can now be set in numeric format only to IDs of default collations, e.g.:
SET @@character_set_xxx=9; -- OK (latin2_general_ci is default)
SET @@character_set_xxx=2; -- ERROR (latin2_czech_cs is not default)
SET @@character_set_xxx=21; -- ERROR (latin2_hungarian_ci is not default)
Before this change the server accepted IDs of non-default collations
so all three examples above worked without errors,
but this could lead to unexpected behavior in later statements.
In commit c4c8830709 (MDEV-28111) we disabled
the file system cache on the InnoDB write-ahead log file (ib_logfile0)
by default on Linux.
It turns out that especially with innodb_flush_trx_log_at_commit=2,
writing to the log via the file system cache typically improves throughput,
especially on slow storage or at a small number of concurrent transactions.
For other values of innodb_flush_log_at_trx_commit, direct writes were
observed to be mostly but not always faster. Whether it pays off to
disable the file system cache on the log may depend on the type of storage,
the workload, and the operating system kernel version.
On Linux and Microsoft Windows, we will introduce the settable Boolean
global variable innodb_log_file_buffering that indicates whether the
file system cache on the redo log file is enabled. The default value is
innodb_log_file_buffering=OFF. If the server is started up with
innodb_flush_log_at_trx_commit=2, the value will be changed to
innodb_log_file_buffering=ON.
When a persistent memory interface is being used for the log,
the value cannot be changed from innodb_log_file_buffering=OFF.
On Linux, when the physical block size cannot be determined
to be a power of 2 between 64 and 4096 bytes, the file system cache
cannot be disabled, and innodb_log_file_buffering=ON cannot be changed.
Server log messages will indicate whether the file system cache is
enabled for the redo log:
[Note] InnoDB: Buffered log writes (block size=512 bytes)
[Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
After this change, the startup parameter innodb_flush_method will no
longer control whether O_DIRECT will be set on the redo log on Linux.
On other operating systems that support O_DIRECT, no interface has been
implemented for controlling the file system cache for the redo log.
The innodb_flush_method values O_DIRECT, O_DIRECT_NO_FSYNC, O_DSYNC
will enable O_DIRECT for data files, not the log.
Tested by: Matthias Leich, Axel Schwenke
The parameter innodb_prefix_index_cluster_optimization used to enable an
optimization that was added in cb37c55768
and was disabled by default.
We will unconditionally enable the extension and mark the parameter
as deprecated.
Related to this, the counters
Innodb_secondary_index_triggered_cluster_reads and
Innodb_secondary_index_triggered_cluster_reads_avoided
allowed to determine the usefulness of this optimization.
Now that the configuration parameter is disabled, the counters
do not serve any useful purpose and can be removed.
row_search_with_covering_prefix(): Fix a bug that caused an
incorrect result to be returned.
INNODB_VERSION_STR: Replaced with PACKAGE_VERSION (non-functional change).
INNODB_VERSION_SHORT: Replaced with direct use of
MYSQL_VERSION_MAJOR << 8 | MYSQL_VERSION_MINOR.
check_version(): Simplify the mariadb-backup version check,
and require the server version to be MariaDB 10.8 or later,
because that is when the InnoDB redo log format was last changed.
InnoDB buffer pool resize messages are more succinct from this change:
Before:
```
2022-05-07 17:10:33 0 [Note] InnoDB: Completed resizing buffer pool from 14745600 to 19660800 bytes.
2022-05-07 17:10:33 0 [Note] InnoDB: Completed resizing buffer pool.
2022-05-07 17:10:33 8 [Note] InnoDB: Completed resizing buffer pool. (New size: 19660800 bytes).
```
After:
```
2022-05-07 17:10:33 0 [Note] InnoDB: Completed resizing buffer pool from 14745600 to 19660800 bytes.
```
Additionally, the INNODB_BUFFER_POOL_RESIZE_STATUS has more complete
info: it contains both the old and new buffer pool size values.
Make two existing command line options "allow-suspicious-udfs" and
"skip-grant-tables" visible as global system variables.
Both options have security implications, but users were not able to check
their states in the server prior to this change. This was a security
issue, as the user may not be aware if the options are enabled. By adding
them into system variables, it increases users’ visibility into their
security configurations.
Create new MTR tests to verify that the system variables align with the
command line options. Minor adjustments to the existing MTR due to the new
members in system variables.
Before:
mysql> SHOW VARIABLES WHERE
Variable_Name LIKE 'allow_suspicious_udfs' OR
Variable_Name LIKE 'skip_grant_tables';
Empty set (0.000 sec)
After:
mysql> SHOW VARIABLES WHERE
Variable_Name LIKE 'allow_suspicious_udfs' OR
Variable_Name LIKE 'skip_grant_tables';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| allow_suspicious_udfs | OFF |
| skip_grant_tables | OFF |
+-----------------------+-------+
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
Arythmetic can overrun the uint type when possible group_concat_max_len
is multiplied to collation.mbmaxlen (can easily be like 4).
So use ulonglong there for calculations.
This is a backport of commit 4489a89c71
in order to remove the test innodb.redo_log_during_checkpoint
that would cause trouble in the DBUG subsystem invoked by
safe_mutex_lock() via log_checkpoint(). Before
commit 7cffb5f6e8
these mutexes were of different type.
The following options were introduced in
commit 2e814d4702 (mariadb-10.2.2)
and have little use:
innodb_disable_resize_buffer_pool_debug had no effect even in
MariaDB 10.2.2 or MySQL 5.7.9. It was introduced in
mysql/mysql-server@5c4094cf49
to work around a problem that was fixed in
mysql/mysql-server@2957ae4f99
(but the parameter was not removed).
innodb_page_cleaner_disabled_debug and innodb_master_thread_disabled_debug
are only used by the test innodb.redo_log_during_checkpoint
that will be removed as part of this commit.
innodb_dict_stats_disabled_debug is only used by that test,
and it is redundant because one could simply use
innodb_stats_persistent=OFF or the STATS_PERSISTENT=0 attribute
of the table in the test to achieve the same effect.