The parameters innodb_thread_concurrency and innodb_commit_concurrency
were useful years ago when both computing resources and the implementation
of some shared data structures were limited. MySQL 5.0 or 5.1 had trouble
scaling beyond 8 concurrent connections. Most of the scalability bottlenecks
have been removed since then, and the transactions per second delivered
by MariaDB Server 10.5 should not dramatically drop upon exceeding the
'optimal' number of connections.
Hence, enabling any concurrency throttling for InnoDB actually makes
things worse. We have seen many customers mistakenly setting this to a
small value like 16 or 64 and then complaining the server was slow.
Ignoring the parameters allows us to remove some normally unused code
and data structures, which could slightly improve performance.
innodb_thread_concurrency, innodb_commit_concurrency,
innodb_replication_delay, innodb_concurrency_tickets,
innodb_thread_sleep_delay, innodb_adaptive_max_sleep_delay:
Deprecate and ignore; hard-wire to 0.
The column INFORMATION_SCHEMA.INNODB_TRX.trx_concurrency_tickets
will always report 0.
MDEV-22871 refactored the InnoDB buf_pool.page_hash to use a simple
rw-lock implementation that avoids a spinloop between non-contended
read-lock requests, simply using std::atomic::fetch_add() for the
lock acquisition.
Alas, in a write-heavy stress test on a 56-core system with 1,000
concurrent client connections, the server would stop processing
any transactions every now and then. The reason turned out to be
false sharing. Attaching a debugger to the server during one such
hang revealed that 22 of the 1,033 threads were polling in
page_hash_latch::read_lock_wait() on the same object, which appeared
to be in unlocked state (no readers or writers). All 22 requests were
for accessing an undo log page, with a distinct page number.
To eliminate such false sharing, we will make buf_pool.page_hash.array
contain one page_hash_latch per CPU data cache line. On AMD64, this
will pad the size of the array by 8/7, or almost 15%. For a 50GiB
buffer pool of 16KiB pages, the buf_pool.page_hash.array would
grow from 25MiB to 28.6MiB. On other instruction set architectures,
the incurred memory overhead may be smaller.
Thanks to Vladislav Vaintroub for noticing this anomaly.
On MariaDB 10.4 (commit 4db4b77365),
the query results would not be sorted, which creates random result
differences. Let us explicitly sort the results already in 10.3
in order to avoid future merge trouble.
- Include a link to the relevant KB article for more info
- Use spaced around the equal sign for better readability and so that
the examples are more aligned with the general style in the KB
- Load plugins with just the base name, the .so is optional and excess
Verified by running before and after:
mariadb --skip-column-names -e "select plugin_name, plugin_status,
plugin_type, plugin_library, plugin_license from
information_schema.all_plugins order by plugin_name, plugin_library"
Nothing else but exactly this line changed so there are no side effects:
-S3 NOT INSTALLED STORAGE ENGINE ha_s3.so GPL
+S3 ACTIVE STORAGE ENGINE ha_s3.so GPL
Also enrich config file with link to KB and unify option syntax and
standard comments.
Fixing ROUND(date,0), TRUNCATE(date,x), FLOOR(date), CEILING(date)
to return the `int(8) unsigned` data type.
Details:
1. Cleanup: moving virtual implementations
- Type_handler_temporal_result::Item_func_int_val_fix_length_and_dec()
- Type_handler_temporal_result::Item_func_round_fix_length_and_dec()
to Type_handler_date_common. Other temporal data type handlers
override these methods anyway. So they were only DATE specific.
This change makes the code clearer.
2. Backporting DTCollation_numeric from 10.5, to reuse the code easier.
3. Adding the `preferred_attrs` argument to Item_func_round::fix_arg_int(). Now
Type_handler_xxx::Item_func_round_val_fix_length_and_dec() work as follows:
- The INT-alike and YEAR handlers copy preferred_attrs from args[0].
- The DATE handler passes explicit attributes, to get `int(8) unsigned`.
- The hex hybrid handler passes NULL, so fix_arg_int() calculates attributes.
4. Type_handler_date_common::Item_func_int_val_fix_length_and_dec()
now sets the type handler and attributes to get `int(8) unsigned`.
In case of NATURAL JOIN / USING mark all field (one table can not be opened
in any case so optimisation does not worth it).
IMHO table should be checked for used fields and filled after prepare,
when we will fave whole info about used fields but it is too big change
for a bugfix. Which will be made later by Serg patch
The purpose of the InnoDB doublewrite buffer is to make InnoDB
tolerant against cases where the server was killed in the middle
of a page write. (In Linux, killing a process may interrupt a
write system call, typically on a 4096-byte boundary.)
There may exist multiple copies of a page number in the doublewrite
buffer. Recovery should choose the latest valid copy of the page.
By design, the FIL_PAGE_LSN must not precede the latest checkpoint LSN
nor be later than the end of the recovered log.
For page_compressed and encrypted pages, we were missing proper
consistency checks. In the 10.4 data set generated for in MDEV-23231,
the data file contained a valid page_compressed page, and an
identical copy of that page was also present in the doublewrite
buffer. But, recovery would incorrectly consider the page invalid
and restore an uncompressed copy of the same page that had been
written before the log checkpoint. (In fact, no redo log was to
be applied to that page.)
buf_dblwr_process(): Validate the FIL_PAGE_LSN in the doublewrite
buffer pages, and always skip page 0, because those pages should
have been recovered by Datafile::restore_from_doublewrite() if
necessary.
Datafile::restore_from_doublewrite(): Choose the latest applicable
page from the doublewrite buffer.
recv_dblwr_t::find_page(): Also validate encrypted or
page_compressed pages.
recv_dblwr_t::validate_page(): New function to validate a page,
either a copy in a data file or in the doublewrite buffer.
Also validate encrypted or page_compressed pages.
This is joint work with Thirunarayanan Balathandayuthapani.
row_vers_impl_x_locked_low(): clust_offsets may point to memory
that is allocated by mem_heap_alloc() and may have been freed.
For initializing clust_offsets, try to use the stack-allocated
buffer instead of a pointer that may point to freed memory.
This fixes a regression that was introduced in
commit f0aa073f2b (MDEV-20950).
Call mark_columns_per_binlog_row_image before find_row() to set up table->vcol_set early,
so the virtual column value will be updated after record read (ha_rnd_pos/ha_index_next/etc)
by table->update_virtual_fields() call
Classes that handle privilege tables (like Tables_priv_table)
could read some columns conditionally but they expect a certain
minimal number of colunms always to exist.
Add a check for a minimal required number of columns in privilege tables,
don't use a table that has fewer columns than required.
1. Fixing ROUND(x) and TRUNCATE(x,0) with TINYINT, SMALLINT, MEDIUMINT, BIGINT
input to preserve the exact data type of the argument when it's possible.
2. Fixing FLOOR(x) and CEILING(x) with TINYINT, SMALLINT, MEDIUMINT, BIGINT
to preserve the exact data type of the argument.
3. Adding dedicated Type_handler_year::Item_func_round_fix_length_and_dec()
to easier handle ROUND(x) and TRUNCATE(x,y) for the YEAR(2) and YEAR(4)
input. They still return INT(2) UNSIGNED and INT(4) UNSIGNED correspondingly,
as before.
rec_get_nth_cfield(): Remove a bogus debug assertion.
The function may be invoked by innobase_rec_to_mysql()
for reporting a duplicate key error during CREATE UNIQUE INDEX
or ALTER TABLE...ADD UNIQUE KEY, and in that case the record
will be missing the 5-byte or 6-byte fixed header.
It turns out that in every other code path leading to
rec_get_nth_cfield() we either invoked rec_get_offsets()
ourselves or asserted rec_offs_validate(). So, we can
safely remove the assertion and make debug builds
smaller and faster.
MariaDB adopted a hardware optimized crc32c approach on ARM64 starting 10.5.
Said implementation of crc32c needs support from target hardware for crc32
and pmull instructions. Existing logic is checking only for crc32 support
from target hardware through a runtime check and so if target hardware
doesn't support pmull it would cause things to fail/crash.
Expanded runtime check to ensure pmull support is also checked on the target
hardware along with existing crc32.
Thanks to Marko and Daniel for review.
Problem:
========
In row_merge_drop_indexes(), InnoDB drops only the index from
dictionary and frees the index pages but it maintains the index
object if the table is being used by other DML threads. It sets
the online status of the index to ONLINE_INDEX_ABORTED_DROPPED.
Removing the index from dictionary doesn't remove the
corressponding ahi entries of the index. When block is being
reused, InnoDB tries to remove ahi entries for the block and
it fails if index online status is ONLINE_INDEX_ABORTED_DROPPED.
Fix:
====
MDEV-22456 allows the index ahi entries to be dropped lazily.
so checking online status in btr_search_drop_page_hash_index()
is meaningless and should be removed.
Due to restricted size of the threadpool, execution of client queries can
be delayed (queued) for a while. This delay was interpreted as client
inactivity, and connection is closed, if client idle time + queue time
exceeds wait_timeout.
But users did not expect queue time to be included into wait_timeout.
This patch changes the behavior. We don't close connection anymore,
if there is some unread data present on connection,
even if wait_timeout is exceeded. Unread data means that client
was not idle, it sent a query, which we did not have time to process yet.
Implementing dedicated fixing methods:
- Type_handler_bit::Item_func_round_fix_length_and_dec()
- Type_handler_bit::Item_func_int_val_fix_length_and_dec()
- Type_handler_typelib::Item_func_round_fix_length_and_dec()
because the inherited methods did not work well.
Fixing:
- Type_handler_typelib::Item_func_int_val_fix_length_and_dec
It did not work well, because it used args[0]->max_length to
calculate the result data type. In case of ENUM and SET it was
not correct, because in FLOOR() and CEILING() context
ENUM and SET return not more than 5 digits (65535 is the biggest
possible value).
Misc:
- Changing the API of
Type_handler_bit::Bit_decimal_notation_int_digits(const Item *item)
to a more generic form:
Type_handler_bit::Bit_decimal_notation_int_digits_by_nbits(uint nbits)
- Fixing Type_handler_bit::Bit_decimal_notation_int_digits_by_nbits() to
return the exact number of decimal digits for all nbits 1..64.
The old implementation was approximate.
This change gives better (more precise) data types.
- DEB_BUILD_HARDENING is only used with hardening-wrapper which is
deprecated in Debian, so remove it
- The word 'terse' should be checked in DEB_BUILD_OPTIONS and verbosity
controlled by it
- Remove excess build flag from debian/rules that does nothing, introduced
wrongly in commit 7cbde2d0a2. Instead
implement the embedded server build skipping on Travis-CI correctly.
- Simplify structure by doing all Travis-CI slimdown in one step.
- Remove unnecessary -e from sed, as it does nothing. When regex is needed,
use -r. Move -i to last so it is close to the file name it has an
argument.
- Remove backwards compat checks that are no longer relevant as neither
Debian Jessie (was before Stretch) nor Ubuntu Trusty (before Xenial)
are supported nor built for anymore. For example the GCC 4.8 check
if not relevant anymore, since Debian Jessie already has 4.9 and
Ubuntu Xenial has 5.3 and there is no GCC < 4.8 around anymore.
- Skip building ColumnStore on both Travis-CI and Gitlab-CI as it is
way too slow (time) and big (disk space) to pass.
- Remove unnecessary unused files
- Remove duplicate encryption configuration sample from sources and
re-use the identical file in RPM directory instead
- Clean away harmful "default-character-set = utf8mb4" from client config
as it is unnecassary (server enforces utf8mb4 anyway by default) and
could cause issues with mysqlbinlog and other tools (MDEV-22981).
- Update S3 plugin description to be long enough
- Remove trailing whitespace from support-files and Debian packaging.
- Clean away fixed Lintian issues
- Clean away temporary Salsa-CI fixes now that 10.5.4 is out and is fixed
- Apply wrap-and-sort -a -v
- Type_handler_hex_hybrid did not override
Type_handler_string_result::Item_func_round_fix_length_and_dec(),
so the result type of ROUND(0xFFFFFFFFFFFFFFFF) was erroneously
calculated ad DOUBLE with a wrong length.
Overriding Item_func_round_fix_length_and_dec(), to calculated
the result type as INT/BIGINT.
Also, fixing Item_func_round::fix_arg_int() to use
args[0]->decimal_precision() instead of args[0]->max_length
when calculating this->max_length, to get a correct result
for hex hybrids.
- Type_handler_hex_hybrid::Item_func_int_val_fix_length_and_dec()
called item->fix_length_and_dec_int_or_decimal(), which did not
produce a correct result data type for hex hybrid.
Implementing a dedicated code instead, to return INT UNSIGNED or
BIGINT UNSIGNED depending in the number of digits in the arguments.
I run perf top during ./mtr testing and constantly see times()
function there. It's so slow, that it has no sense to run it
in a loop too many times.
This patch speeds up -suite=innodb for me from 218s to 208s.
9s of times() function!