According to https://discussions.apple.com/thread/8256853
an attempt to use AVX512 registers on macOS will result in #UD
(crash at runtime).
Also, starting with clang-18 and GCC 14, we must add "evex512" to the
target flags so that AVX and SSE instructions can use AVX512 specific
encodings. This flag was introduced together with the avx10.1-512 target.
Older compiler versions do not recognize "evex512". We do not want to
write "avx10.1-512" because it could enable some AVX512 subfeatures
that we do not have any CPUID check for.
Reviewed by: Vladislav Vaintroub
Tested on macOS by: Valerii Kravchuk
The accounting of the limit variable that represents the
amount of space left it the buffer was incorrect.
Also there was 1 or 2 bytes left to write that occured without
the buffer length being checked.
Review: Sanja Byelkin
- ZLIB_LIBRARIES, not ZLIB_LIBRARY
- ZLIB_INCLUDE_DIRS, not ZLIB_INCLUDE_DIR
For building libmariadb, ZLIB_LIBRARY/ZLIB_INCLUDE_DIR are still defined
This workaround will be removed later.
This is based on https://github.com/intel/intel-ipsec-mb/
and has been tested both on x86 and x86-64, with code that
was generated by several versions of GCC and clang.
GCC 11 or clang 8 or later should be able to compile this,
and so should recent versions of MSVC.
Thanks to Intel Corporation for providing access to hardware,
for answering my questions regarding the code, and for
providing the coefficients for the CRC-32C computation.
crc32_avx512(): Compute a reverse polynomial CRC-32 using
precomputed tables and carry-less product, for up to 256 bytes
of unaligned input per loop iteration.
Reviewed by: Vladislav Vaintroub
In our unit test, let us rely on our own reference
implementation using the reflected
CRC-32 ISO 3309 and CRC-32C polynomials. Let us also
test with various lengths.
Let us refactor the CRC-32 and CRC-32C implementations
so that no special compilation flags will be needed and
that some function call indirection will be avoided.
pmull_supported: Remove. We will have pointers to two separate
functions crc32c_aarch64_pmull() and crc32c_aarch64().
The problem was that the signal thread was not killed when using
unireg_abort().
The bug was introduced by:
MDEV-30260: Slave crashed:reload_acl_and_cache during shutdown
Other things fixed:
- Don't produce memory leaks with safemalloc if all threads was not
ended properly (not useful)
- Add additional MTRs for more coverage on invalid options
- Updating a few error messages to be more informative
- Use the exit code from handle_options() when there is an error processing
user options
All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.
When passing in an invalid value (e.g. incorrect data type) for a variable, the
server startup will fail with misleading error messages.
The behavior **before** this change:
For server options:
- The error message will indicate that the argument is being adjusted to a valid value
- Server startup still fails
For plugin options:
- The error message will indicate that the argument is being adjusted to a valid value
- The plugin is still disabled
- Server startup fails with a message that it does not recognize the plugin option
The behavior **after** this change:
For server options:
- Output that an invalid argument was provided
- Exit server startup
For plugin options:
- Output that an invalid argument was provided
- Disable the plugin
- Attempt to continue server startup
All new code of the whole pull request, including one or several files that are
either new files or modified ones, are contributed under the BSD-new license. I
am contributing on behalf of my employer Amazon Web Services, Inc.
Fixed that internal temporary tables are not waiting for freed disk space.
Other things:
- 'kill id' will now kill a query waiting for free disk space instantly.
Before it could take up to 60 seconds for the kill would be noticed.
- Fixed that sorting one index is not using MY_WAIT_IF_FULL for temp files.
- Fixed bug where share->write_flag set MY_WAIT_IF_FULL for temp files.
It is quite hard to do a test case for this. Instead I tested all
combinations interactively.
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
MDEV-33502 Slowdown when running nested statement with many partitions
caused this error as I failed to take into account bigendian architectures.
This patch also introduces bitmap_import() and bitmap_export() to be used
when one wants to store bitmaps in files/logs in a portable way.
Reviewed-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Previously, the behavior was to error out on the first invalid option
encountered. With this change, a best effort approach is made so that
all invalid options processed will be printed before exiting.
There is a caveat. The options are processed many times at varying
stages of server startup because the server is not aware of all valid
options immediately (e.g. plugins have to be loaded first before the
server knows what are the available plugin options). So, there are some
options that the server can determine are invalid "early" on, and there
are some options that the server cannot determine are invalid until
"later" on. For example, the server can determine an option such as
`--a` is an ambiguous option very early on but an option such as
`--this-does-not-match-any-option` cannot be labelled as invalid until
the server is aware of all available options.
Thus, it is possible that the server will still fail before printing out
all "invalid" options. You can see this by passing `--a
--obvious-invalid-option`.
Test cases were added to `mysqld_option_err.test` to validate that
multiple invalid options will be displayed in the error message.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services.
MDEV-33502 Slowdown when running nested statement with many partitions
This change was triggered to help some MariaDB users with close to
10000 bits in their bitmaps.
- Change underlaying storage to be 64 bit instead of 32bit.
- This reduses number of loops to scan bitmaps.
- This can cause some bitmaps to be 4 byte large.
- Ensure that all not used top-bits are always 0 (simplifes code as
the last 64 bit storage is not a special case anymore).
- Use my_find_first_bit() to find the first set bit which is much faster
than scanning trough things byte by byte and then bit by bit.
Other things:
- Added a bool to remember if my_bitmap_init() did allocate the bitmap
array. my_bitmap_free() will only free arrays it did allocate.
This allowed me to remove setting 'bitmap=0' before calling
my_bitmap_free() for cases where the bitmap's where allocated externally.
- my_bitmap_init() sets bitmap to 0 in case of failure.
- Added 'universal' asserts to most bitmap functions.
- Change all remaining calls to bitmap_init() to my_bitmap_init().
- To finish the change from 2014.
- Changed all usage of uint32 in my_bitmap.h to my_bitmap_map.
- Updated bitmap_copy() to handle bitmaps of different size.
- Removed const from bitmap_exists_intersection() as this caused casts
on all usage.
- Removed not used function bitmap_set_above().
- Renamed create_last_word_mask() to create_last_bit_mask() (to match
name changes in my_bitmap.cc)
- Extended bitmap-t with test for more bitmap functions.
Apparently, invoking fcntl(fd, F_SETFL, O_DIRECT) will lead to
unexpected behaviour on Linux bcachefs and possibly other file systems,
depending on the operating system version. So, let us avoid doing that,
and instead just attempt to pass the O_DIRECT flag to open(). This should
make us compatible with NetBSD, IBM AIX, as well as Solaris and its
derivatives.
This fix does not change the fact that we had only implemented
innodb_log_file_buffering=OFF on systems where we can determine the
physical block size (typically 512 or 4096 bytes).
Currently, those operating systems are Linux and Microsoft Windows.
HAVE_FCNTL_DIRECT, os_file_set_nocache(): Remove.
OS_FILE_OVERWRITE, OS_FILE_CREATE_PATH: Remove (never used parameters).
os_file_log_buffered(), os_file_log_maybe_unbuffered(): Helper functions.
os_file_create_simple_func(): When applicable, initially attempt to
open files in O_DIRECT mode.
os_file_create_func(): When applicable, initially attempt to
open files in O_DIRECT mode.
For type==OS_LOG_FILE && create_mode != OS_FILE_CREATE
we will first invoke stat(2) on the file name to find out if the size
is compatible with O_DIRECT. If create_mode == OS_FILE_CREATE, we will
invoke fstat(2) on the created log file afterwards, and may close and
reopen the file in O_DIRECT mode if applicable.
create_temp_file(): Support O_DIRECT. This is only used if O_TMPFILE is
available and innodb_disable_sort_file_cache=ON (non-default value).
Notably, that setting never worked on Microsoft Windows.
row_merge_file_create_mode(): Split from row_merge_file_create_low().
Create a temporary file in the specified mode.
Reviewed by: Vladislav Vaintroub
When testing MariaDB on Arm64, a stall issue will occur, jira link:
https://jira.mariadb.org/browse/MDEV-28430.
The stall occurs because of an unexpected circular reference in the
LF_PINS->purgatory list which is traversed in lf_pinbox_real_free().
We found that on Arm64, ABA problem in LF_ALLOCATOR->top list was not
solved, and various undefined problems will occur, including circular
reference in LF_PINS->purgatory list.
The following codes are used to solve ABA problem, code copied
from below link.
cb4c271355/mysys/lf_alloc-pin.c (L501-)#L505
do
{
503 node= allocator->top;
504 lf_pin(pins, 0, node);
505 } while (node != allocator->top && LF_BACKOFF());
1. ABA problem on Arm64
Combine the below steps to analyze how ABA problem occur on Arm64, the
relevant codes in steps are simplified, code line numbers below are in
MariaDB v10.4.
------------------------------------------------------------------------
Abnormal case.
Initial state: pin = 0, top = A, top list: A->B
T1 T2
step1. write top=B //seq-cst, #L517
step2. write A->next= "any"
step3. read pin==0 //relaxed, #L295
step1. write pin=A //seq-cst, #L504
step2. read old value of top==A //relaxed, #L505
step3. next=A->next="any" //#L517
step4. write A->next=B,top=A //#L420-435
step4. CAS(top,A,next) //#L517
step5. write pin=0 //#L521
------------------------------------------------------------------------
Above case is due to T1.step2 reading the old value of top, causing
"T1.step3, T1.step4" and "T2.step4" to occur at the same time, in other
words, they are not mutually exclusive.
It may happen that T2.step4 is sandwiched between T1.step3 and T1.step4,
which cause top to be updated to "any", which may be in-use or invalid
address.
2. Analyze above issue with Dekker's algorithm
Above problem can be mapped to Dekker's algorithm, link is as below
https://en.wikipedia.org/wiki/Dekker%27s_algorithm.
The following extracts the read and write operations on 'top' and 'pin',
and maps them to Dekker's algorithm to analyze the root cause.
------------------------------------------------------------------------
Initial state: top = A, pin = 0
T1 T2
store_seq_cst(pin, A) // write pin store_seq_cst(top, B) //write top
rt= load_relaxed(top) // read top rp= load_relaxed(pin) //read pin
if (rt == A && rp == 0) printf("oops\n"); // will "oops" be printed?
------------------------------------------------------------------------
How T1 and T2 enter their critical section:
(1) T1, write pin, if T1 reads that top has not been updated, T1 enter
its critical section(T1.step3 and T1.step4, try to obtain 'A', #L517),
otherwise just give up (T1 without priority).
(2) T2, write top, if T2 reads that pin has not been updated, T2 enter
critical section(T2.step4, try to add 'A' to top list again, #L420-435),
otherwise wait until pin!=A (T2 with priority).
In the previous code, due to load 'top' and 'pin' with relaxed semantic,
on arm and ppc, there is no guarantee that the above critical sections
are mutually exclusive, in other words, "oops" will be printed.
This bug only happens on arm and ppc, not x86. On current x86
implementation, load is always seq-cst (relaxed and seq-cst load
generates same machine code), as shown in https://godbolt.org/z/sEzMvnjd9
3. Fix method
Add sequential-consistency semantic to read 'top' in #L505(T1.step2),
Add sequential-consistency semantic to read "el->pin[i]" in #L295
and #L320.
4. Issue reproduce
Add "delay" after #L503 in lf_alloc-pin.c, When run unit.lf, can quickly
get segment fault because "top" point to an invalid address. For detail,
see comment area of below link.
https://jira.mariadb.org/browse/MDEV-28430.
5. Futher improvement
To make this code more robust and safe on all platforms, we recommend
replacing volatile with C11 atomics and to fix all data races. This will
also make the code easier to reason.
Signed-off-by: Xiaotong Niu <xiaotong.niu@arm.com>
The old code collected a list of THD's, locked the THD's from getting
deleted by locking two mutex and then later in a separate loop
sent a kill signal to each THD.
The problem with this approach is that, as THD's can be reused,
the second time the THD is killed, the mutex can be taken in
different order, which signals failures in safe_mutex.
Fixed by sending the kill signal directly and not collect the THD's
in a list to be signaled later. This is the same approach we are using
in kill_zombie_dump_threads().
Other things:
- Reset safe_mutex_t->locked_mutex when freed (Safety fix)
Other things:
- Added DBUG_EXECUTE_IF("print_allocated_thread_memory") at end of query
to easier find not freed memory allocated by THD
- Removed free_root() from plugin_init() that did nothing.
The leaks are all 40 bytes and happens in this call stack when running
mtr vcol.vcol_syntax:
alloc_root()
...
Virtual_column_info::fix_and_check_exp()
...
Delayed_insert::get_local_table()
The problem was that one copied a MEM_ROOT from THD to a TABLE without
taking into account that new blocks would be allocated through the
TABLE memroot (and would thus be leaked).
In general, one should NEVER copy MEM_ROOT from one object to another
without clearing the copied memroot!
Fixed by, at end of get_local_table(), copy all new allocated objects
to client_thd->mem_root.
Other things:
- Removed references to MEM_ROOT::total_alloc that was wrongly left
after a previous commit
Also fixes: MDEV-30050 Inconsistent results of DISTINCT with NOPAD
Problem:
Key segments for CHAR columns where compared using strnncollsp()
for engines MyISAM and Aria.
This did not work correct in case if the engine applyied trailing
space compression.
Fix:
Replacing ha_compare_text() calls to new functions:
- ha_compare_char_varying()
- ha_compare_char_fixed()
- ha_compare_word()
- ha_compare_word_prefix()
- ha_compare_word_or_prefix()
The code branch corresponding to comparison of CHAR column keys
(HA_KEYTYPE_TEXT segment type) now uses ha_compare_char_fixed()
which calls strnncollsp_nchars().
This patch does not change the behavior for the rest of the code:
- comparison of VARCHAR/TEXT column keys
(HA_KEYTYPE_VARTEXT1, HA_KEYTYPE_VARTEXT2 segments types)
- comparison in the fulltext code
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
Raise notes if indexes cannot be used:
- in case of data type or collation mismatch (diferent error messages).
- in case if a table field was replaced to something else
(e.g. Item_func_conv_charset) during a condition rewrite.
Added option to write warnings and notes to the slow query log for
slow queries.
New variables added/changed:
- note_verbosity, with is a set of the following options:
basic - All old notes
unusable_keys - Print warnings about keys that cannot be used
for select, delete or update.
explain - Print unusable_keys warnings for EXPLAIN querys.
The default is 'basic,explain'. This means that for old installations
the only notable new behavior is that one will get notes about
unusable keys when one does an EXPLAIN for a query. One can turn all
of all notes by either setting note_verbosity to "" or setting sql_notes=0.
- log_slow_verbosity has a new option 'warnings'. If this is set
then warnings and notes generated are printed in the slow query log
(up to log_slow_max_warnings times per statement).
- log_slow_max_warnings - Max number of warnings written to
slow query log.
Other things:
- One can now use =ALL for any 'set' variable to set all options at once.
For example using "note_verbosity=ALL" in a config file or
"SET @@note_verbosity=ALL' in SQL.
- mysqldump will in the future use @@note_verbosity=""' instead of
@sql_notes=0 to disable notes.
- Added "enum class Data_type_compatibility" and changing the return type
of all Field::can_optimize*() methods from "bool" to this new data type.
Reviewer & Co-author: Alexander Barkov <bar@mariadb.com>
- The code that prints out the notes comes mainly from Alexander