Commit graph

10135 commits

Author SHA1 Message Date
Lawrin Novitsky
1ff476b415 MDEV-29490 Renaming internally used client API to avoid name conflicts
with C/C.
The patch introduces mariadb_capi_rename.h which is included into
mysql.h. The hew header contains macro definitions for the names being
renamed. In versions 10.6+(i.e. where sql service exists) the renaming
condition in the mariadb_capi_rename.h should be added with
&& !defined(MYSQL_DYNAMIC_PLUGIN)
and look like
The patch also contains removal of mysql.h from the api check.

Disabling false_duper-6543 test for embedded.

ha_federated.so uses C API. C API functions are being renamed in the server,
but not renamed in embedded, since embedded server library should have proper
C API, as expected by programs using it.
Thus the same ha_federated.so cannot work both for server and embedded
server library.

As all federated tests are already disabled for embedded,
federated isn't supposed to work for embedded anyway, and thus the test
is being disabled.
2022-10-25 14:00:21 +02:00
Marko Mäkelä
9a0b9e3360 Merge 10.4 into 10.5 2022-10-25 11:26:37 +03:00
Marko Mäkelä
667d3fbbb5 Merge 10.3 into 10.4 2022-10-25 10:04:37 +03:00
Alexander Barkov
2a57396e59 MDEV-29481 mariadb-upgrade prints confusing statement
This is a new version of the patch instead of the reverted:

  MDEV-28727 ALTER TABLE ALGORITHM=NOCOPY does not work after upgrade

Ignore the difference in key packing flags HA_BINARY_PACK_KEY and HA_PACK_KEY
during ALTER to allow ALGORITHM=INSTANT and ALGORITHM=NOCOPY in more cases.

If for some reasons (e.g. due to a bug fix such as MDEV-20704) these
cumulative (over all segments) flags in KEY::flags are different for
the old and new table inside compare_keys_but_name(), the difference
in HA_BINARY_PACK_KEY and HA_PACK_KEY in KEY::flags is not really important:

MyISAM and Aria can handle such cases well: per-segment flags are stored in
MYI and MAI files anyway and they are read during ha_myisam::open()
ha_maria::open() time. So indexes get opened with correct per-segment
flags that were calculated during the table CREATE time, no matter
what the old (CREATE time) and new (ALTER TIME) per-index compression
flags are, and no matter if they are equal or not.

All other engine ignore key compression flags, so this change
is safe for other engines as well.
2022-10-22 14:22:20 +04:00
Brad Smith
5f25a91140
Cleanup the alloca.h header handling to further reduce hardcoded OS lists (#2289) 2022-10-16 18:44:51 +01:00
Sergei Golubchik
3a2116241b Merge branch '10.4' into 10.5 2022-10-02 14:38:13 +02:00
Sergei Golubchik
d4f6d2f08f Merge branch '10.3' into 10.4 2022-10-01 23:07:26 +02:00
Oleksandr Byelkin
f65ba9aeb7 MDEV-17124: mariadb 10.1.34, views and prepared statements: ERROR 1615 (HY000): Prepared statement needs to be re-prepared
The problem is that if table definition cache (TDC) is full of real tables
which are in tables cache, view definition can not stay there so will be
removed by its own underlying tables.
In situation above old mechanism of detection matching definition in PS
and current version always require reprepare and so prevent executing
the PS.

One work around is to increase TDC, other - improve version check for
views/triggers (which is done here). Now in suspicious cases we check:
 - timestamp (microseconds) of the view to be sure that version really
   have changed;
 - time (microseconds) of creation of a trigger related to time
   (microseconds) of statement preparation.
2022-09-30 12:11:37 +02:00
Sergei Golubchik
6b685ea7b0 correctness assert
thd_get_ha_data() can be used without a lock, but only from the
current thd thread, when calling from anoher thread it *must*
be protected by thd->LOCK_thd_data

* fix group commit code to take thd->LOCK_thd_data
* remove innobase_close_connection() from the innodb background thread,
  it's not needed after 87775402cd and was failing the assert with
  current_thd==0
2022-09-29 10:44:39 +02:00
Marko Mäkelä
6286a05d80 Merge 10.4 into 10.5 2022-09-26 13:34:38 +03:00
Marko Mäkelä
a69cf6f07e MDEV-29613 Improve WITH_DBUG_TRACE=OFF
In commit 28325b0863
a compile-time option was introduced to disable the macros
DBUG_ENTER and DBUG_RETURN or DBUG_VOID_RETURN.

The parameter name WITH_DBUG_TRACE would hint that it also
covers DBUG_PRINT statements. Let us do that: WITH_DBUG_TRACE=OFF
shall disable DBUG_PRINT() as well.

A few InnoDB recovery tests used to check that some output from
DBUG_PRINT("ib_log", ...) is present. We can live without those checks.

Reviewed by: Vladislav Vaintroub
2022-09-23 13:40:42 +03:00
Marko Mäkelä
0792aff161 Merge 10.4 into 10.5 2022-09-20 13:17:02 +03:00
Marko Mäkelä
0c0a569028 Merge 10.3 into 10.4 2022-09-20 12:38:25 +03:00
Vladislav Vaintroub
32bab2ce05 MDEV-29543 Windows: Unreadable dlerror() message on localized OS
Force using english for error messages (i.e ASCII) to avoid encoding
mixup.
2022-09-15 09:39:05 +02:00
Jan Lindström
ba987a46c9 Merge 10.4 into 10.5 2022-09-05 13:28:56 +03:00
Daniele Sciascia
2917bd0d2c Reduce compilation dependencies on wsrep_mysqld.h
Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2022-08-31 11:05:23 +03:00
Oleksandr Byelkin
b043e1098e Merge branch 'merge-perfschema-5.7' into 10.5 2022-08-02 09:34:15 +02:00
Oleksandr Byelkin
61d08f7427 mysql-5.7.39 2022-07-29 14:48:01 +02:00
Vladislav Vaintroub
8a9c1e9ccf MDEV-25785 Add support for OpenSSL 3.0
Summary of changes

- MD_CTX_SIZE is increased

- EVP_CIPHER_CTX_buf_noconst(ctx) does not work anymore, points
  to nobody knows where. The assumption made previously was that
  (since the function does not seem to be documented)
  was that it points to the last partial source block.
  Add own partial block buffer for NOPAD encryption instead

- SECLEVEL in CipherString in openssl.cnf
  had been downgraded to 0, from 1, to make TLSv1.0 and TLSv1.1 possible
   (according to https://github.com/openssl/openssl/blob/openssl-3.0.0/NEWS.md
   even though the manual for SSL_CTX_get_security_level claims that it
   should not be necessary)

- Workaround Ssl_cipher_list issue, it now returns TLSv1.3 ciphers,
  in addition to what was set in --ssl-cipher

- ctx_buf buffer now must be aligned to 16 bytes with openssl(
  previously with WolfSSL only), ot crashes will happen

- updated aes-t , to be better debuggable
  using function, rather than a huge multiline macro
  added test that does "nopad" encryption piece-wise, to test
  replacement of EVP_CIPHER_CTX_buf_noconst

part of MDEV-29000
2022-07-04 12:49:11 +02:00
Marko Mäkelä
4faef6e240 Cleanup: Remove IF_VALGRIND
The purpose of the compress() wrapper my_compress_buffer() was twofold:
silence Valgrind warnings about uninitialized memory access before
zlib 1.2.4, and have PERFORMANCE_SCHEMA instrumentation of some zlib
related memory allocation. Because of PERFORMANCE_SCHEMA, we cannot
trivially replace my_compress_buffer() with compress().

az_open(): Remove a crc32() call. Any CRC of the empty string is 0.
2022-04-25 09:40:40 +03:00
Marko Mäkelä
cacb61b6be Merge 10.4 into 10.5 2022-04-06 10:06:39 +03:00
Marko Mäkelä
d6d66c6e90 Merge 10.3 into 10.4 2022-04-06 08:59:09 +03:00
Marko Mäkelä
7c584d8270 Merge 10.2 into 10.3 2022-04-06 08:06:35 +03:00
Daniel Black
75b9014fed MDEV-26136: Correct AIX/macOS cast warning (my_time.h)
tv_usec is a (suseconds_t) so we cast to it. Prevents the AIX(gcc-10) warning:

include/my_time.h: In function 'void my_timeval_trunc(timeval*, uint)':
include/my_time.h:249:65: warning: conversion from 'long int' to 'suseconds_t' {aka 'int'} may change value [-Wconversion]
  249 |   tv->tv_usec-= my_time_fraction_remainder(tv->tv_usec, decimals);
      |

macOS is: conversion from 'long int' to '__darwin_suseconds_t' {aka 'int'} may change value

On Windows suseconds_t isn't defined so we use the existing
long return type of my_time_fraction_remainder.

Reviewed by Marko Mäkelä

Closes: #2079
2022-04-04 08:31:40 +10:00
Marko Mäkelä
d62b0368ca Merge 10.4 into 10.5 2022-03-29 12:59:18 +03:00
Marko Mäkelä
cf483a7766 MDEV-17441 fixup: Remove unused my_atomic long macros 2022-03-24 09:53:52 +02:00
Marko Mäkelä
59359fb44a MDEV-24841 Build error with MSAN use-of-uninitialized-value in comp_err
The MemorySanitizer implementation in clang includes some built-in
instrumentation (interceptors) for GNU libc. In GNU libc 2.33, the
interface to the stat() family of functions was changed. Until the
MemorySanitizer interceptors are adjusted, any MSAN code builds
will act as if that the stat() family of functions failed to initialize
the struct stat.

A fix was applied in
https://reviews.llvm.org/rG4e1a6c07052b466a2a1cd0c3ff150e4e89a6d87a
but it fails to cover the 64-bit variants of the calls.

For now, let us work around the MemorySanitizer bug by defining
and using the macro MSAN_STAT_WORKAROUND().
2022-03-14 09:28:55 +02:00
Vlad Lesin
a112a80b47 Merge 10.4 into 10.5 2022-02-22 10:35:16 +03:00
Vlad Lesin
f6f055a191 Merge 10.3 into 10.4 2022-02-21 14:10:27 +03:00
Nayuta Yanagisawa
66f55a018b MDEV-27730 Add PLUGIN_VAR_DEPRECATED flag to plugin variables
The sys_var class has the deprecation_substitute member to mark the
deprecated variables. As it's set, the server produces warnings when
these variables are used. However, the plugin has no means to utilize
that functionality.

So, the PLUGIN_VAR_DEPRECATED flag is introduced to set the
deprecation_substitute with the empty string. A non-empty string can
make the warning more informative, but there's no nice way seen to
specify it, and not that needed at the moment.
2022-02-18 13:10:20 +09:00
Oleksandr Byelkin
34c5019698 Merge branch '10.5' into bb-10.5-release 2022-02-09 08:57:41 +01:00
Monty
88fb89acb7 Fixes some compiler issues on AIX ( 2022-02-08 14:32:28 +02:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Oleksandr Byelkin
ca41fdba22 fix MDEV-27217 (4d5ae2b325)
Add error from later versions to avoid chaging HA_ERR_*
accross versions and in already released versions.
2022-01-30 17:52:15 +01:00
Oleksandr Byelkin
a576a1cea5 Merge branch '10.3' into 10.4 2022-01-30 09:46:52 +01:00
Oleksandr Byelkin
41a163ac5c Merge branch '10.2' into 10.3 2022-01-29 15:41:05 +01:00
Oleksandr Byelkin
880d543554 Merge branch 'merge-perfschema-5.7' into 10.5 2022-01-28 11:57:52 +01:00
Vladislav Vaintroub
be1d965384 MDEV-27373 wolfSSL 5.1.1
- compile wolfcrypt with kdf.c, to avoid undefined symbols in tls13.c
- define WOLFSSL_HAVE_ERROR_QUEUE to avoid endless loop SSL_get_error
- Do not use SSL_CTX_set_tmp_dh/get_dh2048, this would require additional
  compilation options in WolfSSL. Disable it for WolfSSL build, it works
  without it anyway.
- fix "macro already defined" Windows warning.
2022-01-25 11:19:00 +01:00
Sergei Golubchik
4504e6d14e test cases for MySQL bugs
also fix a comment, and update a macro just in case
2022-01-21 16:02:34 +01:00
Marko Mäkelä
c1d7b4575e MDEV-26870 --skip-symbolic-links does not disallow .isl file creation
The InnoDB DATA DIRECTORY attribute is not implemented via
symbolic links but something similar, *.isl files that contain
the names of data files.

InnoDB failed to ignore the DATA DIRECTORY attribute even though
the server was started with --skip-symbolic-links.

Native ALTER TABLE in InnoDB will retain the DATA DIRECTORY attribute
of the table, no matter if the table will be rebuilt or not.

Generic ALTER TABLE (with ALGORITHM=COPY) as well as TRUNCATE TABLE
will discard the DATA DIRECTORY attribute.

All tests have been run with and without the ./mtr option
--mysqld=--skip-symbolic-links
and some tests that use the InnoDB DATA DIRECTORY attribute
have been adjusted for this.
2022-01-21 14:43:59 +02:00
Alexander Barkov
b915f79e4e MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings 2022-01-21 12:16:07 +04:00
Sergei Golubchik
7b555ff2c5 MDEV-27341 Use SET PASSWORD to change PAM service
SET PASSWORD = PASSWORD('foo') would fail for pam plugin with

ERROR HY000: SET PASSWORD is ignored for users authenticating via pam plugin

but SET PASSWORD = 'foo' would not.

Now it will.
2022-01-17 18:19:29 +01:00
Aleksey Midenkov
4d5ae2b325 MDEV-27217 DELETE partition selection doesn't work for history partitions
LIMIT history switching requires the number of history partitions to
be marked for read: from first to last non-empty plus one empty. The
least we can do is to fail with error message if the needed partition
was not marked for read. As this is handler interface we require new
handler error code to display user-friendly error message.

Switching by INTERVAL works out-of-the-box with
ER_ROW_DOES_NOT_MATCH_GIVEN_PARTITION_SET error.
2022-01-13 23:35:16 +03:00
Vladislav Vaintroub
555a53f10d Windows : fix broken build with OpenSSL 2022-01-09 12:04:22 +01:00
Julius Goryavsky
55bb933a88 Merge branch 10.4 into 10.5 2021-12-26 12:51:04 +01:00
sjaakola
49791cbc6f 10.4-MDEV-27275 CREATE TABLE with FK not safe for PA
This commit contains a fix, where the replication write set for a CREATE TABLE
will contain, as certification keys, table names for all FK references.
With this, all DML for the FK parent tables will conflict with the CREATE TABLE
statement.

There is also new test galera.MDEV-27276 to verify the fix.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-12-20 13:34:54 +02:00
sjaakola
c1846c4fcf MDEV-26803 PA unsafety with FK cascade delete operation
This commit has a mtr test where two two transactions delete a row from
two separate tables, which will cascade a FK delete for the same row in
a third table. Second replica node is configured with 2 applier threads,
and the test will fail if these two transactions are applied in parallel.

The actual fix, in this commit, is to mark a transaction as unsafe for
parallel applying when it traverses into cascade delete operation.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2021-12-17 09:38:23 +02:00
Marko Mäkelä
a8ded39557 Merge 10.4 into 10.5 2021-10-28 08:48:36 +03:00
Marko Mäkelä
3a79e5fd31 Merge 10.3 into 10.4 2021-10-28 08:28:39 +03:00
Marko Mäkelä
657bcf928e Merge 10.2 into 10.3 2021-10-28 07:50:05 +03:00
Marko Mäkelä
44f9736e0b Merge 10.4 into 10.5 2021-10-27 09:48:22 +03:00
Eugene Kosov
d74d95961a MDEV-18543 IMPORT TABLESPACE fails after instant DROP COLUMN
ALTER TABLE IMPORT doesn't properly handle instant alter metadata.
This patch makes IMPORT read, parse and apply instant alter metadata at the
very beginning of operation. So, cases when source table has some metadata
and destination table doesn't have it now works fine.

DISCARD already removes instant metadata so importing normal table into
instant table worked fine before this patch.

decrypt_decompress(): decrypts and decompresses page if needed

handle_instant_metadata(): this should be the first thing to read source
table. Basically, it applies instant metadata to a destination
dict_table_t object. This is the first thing to read FSP flags so
all possible checks of it were moved to this function.

PageConverter::update_index_page(): it doesn't now read instant metadata.
This logic were moved into handle_instant_metadata()

row_import::match_flags(): this is a first part row_import::match_schema().
As a separate function it's used by handle_instant_metadata().

fil_space_t::is_full_crc32_compressed(): added convenient function

ha_innobase::discard_or_import_tablespace(): do not reload table definition
to read instant metadata because handle_instant_metadata() does it better.
The reverted code was originally added in
4e7ee166a9

ANONYMOUS_VAR: this is a handy thing to use along with make_scope_exit()

full_crc32_import.test shows different results, because no
dict_table_close() and dict_table_open_on_id() happens.
Thus, SHOW CREATE TABLE shows a little bit older table definition.
2021-10-26 22:50:58 +06:00
Oleksandr Byelkin
1f70e4b00c pthread_yield() is depricated now, so use sched_yield() if possible. 2021-10-26 15:05:13 +02:00
Marko Mäkelä
99bb3fb656 Merge 10.4 into 10.5 2021-10-13 12:33:56 +03:00
Marko Mäkelä
a736a3174a Merge 10.3 into 10.4 2021-10-13 12:03:32 +03:00
Marko Mäkelä
4a7dfda373 Merge 10.2 into 10.3 2021-10-13 11:38:21 +03:00
Sergei Krivonos
6f32b28be5 Xcode compatibility update 2021-10-12 18:10:56 -04:00
Alexander Barkov
eadd878808 MDEV-23269 SIGSEGV in ft_boolean_check_syntax_string on setting ft_boolean_syntax
The crash happened because my_isalnum() does not support character
sets with mbminlen>1.

The value of "ft_boolean_syntax" is converted to utf8 in do_string_check().
So calling my_isalnum() is combination with "default_charset_info" was wrong.

Adding new parameters (size_t length, CHARSET_INFO *cs) to
ft_boolean_check_syntax_string() and passing self->charset(thd)
as the character set.
2021-10-11 17:43:23 +04:00
SergMariaDB
5e3e5ccbea
Apple Silicon is a 64-bit platform (#1922)
Co-authored-by: FX Coudert <fxcoudert@gmail.com>
2021-10-11 15:26:33 +03:00
Aleksey Midenkov
911c803db1 MDEV-22660 System versioning cleanups
- Cleaned up Vers_parse_info::check_sys_fields();
- Renamed VERS_SYS_START_FLAG, VERS_SYS_END_FLAG to VERS_ROW_START,
  VERS_ROW_END.
2021-10-11 13:36:06 +03:00
Marko Mäkelä
83c4523f03 Merge 10.4 into 10.5 2021-09-24 17:32:50 +03:00
Vladislav Vaintroub
25a5ce367b Merge branch '10.3' into 10.4 2021-09-24 12:29:57 +02:00
Vladislav Vaintroub
ca7046dc19 MDEV-11499 mysqltest, Windows : improve diagnostics if server fails to shutdown
Create minidump when server fails to shutdown. If process is being
debugged, cause a debug break.

Moves some code which is part of safe_kill into mysys, as both safe_kill,
and mysqltest produce minidumps on different timeouts.

Small cleanup in wait_until_dead() - replace inefficient loop with a single
wait.
2021-09-24 11:49:28 +02:00
Marko Mäkelä
699de65d5e Merge 10.4 into 10.5 2021-09-17 19:57:13 +03:00
Jan Lindström
d3b35598fc MDEV-26053 : TRUNCATE on table with Foreign Key Constraint no longer replicated to other nodes
Problem was that there was extra condition !thd->lex->no_write_to_binlog
before call to begin TOI. It seems that this variable is not initialized.
TRUNCATE does not support [NO_WRITE_TO_BINLOG | LOCAL] keywords, thus
we should not check this condition. All this was hidden in a macro,
so I decided to remove those macros that were used only a few places
with actual function calls.
2021-09-17 07:18:37 +03:00
Monty
0d47945b58 Fixed bug in aria_chk that overwrote sort_buffer_length
This bug happens when one runs aria_chk on multiple tables. It does not
affect REPAIR TABLE.
aria_chk tries to optimize the sort buffer size to minimize memory usage
when used with small tables. The bug was that the adjusted value was
used as a base for the next table, which could cause problems.
2021-09-15 21:21:03 +03:00
Monty
f03fee06b0 Improve error messages from Aria
- Error on commit now returns HA_ERR_COMMIT_ERROR instead of
  HA_ERR_INTERNAL_ERROR
- If checkpoint fails, it will now print out where it failed.
2021-09-15 19:27:34 +03:00
Marko Mäkelä
e62120cec7 Merge 10.4 into 10.5 2021-08-31 10:04:56 +03:00
Marko Mäkelä
0464761126 Merge 10.3 into 10.4 2021-08-31 09:22:21 +03:00
Marko Mäkelä
e835cc851e Merge 10.2 into 10.3 2021-08-31 08:36:59 +03:00
Marko Mäkelä
fda704c82c Fix GCC 11 -Wmaybe-uninitialized for PLUGIN_PERFSCHEMA
init_mutex_v1_t: Stop lying that the mutex parameter is const.
GCC 11.2.0 assumes that it is and could complain about any mysql_mutex_t
being uninitialized even after mysql_mutex_init() as long as
PLUGIN_PERFSCHEMA is enabled.

init_rwlock_v1_t, init_cond_v1_t: Remove untruthful const qualifiers.

Note: init_socket_v1_t is expecting that the socket fd has already
been created before PSI_SOCKET_CALL(init_socket), and therefore that
parameter really is being treated as a pointer to const.
2021-08-30 11:52:59 +03:00
Oleksandr Byelkin
ae6bdc6769 Merge branch '10.4' into 10.5 2021-07-31 23:19:51 +02:00
Oleksandr Byelkin
7841a7eb09 Merge branch '10.3' into 10.4 2021-07-31 22:59:58 +02:00
Marko Mäkelä
f50eb0d398 Merge 10.2 into 10.3 2021-07-27 10:47:17 +03:00
Marko Mäkelä
da094188f6 MDEV-24393 InnoDB disregards --skip-external-locking
On POSIX systems, InnoDB would unconditionally acquire advisory locks
on the files that it opens. On Linux, this would be observable by
a large number of entries in /proc/locks.

Other storage engines would only acquire advisory locks on files
based on the Boolean configuration parameter external_locking.

Let InnoDB do the same.

NOTE: The --skip-external-locking is activated by default. To have
InnoDB acquire advisory locks, --external-locking must be specified.

Reviewed by: Sergei Golubchik
2021-07-27 08:52:59 +03:00
Michael Okoko
6cd3588f0e Improve documentation of json parser functions
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2021-07-22 21:51:49 +03:00
Marko Mäkelä
759deaa0a2 MDEV-26010 fixup: Use acquire/release memory order
In commit 5f22511e35 we depend on
Total Store Ordering. For correct operation on ISAs that implement
weaker memory ordering, we must explicitly use release/acquire stores
and loads on buf_page_t::oldest_modification_ to prevent a race condition
when buf_page_t::list does not happen to be on the same cache line.

buf_page_t::clear_oldest_modification(): Assert that the block is
not in buf_pool.flush_list, and use std::memory_order_release.

buf_page_t::oldest_modification_acquire(): Read oldest_modification_
with std::memory_order_acquire. In this way, if the return value is 0,
the caller may safely assume that it will not observe the buf_page_t
as being in buf_pool.flush_list, even if it is not holding
buf_pool.flush_list_mutex.

buf_flush_relocate_on_flush_list(), buf_LRU_free_page():
Invoke buf_page_t::oldest_modification_acquire().
2021-06-26 11:16:40 +03:00
Marko Mäkelä
a42c80bd48 Merge 10.4 into 10.5 2021-06-21 14:22:22 +03:00
Vladislav Vaintroub
b81803f065 MDEV-22221: MariaDB with WolfSSL doesn't support AES-GCM cipher for SSL
Enable AES-GCM for SSL (only).

AES-GCM for encryption plugins remains disabled (aes-t fails, on some bug
in GCM or CTR padding)
2021-06-09 15:44:55 +02:00
Krunal Bauskar
2a4f72b7d2 MDEV-25807: ARM build failure due to missing ISB instruction on ARMv6
Debian has support for 3 different arm machine and probably one which is
failing is a pretty old version which doesn't support the said instruction.

So we can make it specific to _aarch64_ (as per the arm official reference
manual ISB should be defined by all ARMv8 processors). [as per the ARMv7 specs
even 32 bits processor should support it but not sure which exact version
Debian has under armel].

In either case, the said performance issue will have an impact mainly with a
high-end processors with extreme parallelism so it safe to limit it to _aarch64_.

Corrects: MDEV-24630
2021-06-01 13:51:39 +10:00
Sergei Golubchik
0b116d160a 5.7.34 2021-05-03 11:22:07 +02:00
Marko Mäkelä
80ed136e6d Merge 10.4 into 10.5 2021-04-21 09:01:01 +03:00
Monty
031f11717d Fix all warnings given by UBSAN
The easiest way to compile and test the server with UBSAN is to run:
./BUILD/compile-pentium64-ubsan
and then run mysql-test-run.
After this commit, one should be able to run this without any UBSAN
warnings. There is still a few compiler warnings that should be fixed
at some point, but these do not expose any real bugs.

The 'special' cases where we disable, suppress or circumvent UBSAN are:
- ref10 source (as here we intentionally do some shifts that UBSAN
  complains about.
- x86 version of optimized int#korr() methods. UBSAN do not like unaligned
  memory access of integers.  Fixed by using byte_order_generic.h when
  compiling with UBSAN
- We use smaller thread stack with ASAN and UBSAN, which forced me to
  disable a few tests that prints the thread stack size.
- Verifying class types does not work for shared libraries. I added
  suppression in mysql-test-run.pl for this case.
- Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is
  safe to have overflows (two cases, in item_func.cc).

Things fixed:
- Don't left shift signed values
  (byte_order_generic.h, mysqltest.c, item_sum.cc and many more)
- Don't assign not non existing values to enum variables.
- Ensure that bool and enum values are properly initialized in
  constructors.  This was needed as UBSAN checks that these types has
  correct values when one copies an object.
  (gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...)
- Ensure we do not called handler functions on unallocated objects or
  deleted objects.
  (events.cc, sql_acl.cc).
- Fixed bugs in Item_sp::Item_sp() where we did not call constructor
  on Query_arena object.
- Fixed several cast of objects to an incompatible class!
  (Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc,
   sql_select.cc ...)
- Ensure we do not do integer arithmetic that causes over or underflows.
  This includes also ++ and -- of integers.
  (Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...)
- Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that
  value_type is initialized to this instead of to -1, which is not a valid
  enum value for json_value_types.
- Ensure we do not call memcpy() when second argument could be null.
- Fixed that Item_func_str::make_empty_result() creates an empty string
  instead of a null string (safer as it ensures we do not do arithmetic
  on null strings).

Other things:

- Changed struct st_position to an OBJECT and added an initialization
  function to it to ensure that we do not copy or use uninitialized
  members. The change to a class was also motived that we used "struct
  st_position" and POSITION randomly trough the code which was
  confusing.
- Notably big rewrite in sql_acl.cc to avoid using deleted objects.
- Changed in sql_partition to use '^' instead of '-'. This is safe as
  the operator is either 0 or 0x8000000000000000ULL.
- Added check for select_nr < INT_MAX in JOIN::build_explain() to
  avoid bug when get_select() could return NULL.
- Reordered elements in POSITION for better alignment.
- Changed sql_test.cc::print_plan() to use pointers instead of objects.
- Fixed bug in find_set() where could could execute '1 << -1'.
- Added variable have_sanitizer, used by mtr.  (This variable was before
  only in 10.5 and up).  It can now have one of two values:
  ASAN or UBSAN.
- Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked
  it virtual. This was an effort to get UBSAN to work with loaded storage
  engines. I kept the change as the new place is better.
- Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast
  in tabutil.cpp.
- Added HAVE_REPLICATION around usage of rgi_slave, to get embedded
  server to compile with UBSAN. (Patch from Marko).
- Added #ifdef for powerpc64 to avoid a bug in old gcc versions related
  to integer arithmetic.

Changes that should not be needed but had to be done to suppress warnings
from UBSAN:

- Added static_cast<<uint16_t>> around shift to get rid of a LOT of
  compiler warnings when using UBSAN.
- Had to change some '/' of 2 base integers to shift to get rid of
  some compile time warnings.

Reviewed by:
- Json changes: Alexey Botchkov
- Charset changes in ctype-uca.c: Alexander Barkov
- InnoDB changes & Embedded server: Marko Mäkelä
- sql_acl.cc changes: Vicențiu Ciorbaru
- build_explain() changes: Sergey Petrunia
2021-04-20 12:30:09 +03:00
Marko Mäkelä
7b48da4d7e Merge 10.4 into 10.5 2021-04-08 07:47:49 +03:00
Daniel Black
f69c1c9dcb MDEV-19508: SI_KERNEL is not on all implementations
SI_USER is, however in FreeBSD there are a couple of non-kernel
user signal infomations above SI_KERNEL.

Put a fallback just in case there is nothing available.
2021-04-07 14:01:56 +10:00
Monty
81258f1432 MDEV-17913 Encrypted transactional Aria tables remain corrupt after crash recovery, automatic repairment does not work
This was because of a wrong test in encryption code that wrote random
numbers over the LSN for pages for transactional Aria tables during repair.
The effect was that after an ALTER TABLE ENABLE KEYS of a encrypted
recovery of the tables would not work.

Fixed by changing testing of !share->now_transactional to
!share->base.born_transactional.

Other things:
- Extended Aria check_table() to check for wrong (= too big) LSN numbers.
- If check_table() failed just because of wrong LSN or TRN numbers,
  a following repair table will just do a zerofill which is much faster.
- Limit number of LSN errors in one check table to MAX_LSN_ERROR (10).
- Removed old obsolete test of 'if (error_count & 2)'. Changed error_count
  and warning_count from bits to numbers of errors/warnings as this is
  more useful.
2021-04-06 14:57:22 +03:00
Krunal Bauskar
76d2846a71 MDEV-24630: MY_RELAX_CPU assembly instruction upgrade/research for
memory barrier on ARM

As suggested in the said JIRA ticket based on the contribution done by
the community (in an attempt to optimize the spin-loop) the said approach
was evaluated against MariaDB Server 10.5 and found to help improve
throughput in the range of 2-5%.

Note: 10.6 timing graph and model are different as home-brew
mutexes are replaced with pthread mutexes. Said patch has mixed
impact on 10.6 so not recommended for 10.6.
2021-03-30 13:26:19 +03:00
Marko Mäkelä
8c2e3259c1 MDEV-24302 follow-up: RESET MASTER hangs
As pointed out by Andrei Elkin, the previous fix did not fix one
race condition that may have caused the observed hang.

innodb_log_flush_request(): If we are enqueueing the very first
request at the same time the log write is being completed,
we must ensure that a near-concurrent call to log_flush_notify()
will not result in a missed notification. We guarantee this by
release-acquire operations on log_requests.start and
log_sys.flushed_to_disk_lsn.

log_flush_notify_and_unlock(): Cleanup: Always release the mutex.

log_sys_t::get_flushed_lsn(): Use acquire memory order.

log_sys_t::set_flushed_lsn(): Use release memory order.

log_sys_t::set_lsn(): Use release memory order.

log_sys_t::get_lsn(): Use relaxed memory order by default, and
allow the caller to specify acquire memory order explicitly.
Whenever the log_sys.mutex is being held or when log writes are
prohibited during startup, we can use a relaxed load. Likewise,
in some assertions where reading a stale value of log_sys.lsn
should not matter, we can use a relaxed load.

This will cause some additional instructions to be emitted on
architectures that do not implement Total Store Ordering (TSO),
such as POWER, ARM, and RISC-V Weak Memory Ordering (RVWMO).
2021-03-30 10:29:11 +03:00
Daniel Black
bcb9ca4105 MEM_CHECK_DEFINED: replace HAVE_valgrind
HAVE_valgrind_or_MSAN to HAVE_valgrind was incorrect in
af784385b4.

In my_valgrind.h when clang exists (hence no __has_feature(memory_sanitizer),
and -DWITH_VALGRIND=1, but without memcheck.h, we end up with a MEM_CHECK_DEFINED
being empty.

If we are also doing a CMAKE_BUILD_TYPE=Debug this results a number of
[-Werror,-Wunused-variable] errors because MEM_CHECK_DEFINED is empty.
With MEM_CHECK_DEFINED empty, there becomes no uses of this of the
fixed field and innodb variables in this patch.

So we stop using HAVE_valgrind as catchall and use the name
HAVE_CHECK_MEM to indicate that a CHECK_MEM_DEFINED function exists.

Reviewer: Monty

Corrects: af784385b4
2021-03-26 07:58:49 +11:00
Marko Mäkelä
be881ec457 Merge 10.4 into 10.5 2021-03-19 13:09:21 +02:00
Marko Mäkelä
44d70c01f0 Merge 10.3 into 10.4 2021-03-19 11:42:44 +02:00
Marko Mäkelä
19052b6deb Merge 10.2 into 10.3 2021-03-18 12:34:48 +02:00
Vladislav Vaintroub
031b3dfc22 MDEV-25123 support MSVC ASAN 2021-03-12 08:44:55 +01:00
Sergei Golubchik
2c0b3141f3 update wsrep service version after 7345d37141 2021-03-08 14:54:05 +01:00
Julius Goryavsky
7345d37141 MDEV-24853: Duplicate key generated during cluster configuration change
Incorrect processing of an auto-incrementing field in the
WSREP-related code during applying transactions results in
a duplicate key being created. This is due to the fact that
at the beginning of the write_row() and update_row() functions,
the values of the auto-increment parameters are used, which
are read from the parameters of the current thread, but further
along the code other values are used, which are read from global
variables (when applying a transaction). This can happen when
the cluster configuration has changed while applying a transaction
(for example in the high_priority_service mode for Galera 4).
Further during IST processing duplicating key is detected, and
processing of the DB_DUPLICATE_KEY return code (inside innodb,
in the write_row() handler) results in a call to the
wsrep_thd_self_abort() function.
2021-03-08 11:15:08 +01:00
Marko Mäkelä
10d544aa7b Merge 10.4 into 10.5 2021-03-05 12:54:43 +02:00
Marko Mäkelä
fcc9f8b10c Remove unused HA_EXTRA_FAKE_START_STMT
This is fixup for commit f06a0b5338.
2021-03-05 10:40:16 +02:00
Daniel Black
e0ba68ba34 MDEV-23510: arm64 lf_hash alignment of pointers
Like the 10.2 version 1635686b50,
except C++ on internal functions for my_assume_aligned.

volatile != atomic.

volatile has no memory barrier schemantics, its for mmaped IO
so lets allow some optimizer gains and stop pretending it helps
with memory atomicity.

The MDEV lists a SEGV an assumption is made that an address was
partially read. As C packs structs strictly in order and on arm64 the
cache line size is 128 bits. A pointer (link - 64 bits), followed
by a hashnr (uint32 - 32 bits), leaves the following key (uchar *
64 bits), neither naturally aligned to any pointer and worse, split
across a cache line which is the processors view of an atomic
reservation of memory.

lf_dynarray_lvalue is assumed to return a 64 bit aligned address.

As a solution move the 32bit hashnr to the end so we don't get the
*key pointer split across two cache lines.

Tested by: Krunal Bauskar
Reviewer: Marko Mäkelä
2021-02-25 10:06:15 +11:00
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Monty
bd5ac03896 Make maria_data_root const char*
This allow one to remove some casts like:
maria_data_root= (char *)".";

It also removes warnings from icc.
2021-02-08 12:16:29 +02:00
Monty
5d6ad2ad66 Added 'const' to arguments in get_one_option and find_typeset()
One should not change the program arguments!
This change also reduces warnings from the icc compiler.

Almost all changes are just syntax changes (adding const to
'get_one_option function' declarations).

Other changes:
- Added a few cast of 'argument' from 'const char*' to 'char *'. This
  was mainly in calls to 'external' functions we don't have control of.
- Ensure that all reset of 'password command line argument' are similar.
  (In almost all cases it was just adding a comment and a cast)
- In mysqlbinlog.cc and mysqld.cc there was a few cases that changed
  the command line argument. These places where changed to instead allocate
  the option in a MEM_ROOT to avoid changing the argument. Some of this
  code was changed to ensure that different programs did parsing the
  same way. Added a test case for the changes in mysqlbinlog.cc
- Changed a few variables that took their value from command line options
  from 'char *' to 'const char *'.
2021-02-08 12:16:29 +02:00
Sergei Golubchik
2676c9aad7 galera fixes related to THD::LOCK_thd_kill
Since 2017 (c2118a08b1) THD::awake() no longer requires LOCK_thd_data.
It uses LOCK_thd_kill, and this latter mutex is used to prevent
a thread of dying, not LOCK_thd_data as before.
2021-02-02 10:02:17 +01:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
David CARLIER
b1241585b2
Mac M1 build support proposal (minus rocksdb option) (#1743) 2021-01-30 17:04:27 +02:00
FX Coudert
52795c2f78 Apple Silicon is a 64-bit platform 2021-01-30 16:35:46 +02:00
Sergei Golubchik
a216672dab MDEV-16341 Wrong length for USER columns in performance_schema tables
use USERNAME_CHAR_LENGTH and HOSTNAME_LENGTH for perfschema
USER and HOST columns
2021-01-11 21:54:48 +01:00
Oleksandr Byelkin
02e7bff882 Merge commit '10.4' into 10.5 2021-01-06 10:53:00 +01:00
Marko Mäkelä
c1a7a82bca WolfSSL v4.6.0-stable 2021-01-02 11:56:41 +02:00
Marko Mäkelä
c1f0afb102 WolfSSL v4.6.0-stable 2021-01-01 19:15:46 +02:00
Oleksandr Byelkin
478b83032b Merge branch '10.3' into 10.4 2020-12-25 09:13:28 +01:00
Oleksandr Byelkin
25561435e0 Merge branch '10.2' into 10.3 2020-12-23 19:28:02 +01:00
Etienne Guesnet
2c7247622a AIX workaround for GCC TOC bug 2020-12-16 08:07:04 +11:00
Etienne Guesnet
2f5d372444 Add build on AIX 2020-12-16 08:07:04 +11:00
Sergei Golubchik
e189faf0b3 document that a fulltext parser plugin can replace mysql_add_word callback 2020-12-10 08:45:20 +01:00
Eugene Kosov
a50cb4867a MDEV-24334 make monitor_set_tbl global variable thread-safe
Atomic_relaxed<T>: add fetch_or() and fetch_and()

innodb_init(): rely on a zero-initialization of a global variable

monitor_set_tbl: make Atomic_relaxed<ulint> array and use proper operations
for setting bit, unsetting bit and reading bit

Reviewed by: Marko Mäkelä
2020-12-03 11:55:36 +03:00
Marko Mäkelä
f146969fb3 MDEV-22929 fixup: root_name() clash with clang++ <fstream>
The clang++ -stdlib=libc++ header file <fstream> depends on
<filesystem> that defines a member function path::root_name(),
which conflicts with the rather unused #define root_name()
that had been introduced in
commit 7c58e97bf6.

Because an instrumented -stdlib=libc++ (rather than the default
-stdlib=libstdc++) is easier to build for a working -fsanitize=memory
(cmake -DWITH_MSAN=ON), let us remove the conflicting #define for now.
2020-12-03 07:45:48 +02:00
Daniel Black
8cc5d2845c MDEV-24125: linux large pages - Revert "Fixed centos 6 build failure"
This reverts commit 6cf8f05fd9.

Original patch assumed that MAP_HUGETLB as consistent across
achitectures which isn't the case. Defining it unconditionally
broke large pages on every achitecutre where the value differed
from x86_64.

With the EOL for Centos/RHEL6 announced in 10.5.7, <3.8 linux
kernels are no longer supported.
2020-11-17 07:53:55 +11:00
Marko Mäkelä
d7a5824899 Merge 10.4 into 10.5 2020-11-13 21:54:21 +02:00
sjaakola
ad432ef4c0 MDEV-24119 MDL BF-BF Conflict caused by TRUNCATE TABLE
This PR fixes same issue as MDEV-21577 for TRUNCATE TABLE.
MDEV-21577 fixed TOI replication for OPTIMIZE, REPAIR and ALTER TABLE
operating on FK child table. It was later found out that also TRUNCATE
has similar problem and needs a fix.

The actual fix is to do FK parent table lookup before TRUNCATE TOI
isolation and append found FK parent table names in certification key
list for the write set.

PR contains also new test scenario in galera_ddl_fk_conflict test where
FK child has two FK parent tables and there are two DML transactions operating
on both parent tables.

For development convenience, new TO isolation macro was added:
WSREP_TO_ISOLATION_BEGIN_IF and WSREP_TO_ISOLATION_BEGIN_ALTER macro was changed
to skip the goto statement.

Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2020-11-11 17:46:50 +02:00
Marko Mäkelä
4cbfdeca84 MDEV-24109 InnoDB hangs with innodb_flush_sync=OFF
MDEV-23855 broke the handling of innodb_flush_sync=OFF.
That parameter is supposed to limit the page write rate
in case the log capacity is being exceeded and log checkpoints
are needed.

With this fix, the following should pass:
./mtr --mysqld=--loose-innodb-flush-sync=0

One of our best regression tests for page flushing is
encryption.innochecksum. With innodb_page_size=16k and
innodb_flush_sync=OFF it would likely hang without this fix.

log_sys.last_checkpoint_lsn: Declare as Atomic_relaxed<lsn_t>
so that we are allowed to read the value while not holding
log_sys.mutex.

buf_flush_wait_flushed(): Let the page cleaner perform the flushing
also if innodb_flush_sync=OFF. After the page cleaner has
completed, perform a checkpoint if it is needed, because
buf_flush_sync_for_checkpoint() will not be run if
innodb_flush_sync=OFF.

buf_flush_ahead(): Simplify the condition. We do not really care
whether buf_flush_page_cleaner() is running.

buf_flush_page_cleaner(): Evaluate innodb_flush_sync at the low
level. If innodb_flush_sync=OFF, rate-limit the batches to
innodb_io_capacity_max pages per second.

Reviewed by: Vladislav Vaintroub
2020-11-04 16:55:36 +02:00
sjaakola
4d6c661144 MDEV-21577 MDL BF-BF conflict
Some DDL statements appear to acquire MDL locks for a table referenced by
foreign key constraint from the actual affected table of the DDL statement.
OPTIMIZE, REPAIR and ALTER TABLE belong to this class of DDL statements.

Earlier MariaDB version did not take this in consideration, and appended
only affected table in the certification key list in write set.
Because of missing certification information, it could happen that e.g.
OPTIMIZE table for FK child table could be allowed to apply in parallel
with DML operating on the foreign key parent table, and this could lead to
unhandled MDL lock conflicts between two high priority appliers (BF).

The fix in this patch, changes the TOI replication for OPTIMIZE, REPAIR and
ALTER TABLE statements so that before the execution of respective DDL
statement, there is foreign key parent search round. This FK parent search
contains following steps:
* open and lock the affected table (with permissive shared locks)
* iterate over foreign key contstraints and collect and array of Fk parent
  table names
* close all tables open for the THD and release MDL locks
* do the actual TOI replication with the affected table and FK parent
  table names as key values

The patch contains also new mtr test for verifying that the above mentioned
DDL statements replicate without problems when operating on FK child table.
The mtr test scenario #1, which can be used to check if some other DDL
(on top of OPTIMIZE, REPAIR and ALTER) could cause similar excessive FK
parent table locking.

Reviewed-by: Aleksey Midenkov <aleksey.midenkov@mariadb.com>
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
2020-11-03 19:40:06 +02:00
Marko Mäkelä
133b4b46fe Merge 10.4 into 10.5 2020-11-03 16:24:47 +02:00
Marko Mäkelä
533a13af06 Merge 10.3 into 10.4 2020-11-03 14:49:17 +02:00
Marko Mäkelä
c7f322c91f Merge 10.2 into 10.3 2020-11-02 15:48:47 +02:00
Marko Mäkelä
8036d0a359 MDEV-22387: Do not violate __attribute__((nonnull))
This follows up commit
commit 94a520ddbe and
commit 7c5519c12d.

After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
2020-11-02 14:19:21 +02:00
Marko Mäkelä
898521e2dd Merge 10.4 into 10.5 2020-10-30 11:15:30 +02:00
Vicențiu Ciorbaru
76fabe816f Expose utf8mb4_bin charset for plugins
Cleanup other linker errors
2020-10-29 15:01:33 +02:00
Marko Mäkelä
7b2bb67113 Merge 10.3 into 10.4 2020-10-29 13:38:38 +02:00
Marko Mäkelä
a8de8f261d Merge 10.2 into 10.3 2020-10-28 10:01:50 +02:00
Varun Gupta
b94e8e4b25 MDEV-23867: insert... select crash in compute_window_func
There are 2 issues here:

Issue #1: memory allocation.
An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data,
decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc).

Issue #2: IO_CACHE::seek_not_done
When IO_CACHE objects are cloned, they still share the file descriptor.
This means, operation on one IO_CACHE may change the file read position
which will confuse other IO_CACHEs using it.

The fix of these issues would be:
Allocate the buffer to also include the extra size needed for encryption.
Perform seek again after one IO_CACHE reads the file.
2020-10-23 22:36:47 +05:30
Sergei Golubchik
e864e6b181 compilation failure with new C/C
define symbols as C/C does to avoid "macro redefined" warnings
2020-10-23 15:53:41 +02:00
Marko Mäkelä
7cffb5f6e8 MDEV-23399: Performance regression with write workloads
The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted
the performance bottleneck to the page flushing.

The configuration parameters will be changed as follows:

innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction)
innodb_lru_scan_depth=1536 (old: 1024)
innodb_max_dirty_pages_pct=90 (old: 75)
innodb_max_dirty_pages_pct_lwm=75 (old: 0)

Note: The parameter innodb_lru_scan_depth will only affect LRU
eviction of buffer pool pages when a new page is being allocated. The
page cleaner thread will no longer evict any pages. It used to
guarantee that some pages will remain free in the buffer pool. Now, we
perform that eviction 'on demand' in buf_LRU_get_free_block().
The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows:
 * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks()
 * As a buf_pool.free limit in buf_LRU_list_batch() for terminating
   the flushing that is initiated e.g., by buf_LRU_get_free_block()
The parameter also used to serve as an initial limit for unzip_LRU
eviction (evicting uncompressed page frames while retaining
ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit
of 100 or unlimited for invoking buf_LRU_scan_and_free_block().

The status variables will be changed as follows:

innodb_buffer_pool_pages_flushed: This includes also the count of
innodb_buffer_pool_pages_LRU_flushed and should work reliably,
updated one by one in buf_flush_page() to give more real-time
statistics. The function buf_flush_stats(), which we are removing,
was not called in every code path. For both counters, we will use
regular variables that are incremented in a critical section of
buf_pool.mutex. Note that show_innodb_vars() directly links to the
variables, and reads of the counters will *not* be protected by
buf_pool.mutex, so you cannot get a consistent snapshot of both variables.

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be
removed, because the page cleaner no longer deals with writing or
evicting least recently used pages, and because the single-page writes
have been removed:
* buffer_LRU_batch_flush_avg_time_slot
* buffer_LRU_batch_flush_avg_time_thread
* buffer_LRU_batch_flush_avg_time_est
* buffer_LRU_batch_flush_avg_pass
* buffer_LRU_single_flush_scanned
* buffer_LRU_single_flush_num_scan
* buffer_LRU_single_flush_scanned_per_call

When moving to a single buffer pool instance in MDEV-15058, we missed
some opportunity to simplify the buf_flush_page_cleaner thread. It was
unnecessarily using a mutex and some complex data structures, even
though we always have a single page cleaner thread.

Furthermore, the buf_flush_page_cleaner thread had separate 'recovery'
and 'shutdown' modes where it was waiting to be triggered by some
other thread, adding unnecessary latency and potential for hangs in
relatively rarely executed startup or shutdown code.

The page cleaner was also running two kinds of batches in an
interleaved fashion: "LRU flush" (writing out some least recently used
pages and evicting them on write completion) and the normal batches
that aim to increase the MIN(oldest_modification) in the buffer pool,
to help the log checkpoint advance.

The buf_pool.flush_list flushing was being blocked by
buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN
of a page is ahead of log_sys.get_flushed_lsn(), that is, what has
been persistently written to the redo log, we would trigger a log
flush and then resume the page flushing. This would unnecessarily
limit the performance of the page cleaner thread and trigger the
infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms.
The settings might not be optimal" that were suppressed in
commit d1ab89037a unless log_warnings>2.

Our revised algorithm will make log_sys.get_flushed_lsn() advance at
the start of buf_flush_lists(), and then execute a 'best effort' to
write out all pages. The flush batches will skip pages that were modified
since the log was written, or are are currently exclusively locked.
The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message
will be removed, because by design, the buf_flush_page_cleaner() should
not be blocked during a batch for extended periods of time.

We will remove the single-page flushing altogether. Related to this,
the debug parameter innodb_doublewrite_batch_size will be removed,
because all of the doublewrite buffer will be used for flushing
batches. If a page needs to be evicted from the buffer pool and all
100 least recently used pages in the buffer pool have unflushed
changes, buf_LRU_get_free_block() will execute buf_flush_lists() to
write out and evict innodb_lru_flush_size pages. At most one thread
will execute buf_flush_lists() in buf_LRU_get_free_block(); other
threads will wait for that LRU flushing batch to finish.

To improve concurrency, we will replace the InnoDB ib_mutex_t and
os_event_t native mutexes and condition variables in this area of code.
Most notably, this means that the buffer pool mutex (buf_pool.mutex)
is no longer instrumented via any InnoDB interfaces. It will continue
to be instrumented via PERFORMANCE_SCHEMA.

For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be
declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical
sections of buf_pool.flush_list_mutex should be shorter than those for
buf_pool.mutex, because in the worst case, they cover a linear scan of
buf_pool.flush_list, while the worst case of a critical section of
buf_pool.mutex covers a linear scan of the potentially much longer
buf_pool.LRU list.

mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable
with SAFE_MUTEX. Some InnoDB debug assertions need this predicate
instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner().

buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list:
Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[].
The number of active flush operations.

buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t
instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA
and SAFE_MUTEX instrumentation.

buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU.

buf_pool_t::done_flush_list: Condition variable for !n_flush_list.

buf_pool_t::do_flush_list: Condition variable to wake up the
buf_flush_page_cleaner when a log checkpoint needs to be written
or the server is being shut down. Replaces buf_flush_event.
We will keep using timed waits (the page cleaner thread will wake
_at least_ once per second), because the calculations for
innodb_adaptive_flushing depend on fixed time intervals.

buf_dblwr: Allocate statically, and move all code to member functions.
Use a native mutex and condition variable. Remove code to deal with
single-page flushing.

buf_dblwr_check_block(): Make the check debug-only. We were spending
a significant amount of execution time in page_simple_validate_new().

flush_counters_t::unzip_LRU_evicted: Remove.

IORequest: Make more members const. FIXME: m_fil_node should be removed.

buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex
(which we are removing).

page_cleaner_slot_t, page_cleaner_t: Remove many redundant members.

pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot().

recv_writer_thread: Remove. Recovery works just fine without it, if we
simply invoke buf_flush_sync() at the end of each batch in
recv_sys_t::apply().

recv_recovery_from_checkpoint_finish(): Remove. We can simply call
recv_sys.debug_free() directly.

srv_started_redo: Replaces srv_start_state.

SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown()
can communicate with the normal page cleaner loop via the new function
flush_buffer_pool().

buf_flush_remove(): Assert that the calling thread is holding
buf_pool.flush_list_mutex. This removes unnecessary mutex operations
from buf_flush_remove_pages() and buf_flush_dirty_pages(),
which replace buf_LRU_flush_or_remove_pages().

buf_flush_lists(): Renamed from buf_flush_batch(), with simplified
interface. Return the number of flushed pages. Clarified comments and
renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions
buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this
function, which was their only caller, and remove 2 unnecessary
buf_pool.mutex release/re-acquisition that we used to perform around
the buf_flush_batch() call. At the start, if not all log has been
durably written, wait for a background task to do it, or start a new
task to do it. This allows the log write to run concurrently with our
page flushing batch. Any pages that were skipped due to too recent
FIL_PAGE_LSN or due to them being latched by a writer should be flushed
during the next batch, unless there are further modifications to those
pages. It is possible that a page that we must flush due to small
oldest_modification also carries a recent FIL_PAGE_LSN or is being
constantly modified. In the worst case, all writers would then end up
waiting in log_free_check() to allow the flushing and the checkpoint
to complete.

buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_flush_space(): Auxiliary function to look up a tablespace for
page flushing.

buf_flush_page(): Defer the computation of space->full_crc32(). Never
call log_write_up_to(), but instead skip persistent pages whose latest
modification (FIL_PAGE_LSN) is newer than the redo log. Also skip
pages on which we cannot acquire a shared latch without waiting.

buf_flush_try_neighbors(): Do not bother checking buf_fix_count
because buf_flush_page() will no longer wait for the page latch.
Take the tablespace as a parameter, and only execute this function
when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold().

buf_flush_relocate_on_flush_list(): Declare as cold, and push down
a condition from the callers.

buf_flush_check_neighbor(): Take id.fold() as a parameter.

buf_flush_sync(): Ensure that the buf_pool.flush_list is empty,
because the flushing batch will skip pages whose modifications have
not yet been written to the log or were latched for modification.

buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables.

buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize
the counters, and report n->evicted.
Cache the last looked up tablespace. If neighbor flushing is not applicable,
invoke buf_flush_page() directly, avoiding a page lookup in between.

buf_do_LRU_batch(): Return the number of pages flushed.

buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if
adaptive hash index entries are pointing to the block.

buf_LRU_get_free_block(): Do not wake up the page cleaner, because it
will no longer perform any useful work for us, and we do not want it
to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0)
writes out and evicts at most innodb_lru_flush_size pages. (The
function buf_do_LRU_batch() may complete after writing fewer pages if
more than innodb_lru_scan_depth pages end up in buf_pool.free list.)
Eliminate some mutex release-acquire cycles, and wait for the LRU
flush batch to complete before rescanning.

buf_LRU_check_size_of_non_data_objects(): Simplify the code.

buf_page_write_complete(): Remove the parameter evict, and always
evict pages that were part of an LRU flush.

buf_page_create(): Take a pre-allocated page as a parameter.

buf_pool_t::free_block(): Free a pre-allocated block.

recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block
while not holding recv_sys.mutex. During page allocation, we may
initiate a page flush, which in turn may initiate a log flush, which
would require acquiring log_sys.mutex, which should always be acquired
before recv_sys.mutex in order to avoid deadlocks. Therefore, we must
not be holding recv_sys.mutex while allocating a buffer pool block.

BtrBulk::logFreeCheck(): Skip a redundant condition.

row_undo_step(): Do not invoke srv_inc_activity_count() for every row
that is being rolled back. It should suffice to invoke the function in
trx_flush_log_if_needed() during trx_t::commit_in_memory() when the
rollback completes.

sync_check_enable(): Remove. We will enable innodb_sync_debug from the
very beginning.

Reviewed by: Vladislav Vaintroub
2020-10-15 17:04:56 +03:00
Marko Mäkelä
882ce206db Merge 10.4 into 10.5 2020-09-23 11:32:43 +03:00
Marko Mäkelä
3a423088ac Merge 10.3 into 10.4 2020-09-21 12:29:00 +03:00
Marko Mäkelä
cbcb4ecabb Merge 10.2 into 10.3 2020-09-21 11:04:04 +03:00
Vladislav Vaintroub
ccbe6bb6fc MDEV-19935 Create unified CRC-32 interface
Add CRC32C code to mysys. The x86-64 implementation uses PCMULQDQ in addition to CRC32 instruction
after Intel whitepaper, and is ported from rocksdb code.

Optimized ARM and POWER CRC32 were already present in mysys.
2020-09-17 16:07:37 +02:00
Jan Lindström
224c950462 MDEV-23101 : SIGSEGV in lock_rec_unlock() when Galera is enabled
Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue
and move condition to correct callers. Add a function to report
BF lock waits and assert if incorrect BF-BF lock wait happens.

wsrep_report_bf_lock_wait
	Add a new function to report BF lock wait.

wsrep_assert_no_bf_bf_wait
	Add a new function to check do we have a
	BF-BF wait and if we have report this case
	and assert as it is a bug.

lock_rec_has_to_wait
	Use new wsrep_assert_bf_wait to check BF-BF wait.

lock_rec_create_low
lock_table_create
	Use new function to report BF lock waits.

lock_rec_insert_by_trx_age
lock_grant_and_move_on_page
lock_grant_and_move_on_rec
	Assert that trx is not Galera as VATS is not compatible
	with Galera.

lock_rec_add_to_queue
	If there is conflicting lock in a queue make sure that
	transaction is BF.

lock_rec_has_to_wait_in_queue
	Remove incorrect BF handling. If there is conflicting
	locks in a queue all transactions must wait.

lock_rec_dequeue_from_page
lock_rec_unlock
	If there is conflicting lock make sure it is not
	BF-BF case.

lock_rec_queue_validate
	Add Galera record locking rules comment and use
	new function to report BF lock waits.

All attempts to reproduce the original assertion have been
failed. Therefore, there is no test case on this commit.
2020-09-10 13:18:12 +03:00
Marko Mäkelä
5ff7e68c7e Merge 10.4 into 10.5 2020-09-04 18:44:44 +03:00
Marko Mäkelä
b0c194cab4 MDEV-23633 fixup: Add missing semicolon 2020-09-04 11:40:17 +03:00
Marko Mäkelä
24f510bba4 MDEV-23633 MY_RELAX_CPU performs unnecessary compare-and-swap on ARM
This follows up MDEV-14374, which was filed against MariaDB Server 10.3.
Back then, on a 48-core Qualcomm Centriq 2400, the performance of
delay loops for spinloops was tested both with and without the dummy
compare-and-swap operation, and it was decided to keep the dummy
operation.

On target architectures where nothing special is available (other than
x86 (IA-32, AMD64) or POWER), we perform a dummy compare-and-swap operation.
This is contrary to the idea of the x86 PAUSE instruction and the
__ppc_get_timebase(), which aim to keep the memory bus idle for a while,
to allow other cores to better execute code while a spinloop is waiting
for something to be changed.

On MariaDB Server 10.4 and another implementation of the ARMv8 ISA,
omitting the dummy compare-and-swap improved performance by up to 12%.
So, let us avoid the dummy compare-and-swap on ARM.

For now, we are retaining the dummy compare-and-swap on other ISAs
(such as SPARC, MIPS, S390x, RISC-V) because we do not have any
performance data for them.
2020-09-04 10:31:41 +03:00
Marko Mäkelä
1cda462f46 Merge 10.3 into 10.4 2020-09-03 16:55:14 +03:00
Yuqi Gu
151fc0ed88 MDEV-23495: Refine Arm64 PMULL runtime check in MariaDB
Raspberry Pi 4 supports crc32 but doesn't support pmull (MDEV-23030).

The PR #1645 offers a solution to fix this issue. But it does not consider
the condition that the target platform does support crc32 but not support PMULL.

In this condition, it should leverage the Arm64 crc32 instruction (__crc32c) and
just only skip parallel computation (pmull/vmull) rather than skip all hardware
crc32 instruction of computation.

The PR also removes unnecessary CRC32_ZERO branch in 'crc32c_aarch64' for MariaDB,
formats the indent and coding style.

Change-Id: I76371a6bd767b4985600e8cca10983d71b7e9459
Signed-off-by: Yuqi Gu <yuqi.gu@arm.com>
2020-08-21 20:41:35 +03:00
Monty
3ef65f2783 Added DBUG_PUSH_EMPTY and DBUG_POP_EMPTY to speed up DBUG 2020-08-20 19:34:11 +03:00
Marko Mäkelä
65c43bcfe2 After-merge fix of the Windows build 2020-08-20 13:35:26 +03:00
Marko Mäkelä
d5d8756de3 Merge 10.4 into 10.5 2020-08-20 12:52:44 +03:00
Marko Mäkelä
2fa9f8c53a Merge 10.3 into 10.4 2020-08-20 11:01:47 +03:00
Marko Mäkelä
de0e7cd72a Merge 10.2 into 10.3 2020-08-20 09:12:16 +03:00
Marko Mäkelä
bfba2bce6a Merge 10.1 into 10.2 2020-08-20 06:00:36 +03:00
Marko Mäkelä
cf87f3e08c Merge 10.4 into 10.5 2020-08-14 11:33:35 +03:00