- Adding a new argument "flag" to MY_COLLATION_HANDLER::strnncollsp_nchars()
and a flag MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES.
The flag defines if strnncollsp_nchars() should emulate trailing spaces
which were possibly trimmed earlier (e.g. in InnoDB CHAR compression).
This is important for NOPAD collations.
For example, with this input:
- str1= 'a ' (Latin letter a followed by one space)
- str2= 'a ' (Latin letter a followed by two spaces)
- nchars= 3
if the flag is given, strnncollsp_nchars() will virtually restore
one trailing space to str1 up to nchars (3) characters and compare two
strings as equal:
- str1= 'a ' (one extra trailing space emulated)
- str2= 'a ' (as is)
If the flag is not given, strnncollsp_nchars() does not add trailing
virtual spaces, so in case of a NOPAD collation, str1 will be compared
as less than str2 because it is shorter.
- Field_string::cmp_prefix() now passes the new flag.
Field_varstring::cmp_prefix() and Field_blob::cmp_prefix() do
not pass the new flag.
- The branch in cmp_whole_field() in storage/innobase/rem/rem0cmp.cc
(which handles the CHAR data type) now also passed the new flag.
- Fixing UCA collations to respect the new flag.
Other collations are possibly also affected, however
I had no success in making an SQL script demonstrating the problem.
Other collations will be extended to respect this flags in a separate
patch later.
- Changing the meaning of the last parameter of Field::cmp_prefix()
from "number of bytes" (internal length)
to "number of characters" (user visible length).
The code calling cmp_prefix() from handler.cc was wrong.
After this change, the call in handler.cc became correct.
The code calling cmp_prefix() from key_rec_cmp() in key.cc
was adjusted according to this change.
- Old strnncollsp_nchar() related tests in unittest/strings/strings-t.c
now pass the new flag.
A few new tests also were added, without the flag.
This is allowed:
STRING_WITH_LEN("string literal")
This is not:
char *str = "pointer to string";
... STRING_WITH_LEN(str) ..
In C++ this is also allowed:
const char str[] = "string literal";
... STRING_WITH_LEN(str) ...
Let us make innodb_buffer_pool_filename a read-only variable
so that a malicious user cannot cause an important file to be
deleted on InnoDB shutdown. An attempt to delete a directory
will fail because it is not a regular file, but what if the
variable pointed to (say) ibdata1, ib_logfile0 or some *.ibd file?
It does not seem to make much sense for this parameter to be
configurable in the first place, but we will not change that in order
to avoid breaking compatibility.
The solution is to suppress error messages for missing tablespaces if
mariabackup is launched with "--prepare --export" options.
"mariabackup --prepare --export" invokes itself with --mysqld parameter.
If the parameter is set, then it starts server to feed "FLUSH TABLES ...
FOR EXPORT;" queries for exported tablespaces. This is "normal" server
start, that's why new srv_operation value is introduced.
Reviewed by Marko Makela.
row_upd_rec_in_place(): Avoid calling page_zip_write_rec() if we
are not modifying any fields that are stored in compressed format.
btr_cur_update_in_place_zip_check(): New function to check if a
ROW_FORMAT=COMPRESSED record can actually be updated in place.
btr_cur_pessimistic_update(): If the BTR_KEEP_POS_FLAG is not set
(we are in a ROLLBACK and cannot write any BLOBs), ignore the potential
overflow and let page_zip_reorganize() or page_zip_compress() handle it.
This avoids a failure when an attempted UPDATE of an NULL column to 0 is
rolled back. During the ROLLBACK, we would try to move a non-updated
long column to off-page storage in order to avoid a compression failure
of the ROW_FORMAT=COMPRESSED page.
page_zip_write_trx_id_and_roll_ptr(): Remove an assertion that would fail
in row_upd_rec_in_place() because the uncompressed page would already
have been modified there.
Thanks to Jean-François Gagné for providing a copy of a page that
triggered these bugs on the ROLLBACK of UPDATE and DELETE.
A 10.6 version of this was tested by Matthias Leich using
cmake -DWITH_INNODB_EXTRA_DEBUG=ON a.k.a. UNIV_ZIP_DEBUG.
mtr uses group suffix, but some existing inc and test files use
server_id for expect files. This patch aims to fix that.
For spider:
With this change we will not have to maintain a separate version of
restart_mysqld.inc for spider, that duplicates code, just because
spider tests use different names for expect files, and shutdown_mysqld
requires magical names for them.
With this change spider tests will also be able to use other features
provided by restart_mysqld.inc without code duplication, like the
parameter $restart_parameters (see e.g. the testcase mdev_29904.test
in commit ef1161e5d4f).
Tests run after this change: default, spider, rocksdb, galera, using
the following command
mtr --parallel=auto --force --max-test-fail=0 --skip-core-file
mtr --suite spider,spider/*,spider/*/* \
--skip-test="spider/oracle.*|.*/t\..*" --parallel=auto --big-test \
--force --max-test-fail=0 --skip-core-file
mtr --suite galera --parallel=auto
mtr --suite rocksdb --parallel=auto
buf_LRU_block_remove_hashed(): Ever since
commit 2e814d4702
we could get page_zip_validate() failures after an ALTER TABLE
operation was aborted and BtrBulk::pageCommit() had never been
executed on some blocks.
redundant table rebuild
- InnoDB alter fails to apply the online log during redundant table
rebuild. Problem is that InnoDB wrongly reads the length flags of the
record while applying the temporary log record.
rec_init_offsets_comp_ordinary(): For finding the n_core_null_bytes,
InnoDB should use the same logic as rec_convert_dtuple_to_rec_comp().
rec_init_offsets_comp_ordinary(), rec_init_offsets(),
rec_get_offsets_reverse(), rec_get_nth_field_offs_old():
Simplify some bitwise arithmetics to avoid conditional jumps,
and add branch prediction hints with the assumption that most
variable-length columns are short.
Tested by: Matthias Leich
When using the MySQL table type the CONNECT engine converted the YEAR
datatype to DATETIME for INSERT queries. This is incorrect, causing an
error on the INSERT. It should be SHORT instead.
- MY_I_S_MAYBE_NULL field attributes is added PAGE_NO and SPACE in
innodb_sys_index table. By doing this, InnoDB can set null for these
fields when it encounters discarded tablespace
When one session SELECT ... FOR UPDATE and holds the lock, subsequent
sessions that SELECT ... FOR UPDATE will wait to get the lock.
Currently, that event is labeled as `wait/io/table/sql/handler`, which
is incorrect. Instead, it should have been
`wait/lock/table/sql/handler`.
Two factors contribute to this bug:
1. Instrumentation interface and the heavy usage of `TABLE_IO_WAIT` in
`sql/handler.cc` file. See interface [^1] for better understanding;
2. The balancing act [^2] of doing instrumentation aggregration _AND_
having good performance. For example, EVENTS_WAITS_SUMMARY... is
aggregated using EVENTS_WAITS_CURRENT. Aggregration needs to be based
on the same wait class, and the code was overly aggressive in label a
LOCK operation as an IO operation in this case.
The proposed fix is pretty simple, but understanding the bug took a
while. Hence the footnotes below. For future improvement and
refactoring, we may want to consider renaming `TABLE_IO_WAIT` and making
it less coarse and more targeted.
Note that newly added test case, events_waits_current_MDEV-29091,
initially didn't pass Buildbot CI for embedded build tests. Further
research showed that other impacted tests all included not_embedded.inc.
This oversight was fixed later.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
[^1]: To understand `performance_schema` instrumentation interface, I
found this URL is the most helpful:
https://dev.mysql.com/doc/dev/mysql-server/latest/PAGE_PFS_PSI.html
[^2]: The best place to understand instrumentation projection,
composition, and aggregration is through the source file. Although I
prefer reading Doxygen produced html file, but for whatever reason, the
rendering is not ideal. Here is link to 10.6's pfs.cc:
https://github.com/MariaDB/server/blob/10.6/storage/perfschema/pfs.cc
- InnoDB tries to build the previous version of the record for
the virtual index, but the undo log record doesn't contain
virtual column information. This leads to assert failure while
building the tuple.
This patch is the result of running
run-clang-tidy -fix -header-filter=.* -checks='-*,modernize-use-equals-default' .
Code style changes have been done on top. The result of this change
leads to the following improvements:
1. Binary size reduction.
* For a -DBUILD_CONFIG=mysql_release build, the binary size is reduced by
~400kb.
* A raw -DCMAKE_BUILD_TYPE=Release reduces the binary size by ~1.4kb.
2. Compiler can better understand the intent of the code, thus it leads
to more optimization possibilities. Additionally it enabled detecting
unused variables that had an empty default constructor but not marked
so explicitly.
Particular change required following this patch in sql/opt_range.cc
result_keys, an unused template class Bitmap now correctly issues
unused variable warnings.
Setting Bitmap template class constructor to default allows the compiler
to identify that there are no side-effects when instantiating the class.
Previously the compiler could not issue the warning as it assumed Bitmap
class (being a template) would not be performing a NO-OP for its default
constructor. This prevented the "unused variable warning".
The existing storage/rocksdb/CMakeCache.txt defined
ATOMIC_EXTRA_LIBS when atomics where required. This was
determined by the toplevel configure.cmake test
(HAVE_GCC_C11_ATOMICS_WITH_LIBATOMIC).
As build_rocksdb.cmake is included after ATOMIC_EXTRA_LIBS
was set, we just need to use it. As such no riscv64
specific macro is needed in build_rocksdb.cmake.
As highlighted by Gianfranco Costamagna (@LocutusOfBorg)
in #2472 overwriting SYSTEM_LIBS was problematic.
This is corrected in case in future SYSTEM_LIBS is changed
elsewhere.
Closes#2472.
This is Kentoku's patch for MDEV-22979 (e6e41f04f4 + 22a0097727),
which fixes 30370.
It changes the wait to a timed wait for the first sts thread, which
waits on server start to execute the init queries for spider. It also
flips the flag init_command to false when the sts thread is being
freed. With these changes the sts thread can check the flag regularly
and abort the init_queries when it finds out the init_command is
false. This avoids the deadlock that causes the problem in MDEV-30370.
It also fixes MDEV-22979 for 10.4, but not 10.5. I have not tested
higher versions for MDEV-22979.
A test has also been done on MDEV-29904 to avoid regression, given
MDEV-27233 is a similar problem and its patch caused the
regression. The test passes for 10.4-11.0.
However, this adhoc test only works consistently when placed in the
main testsuite. We should not place spider tests in the main suite, so
we do not include it in this commit. A patch for MDEV-27912 should fix
this problem and allow a proper test for MDEV-29904. See comments in
the jira ticket MDEV-30370/29904 for the adhoc testcase used for this
commit.
The MariaDB code base uses strcat() and strcpy() in several
places. These are known to have memory safety issues and their usage is
discouraged. Common security scanners like Flawfinder flags them. In MariaDB we
should start using modern and safer variants on these functions.
This is similar to memory issues fixes in 19af1890b5
and 9de9f105b5 but now replace use of strcat()
and strcpy() with safer options strncat() and strncpy().
However, add '\0' forcefully to make sure the result string is correct since
for these two functions it is not guaranteed what new string will be null-terminated.
Example:
size_t dest_len = sizeof(g->Message);
strncpy(g->Message, "Null json tree", dest_len); strncat(g->Message, ":",
sizeof(g->Message) - strlen(g->Message)); size_t wrote_sz = strlen(g->Message);
size_t cur_len = wrote_sz >= dest_len ? dest_len - 1 : wrote_sz;
g->Message[cur_len] = '\0';
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the BSD-new
license. I am contributing on behalf of my employer Amazon Web Services
-- Reviewer and co-author Vicențiu Ciorbaru <vicentiu@mariadb.org>
-- Reviewer additions:
* The initial function implementation was flawed. Replaced with a simpler
and also correct version.
* Simplified code by making use of snprintf instead of chaining strcat.
* Simplified code by removing dynamic string construction in the first
place and using static strings if possible. See connect storage engine
changes.
When built with ubsan and trying to load the spider plugin, the hidden
visibility of mysqld compiling flag causes ha_spider.so to be missing
the symbol ha_partition. This commit fixes that, as well as some
memcpy null pointer issues when built with ubsan.
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
MySQL 5.7.41 includes one InnoDB change
mysql/mysql-server@d2d6b2dd00
that seems to be applicable to MariaDB Server 10.3 and 10.4.
Even though commit 5b9ee8d819
seems to have fixed sporadic failures on our CI systems, it is
theoretically possible that another race condition remained.
buf_flush_page_cleaner_coordinator(): In the final loop,
wait also for buf_get_n_pending_read_ios() to reach 0.
In this way, if a secondary index leaf page was read into the
buffer pool and ibuf_merge_or_delete_for_page() modified that
page or some change buffer pages, the flush loop would execute
until the buffer pool really is in a clean state.
This potential data corruption bug does not affect MariaDB Server 10.5
or later, thanks to commit b42294bc64
which removed change buffer merges that are not explicitly requested.
If two high priority threads have lock conflict, we look at the
order of these transactions and honor the earlier transaction.
for_locking parameter in lock_rec_has_to_wait() has become
obsolete and it is now removed from the code .
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Cluster conflict victim's THD is marked with wsrep_aborter.
THD::wsrep_aorter holds the thread ID of the hight priority tread,
which is currently carrying out BF aborting for this victim.
However, the BF abort operation is not always successful,
and in such case the wsrep_aborter mark should be removed.
In the old code, this wsrep_aborter resetting did not happen,
and this could lead to a situation where the sticky wsrep_aborter
mark prevents any further attempt to BF abort this transaction.
This commit fixes this issue, and resets wsrep_aborter after
unsuccesful BF abort attempt.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
node->is_delete was incorrectly set to NO_DELETE for a set of operations.
In general we shouldn't rely on sql_command and look for more abstract ways
to control the behavior.
trg_event_map seems to be a suitable way. To mind replica nodes, it is ORed
with slave_fk_event_map, which stores trg_event_map when replica has
triggers disabled.
clang15 finally errors on old prototype definations.
Its also a lot fussier about variables that aren't used
as is the case a number of time with loop counters that
aren't examined.
RocksDB was complaining that its get_range function was
declared without the array length in ha_rocksdb.h. While
a constant is used rather than trying to import the
Rdb_key_def::INDEX_NUMBER_SIZE header (was causing a lot of
errors on the defination of other orders). If the constant
does change can be assured that the same compile warnings will
tell us of the error.
The ha_rocksdb::index_read_map_impl DBUG_EXECUTE_IF was similar
to the existing endless functions used in replication tests.
Its rather moot point as the rocksdb.force_shutdown test that
uses myrocks_busy_loop_on_row_read is currently disabled.
1. In case of system-versioned table add row_end into FTS_DOC_ID index
in fts_create_common_tables() and innobase_create_key_defs().
fts_n_uniq() returns 1 or 2 depending on whether the table is
system-versioned.
After this patch recreate of FTS_DOC_ID index is required for
existing system-versioned tables. If you see this message in error
log or server warnings: "InnoDB: Table db/t1 contains 2 indexes
inside InnoDB, which is different from the number of indexes 1
defined in the MariaDB" use this command to fix the table:
ALTER TABLE db.t1 FORCE;
2. Fix duplicate history for secondary unique index like it was done
in MDEV-23644 for clustered index (932ec586aa). In case of
existing history row which conflicts with currently inseted row we
check in row_ins_scan_sec_index_for_duplicate() whether that row
was inserted as part of current transaction. In that case we
indicate with DB_FOREIGN_DUPLICATE_KEY that new history row is not
needed and should be silently skipped.
3. Some parts of MDEV-21138 (7410ff436e) reverted. Skipping of
FTS_DOC_ID index for history rows made problems with purge
system. Now this is fixed differently by p.2.
4. wait_all_purged.inc checks that we didn't affect non-history rows
so they are deleted and purged correctly.
Additional FTS fixes
fts_init_get_doc_id(): exclude history rows from max_doc_id
calculation. fts_init_get_doc_id() callback is used only for crash
recovery.
fts_add_doc_by_id(): set max value for row_end field.
fts_read_stopword(): stopwords table can be system-versioned too. We
now read stopwords only for current data.
row_insert_for_mysql(): exclude history rows from doc_id validation.
row_merge_read_clustered_index(): exclude history_rows from doc_id
processing.
fts_load_user_stopword(): for versioned table retrieve row_end field
and skip history rows. For non-versioned table we retrieve 'value'
field twice (just for uniformity).
FTS tests for System Versioning now include maybe_versioning.inc which
adds 3 combinations:
'vers' for debug build sets sysvers_force and
sysvers_hide. sysvers_force makes every created table
system-versioned, sysvers_hide hides WITH SYSTEM VERSIONING
for SHOW CREATE.
Note: basic.test, stopword.test and versioning.test do not
require debug for 'vers' combination. This is controlled by
$modify_create_table in maybe_versioning.inc and these
tests run WITH SYSTEM VERSIONING explicitly which allows to
test 'vers' combination on non-debug builds.
'vers_trx' like 'vers' sets sysvers_force_trx and sysvers_hide. That
tests FTS with trx_id-based System Versioning.
'orig' works like before: no System Versioning is added, no debug is
required.
Upgrade/downgrade test for System Versioning is done by
innodb_fts.versioning. It has 2 combinations:
'prepare' makes binaries in std_data (requires old server and OLD_BINDIR).
It tests upgrade/downgrade against old server as well.
'upgrade' tests upgrade against binaries in std_data.
Cleanups:
Removed innodb-fts-stopword.test as it duplicates stopword.test
Works like vers_force but forces trx_id-based system-versioned tables
if the storage supports it (currently InnoDB-only). Otherwise creates
timestamp-based system-versioned table.
The incorrect type handler caused an incorrect result_type() for
Item_cache_row (STRING_RESULT rather than ROW_RESULT). By updating the
constructor of Item_cache_row with the correct type handler, it fixes
this problem.
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
Reviewed-by: Sergei Golubchik <serg@mariadb.com>
Before the fix next-key lock was requested only if a record was
delete-marked for locking unique search in RR isolation level.
There can be several delete-marked records for the same unique key,
that's why InnoDB scans the records until eighter non-delete-marked record
is reached or all delete-marked records with the same unique key are
scanned.
For range scan next-key locks are used for RR to protect scanned range from
inserting new records by other transactions. And this is the reason of why
next-key locks are used for delete-marked records for unique searches.
If a record is not delete-marked, the requested lock type was "not-gap".
When a record is not delete-marked during lock request by trx 1, and
some other transaction holds conflicting lock, trx 1 creates waiting
not-gap lock on the record and suspends. During trx 1 suspending the
record can be delete-marked. And when the lock is granted on conflicting
transaction commit or rollback, its type is still "not-gap". So we have
"not-gap" lock on delete-marked record for RR. And this let some other
transaction to insert some record with the same unique key when trx 1 is
not committed, what can cause isolation level violation.
The fix is to set next-key locks for both delete-marked and
non-delete-marked records for unique search in RR.
When trying to create a spider table with banned charsets including
utf32, utf16, ucs2 and utf16le[1], spider should emit an error
immediately, rather than wait until a separate statement that
establishes a connection (e.g. SELECT). This also applies to ALTER
TABLE statement that changes charsets.
[1] https://mariadb.com/kb/en/server-system-variables/#character_set_client
Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
Reviewed-by: Nayuta Yanagisawa <nayuta.yanagisawa@mariadb.com>
Hide the errors related to missing innodb stats tables in bootstrap mode
on the assumption that because we are in bootstrap mode they are going
to be created.
mysql_discard_or_import_tablespace(): On successful
ALTER TABLE...DISCARD TABLESPACE, evict the table handle from the
table definition cache, so that ha_innobase::close() will be invoked,
like InnoDB expects to be the case. This will avoid an assertion failure
ut_a(table->get_ref_count() == 0) during IMPORT TABLESPACE.
ha_innobase::open(): Do not issue any ER_TABLESPACE_DISCARDED warning.
Member functions for DML will do that.
ha_innobase::truncate(), ha_innobase::check_if_supported_inplace_alter():
Issue ER_TABLESPACE_DISCARDED warnings, to compensate for the removal of
the warning in ha_innobase::open().
row_quiesce_write_indexes(): Only write information about committed
indexes. The ALTER TABLE t NOWAIT ADD INDEX(c) in the nondeterministic
test case will most of the time fail due to a metadata lock (MDL) timeout
and leave behind an uncommitted index.
Reviewed by: Sergei Golubchik
The bug is caused by a similar mechanism as MDEV-21027.
The function, check_insert_or_replace_autoincrement, failed to open
all the partitions on REPLACE SELECT statements and it results in the
assertion error.