This is a non-functional change. It changes the way how case folding data
and weight data (for simple Unicode collations) are stored:
- Removing data types MY_UNICASE_CHARACTER, MY_UNICASE_INFO
- Using data types MY_CASEFOLD_CHARACTER, MY_CASEFOLD_INFO instead.
This patch changes simple Unicode collations in a similar way
how MDEV-30695 previously changed Asian collations.
No new MTR tests are needed. The underlying code is thoroughly
covered by a number of ctype_*_ws.test and ctype_*_casefold.test
files, which were added recently as a preparation
for this change.
Old and new Unicode data layout
-------------------------------
Case folding data is now stored in separate tables
consisting of MY_CASEFOLD_CHARACTER elements with two members:
typedef struct casefold_info_char_t
{
uint32 toupper;
uint32 tolower;
} MY_CASEFOLD_CHARACTER;
while weight data (for simple non-UCA collations xxx_general_ci
and xxx_general_mysql500_ci) is stored in separate arrays of
uint16 elements.
Before this change case folding data and simple weight data were
stored together, in tables of the following elements with three members:
typedef struct unicase_info_char_st
{
uint32 toupper;
uint32 tolower;
uint32 sort; /* weights for simple collations */
} MY_UNICASE_CHARACTER;
This data format was redundant, because weights (the "sort" member) were
needed only for these two simple Unicode collations:
- xxx_general_ci
- xxx_general_mysql500_ci
Adding case folding information for Unicode-14.0.0 using the old
format would waste memory without purpose.
Detailed changes
----------------
- Changing the underlying data types as described above
- Including unidata-dump.c into the sources.
This program was earlier used to dump UnicodeData.txt
(e.g. https://www.unicode.org/Public/14.0.0/ucd/UnicodeData.txt)
into MySQL / MariaDB source files.
It was originally written in 2002, but has not been distributed yet
together with MySQL / MariaDB sources.
- Removing the old format Unicode data earlier dumped from UnicodeData.txt
(versions 3.0.0 and 5.2.0) from ctype-utf8.c.
Adding Unicode data in the new format into separate header files,
to maintain the code easier:
- ctype-unicode300-casefold.h
- ctype-unicode300-casefold-tr.h
- ctype-unicode300-general_ci.h
- ctype-unicode300-general_mysql500_ci.h
- ctype-unicode520-casefold.h
- Adding a new file ctype-unidata.c as an aggregator for
the header files listed above.
Removing similar functions from ctype-utf8.c and ctype-ucs2.c
- my_tosort_utf16()
- my_tosort_utf32()
- my_tosort_ucs2()
- my_tosort_unicode()
Adding new shared functions into ctype-unidata.h:
- my_tosort_unicode_bmp() - reused for utf8mb3, ucs2
- my_tosort_unicode() - reused for utf8mb4, utf16, utf32
For simplicity, the new version of my_tosort_unicode*()
does not include the code handling the MY_CS_LOWER_SORT flag because:
- it affects performance negatively
- we don't have any collations with this flag yet anyway
(This code was most likely earlier erroneously merged from
MySQL's utf8_tolower_ci at some point.)
Currently autobake-debs.sh does not pass shellcheck
it fails making errors:
* SC1091 when using shellcheck -x it needs to know where to
find ./VERSION. As this is not needed we just specify it
as /dev/null as mentioned in shellcheck documentation:
https://www.shellcheck.net/wiki/SC1091
* SC2086 make sure that there is no globbing or word splitting
in dpkg-buidpackage string. This not big problem or about to happen
but now extra parameter parsing is more Bash compliant with
using array.
Change BUILDPACKAGE_PREPEND to BUILDPACKAGE_DPKGCMD which holds
'eatmydata' if it's available and needed 'dpkg-buildpackage'
https://www.shellcheck.net/wiki/SC2086
Fix small script indentation problem.
Github PR #2424 regressed Ubuntu 18.04 building
other than x86_64 machines. Architecture that are
impacted are PPC64 and ARM64.
This was because of changes in debian/rules file
which caused removing dependency to package 'libpmem-dev'
and CMake which '-DWITH_PMEM' removing not working
correctly. Package libpmem-dev was removed but
it still required to have PMEM with CMake which.
Commit make change that -DWITH_PMEM is correctly removed
if it's not wanted.
tpool::cache::m_mtx: Add PERFORMANCE_SCHEMA instrumentation
(wait/synch/mutex/innodb/tpool_cache_mutex). This covers the
InnoDB read_slots and write_slots for asynchronous data page I/O.
- This issue caused by race condition between drop thread
and fil_encrypt_thread. fil_encrypt_thread closes
the tablespace if the number of opened files
exceeds innodb_open_files. fil_node_open_file()
closes the tablespace which are open and it doesn't
have pending operations. At that time, InnoDB drop tries
to write the redo log for the file delete operation.
It throws the bad file descriptor error.
- When trying to close the file, InnoDB should check
whether the table is going to be dropped.
Query cache should be invalidated if we are not in applier. For some
reason this condition was incorrect starting from 10.5 but it is
correct in 10.4.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
After MDEV-30830 has added block-nl-join.r_unpack_time_ms, it became
apparent that there is some unaccounted-for time in BNL join operation,
namely the time that is spent after unpacking the join buffer record.
Fix this by adding a Gap_time_tracker to track the time that is spent
after unpacking the join buffer record and before any next time tracking.
The collected time is printed in block-nl-join.r_other_time_ms.
Reviewed by: Monty <monty@mariadb.org>
- Adding a new argument "flag" to MY_COLLATION_HANDLER::strnncollsp_nchars()
and a flag MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES.
The flag defines if strnncollsp_nchars() should emulate trailing spaces
which were possibly trimmed earlier (e.g. in InnoDB CHAR compression).
This is important for NOPAD collations.
For example, with this input:
- str1= 'a ' (Latin letter a followed by one space)
- str2= 'a ' (Latin letter a followed by two spaces)
- nchars= 3
if the flag is given, strnncollsp_nchars() will virtually restore
one trailing space to str1 up to nchars (3) characters and compare two
strings as equal:
- str1= 'a ' (one extra trailing space emulated)
- str2= 'a ' (as is)
If the flag is not given, strnncollsp_nchars() does not add trailing
virtual spaces, so in case of a NOPAD collation, str1 will be compared
as less than str2 because it is shorter.
- Field_string::cmp_prefix() now passes the new flag.
Field_varstring::cmp_prefix() and Field_blob::cmp_prefix() do
not pass the new flag.
- The branch in cmp_whole_field() in storage/innobase/rem/rem0cmp.cc
(which handles the CHAR data type) now also passed the new flag.
- Fixing UCA collations to respect the new flag.
Other collations are possibly also affected, however
I had no success in making an SQL script demonstrating the problem.
Other collations will be extended to respect this flags in a separate
patch later.
- Changing the meaning of the last parameter of Field::cmp_prefix()
from "number of bytes" (internal length)
to "number of characters" (user visible length).
The code calling cmp_prefix() from handler.cc was wrong.
After this change, the call in handler.cc became correct.
The code calling cmp_prefix() from key_rec_cmp() in key.cc
was adjusted according to this change.
- Old strnncollsp_nchar() related tests in unittest/strings/strings-t.c
now pass the new flag.
A few new tests also were added, without the flag.
The tests innodb.import_tablespace_race, innodn.restart, and innodb.innodb-wl5522 move
the tablespace file between the data directory and the tmp directory specified by
global environment variables. However this is risky because it's not unusual that the
set tmp directory (often under /tmp) is mounted on another disk partition or device,
and 'move_file' command may fail with "Errcode: 18 'Invalid cross-device link.'"
For innodb.import_tablespace_race and innodb.innodb-wl5522, moving files
across directories is not necessary. Modify the tests so they rename
files under the same directory. For innodb.restart, instead of moving
between datadir and MYSQL_TMPDIR, move the files under MYSQLTEST_VARDIR.
All new code of the whole pull request, including one or several files that
are either new files or modified ones, are contributed under the BSD-new license.
I am contributing on behalf of my employer Amazon Web Services, Inc.
The tests innodb.import_tablespace_race, innodn.restart, and innodb.innodb-wl5522 move
the tablespace file between the data directory and the tmp directory specified by
global environment variables. However this is risky because it's not unusual that the
set tmp directory (often under /tmp) is mounted on another disk partition or device,
and 'move_file' command may fail with "Errcode: 18 'Invalid cross-device link.'"
To stabilize mysqltest in the described scenario, and prevent such
behavior in the future, let make_file() check both from file path and to
file path and make sure they are either both under MYSQLTEST_VARDIR or
MYSQL_TMP_DIR.
All new code of the whole pull request, including one or several files that
are either new files or modified ones, are contributed under the BSD-new license.
I am contributing on behalf of my employer Amazon Web Services, Inc.
Scripts of lsb-base package are moved to sysvinit-utils
in Debian 12.
Commit removes deprecated lsb-base as dependency.
It also make change autobake-debs.sh to
be sure that there is backward compatibility
with distro version that still use lsb-base:
* Debian 10 and 11
* Ubuntu 18.04, 20.04, 22.04 and 22.10
Add vital missing step to MariaDB 10.5 upgrade job to actually install
the new binary being built. Without this the test was happily passing all
the time but actually not testing the upgrade.
Also stop using oneliner syntax for the install step to make the debugging
of failing installs/upgrades from build logs easier.
NOTE TO MERGERS: This commit is made on 10.6 branch and can be merged to
all later branches (10.7, 10.8, ..., 11.0). If/when some jobs break, they
will be fixed per branch on follow-up commits.
Tests that try to upgrade MariaDB 10.6 in Debian Sid can no longer work
properly on versions < 10.11 as MariaDB 10.11 in now Debian Sid.
Change RELEASE to use Bullseye and refactor builds and upgrade tests to
use primarily Bullseye, as it has MariaDB 10.5 and thus testing upgrades
to MariaDB 10.6 and higher can work. Add on new 'build sid' job to continue
at least on build on Sid as well, even though it is not tested in other
jobs.
Due to this many Sid specific workarounds are can also be dropped, and
since Bullseye is now used for everything, the old bullseye-backports jobs
are obsolete and removed.
Tests that upgrade MySQL in Sid to MariaDB are also removed, as no test
in Debian Sid with MariaDB < 10.11 can reliably work anymore.
Also disable reprotest as unnecessary on old branches, refactor the naming
of autobake-deb.sh and native Debian build jobs to be more clear.
NOTE TO MERGERS: This commit is made on 10.6 branch and can be merged to
all later branches (10.7, 10.8, ..., 11.0). If/when some jobs break, they
will be fixed per branch on follow-up commits.
Compare to Debian packaging of MariaDB 1:10.6.11-2 release at commit
2934e8a795
and sync upstream everything that is relevant for upstream and safe
to import on a stable 10.6 release.
* Use OpenSSL 1.1 from Debian Snapshots
0040c272bf
Related: https://jira.mariadb.org/browse/MDEV-30322
* Prefer using bullseye-backports in mosts tests over buster-bpo
daa827ecde
* Add new upgrade test for MySQL Community Cluster 8.0
3c71bec9b7
* Enable automatic datadir move also on upgrades from MySQL.com packages
4cbbcb7e56
* Update Breaks/Replaces
2cab13d059
* Normalize apt-get and curl commands
8754ea2578
* Standardize on using capitalized 'ON' in CMake build options
938757a85a
* Apply wrap-and-sort -av and other minor tweaks and inline documentation
NOTE TO MERGERS: This commit is made on 10.6 branch and can be merged to
all later branches (10.7, 10.8, ..., 11.0).
This is allowed:
STRING_WITH_LEN("string literal")
This is not:
char *str = "pointer to string";
... STRING_WITH_LEN(str) ..
In C++ this is also allowed:
const char str[] = "string literal";
... STRING_WITH_LEN(str) ...
Test fails sporadically and very rarely on this:
```
let $org_queries= `SHOW STATUS LIKE 'Queries'`;
SELECT f1();
CALL p1();
let $new_queries= `SHOW STATUS LIKE 'Queries'`;
let $diff= `SELECT SUBSTRING('$new_queries',9)-SUBSTRING('$org_queries',9)`;
```
if COM_QUIT from one of the earlier (in the test) disconnect's
happens between the two SHOW STATUS commands.
Because COM_QUIT increments "Queries".
The directly previous test uses wait_condition to wait for
its disconnects to complete. But there are more disconnects earlier
in the test file and nothing waits for them.
Let's change wait_condition to wait for *all* disconnect to complete.
MariaDB server prints the stack information if a crash happens.
It traverses the stack frames in function `print_with_addr_resolve`.
For *EACH* frame, it tries to parse the file name and line number of the
frame using `addr2line`, or prints `backtrace_symbols_fd` if `addr2line`
fails.
1. Logic in `addr_resolve` function uses addr2line to get the file name
and line numbers. It has a timeout of 500ms to wait for the response
from addr2line. However, that's not enough on small instances
especially if the debug information is in a separate file or
compressed.
Increase the timeout to 5 seconds to support some edge cases, as
experiments showed addr2line may take 2-3 seconds on some frames.
2. While parsing a frame inside of a shared library using `addr2line`,
the file name and line numbers could be `??`, empty or `0` if the
debug info is not loaded.
It's easy to reproduce when glibc-debuginfo is not installed.
Instead of printing a meaningless frame like:
:0(__GI___poll)[0x1505e9197639]
...
??:0(__libc_start_main)[0x7ffff6c8913a]
We want to print the frame information using `backtrace_symbols_fd`,
with the shared library name and a hexadecimal offset.
Stacktrace example on a real instance with this commit:
/lib64/libc.so.6(__poll+0x49)[0x145cbf71a639]
...
/lib64/libc.so.6(__libc_start_main+0xea)[0x7f4d0034d13a]
`addr_resolve` has considered the case of meaningless combination of
file name and line number returned by `addr2line`. e.g. `??:?`
However, conditions like `:0` and `??:0` are not handled. So now the
function will rollback to `backtrace_symbols_fd` in above cases.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
In block-nl-join, add:
- r_loops - this shows how many incoming record combinations this
query plan node had.
- r_effective_rows - this shows the average number of matching rows
that this table had for each incoming record combination. This is
comparable with r_rows in non-blocked access methods.
For BNL-joins, it is always equal to
$.table.r_rows * $.table.r_filtered
For BNL-H joins the value cannot be computed from other values
Reviewed by: Monty <monty@mariadb.org>
CREATE [TEMPORARY] SEQUENCE is internally CREATE+INSERT (initial value)
and it is replicated using statement based replication. In Galera
we use either TOI or RSU so we should skip commit time hooks
for it.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
With binlogs enabled, debug assertion ut_ad(xid_seqno > wsrep_seqno)
fired in trx_rseg_update_wsrep_checkpoint() when an applier thread
synced the seqno out of order for write set which had failed
certification. This was caused by releasing commit
order too early when binlogs were on, allowing group
commit to run in parallel and commit following transactions
too early.
Fixed by extending the commit order critical section to cover
call to wsrep_set_SE_checkpoint() also when binlogs are on.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
When using LEFT() function with a string that is without a charset,
the function crashes. This is because the function assumes that
the string has a charset, and tries to use it to calculate the
length of the string.
Two functions, UNHEX and WEIGHT_STRING, returned a string without
the charset being set to a not null value.
The fix is to set charset when calling val_str on these two functions.
Reviewed-by: Alexander Barkov <bar@mariadb.com>
Reviewed-by: Daniel Black <daniel@mariadb.org>
Sequence objects are implemented using special tables.
These tables do not have primary key and only one row.
NEXTVAL is basically update from existing value to new
value. In Galera this could mean that two write-sets
from different nodes do not conflict and this could
lead situation where write-sets are executed
concurrently and possibly in wrong seqno order.
This is fixed by using table-level exclusive key for
SEQUENCE updates. Note that this naturally works
correctly only if InnoDB storage engine is used for
sequence.
This fix does not contain a test case because while
it is possible to syncronize appliers using dbug_sync
it was too hard to syncronize MDL-lock requests to
exact objects. Testing done for this fix is documented
on MDEV.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Let us make innodb_buffer_pool_filename a read-only variable
so that a malicious user cannot cause an important file to be
deleted on InnoDB shutdown. An attempt to delete a directory
will fail because it is not a regular file, but what if the
variable pointed to (say) ibdata1, ib_logfile0 or some *.ibd file?
It does not seem to make much sense for this parameter to be
configurable in the first place, but we will not change that in order
to avoid breaking compatibility.
Problem:
UNIX_TIMESTAMP() called for a expression of the TIME data type
returned NULL.
Inside Type_handler_timestamp_common::Item_val_native_with_conversion
the call for item->get_date() did not convert TIME to DATETIME
automatically (because it does not have to, by design).
As a result, Type_handler_timestamp_common::TIME_to_native() received
a MYSQL_TIME value with zero date 0000-00-00 and therefore returned "true"
(indicating SQL NULL value).
Fix:
Removing the call for item->get_date().
Instantiating Datetime(item) instead.
This forces automatic TIME to DATETIME conversion
(unless @@old_mode is zero_date_time_cast).