Commit graph

28346 commits

Author SHA1 Message Date
Marko Mäkelä
ebefef658e Merge 10.11 into 11.2 2024-10-18 11:32:22 +03:00
Marko Mäkelä
eca552a1a4 MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

In crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-18 10:12:47 +03:00
Sergei Golubchik
3a1cf2c85b MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
Marko Mäkelä
bb47e575de MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

On crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

A section of the test mariabackup.innodb_redo_overwrite
that is parsing some mariadb-backup --backup output has
been removed, because that output "redo log block is overwritten"
would often be missing in a Microsoft Windows environment
as a result of these changes.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page(): Handle FSP_SPACE_FLAGS=0xffffffff
in the same way on both 32-bit and 64-bit architectures.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-17 17:24:20 +03:00
Marko Mäkelä
740519e15a MDEV-35125: Unnecessary buf_pool.page_hash lookups
dict_index_t::clear(), btr_drop_temporary_table(): Make use of the
root page guess if it is available.

btr_read_autoinc(): Invoke btr_root_block_get() to access the root page.

btr_blob_free(): Retain a buffer-fix on the page across mtr_t::commit()
in order to avoid a buf_pool.page_hash lookup.

dict_load_table_one(): Remove a redundant check for page id. It was
already validated in buf_page_t::read_complete().

trx_t::apply_log(): Make use of buf_pool.page_fix() to avoid some
mtr_t related overhead.

Reviewed by: Thirunarayanan Balathandayuthapani
2024-10-17 09:10:45 +03:00
Thirunarayanan Balathandayuthapani
4a1ded61a4 MDEV-34529 Shrink the system tablespace when system tablespace contains MDEV-30671 leaked undo pages
- InnoDB fails to shrink the system tablespace when it contains
the leaked undo log pages caused by MDEV-30671.

- InnoDB does free the unused segment in system tablespace
before shrinking the tablespace. InnoDB fails to free
the unused segment if XA PREPARE transaction exist or
if the previous shutdown was not with innodb_fast_shutdown=0

inode_info: Structure to store the inode page and offsets.

fil_space_t::garbage_collect(): Frees the system tablespace
unused segment

fsp_get_sys_used_segment(): Iterates through all default
file segment and index segment present in system tablespace.

trx_sys_t::is_xa_exist(): Returns true if the XA transaction
exist in the undo logs

fseg_inode_free(): Frees the extents, fragment pages for the
given index node and ignores any error similar to
trx_purge_free_segment()

trx_sys_t::reset_page(): Retain the TRX_SYS_FSEG_HEADER value
in trx_sys page while resetting the page.
2024-10-16 21:34:24 +05:30
Monty
0403313bdb Fixed connect to not call strlen() over and over again in a loop 2024-10-16 17:24:46 +03:00
Vladislav Vaintroub
c1fc59277a MDEV-34929 page-compressed tables do not work on Windows
Remove workaround for MDEV-13941, it served for 5 years,and all affected
pre-release 10.2 installation should have been already fixed in between.

Apparently Innodb is using is_sparse parameter in os_file_set_size()
inconsistently, and it passes is_sparse=false now during first file
extension. With MDEV-13941 workaround in place, it would unsparse
the file, which is makes compression not to work at all anymore.
2024-10-16 16:02:13 +02:00
Marko Mäkelä
a4d2cc931d MDEV-35174 Possible hang in trx_undo_prev_version()
In commit b7b9f3ce82 (MDEV-34515) we
accidentally made the InnoDB MVCC code acquire a shared
purge_sys.latch twice. Recursive shared latch acquisition may cause a
deadlock of InnoDB threads if another thread in between will start waiting
for an exclusive latch.

purge_sys_t::latch: In debug builds, use srw_lock_debug instead of
srw_spin_lock, so that bugs like this will result in debug assertion
failures.

trx_undo_report_row_operation(): Pass the view_guard to
trx_undo_prev_version() and the rest of the arguments in the same
order, so that the work to permute argument registers is minimized.
2024-10-16 14:37:44 +03:00
Thirunarayanan Balathandayuthapani
6aaae4c03b MDEV-35122 Incorrect NULL value handling for instantly dropped BLOB columns
Problem:
=======
- Redundant table fails to insert into the table after
instant drop blob column. Instant drop column only marking
the column as hidden and consecutive insert statement tries
to insert NULL value for the dropped BLOB column and returns
the fixed length of the blob type as 65535. This lead to
row size too large error.

Fix:
====
 For redundant table, if the non-fixed dropped column can be null
then set the length of the field type as 0.
2024-10-15 12:04:37 +05:30
Yuchen Pei
cd5577ba4a
Merge branch '10.5' into 10.6 2024-10-15 16:00:44 +11:00
Yuchen Pei
77ed235d50
MDEV-26345 Spider GBH should execute original queries on the data node
Stop skipping const items when selecting but skip them when storing
their results to spider row to avoid storing in mismatching temporary
table fields.

Skip auxiliary fields in SELECTing, and do not store
the (non-existing) results to the corresponding temporary table
accordingly.

When there are BOTH auxiliary fields AND const items in the auxiliary
field items, do not use the spider GBH. This is a rare occasion if it
happens at all and not worth the added complexity to cover it.

Use the original item (item_ptr) in constructing GROUP BY and ORDER
BY, which also means using item->name instead of field->field_name as
aliases in constructing SELECT items. This fixes spurious regressions
caused by the above changes in some tests using ORDER BY, such as
mdev_24517.test. As a by-product, this also fixes MDEV-29546.
Therefore we update mdev_29008.test to include the MDEV-29546 case.
2024-10-15 15:36:12 +11:00
Yuchen Pei
e6daff40e4
MDEV-32524 [fixup] Fixup of spider mem alloc enums missed in a previous merge
The merge was 10.4.34->10.5.25
2024-10-15 14:30:40 +11:00
Nayuta Yanagisawa
6080e3af19
MDEV-26912 Spider: Remove dead code related to Oracle OCI
Remove the dead-code, in Spider, which is related to the Spider's
Oracle OCI support. The code has been disabled for a long time and
it is unlikely that the code will be enabled.
2024-10-15 14:30:40 +11:00
Yuchen Pei
03a5c683f9
MDEV-27650 Spider: remove #ifdef SPIDER_HAS_GROUP_BY_HANDLER 2024-10-15 14:30:39 +11:00
Yuchen Pei
0a59aafc5f
MDEV-34659 Bound check in spider cast function query construction
During spider query construction of certain cast functions, it
locates the last occurrence of a keyword in the output of the
Item::print() function and append from there to the constructed query
so far. For example, consider the following query

SELECT * FROM t2 ORDER BY CAST(c AS INET6);

It constructs the following query and executes it at the data
node (assuming the data node table is called t0).

select cast(t0.`c` as inet6) ``,t0.`c` `c` from `test`.`t1` t0 order by ``

When the construction has completed the initial part

select cast(t0.`c`

It then attempts to construct the " as inet6" part. To that end, it
calls print() on the Item_typecast_fbt corresponding to the cast item,
and obtains

cast(`test`.`t2`.`c` as inet6)

It then looks for " as ", and places cursor there for appending:

cast(`test`.`t2`.`c` as inet6)
                    ^

In this patch, if the search fails, i.e. there's no " as ...", we
make sure that the cursor is not placed before the beginning of the
string (out of bound).

We also relax the search from " as char" to " as " in the case of
CHAR_TYPECAST_FUNC, since there is more than one Item type with this
func type. For example, "AS INET6" is an Item_typecast_fbt which has
this func type.
2024-10-15 14:30:30 +11:00
Yuchen Pei
98a9c75ea3
MDEV-34659 Use evalp in CREATE SERVER's in init_spider.inc
This was already fixed in higher versions.
2024-10-15 14:18:12 +11:00
Yuchen Pei
d3b84ff10d
MDEV-30067 Remove some overly enthusiastic asserts when deleting from a partitioned table
When an DDL statement results in a local partition table with
partitions not covering all values in the table, a failure is emitted.
However, when the table in question is a spider table, the issue does
not surface until some future statements (DELETE in the test examples
in this commit) are executed. This is consistent with the design of
spider which aims to minimise connections with the data node. The
resulting error is legitimate and should not result in an assertion
failure. Similarly, a partitioned spider table could have misplaced
rows, so we remove the other assertion as well.
2024-10-15 14:18:10 +11:00
Yuchen Pei
8a52639ede
MDEV-34716 spider: some trivial cleanups and documentation
- document tmp_share, which are temporary spider shares with only one
link (no ha)
- simplify spider_get_sys_tables_connect_info() where link_idx is
always 0
2024-10-15 11:04:27 +11:00
Yuchen Pei
2345407b8c
MDEV-34716 Fix mysql.servers socket max length too short
The limit of socket length on unix according to libc is 108, see
sockaddr_un::sun_path, but in the table it is a string of max length
64, which results in truncation of socket and failure to connect by
plugins using servers such as spider.
2024-10-15 10:50:22 +11:00
Yuchen Pei
84df8d7275
MDEV-34716 spider: some trivial cleanups and documentation
- document tmp_share, which are temporary spider shares with only one
link (no ha)
- simplify spider_get_sys_tables_connect_info() where link_idx is
always 0
2024-10-15 10:50:22 +11:00
Thirunarayanan Balathandayuthapani
5777d9f282 MDEV-35116 InnoDB fails to set error index for HA_ERR_NULL_IN_SPATIAL
- InnoDB fails to set the index information or index number
for the spatial index error HA_ERR_NULL_IN_SPATIAL.

row_build_spatial_index_key(): Initialize the tmp_mbr array completely.

check_if_supported_inplace_alter(): Fix the spelling mistake of alter
2024-10-14 14:28:24 +05:30
Marko Mäkelä
4e1e9ea6f3 MDEV-35124 Set innodb_snapshot_isolation=ON by default
From the very beginning, the default InnoDB transaction isolation level
REPEATABLE READ does not correspond to any well formed definition.
The main issue is the lack of write/write conflict detection.
To fix that and to make REPEATABLE READ correspond to Snapshot Isolation,
b8a6719889 introduced the Boolean
session variable innodb_snapshot_isolation. It was disabled by default
in order not to break any user applications.

In a new major version of MariaDB Server, we had better enable this
parameter by default.
2024-10-11 15:02:31 +03:00
Monty
875d8c909f Removed end '.' from variable comment 2024-10-09 17:10:30 +03:00
Oleksandr Byelkin
1d0e94c55f Merge branch '10.5' into 10.6 2024-10-09 08:38:48 +02:00
Thirunarayanan Balathandayuthapani
23820f1d79 MDEV-34392 Inplace algorithm violates the foreign key constraint
- Fixing the compilation issue for the compiler lesser than gcc-6

Reviewed-by : Marko Mäkelä <marko.makela@mariadb.com>
2024-10-09 10:14:29 +05:30
Thirunarayanan Balathandayuthapani
65418ca9ad MDEV-34392 Inplace algorithm violates the foreign key constraint
- Fix the compilation error in gcc-5
2024-10-08 16:43:57 +05:30
Aleksey Midenkov
4e4c7dd4f5 MDEV-28288 System versioning doesn't support correct work for
engine=connect and doesn't always give any warnings/errors

Disabled system versioning for connect due to unsupported
microseconds (MDEV-15967).
2024-10-08 13:08:10 +03:00
Aleksey Midenkov
5b940bdcfc MDEV-25060 Freeing overrun buffer, various crashes, ASAN
heap-buffer-overflow in _mi_put_key_in_record

Rec buffer size depends on vreclength like this:

  length= MY_MAX(length, info->s->vreclength);

The problem is rec buffer is allocated before vreclength is
calculated. The fix reallocates rec buffer if vreclength changed.

1. Rec buffer allocated

  f0  mi_alloc_rec_buff (...) at ../src/storage/myisam/mi_open.c:738
  f1  0x00005f4928244516 in mi_open (...) at ../src/storage/myisam/mi_open.c:671
  f2  0x00005f4928210b98 in ha_myisam::open (...)
      at ../src/storage/myisam/ha_myisam.cc:847
  f3  0x00005f49273aba41 in handler::ha_open (...) at ../src/sql/handler.cc:3105
  f4  0x00005f4927995a65 in open_table_from_share (...)
      at ../src/sql/table.cc:4320
  f5  0x00005f492769f084 in open_table (...) at ../src/sql/sql_base.cc:2024
  f6  0x00005f49276a3ea9 in open_and_process_table (...)
      at ../src/sql/sql_base.cc:3819
  f7  0x00005f49276a29b8 in open_tables (...) at ../src/sql/sql_base.cc:4303
  f8  0x00005f49276a6f3f in open_and_lock_tables (...)
      at ../src/sql/sql_base.cc:5250
  f9  0x00005f49275162de in open_and_lock_tables (...)
      at ../src/sql/sql_base.h:509
  f10 0x00005f4927a30d7a in open_only_one_table (...)
      at ../src/sql/sql_admin.cc:412
  f11 0x00005f4927a2c0c2 in mysql_admin_table (...)
      at ../src/sql/sql_admin.cc:603
  f12 0x00005f4927a2fda8 in Sql_cmd_optimize_table::execute (...)
      at ../src/sql/sql_admin.cc:1517
  f13 0x00005f49278102e3 in mysql_execute_command (...)
      at ../src/sql/sql_parse.cc:6180
  f14 0x00005f49278012d7 in mysql_parse (...) at ../src/sql/sql_parse.cc:8236

2. vreclength calculated

  f0  ha_myisam::setup_vcols_for_repair (...)
      at ../src/storage/myisam/ha_myisam.cc:1002
  f1  0x00005f49282138b4 in ha_myisam::optimize (...)
      at ../src/storage/myisam/ha_myisam.cc:1250
  f2  0x00005f49273b4961 in handler::ha_optimize (...)
      at ../src/sql/handler.cc:4896
  f3  0x00005f4927a2d254 in mysql_admin_table (...)
      at ../src/sql/sql_admin.cc:875
  f4  0x00005f4927a2fda8 in Sql_cmd_optimize_table::execute (...)
      at ../src/sql/sql_admin.cc:1517
  f5  0x00005f49278102e3 in mysql_execute_command (...)
      at ../src/sql/sql_parse.cc:6180
  f6  0x00005f49278012d7 in mysql_parse (...) at ../src/sql/sql_parse.cc:8236

FYI backtrace was done with

  set print frame-info location
  set print frame-arguments presence
  set width 80
2024-10-08 13:08:10 +03:00
Marko Mäkelä
8a6a4c947a Cleanup: Replace some is_predefined_tablespace()
In some places, there were redundant comparisons against TRX_SYS_SPACE
or SRV_TMP_SPACE_ID. The temporary tablespace is never the subject of
log-based recovery.

Also, consistently check for SRV_SPACE_ID_UPPER_BOUND.

Reviewed by: Debarun Barerjee
2024-10-04 13:41:12 +03:00
Marko Mäkelä
b249a059da MDEV-34850: Busy work while parsing FILE_ records
In mariadb-backup --backup, we only have to invoke the undo_space_trunc
and log_file_op callbacks as well as validate the mini-transaction
checksums. There is absolutely no need to access recv_sys.pages or
recv_spaces, or to allocate a decrypt_buf in case of innodb_encrypt_log=ON.
This is what the new mode recv_sys_t::store::BACKUP will do.

In the skip_the_rest: loop, the main thing is to process all FILE_ records
until the end of the log is reached.  Additionally, we must process
INIT_PAGE and FREE_PAGE records in the same way as they would be
during storing == YES.

This was measured to reduce the CPU time between the messages
"InnoDB: Multi-batch recovery needed at LSN" and
"InnoDB: End of log at LSN"
by some 20%.

recv_sys_t::store: A ternary enumeration that specifies how records
should be stored: NO, BACKUP, or YES.

recv_sys_t::parse(), recv_sys_t::parse_mtr(), recv_sys_t::parse_pmem():
Replace template<bool store> with template<store storing>.

store_freed_or_init_rec(): Simplify some logic. We can look up also
the system tablespace.

Reviewed by: Debarun Banerjee
2024-10-04 13:38:21 +03:00
Yuchen Pei
690f8a91f9
MDEV-35073 Fix -Wmaybe-uninitialized in spider_conn_first_link_idx
Default result to "no eligible servers".
2024-10-04 17:18:20 +10:00
Marko Mäkelä
f493e46494 Merge 11.6 into 11.7 2024-10-03 18:15:13 +03:00
Marko Mäkelä
43465352b9 Merge 11.4 into 11.6 2024-10-03 16:09:56 +03:00
Marko Mäkelä
b53b81e937 Merge 11.2 into 11.4 2024-10-03 14:32:14 +03:00
Marko Mäkelä
12a91b57e2 Merge 10.11 into 11.2 2024-10-03 13:24:43 +03:00
Marko Mäkelä
63913ce5af Merge 10.6 into 10.11 2024-10-03 10:55:08 +03:00
Marko Mäkelä
7e0afb1c73 Merge 10.5 into 10.6 2024-10-03 09:31:39 +03:00
Yuchen Pei
ba7088d462
Merge '11.4' into 11.6 2024-10-03 15:59:20 +10:00
Marko Mäkelä
23debf214f MDEV-28091 fixup: Fix another pfs_malloc() stub
In commit 0f56e21efa
only one pfs_malloc() stub was fixed to return aligned memory.
Also, the MSVC _aligned_malloc() pairs with _aligned_free().
2024-10-03 08:15:17 +03:00
Daniel Black
1f7898f686 mroonga: remove -Wunused-but-set-variable warnings
There where unused variable. They were not conditional
on defines, so removed them.

Added an error handing in proc_object if there was no db
as subsequent operations would have failed.
2024-10-03 15:05:09 +10:00
Daniel Black
3723fd1573 MDEV-35007 mroonga should modify source files during build
CMake rewriting the tests causes Mroonga to be un-buildable
on build environments where there source directory is read
only.

In the test results, the version wasn't particularly important.

Remove the version dependence of tests.
2024-10-03 15:05:09 +10:00
Marko Mäkelä
cc70ca7eab MDEV-35059 ALTER TABLE...IMPORT TABLESPACE with FULLTEXT SEARCH may corrupt the adaptive hash index
build_fts_hidden_table(): Correct a mistake that had been made in
commit 903ae30069 (MDEV-30655).
2024-10-02 11:09:31 +03:00
Sergei Golubchik
b1bbdbab9e cleanup: remove redundant if()
likely, a result of auto-merge of two fixes in different versions
2024-10-01 18:29:11 +02:00
Sergei Golubchik
813e592763 compilation failure in CONNECT
storage/connect/tabfmt.cpp:419:24: error: '%.3d' directive writing between 3 and 10 bytes into a region of size 5 [-Werror=format-overflow=]
  419 |       sprintf(buf, "COL%.3d", i+1);
2024-10-01 18:29:11 +02:00
Marko Mäkelä
464055fe65 MDEV-34078 Memory leak in InnoDB purge with 32-column PRIMARY KEY
row_purge_reset_trx_id(): Reserve large enough offsets for accomodating
the maximum width PRIMARY KEY followed by DB_TRX_ID,DB_ROLL_PTR.

Reviewed by: Thirunarayanan Balathandayuthapani
2024-10-01 18:35:39 +03:00
Marko Mäkelä
a298dfb84c MDEV-35053 Crash in purge_sys_t::iterator::free_history_rseg()
purge_sys_t::get_page(): Avoid accessing a freed reference to pages[id]
after pages.erase(id).  This heap-use-after-free would sometimes be
caught by AddressSanitizer.

purge_sys_t::iterator::free_history_rseg(): Do not crash if undo=nullptr
(the database is corrupted).

Reviewed by: Debarun Banerjee
2024-10-01 15:03:04 +03:00
Marko Mäkelä
2d031f4a71 MDEV-34973 fixup for POWER,s390x
xtest(): Correct the declaration.
2024-10-01 13:29:59 +03:00
Max Kellermann
6715e4dfe1 MDEV-34973: innobase/dict0dict: add noexcept to lock/unlock methods
Another chance for cutting back overhead due to C++ exceptions being
enabled; the `dict_sys_t` class is a good candidate because its
locking methods are called frequently.

Binary size reduction this time:

    text	  data	   bss	   dec	   hex	filename
 24448622	2436488	9473537	36358647	22ac9f7	build/release/sql/mariadbd
 24448474	2436488	9473601	36358563	22ac9a3	build/release/sql/mariadbd
2024-10-01 09:53:16 +03:00
Max Kellermann
813123e3e0 MDEV-34973: innobase/lock0lock: add noexcept
MariaDB is compiled with C++ exceptions enabled, and that disallows
some optimizations (e.g. the stack must always be unwinding-safe).  By
adding `noexcept` to functions that are guaranteed to never throw,
some of these optimizations can be regained.  Low-level locking
functions that are called often are a good candidate for this.

This shrinks the executable a bit (tested with GCC 14 on aarch64):

    text	  data	   bss	   dec	   hex	filename
 24448910	2436488	9473185	36358583	22ac9b7	build/release/sql/mariadbd
 24448622	2436488	9473537	36358647	22ac9f7	build/release/sql/mariadbd
2024-10-01 09:53:16 +03:00
Thirunarayanan Balathandayuthapani
cc810e64d4 MDEV-34392 Inplace algorithm violates the foreign key constraint
Don't allow the referencing key column from NULL TO NOT NULL
when

 1) Foreign key constraint type is ON UPDATE SET NULL
 2) Foreign key constraint type is ON DELETE SET NULL
 3) Foreign key constraint type is UPDATE CASCADE and referenced
 column declared as NULL

Don't allow the referenced key column from NOT NULL to NULL
when foreign key constraint type is UPDATE CASCADE
and referencing key columns doesn't allow NULL values

get_foreign_key_info(): InnoDB sends the information about
nullability of the foreign key fields and referenced key fields.

fk_check_column_changes(): Enforce the above rules for COPY
algorithm

innobase_check_foreign_drop_col(): Checks whether the dropped
column exists in existing foreign key relation

innobase_check_foreign_low() : Enforce the above rules for
INPLACE algorithm

dict_foreign_t::check_fk_constraint_valid(): This is used
by CREATE TABLE statement to check nullability for foreign
key relation.
2024-10-01 09:41:56 +05:30
Max Kellermann
45298b730b sql/handler: referenced_by_foreign_key() returns bool
The method was declared to return an unsigned integer, but it is
really a boolean (and used as such by all callers).

A secondary change is the addition of "const" and "noexcept" to this
method.

In ha_mroonga.cpp, I also added "inline" to the two helper methods of
referenced_by_foreign_key().  This allows the compiler to flatten the
method.
2024-09-30 16:33:25 +03:00
Marko Mäkelä
d28ac3f82d MDEV-34207: ALTER TABLE...STATS_PERSISTENT=0 fails to drop statistics
commit_try_norebuild(): If the STATS_PERSISTENT attribute of the table
is being changed to disabled, drop the persistent statistics of the table.
2024-09-30 15:27:38 +03:00
Sergei Golubchik
b88f1267e4 MDEV-33373 part 2: Unexpected ER_FILE_NOT_FOUND upon reading from logging table after crash recovery
CSV engine shoud set my_errno if use it.
2024-09-30 13:50:51 +02:00
Marko Mäkelä
dd5ce6b0c4 MDEV-34450 os_file_write_func() is an overkill for ib_logfile0
log_file_t::read(), log_file_t::write(): Invoke pread() or pwrite()
directly, so that we can give more accurate diagnostics in case of
a failure, and so that we will avoid the overhead of setting up 5(!)
stack frames and related objects.

tpool::pwrite(): Add a missing const qualifier.
2024-09-30 13:36:38 +03:00
Daniel Black
f199dffe3b MDEV-34937 s3 engine no longer functional on non-gcc builds
Last commit on libmarias3 broke non-gcc builds by excluding
the most important aspect, the snprintf being executed.

Reviewer: Andrew Hutchings <andrew@mariadb.org>
Ref: https://github.com/mariadb-corporation/libmarias3/pull/130
2024-09-30 09:23:30 +01:00
Yuchen Pei
282b92f0a2
MDEV-34589 Do not execute before queries in spider_db_mbase::rollback()
Rollback is not supposed to fail. This prevents false failures in
spider rollback.
2024-09-30 16:16:27 +10:00
Yuchen Pei
42735c557e
MDEV-34636 Spider: reset wide_handler->trx in two occasions
ha_spider::update_create_info()
ha_spider::append_lock_tables_list()
2024-09-30 15:52:08 +10:00
Yuchen Pei
f43ea935a1
MDEV-34636 Remove implementation of ha-spider::extra() with MERGE flags 2024-09-30 15:51:18 +10:00
Yuchen Pei
69874ee95c
MDEV-34828 Remove some obsolete cmake code related to the removed spider handlersocket support
A fixup of MDEV-26858
2024-09-30 15:12:00 +10:00
Kristian Nielsen
35c732cdde MDEV-25611: RESET MASTER causes the server to hang
RESET MASTER waits for storage engines to reply to a binlog checkpoint
requests. If this response is delayed for a long time for some reason, then
RESET MASTER can hang.

Fix this by forcing a log sync in all engines just before waiting for the
checkpoint reply.

(Waiting for old checkpoint responses is needed to preserve durability of
any commits that were synced to disk in the to-be-deleted binlog but not yet
synced in the engine.)

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-09-27 15:10:06 +02:00
Marko Mäkelä
2d3ddaef35 MDEV-34907 Bogus assertion failure and busy work while parsing FILE_ records
A server that was running with innodb_log_file_size=96M and
innodb_buffer_pool_size=6M had inserted some data into a table
that was subsequently dropped. When the server was killed and
restarted, an assertion failed in recv_sys_t::parse() while
a FSP_SIZE change was unnecessarily being processed during
the skip_the_rest: loop in recv_scan_log().

The ib_logfile0 contents was as follows:

1. The checkpoint start LSN points to the start of some mini-transaction.
2. There may be log records for modifying files for which a FILE_MODIFY
had been written before the checkpoint. These records were "purged"
by advancing the checkpoint.
3. At some point during the initial parsing with store=true the space
reserved for recv_sys.pages will run out and recv_scan_log() would switch
to the skip_the_rest: mode.
4. We encounter a log record for extending a tablespace that will be
deleted a bit later. This would trip the bogus debug assertion.
5. Later on, there would be a FILE_DELETE record for this tablespace.
6. The checkpoint end LSN points to a possibly empty sequence of
FILE_MODIFY records and a FILE_CHECKPOINT record. Recovery had parsed these
records first, before rewinding to the checkpoint start LSN.
7. There could be further records following the FILE_CHECKPOINT record.
Recovery will process all records until an inconsistency is found and
it is assumed that the end of the circular ib_logfile0 was reached.

recv_sys_t::parse(): For the template instantiation with store=false,
remove a debug assertion that could fail in a multi-batch recovery,
while recv_scan_log(false) would be in the skip_the_rest: loop.
It is very well possible that we have not encountered all FILE_ records
yet, and therefore we should not complain about unknown tablespaces.

Reviewed by: Debarun Banerjee
2024-09-27 12:31:37 +03:00
Marko Mäkelä
6acada713a MDEV-34062: Implement innodb_log_file_mmap on 64-bit systems
When using the default innodb_log_buffer_size=2m, mariadb-backup --backup
would spend a lot of time re-reading and re-parsing the log. For reads,
it would be beneficial to memory-map the entire ib_logfile0 to the
address space (typically 48 bits or 256 TiB) and read it from there,
both during --backup and --prepare.

We will introduce the Boolean read-only parameter innodb_log_file_mmap
that will be OFF by default on most platforms, to avoid aggressive
read-ahead of the entire ib_logfile0 in when only a tiny portion would be
accessed. On Linux and FreeBSD the default is innodb_log_file_mmap=ON,
because those platforms define a specific mmap(2) option for enabling
such read-ahead and therefore it can be assumed that the default would
be on-demand paging. This parameter will only have impact on the initial
InnoDB startup and recovery. Any writes to the log will use regular I/O,
except when the ib_logfile0 is stored in a specially configured file system
that is backed by persistent memory (Linux "mount -o dax").

We also experimented with allowing writes of the ib_logfile0 via a
memory mapping and decided against it. A fundamental problem would be
unnecessary read-before-write in case of a major page fault, that is,
when a new, not yet cached, virtual memory page in the circular
ib_logfile0 is being written to. There appears to be no way to tell
the operating system that we do not care about the previous contents of
the page, or that the page fault handler should just zero it out.

Many references to HAVE_PMEM have been replaced with references to
HAVE_INNODB_MMAP.

The predicate log_sys.is_pmem() has been replaced with
log_sys.is_mmap() && !log_sys.is_opened().

Memory-mapped regular files differ from MAP_SYNC (PMEM) mappings in the
way that an open file handle to ib_logfile0 will be retained. In both
code paths, log_sys.is_mmap() will hold. Holding a file handle open will
allow log_t::clear_mmap() to disable the interface with fewer operations.

It should be noted that ever since
commit 685d958e38 (MDEV-14425)
most 64-bit Linux platforms on our CI platforms
(s390x a.k.a. IBM System Z being a notable exception) read and write
/dev/shm/*/ib_logfile0 via a memory mapping, pretending that it is
persistent memory (mount -o dax). So, the memory mapping based log
parsing that this change is enabling by default on Linux and FreeBSD
has already been extensively tested on Linux.

::log_mmap(): If a log cannot be opened as PMEM and the desired access
is read-only, try to open a read-only memory mapping.

xtrabackup_copy_mmap_snippet(), xtrabackup_copy_mmap_logfile():
Copy the InnoDB log in mariadb-backup --backup from a memory
mapped file.
2024-09-26 18:47:12 +03:00
Denis Protivensky
231900e5bb MDEV-34836: TOI on parent table must BF abort SR in progress on a child
Applied SR transaction on the child table was not BF aborted by TOI running
on the parent table for several reasons:

Although SR correctly collected FK-referenced keys to parent, TOI in Galera
disregards common certification index and simply sets itself to depend on
the latest certified write set seqno.

Since this write set was the fragment of SR transaction, TOI was allowed to
run in parallel with SR presuming it would BF abort the latter.

At the same time, DML transactions in the server don't grab MDL locks on
FK-referenced tables, thus parent table wasn't protected by an MDL lock from
SR and it couldn't provoke MDL lock conflict for TOI to BF abort SR transaction.

In InnoDB, DDL transactions grab shared MDL locks on child tables, which is not
enough to trigger MDL conflict in Galera.

InnoDB-level Wsrep patch didn't contain correct conflict resolution logic due to
the fact that it was believed MDL locking should always produce conflicts correctly.

The fix brings conflict resolution rules similar to MDL-level checks to InnoDB,
thus accounting for the problematic case.

Apart from that, wsrep_thd_is_SR() is patched to return true only for executing
SR transactions. It should be safe as any other SR state is either the same as
for any single write set (thus making the two logically equivalent), or it reflects
an SR transaction as being aborting or prepared, which is handled separately in
BF-aborting logic, and for regular execution path it should not matter at all.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-09-24 11:14:01 +02:00
Marko Mäkelä
971cf59579 Merge 10.6 into 10.11 2024-09-24 08:49:20 +03:00
Marko Mäkelä
638c62acac MDEV-34983: Remove x86 asm from InnoDB
Starting with GCC 7 and clang 15, single-bit operations such as
fetch_or(1) & 1 are translated into 80386 instructions such as
LOCK BTS, instead of using the generic translation pattern
of emitting a loop around LOCK CMPXCHG.

Given that the oldest currently supported GNU/Linux distributions
ship GCC 7, and that older versions of GCC are out of support,
let us remove some work-arounds that are not strictly necessary.
If someone compiles the code using an older compiler, it will work
but possibly less efficiently.

srw_mutex_impl::HOLDER: Changed from 1U<<31 to 1 in order to
work around https://github.com/llvm/llvm-project/issues/37322
which is specific to setting the most significant bit.

srw_mutex_impl::WAITER: A multiplier of waiting requests.
This used to be 1, which would now collide with HOLDER.

fil_space_t::set_stopping(): Remove this unused function.

In MSVC we need _interlockedbittestandset() for LOCK BTS.
2024-09-23 12:51:27 +03:00
Daniel Black
ac5cbaff66 Aria - correct type
Aria transaction ids are uint16 rather than uint.

Change the type to be more accurate.
2024-09-21 09:11:02 +10:00
mariadb-DebarunBanerjee
35d477dd1d MDEV-34453 Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1
The issue is caused by a race between buf_page_create_low getting the
page from buffer pool hash and buf_LRU_free_page evicting it from LRU.

The issue is introduced in 10.6 by MDEV-27058
commit aaef2e1d8c
MDEV-27058: Reduce the size of buf_block_t and buf_page_t

The solution is buffer fix the page before releasing buffer pool mutex
in buf_page_create_low when x_lock_try fails to acquire the page latch.
2024-09-20 20:26:43 +05:30
Marko Mäkelä
9ea7f7129a MDEV-34909 DDL hang during SET GLOBAL innodb_log_file_size on PMEM
log_t::persist(): Add a parameter holding_latch to specify
whether the caller is already holding exclusive log_sys.latch,
like log_write_and_flush() always is.
2024-09-20 15:29:56 +03:00
Lena Startseva
0a5e4a0191 MDEV-31005: Make working cursor-protocol
Updated tests: cases with bugs or which cannot be run
with the cursor-protocol were excluded with
"--disable_cursor_protocol"/"--enable_cursor_protocol"

Fix for v.10.5
2024-09-18 18:39:26 +07:00
Marko Mäkelä
7ea9e1358f Merge 11.2 into 11.4 2024-09-18 08:07:22 +03:00
Marko Mäkelä
e782e416ac Merge 10.11 into 11.2 2024-09-18 07:38:49 +03:00
Marko Mäkelä
64b75865d5 MDEV-34823 after-merge fix
btr_cur_t::search_leaf(): Remove a redundant condition.
This fixes up the merge commit cfa9784edb
2024-09-18 07:06:35 +03:00
Yuchen Pei
1f7f406b7b
Merge branch '11.2' into 11.4 2024-09-18 11:27:53 +10:00
Yuchen Pei
ff88633b9c
Merge branch '10.11' into 11.2 2024-09-18 10:45:26 +10:00
Yuchen Pei
cfa9784edb
Merge branch '10.11' into 11.2 2024-09-18 10:25:16 +10:00
Sergei Petrunia
5cd3fa81ef Merge 10.11 -> 11.2 2024-09-17 12:34:33 +03:00
Julius Goryavsky
f176248d4b Merge branch '10.6' into '10.11' 2024-09-17 06:23:10 +02:00
Julius Goryavsky
80fff4c6b1 Merge branch '10.5' into '10.6' 2024-09-16 16:39:59 +02:00
Marko Mäkelä
b187414764 Merge 10.6 into 10.11 2024-09-16 10:58:40 +03:00
Marko Mäkelä
4010dff058 mtr_t::log_file_op(): Fix -Wnonnull
GCC 12.2.0 could issue -Wnonnull for an unreachable call to
strlen(new_path).  Let us prevent that by replacing the condition
(type == FILE_RENAME) with the equivalent (new_path).
This should also optimize the generated code, because the life time
of the parameter "type" will be reduced.
2024-09-14 11:05:44 +03:00
Marko Mäkelä
e3f653ca66 MDEV-34750 fixup: -Wconversion on 32-bit
log_t::resize_write_buf(): If d<0 and d>-length, d will fit in ssize_t,
which is a signed 32-bit or 64-bit integer. Cast from int64_t to ssize_t
to make this clear and to silence a compiler warning.
2024-09-14 10:35:28 +03:00
Marko Mäkelä
762ad01c7f Merge 11.2 into 11.4 2024-09-13 13:09:23 +03:00
Marko Mäkelä
a74bea7ba9 MDEV-34879 InnoDB fails to merge the change buffer to ROW_FORMAT=COMPRESSED tables
buf_page_t::read_complete(): Fix an incorrect condition that had been
added in commit aaef2e1d8c (MDEV-27058).
Also for compressed-only pages we must remember that buffered changes
may exist.

buf_read_page(): Correct the function comment; this is for a synchronous
and not asynchronous read. Pass the parameter unzip=true to
buf_read_page_low(), because each of our callers will be interested in
the uncompressed page frame. This will cause the test
encryption.innodb-compressed-blob to emit more errors when the
correct keys for decrypting the clustered index root page are unavailable.

Reviewed by: Debarun Banerjee
2024-09-12 10:52:55 +03:00
Marko Mäkelä
f168050e90 MDEV-34791 fixup: Avoid an infinite loop with ROW_FORMAT=COMPRESSED
buf_pool_t::page_fix(): If a change buffer merge may be needed on a
ROW_FORMAT=COMPRESSED page that exists in compressed-only format in
the buffer pool, go ahead to decompress the block. This fixes an
infinite loop.

Reviewed by: Debarun Banerjee
2024-09-12 10:52:12 +03:00
Yuchen Pei
a8c5717223
Merge branch '10.6' into 10.11 2024-09-12 10:44:13 +10:00
Yuchen Pei
09b1269e4a
Merge branch '10.5' into 10.6 2024-09-12 10:17:51 +10:00
Monty
3ae4ecbfc5 MDEV-34867 engine S3 cause 500 error for huawei buckets
Add support for removing the Content-Type header to the S3 engine. This
is required for compatibility with some S3 providers.

This also adds a provider option to the S3 engine which will turn on
relevant compatibility options for specific providers.

This was required for getting MariaDB S3 engine to work with "Huawei
Cloud S3".
To get Huawei S3 storage to work on has set one of the following
S3 options:
s3_provider=Huawei
s3_ssl_no_verify=1

Author: Andrew Hutchings <andrew@mariadb.org>
2024-09-11 16:15:37 +03:00
Yuchen Pei
b168859d1e
Merge branch '10.6' into 10.11 2024-09-11 16:10:53 +10:00
Yuchen Pei
4a09e74387
Merge branch '10.5' into 10.6 2024-09-11 15:49:16 +10:00
Marko Mäkelä
f0de610d0c Merge 10.11 into 11.2 2024-09-10 18:35:16 +03:00
Yuchen Pei
fe3432b3bd
MDEV-28009 Deprecate spider_table_crd_thread_count and spider_table_sts_thread_count
These variables/parameters have the default read-only value of 1, and
the only way to change them is through a command line flag together
with a command line flag loading spider. After this change, the flag
will have no effect.
2024-09-10 14:48:59 +10:00
Yuchen Pei
cc0faa1e3e
MDEV-31788 Factor functions to reduce duplication around spider_check_and_init_casual_read in ha_spider.cc
factored out static functions:
- spider_prep_loop
- spider_start_bg
- spider_send_queries
2024-09-10 11:52:26 +10:00
Yuchen Pei
0ba97e4dc6
MDEV-31788 Factor out calls to spider_ping_table_mon_from_table in ha_spider.cc 2024-09-10 11:52:26 +10:00
Yuchen Pei
9e1579788f
MDEV-31788 Factor spider locking and unlocking code around sending queries 2024-09-10 11:52:22 +10:00
Yuchen Pei
84067291b4
MDEV-28360 Spider: remove #ifdef SPIDER_use_LEX_CSTRING_for_KEY_Field_name 2024-09-10 11:19:19 +10:00
Yuchen Pei
f5b7c25e1e
MDEV-27643 Spider: remove #ifdef HA_CAN_BULK_ACCESS 2024-09-10 11:19:19 +10:00
Yuchen Pei
e7570c7759
MDEV-31788 Remove spider_file_pos
They are for unnecessary debugging purposes only.
2024-09-10 11:19:18 +10:00
Yuchen Pei
a81f419b06
MDEV-27648 remove #define HASH_UPDATE_WITH_HASH_VALUE
The functions called in blocks protected by this macro remain
undefined as of 11.5 c96b23f994
2024-09-10 11:19:14 +10:00
Yuchen Pei
5d54e86c22
MDEV-26178 spider: delete spd_environ.h
It's virtually empty now
2024-09-10 11:15:18 +10:00
Yuchen Pei
869c501ac3
MDEV-27644 Spider: remove HANDLER_HAS_DIRECT_AGGREGATE 2024-09-10 11:15:18 +10:00
Yuchen Pei
3a58291680
MDEV-27662 remove SPIDER_SUPPORT_CREATE_OR_REPLACE_TABLE 2024-09-10 11:15:17 +10:00
Yuchen Pei
84977868b1
MDEV-27809 remove SPIDER_I_S_USE_SHOW_FOR_COLUMN
Show::Column() was added in MDEV-19772
4156b1a260
2024-09-10 11:15:17 +10:00
Yuchen Pei
6287fb6e17
MDEV-27652 remove #ifdef HA_HAS_CHECKSUM_EXTENDED
handler::pre_calculate_checksum was added in MDEV-16249
be5c432a42
2024-09-10 11:15:17 +10:00
Yuchen Pei
e8a5553cef
MDEV-27808 remove SPIDER_LIKE_FUNC_HAS_GET_NEGATED
get_negated() was introduced in MDEV-16707
2024-09-10 11:15:16 +10:00
Yuchen Pei
ab49b46d01
MDEV-27664 remove SPIDER_SQL_CACHE_IS_IN_LEX
sql_cache was moved to lex in MDEV-11953 in
de745ecf29
2024-09-10 11:15:16 +10:00
Yuchen Pei
a1e5ee9111
MDEV-27663 remove SPIDER_USE_CONST_ITEM_FOR_STRING_INT_REAL_DECIMAL_DATE_ITEM
{STRING|INT|REAL|DECIMAL|DATE}_ITEM were replaced with CONST_ITEM in
MDEV-14630 c20cd68e60
2024-09-10 11:15:16 +10:00
Yuchen Pei
5e98471df1
MDEV-27811: remove SPIDER_MDEV_16246
MDEV-16246 was fixed long ago. And this macro was removed in other
versions too
2024-09-10 11:15:15 +10:00
Yuchen Pei
8c8684b17f
MDEV-28226 Remove HANDLER_HAS_NEED_INFO_FOR_AUTO_INC
handler::need_info_for_auto_inc() was added in MDEV-7720 / MDEV-7726
in commit dc17ac1638
2024-09-10 11:15:15 +10:00
Yuchen Pei
affcb0713d
MDEV-26178 spider: remove PARTITION_HAS_GET_PART_SPEC
This macro is unused, and not in 11.5 c96b23f994
2024-09-10 11:15:15 +10:00
Yuchen Pei
6d0d09ebc2
MDEV-26178 Spider: remove HANDLER_HAS_TOP_TABLE_FIELDS
This macro is unused
2024-09-10 11:15:14 +10:00
Yuchen Pei
1cb75d9a33
MDEV-27660 Remove #ifdef SPIDER_HANDLER_START_BULK_INSERT_HAS_FLAGS
The flag argument was added to handler::start_bulk_insert() in
the MDEV-539 commit
ca2cdaad86
2024-09-10 11:15:14 +10:00
Yuchen Pei
aaba68ac1e
MDEV-28896 Spider: remove #ifdef SPIDER_UPDATE_ROW_HAS_CONST_NEW_DATA
new_data is const since at least 2017: a05a610d60
2024-09-10 11:15:14 +10:00
Yuchen Pei
f16c037753
MDEV-28895 Spider: remove #ifdef HANDLER_HAS_CAN_USE_FOR_AUTO_INC_INIT
handler has can_use_for_auto_inc_init() since at latest 2017:
dc17ac1638
2024-09-10 11:15:14 +10:00
Yuchen Pei
0650c87d9b
MDEV-27647 Spider: remove HANDLER_HAS_DIRECT_UPDATE_ROWS 2024-09-10 11:15:13 +10:00
Yuchen Pei
d5d65b948b
MDEV-26178 Spider: remove HA_EXTRA_HAS_HA_EXTRA_USE_CMP_REF
HA_EXTRA_USE_CMP_REF is undefined, and remains so as of 11.5
c96b23f994
2024-09-10 11:15:13 +10:00
Yuchen Pei
de3dd942c0
MDEV-28894 Spider: remove #ifdef HA_EXTRA_HAS_STARTING_ORDERED_INDEX_SCAN
HA_EXTRA_STARTING_ORDERED_INDEX_SCAN was added latest 2018:
921c5e9314
2024-09-10 11:15:13 +10:00
Yuchen Pei
64581c83e8
MDEV-28893 Spider: remove #ifdef SPIDER_NET_HAS_THD
net has thd since 2015 in 56aa19989f for MDEV-6152
2024-09-10 11:15:12 +10:00
Yuchen Pei
ba9bebd719
MDEV-28892 remove #ifdef SPIDER_Item_args_arg_count_IS_PROTECTED
arg_count was protected since 2015 in commit
afa1773439
2024-09-10 11:15:12 +10:00
Yuchen Pei
05fafaf82d
MDEV-27646 remove SPIDER_HAS_HASH_VALUE_TYPE
unifdef -DSPIDER_HAS_HASH_VALUE_TYPE -m storage/spider/spd_* storage/spider/ha_spider.* storage/spider/hs_client/*
2024-09-10 11:15:12 +10:00
Daniel Black
ccb4bc7754 MDEV-33894: Resurrect innodb_log_write_ahead_size (postfix)
os_file_log_maybe_unbuffered is now Linux only.

Aso the stat st structure only used in linux.

This avoids unused function/structure errors on FreeBSD.
2024-09-10 09:13:49 +10:00
Sergei Petrunia
2c3b298337 Merge 11.2 into 11.4 2024-09-09 14:40:02 +03:00
Sergei Petrunia
abd98336d2 Merge 10.11 -> 11.2 2024-09-09 13:50:38 +03:00
Thirunarayanan Balathandayuthapani
06ad31d4b6 MDEV-34855 Bootstrap hangs while shrinking the system tablespace
Reason:
=======
- During bootstrap, InnoDB shrinks the system tablespace while purge is
still active on system tablespace. This could lead to deadlock.

Fix:
====
Avoid System tablespace shrinking during bootstrap.
Move the shrinking logic of system tablespace after purge gets shutdown.
2024-09-09 14:34:14 +05:30
Marko Mäkelä
b7b2d2bde4 Merge 10.5 into 10.6 2024-09-09 11:30:30 +03:00
Yuchen Pei
d002b1f503
Merge branch '10.6' into 10.11 2024-09-09 11:34:19 +10:00
Marko Mäkelä
f9f92b480e Merge 10.6 into 10.11 2024-09-06 16:17:42 +03:00
Marko Mäkelä
f06060f5ed Cleanup: Remove the function dict_remove_db_name() 2024-09-06 14:31:55 +03:00
Marko Mäkelä
024a18dbcb MDEV-34823 Invalid arguments in ib_push_warning()
In the bug report MDEV-32817 it occurred that the function
row_mysql_get_table_status() is outputting a fil_space_t*
as if it were a numeric tablespace identifier.

ib_push_warning(): Remove. Let us invoke push_warning_printf() directly.

innodb_decryption_failed(): Report a decryption failure and set the
dict_table_t::file_unreadable flag. This code was being duplicated in
very many places. We return the constant value DB_DECRYPTION_FAILED
in order to avoid code duplication in the callers and to allow tail calls.

innodb_fk_error(): Report a FOREIGN KEY error.

dict_foreign_def_get(), dict_foreign_def_get_fields(): Remove.
This code was being used in dict_create_add_foreign_to_dictionary()
in an apparently uncovered code path. That ib_push_warning() call
would pass the integer i+1 instead of a pointer to NUL terminated
string ("%s"), and therefore the call should have resulted in a crash.

dict_print_info_on_foreign_key_in_create_format(),
innobase_quote_identifier(): Add const qualifiers.

row_mysql_get_table_error(): Replaces row_mysql_get_table_status().
Display no message on DB_CORRUPTION; it should be properly reported at
the SQL layer anyway.
2024-09-06 14:29:09 +03:00
Yuchen Pei
60b93cdd30
Merge branch '10.5' into 10.6 2024-09-06 13:52:57 +10:00
Libing Song
5bbda97111 MDEV-33853 Async rollback prepared transactions during binlog
crash recovery

Summary
=======
When doing server recovery, the active transactions will be rolled
back by InnoDB background rollback thread automatically. The
prepared transactions will be committed or rolled back accordingly
by binlog recovery. Binlog recovery is done in main thread before
the server can provide service to users. If there is a big
transaction to rollback, the server will not available for a long
time.

This patch provides a way to rollback the prepared transactions
asynchronously. Thus the rollback will not block server startup.

Design
======
- Handler::recover_rollback_by_xid()
  This patch provides a new handler interface to rollback transactions
  in recover phase. InnoDB just set the transaction's state to active.
  Then the transaction will be rolled back by the background rollback
  thread.

- Handler::signal_tc_log_recover_done()
  This function is called after tc log is opened(typically binlog opened)
  has done. When this function is called, all transactions will be rolled
  back have been reverted to ACTIVE state. Thus it starts rollback thread
  to rollback the transactions.

- Background rollback thread
  With this patch, background rollback thread is defered to run until binlog
  recovery is finished. It is started by innobase_tc_log_recovery_done().
2024-09-05 21:19:25 +03:00
Daniel Black
8024b8e4c1 MDEV-33091 pcre2 headers - handle columnstore
From e735cf2ed7cefb2af36f10f3cb47dfc060789df3, the PCRE_INCLUDES
changed to PCRE_INCLUDE_DIRS for consistency.

The columnstore module depends on the old name.

Create a mapping for the columnstore submodule.

10.6+ fix for submodule is:
* https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3304
2024-09-05 12:14:06 +10:00
Sergei Golubchik
b2ebe1cb7b MDEV-33091 pcre2 headers aren't found on Solaris
use pkg-config to find pcre2, if possible

rename PCRE_INCLUDES to use PKG_CHECK_MODULES naming, PCRE_INCLUDE_DIRS
2024-09-05 12:14:06 +10:00
Marko Mäkelä
fe5829a121 MDEV-34446 SIGSEGV on SET GLOBAL innodb_log_file_size with memory-mapped log file
log_t::resize_write(): Advance log_sys.resize_lsn and reset
the resize_log offset to START_OFFSET whenever the memory-mapped buffer
would wrap around.

Previously, in case the initial target offset would be beyond the
requested innodb_log_file_size, we only adjusted the offset but
not the LSN. An incorrect LSN would cause log_sys.buf_free to be out
of bounds when the log resizing completes.

The log_sys.lsn_lock will cover the entire duration of replicating
memory-mapped log for resizing. We just need a mutex that is compatible
with the caller holding log_sys.latch. While the choice of mtr_t::finisher
(for normal log writes) depends on mtr_t::spin_wait_delay,
replicating the log during resizing is a rare operation where we can
afford possible additional context switching overhead.
2024-09-04 14:24:30 +03:00
Marko Mäkelä
a5b80531fb Merge 11.4 into 11.6 2024-09-04 10:38:25 +03:00
Marko Mäkelä
9f0b106631 MDEV-34845 buf_flush_buffer_pool(): Assertion `!os_aio_pending_reads()' failed
buf_flush_buffer_pool(): Wait for any pending asynchronous reads
to complete. This assertion failed in a run where buf_read_ahead_linear()
had been triggered in an SQL statement that was executed right
before shutdown.

Reviewed by: Debarun Banerjee
2024-09-03 18:22:10 +03:00
Marko Mäkelä
9878238f74 MDEV-34791: Redundant page lookups hurt performance
btr_cur_t::search_leaf(): When the index root page is also a leaf page,
we may need to upgrade our existing shared root page latch into an
exclusive latch. Even if we end up waiting, the root page won't be able
to go away while we hold an index()->lock. The index page may be split;
that is all.

btr_latch_prev(): Acquire the page latch while holding a buffer-fix
and an index tree latch. Merge the change buffer if needed. Use
buf_pool_t::page_fix() for this special case instead of complicating
buf_page_get_low() and buf_page_get_gen().

row_merge_read_clustered_index(): Remove some code that does not seem
to be useful. No difference was observed with regard to removing this
code when a CREATE INDEX or OPTIMIZE TABLE statement was run concurrently
with sysbench oltp_update_index --tables=1 --table_size=1000 --threads=16.

buf_pool_t::unzip(): Decompress a ROW_FORMAT=COMPRESSED page.

buf_pool_t::page_fix(): Handle also ROW_FORMAT=COMPRESSED pages
as well as change buffer merge. Optionally return an error.
Add a flag for suppressing a page latch wait and a special return
value -1 to indicate that the call would block.
This is the preferred way of buffer-fixing blocks.
The functions buf_page_get_gen() and buf_page_get_low() are only being
invoked with rw_latch=RW_NO_LATCH in operations on SPATIAL INDEX.

buf_page_t: Define some static functions for interpreting state().

buf_page_get_zip(), buf_read_page(),
buf_read_ahead_random(), buf_read_ahead_linear():
Remove the redundant parameter zip_size. We must look up the
tablespace and can invoke fil_space_t::zip_size() on it.

buf_page_get_low(): Require mtr!=nullptr.

buf_page_get_gen(): Implement some lock downgrading during recovery.

ibuf_page_low(): Use buf_pool_t::page_fix() in a debug check.
We do wait for a page read here, because otherwise a debug assertion in
buf_page_get_low() in the test innodb.ibuf_delete could occasionally fail.

PageConverter::operator(): Invoke buf_pool_t::page_fix() in order
to possibly evict a block. This allows us to remove some
special case code from buf_page_get_low().
2024-09-03 14:15:57 +03:00
Marko Mäkelä
44733aa8cf Merge 11.2 into 11.4 2024-08-29 19:10:38 +03:00
Marko Mäkelä
e91a799458 Merge 10.11 into 11.2 2024-08-29 16:02:57 +03:00
Marko Mäkelä
984606d747 MDEV-34750 SET GLOBAL innodb_log_file_size is not crash safe
The recent commit 4ca355d863 (MDEV-33894)
caused a serious regression for online InnoDB ib_logfile0 resizing,
breaking crash-safety unless the memory-mapped log file interface is
being used. However, the log resizing was broken also before this.

To prevent such regressions in the future, we extend the test
innodb.log_file_size_online with a kill and restart of the server
and with some writes running concurrently with the log size change.
When run enough many times, this test revealed all the bugs that
are being fixed by the code changes.

log_t::resize_start(): Do not allow the resized log to start before
the current log sequence number. In this way, there is no need to
copy anything to the first block of resize_buf. The previous logic
regarding that was incorrect in two ways. First, we would have to
copy from the last written buffer (buf or flush_buf). Second, we failed
to ensure that the mini-transaction end marker bytes would be 1
in the buffer. If the source ib_logfile0 had wrapped around an odd number
of times, the end marker would be 0. This was occasionally observed
when running the test innodb.log_file_size_online.

log_t::resize_write_buf(): To adjust for the resize_start() change,
do not write anything that would be before the resize_lsn.
Take the buffer (resize_buf or resize_flush_buf) as a parameter.
Starting with commit 4ca355d863
we no longer swap buffers when rewriting the last log block.

log_t::append(): Define as a static function; only some debug
assertions need to refer to the log_sys object.

innodb_log_file_size_update(): Wake up the buf_flush_page_cleaner()
if needed, and wait for it to complete a batch while waiting for
the log resizing to be completed. If the current LSN is behind the
resize target LSN, we will write redundant FILE_CHECKPOINT records to
ensure that the log resizing completes. If the buf_pool.flush_list is
empty or the buf_flush_page_cleaner() is stuck for some reason, our wait
will time out in 5 seconds, so that we can periodically check if the
execution of SET GLOBAL innodb_log_file_size was aborted. Previously,
we could get into a busy loop here while the buf_flush_page_cleaner()
would remain idle.
2024-08-29 14:53:08 +03:00
Marko Mäkelä
cfcf27c6fe Merge 10.6 into 10.11 2024-08-29 07:47:29 +03:00
Marko Mäkelä
0e76c1ba94 Merge 10.5 into 10.6 2024-08-28 15:51:36 +03:00
Marko Mäkelä
1ff6b6f0b4 MDEV-34802 Recovery fails to note some log corruption
recv_recovery_from_checkpoint_start(): Abort startup due to log
corruption if we were unable to parse the entire log between
the latest log checkpoint and the corresponding FILE_CHECKPOINT record.

Also, reduce some code bloat related to log output and log_sys.mutex.

Reviewed by: Debarun Banerjee
2024-08-28 15:44:42 +03:00
Yuchen Pei
18d3f63a4e
MDEV-32627 Spider: use CONNECTION string in SQLDriverConnect
This is the CS part of the implementation of MENT-2070.
2024-08-28 16:43:07 +10:00
Marko Mäkelä
bda40ccb85 MDEV-34803 innodb_lru_flush_size is no longer used
In commit fa8a46eb68 (MDEV-33613)
the parameter innodb_lru_flush_size ceased to have any effect.

Let us declare the parameter as deprecated and additionally as
MARIADB_REMOVED_OPTION, so that there will be a warning written
to the error log in case the option is specified in the command line.

Let us also do the same for the parameter
innodb_purge_rseg_truncate_frequency
that was deprecated&ignored earlier in MDEV-32050.

Reviewed by: Debarun Banerjee
2024-08-28 07:18:03 +03:00
Marko Mäkelä
e7bb9b7c55 MDEV-24923 fixup: Correct a function comment 2024-08-27 18:06:24 +03:00
Marko Mäkelä
48becffd07 Merge 10.5 into 10.6 2024-08-27 08:52:10 +03:00
Yuchen Pei
58bc83e1a7
[fixup] Spider: Restored lines accidentally deleted in MDEV-32157
Also restored a change that resulted in off-by-one, as well as
appending the correctly indexed key_hint.
2024-08-27 15:36:39 +10:00
Marko Mäkelä
36ab75a498 MDEV-34515: Fix a bogus debug assertion
purge_sys_t::stop_FTS(): Fix an incorrect debug assertion that
commit d58734d781 added.
The assertion would fail if there had been prior invocations of
purge_sys.stop_SYS() without purge_sys.resume_SYS().
The intention of the assertion is to check that number of pending
stop_FTS() stays below 65536.
2024-08-27 07:27:24 +03:00
Marko Mäkelä
76f6b6d818 MDEV-34515: Reduce context switching in purge
Before this patch, the InnoDB purge coordinator task submitted
innodb_purge_threads-1 tasks even if there was not sufficient amount
of work for all of them. For example, if there are undo log records
only for 1 table, only 1 task can be employed, and that task had better
be the purge coordinator.

srv_purge_worker_task_low(): Split from purge_worker_callback().

trx_purge_attach_undo_recs(): Remove the parameter n_purge_threads,
and add the parameter n_work_items, to keep track of the amount of
work.

trx_purge(): Launch purge worker tasks only if necessary. The work of
one thread will be executed by this purge coordinator thread.

que_fork_scheduler_round_robin(): Merged to trx_purge().

Thanks to Vladislav Vaintroub for supplying a prototype of this.

Reviewed by: Debarun Banerjee
2024-08-26 12:23:17 +03:00
Marko Mäkelä
b7b9f3ce82 MDEV-34515: Contention between purge and workload
In a Sysbench oltp_update_index workload that involves 1 table,
a serious contention between the workload and the purge of history
was observed. This was the worst when the table contained only 1 record.

This turned out to be fixed by setting innodb_purge_batch_size=128,
which corresponds to the number of usable persistent rollback segments.
When we go above that, there would be contention between row_purge_poss_sec()
and the workload, typically on the clustered index page latch, sometimes
also on a secondary index page latch. It might be that with smaller
batches, trx_sys.history_size() will end up pausing all concurrent
transaction start/commit frequently enough so that purge will be able
to make some progress, so that there would be less contention on the
index page latches between purge and SQL execution.

In commit aa719b5010 (part of MDEV-32050)
the interpretation of the parameter innodb_purge_batch_size was slightly
changed. It would correspond to the maximum desired size of the
purge_sys.pages cache. Before that change, the parameter was referring to
a number of undo log pages, but the accounting might have been inaccurate.

To avoid a regression, we will reduce the default value to
innodb_purge_batch_size=127, which will also be compatible with
innodb_undo_tablespaces>1 (which will disable rollback segment 0).

Additionally, some logic in the purge and MVCC checks is simplified.
The purge tasks will make use of purge_sys.pages when accessing undo
log pages to find out if a secondary index record can be removed.
If an undo page needs to be looked up in buf_pool.page_hash, we will
merely buffer-fix it. This is correct, because the undo pages are
append-only in nature. Holding purge_sys.latch or purge_sys.end_latch
or the fact that the current thread is executing as a part of an
in-progress purge batch will prevent the contents of the undo page from
being freed and subsequently reused. The buffer-fix will prevent the
page from being evicted form the buffer pool. Thanks to this logic,
we can refer to the undo log record directly in the buffer pool page
and avoid copying the record.

buf_pool_t::page_fix(): Look up and buffer-fix a page. This is useful
for accessing undo log pages, which are append-only by nature.
There will be no need to deal with change buffer or ROW_FORMAT=COMPRESSED
in that case.

purge_sys_t::view_guard::view_guard(): Allow the type of guard to be
acquired: end_latch, latch, or no latch (in case we are a purge thread).

purge_sys_t::view_guard::get(): Read-only accessor to purge_sys.pages.

purge_sys_t::get_page(): Invoke buf_pool_t::page_fix().

row_vers_old_has_index_entry(): Replaced with row_purge_is_unsafe()
and row_undo_mod_sec_unsafe().

trx_undo_get_undo_rec(): Merged to trx_undo_prev_version_build().

row_purge_poss_sec(): Add the parameter mtr and remove redundant
or unused parameters sec_pcur, sec_mtr, is_tree. We will use the
caller's mtr object but release any acquired page latches before
returning.

btr_cur_get_page(), page_cur_get_page(): Do not invoke page_align().

row_purge_remove_sec_if_poss_leaf(): Return the value of PAGE_MAX_TRX_ID
to be checked against the page in row_purge_remove_sec_if_poss_tree().
If the secondary index page was not changed meanwhile, it will be
unnecessary to invoke row_purge_poss_sec() again.

trx_undo_prev_version_build(): Access any undo log pages using
the caller's mini-transaction object.

row_purge_vc_matches_cluster(): Moved to the only compilation unit that
needs it.

Reviewed by: Debarun Banerjee
2024-08-26 12:23:06 +03:00
Marko Mäkelä
d58734d781 MDEV-34520 purge_sys_t::wait_FTS sleeps 10ms, even if it does not have to
There were two separate Atomic_counter<uint32_t>, purge_sys.m_SYS_paused
and purge_sys.m_FTS_paused. In purge_sys.wait_FTS() we have to read both
atomically. We used to use an overkill solution for this, acquiring
purge_sys.latch and waiting 10 milliseconds between samples. To make
matters worse, the 10-millisecond wait was unconditional, which would
unnecessarily suspend the purge_coordinator_task every now and then.

It turns out that we can fold both "reference counts" into a single
Atomic_relaxed<uint32_t> and avoid the purge_sys.latch.
To assess whether std::memory_order_relaxed is acceptable, we should
consider the operations that read these "reference counts", that is,
purge_sys_t::wait_FTS(bool) and purge_sys_t::must_wait_FTS().

Outside debug assertions, purge_sys.must_wait_FTS() is only invoked in
trx_purge_table_acquire(), which is covered by a shared dict_sys.latch.
We would increment the counter as part of a DDL operation, but before
acquiring an exclusive dict_sys.latch. So, a
purge_sys_t::close_and_reopen() loop could be triggered slightly
prematurely, before a problematic DDL operation is actually executed.
Decrementing the counter is less of an issue; purge_sys.resume_FTS()
or purge_sys.resume_SYS() would mostly be invoked while holding an
exclusive dict_sys.latch; ha_innobase::delete_table() does it outside
that critical section. Still, this would only cause some extra wait in
the purge_coordinator_task, just like at the start of a DDL operation.

There are two calls to purge_sys_t::wait_FTS(bool): in the above mentioned
purge_sys_t::close_and_reopen() and in purge_sys_t::clone_oldest_view(),
both invoked by the purge_coordinator_task. There is also a
purge_sys.clone_oldest_view<true>() call at startup when no DDL operation
can be in progress.

purge_sys_t::m_SYS_paused: Merged into m_FTS_paused, using a new
multiplier PAUSED_SYS = 65536.

purge_sys_t::wait_FTS(): Remove an unnecessary sleep as well as the
access to purge_sys.latch. It suffices to poll purge_sys.m_FTS_paused.

purge_sys_t::stop_FTS(): Do not acquire purge_sys.latch.

Reviewed by: Debarun Banerjee
2024-08-26 12:22:44 +03:00
Marko Mäkelä
9db2b327d4 MDEV-34759: buf_page_get_low() is unnecessarily acquiring exclusive latch
buf_page_ibuf_merge_try(): A new, separate function for invoking
ibuf_merge_or_delete_for_page() when needed. Use the already requested
page latch for determining if the call is necessary. If it is and
if we are currently holding rw_latch==RW_S_LATCH, upgrading to an exclusive
latch may involve waiting that another thread acquires and releases
a U or X latch on the page. If we have to wait, we must recheck if the
call to ibuf_merge_or_delete_for_page() is still needed. If the page
turns out to be corrupted, we will release and fail the operation.
Finally, the exclusive page latch will be downgraded to the originally
requested latch.

ssux_lock_impl::rd_u_upgrade_try(): Attempt to upgrade a shared lock to
an update lock.

sux_lock::s_x_upgrade_try(): Attempt to upgrade a shared lock to
exclusive.

sux_lock::s_x_upgrade(): Upgrade a shared lock to exclusive.
Return whether a wait was elided.

ssux_lock_impl::u_rd_downgrade(), sux_lock::u_s_downgrade():
Downgrade an update lock to shared.
2024-08-23 13:27:50 +03:00
Monty
1f040ae048 MDEV-34043 Drastically slower query performance between CentOS (2sec) and Rocky (48sec)
One cause of the slowdown is because the ftruncate call can be much
slower on some systems.  ftruncate() is called by Aria for internal
temporary tables, tables created by the optimizer, when the upper level
asks Aria to delete the previous result set. This is needed when some
content from previous tables changes.

I have now changed Aria so that for internal temporary tables we don't
call ftruncate() anymore for maria_delete_all_rows().

I also had to update the Aria repair code to use the logical datafile
size and not the on-disk datafile size, which may contain data from a
previous result set.  The repair code is called to create indexes for
the internal temporary table after it is filled.
I also replaced a call to mysql_file_size() with a pwrite() in
_ma_bitmap_create_first().

Reviewer: Sergei Petrunia <sergey@mariadb.com>
Tester: Dave Gosselin <dave.gosselin@mariadb.com>
2024-08-21 22:47:29 +03:00
Thirunarayanan Balathandayuthapani
22b48bb393 MDEV-34756 Validation of new foreign key skipped if innodb_alter_copy_bulk=ON
- During copy algorithm, InnoDB should disable bulk insert
operation if the table has foreign key relation and foreign key
check is enabled.
2024-08-21 18:58:20 +05:30
Oleksandr Byelkin
492a7c2430 Merge branch '11.5' into 11.6 2024-08-21 15:13:47 +02:00
Oleksandr Byelkin
342fa29615 Merge branch '11.4' into 11.5 2024-08-21 11:52:54 +02:00
Oleksandr Byelkin
eb70e0d6e2 Merge branch '11.2' into 11.4 2024-08-21 09:30:54 +02:00
Oleksandr Byelkin
6197e6abc4 Merge branch '10.11' into 11.2 2024-08-21 07:58:46 +02:00
Oleksandr Byelkin
70afc62750 Merge branch '10.6' into 10.11 2024-08-20 10:00:39 +02:00
Marko Mäkelä
267c0fce56 Merge 10.5 into 10.6 2024-08-15 10:16:46 +03:00
Marko Mäkelä
e40dfcdd89 Fix clang++-19 -Wunused-but-set-variable 2024-08-15 10:13:49 +03:00
Marko Mäkelä
62bfcfd8b2 Merge 10.6 into 10.11 2024-08-14 11:36:52 +03:00
Marko Mäkelä
757c368139 Merge 10.5 into 10.6 2024-08-14 10:56:11 +03:00
Marko Mäkelä
4f8803c036 MDEV-34678 pthread_mutex_init() without pthread_mutex_destroy()
When SUX_LOCK_GENERIC is defined, the srw_mutex, srw_lock, sux_lock are
implemented based on pthread_mutex_t and pthread_cond_t.  This is the
only option for systems that lack a futex-like system call.

In the SUX_LOCK_GENERIC mode, if pthread_mutex_init() is allocating
some resources that need to be freed by pthread_mutex_destroy(),
a memory leak could occur when we are repeatedly invoking
pthread_mutex_init() without a pthread_mutex_destroy() in between.

pthread_mutex_wrapper::initialized: A debug field to track whether
pthread_mutex_init() has been invoked.  This also helps find bugs
like the one that was fixed by
commit 1c8af2ae53 (MDEV-34422);
one simply needs to add -DSUX_LOCK_GENERIC to the CMAKE_CXX_FLAGS
to catch that particular bug on the initial server bootstrap.

buf_block_init(), buf_page_init_for_read(): Invoke block_lock::init()
because buf_page_t::init() will no longer do that.

buf_page_t::init(): Instead of invoking lock.init(), assert that it
has already been invoked (the lock is vacant).

add_fts_index(), build_fts_hidden_table(): Explicitly invoke
index_lock::init() in order to avoid a pthread_mutex_destroy()
invocation on an uninitialized object.

srw_lock_debug::destroy(): Invoke readers_lock.destroy().

trx_sys_t::create(): Invoke trx_rseg_t::init() on all rollback segments
in order to guarantee a deterministic state for shutdown, even if
InnoDB fails to start up.

trx_rseg_array_init(), trx_temp_rseg_create(), trx_rseg_create():
Invoke trx_rseg_t::destroy() before trx_rseg_t::init() in order to
balance pthread_mutex_init() and pthread_mutex_destroy() calls.
2024-08-14 07:54:15 +03:00
Marko Mäkelä
12b01d740b MDEV-34707: BUF_GET_RECOVER assertion failure on upgrade
buf_page_get_gen(): Relax the assertion once more.
The LSN may grow by invoking ibuf_upgrade(), that is,
when upgrading files where innodb_change_buffering!=none was used.
The LSN may also have been recovered from a log that needs
to be upgraded to the current format.
2024-08-13 08:20:18 +03:00
Jan Lindström
cd8b8bb964 MDEV-34594 : Assertion `client_state.transaction().active()' failed in
int wsrep_thd_append_key(THD*, const wsrep_key*, int, Wsrep_service_key_type)

CREATE TABLE [SELECT|REPLACE SELECT] is CTAS and idea was that
we force ROW format. However, it was not correctly enforced
and keys were appended before wsrep transaction was started.

At THD::decide_logging_format we should force used stmt binlog
format to ROW in CTAS case and produce a warning if used
binlog format was not ROW.

At ha_innodb::update_row we should not append keys similarly
as in ha_innodb::write_row if sql_command is SQLCOM_CREATE_TABLE.
Improved error logging on ::write_row, ::update_row and ::delete_row
if wsrep key append fails.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-08-12 23:54:30 +02:00
Oleksandr Byelkin
f8a735d6d8 Merge branch '10.11' into mariadb-10.11.9 2024-08-09 08:53:20 +02:00
Oleksandr Byelkin
2e580dc2a8 Merge branch '10.6' into mariadb-10.6.19 2024-08-09 08:51:21 +02:00
Nikita Malyavin
25e2d0a6bb MDEV-34632 Assertion failed in handler::assert_icp_limitations
Assertion `table->field[0]->ptr >= table->record[0] &&
table->field[0]->ptr <= table->record[0] + table->s->reclength' failed in
handler::assert_icp_limitations.

table->move_fields has some limitations:
1. It cannot be used in cascade
2. It should always have a restoring pair.

Rule 1 is covered by assertions in handler::assert_icp_limitations
and handler::ptr_in_record (commit 30894fe9a9).

Rule 2 should be manually maintained with care. Hopefully, the rule 1 assertions
may sometimes help as well.

In ha_myisam::repair, both rules are broken. table->move_fields is used
asymmetrically there: it is set on every param->fix_record call
(i.e. in compute_vcols) but is restored only once, in the end of repair.

The reason to updating field ptr's for every call is that compute_vcols can
(supposedly) be called in parallel, that is, with the same table, but different
records.

The condition to "unmove" the pointers in ha_myisam::restore_vcos_after_repair
is incorrect, when stored vcols are available, and myisam stores a VIRTUAL field
if it's the only field in the table (the record cannot be of zero length).

This patch solves the problem by "unmoving" the pointers symmetrically, in
compute_vcols. That is, both rules will be preserved maintained.
2024-08-07 14:50:19 +02:00
Yuchen Pei
337307a943 spider_string: use c_ptr_safe() instead of ptr() in error messages
Most other uses of ptr() are accompanied with a length, so we leave
them alone.
2024-08-07 12:23:46 +02:00
Sergei Petrunia
ba0d8aeffa Fix rocksdb.unique_check: do not have two threads waiting on the same name 2024-08-07 11:26:15 +03:00
Yuchen Pei
fa8ce92cc0
MDEV-34682 Return the return value of ddl recovery done in ha_initialize_handlerton
Otherwise it could cause false negative when ddl recovery done is part
of the plugin initialization
2024-08-07 15:13:08 +10:00
Yuchen Pei
bce3f3f628
MDEV-34682 Reset spider_hton_ptr in error mode of spider_db_init() 2024-08-07 15:10:56 +10:00
Oleksandr Byelkin
d6444022ca Merge branch 'bb-11.5-release' into bb-11.6-release 2024-08-06 17:28:38 +02:00
Oleksandr Byelkin
ea75a0b600 Merge branch '11.4' into 11.5 2024-08-05 17:50:18 +02:00
Sergei Golubchik
6f357feaf0 suppress rocksdb warning 2024-08-04 17:28:03 +02:00
Oleksandr Byelkin
1640c9b06e Merge branch '11.2' into 11.4 2024-08-04 17:27:48 +02:00
Oleksandr Byelkin
dced6cbdb6 Merge branch '11.1' into 11.2 2024-08-03 09:50:16 +02:00
mariadb-DebarunBanerjee
e515e80773 MDEV-34689 Redo log corruption at high load
Issue: During mtr_t:commit, if there is not enough space available in
redo log buffer, we flush the buffer. During flush, the LSN lock is
released allowing other concurrent mtr to commit. After flush we
reacquire the lock but use the old LSN obtained before check. It could
lead to redo log corruption. As the LSN moves backwards with the
possibility of data loss and unrecoverable server if the server aborts
for any reason or if server is shutdown with innodb_fast_shutdown=2.
With normal shutdown, recovery fails to map the checkpoint LSN to
correct offset.

In debug mode it hits log0log.cc:863: lsn_t log_t::write_buf()
Assertion `new_buf_free == ((lsn - first_lsn) & write_size_1)' failed.

In release mode, after normal shutdown, restart fails.
[ERROR] InnoDB: Missing FILE_CHECKPOINT(8416546) at 8416546
[ERROR] InnoDB: Log scan aborted at LSN 8416546

Backup fails reading the corrupt redo log.
[00] 2024-07-31 20:59:10 Retrying read of log at LSN=7334851
[00] FATAL ERROR: 2024-07-31 20:59:11 Was only able to copy log from
7334851 to 7334851, not 8416446; try increasing innodb_log_file_size

Unless a backup is tried or the server is shutdown or killed
immediately, the corrupt redo part is eventually truncated and there
may not be any visible issues seen in release mode.

This issue was introduced by the following commit.

commit a635c40648
    MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit()

Fix: If we need to release latch and flush redo before writing mtr
logs, make sure to get the latest system LSN after reacquiring the
redo system latch.
2024-08-03 13:11:35 +05:30
Oleksandr Byelkin
80abd847da Merge branch '10.11' into 11.1 2024-08-03 09:32:42 +02:00
Oleksandr Byelkin
0e8fb977b0 Merge branch '10.6' into 10.11 2024-08-03 09:15:40 +02:00
Oleksandr Byelkin
8f020508c8 Merge branch '10.5' into 10.6 2024-08-03 09:04:24 +02:00
Thirunarayanan Balathandayuthapani
37119cd256 MDEV-29010 Table cannot be loaded after instant ALTER
Reason:
======
- InnoDB fails to load the instant alter table metadata from
clustered index while loading the table definition.
The reason is that InnoDB metadata blob has the column length
exceeds maximum fixed length column size.

Fix:
===
- InnoDB should treat the long fixed length column as variable
length fields that needs external storage while initializing
the field map for instant alter operation
2024-08-01 18:58:43 +05:30
Thirunarayanan Balathandayuthapani
533e6d5d13 MDEV-34670 IMPORT TABLESPACE unnecessary traverses tablespace list
Problem:
========
- After the commit ada1074bb1 (MDEV-14398)
fil_crypt_set_encrypt_tables() iterates through all tablespaces to
fill the default_encrypt tables list. This was a trigger to
encrypt or decrypt when key rotation age is set to 0. But import
tablespace does call fil_crypt_set_encrypt_tables() unnecessarily.
The motivation for the call is to signal the encryption threads.

Fix:
====
ha_innobase::discard_or_import_tablespace: Remove the
fil_crypt_set_encrypt_tables() and add the import tablespace
to the default encrypt list if necessary
2024-07-31 14:13:38 +05:30
Thirunarayanan Balathandayuthapani
ee5f7692d7 MDEV-34357 InnoDB: Assertion failure in file ./storage/innobase/page/page0zip.cc line 4211
During InnoDB root page split, InnoDB does the following
1) First move the root records to the new page(p1)
2) Empty the root, insert the node pointer to the root page
3) Split the new page and make it as child nodes.
4) Finds the split record, allocate another new page(p2)
to the index
5) InnoDB stores the record(ret) predecessor to the supremum
record of the page (p2).
6) In page_copy_rec_list_start(), move the records from p1 to p2
upto the split record
6) Given table is a compressed row format page, InnoDB attempts to
compress the page p2 and failed (due to innodb_compression_level = 0)
7) Since the compression fails, InnoDB gets the number of preceding
records(ret_pos) of a record (ret) on the page (p2)
8) Page (p2) is a new page, ret points to infimum record.
ret_pos can be 0. InnoDB have wrong condition that ret_pos shouldn't
be 0 and returns corruption. InnoDB has similar wrong check in
page_copy_rec_list_end()
2024-07-30 14:36:34 +05:30
Marko Mäkelä
1c8af2ae53 MDEV-34422 Corrupted ib_logfile0 due to uninitialized log_sys.lsn_lock
In commit bf0b82d24b (MDEV-33515)
the function log_t::init_lsn_lock() was removed. This was fine on
those platforms where InnoDB uses futex-based mutexes (Linux, FreeBSD,
OpenBSD, NetBSD, DragonflyBSD).

Dave Gosselin debugged this on Apple macOS and submitted a fix where
pthread_mutex_wrapper::pthread_mutex_wrapper() would invoke init().
We do not really need that; we only need to invoke lsn_lock.init()
like we used to do before commit bf0b82d24b.
This should be a no-op for the futex based mutexes, which intentionally
rely on zero initialization.

The missing pthread_mutex_init() call would cause race conditions
and corruption of log_sys.buf because multiple threads could
apparently hold log_sys.lsn_lock concurrently in
log_t::append_prepare().  The error would be caught by a debug
assertion in log_t::write_buf(), or in non-debug builds by the
fact that the server cannot be restarted due to an apparently
missing FILE_CHECKPOINT record (because it had been written
to wrong offset in log_sys.buf).

The failure in log_t::append_prepare() was caught on Microsoft Windows
after enabling SUX_LOCK_GENERIC and therefore forcing the use of
pthread_mutex_wrapper for the log_sys.lsn_lock.  It appears to be fine
to omit the pthread_mutex_init() call on GNU/Linux.

log_t::create(): Invoke lsn_lock.init().

log_t::close(): Invoke lsn_lock.destroy().

To better catch this kind of issues in the future by simply defining
SUX_LOCK_GENERIC on any platform, a separate debug instrumentation patch
will be applied to the 10.6 branch later.

Reviewed by: Debarun Banerjee
2024-07-30 11:58:02 +03:00
Thirunarayanan Balathandayuthapani
c038b3c05e MDEV-34181 Instant table aborts after discard tablespace
- commit 85db534731 (MDEV-33400)
retains the instantness in the table definition after discard
tablespace. So there is no need to assign n_core_null_bytes
during instant table preparation unless they are not
initialized.
2024-07-30 13:31:43 +05:30
Thirunarayanan Balathandayuthapani
cc8eefb0dc MDEV-33087 ALTER TABLE...ALGORITHM=COPY should build indexes more efficiently
- During copy algorithm, InnoDB should use bulk insert operation
for row by row insert operation. By doing this, copy algorithm
can effectively build indexes. This optimization is disabled
for temporary table, versioning table and table which has
foreign key relation.

Introduced the variable innodb_alter_copy_bulk to allow
the bulk insert operation for copy alter operation
inside InnoDB. This is enabled by default

ha_innobase::extra(): HA_EXTRA_END_ALTER_COPY mode tries to apply
the buffered bulk insert operation, updates the non-persistent
table stats.

row_merge_bulk_t::write_to_index(): Update stat_n_rows after
applying the bulk insert operation

row_ins_clust_index_entry_low(): In case of copy algorithm,
switch to bulk insert operation.

copy_data_error_ignore(): Handles the error while copying
the data from source to target file.
2024-07-30 11:59:01 +05:30
Monty
4bf7c966b3 MDEV-34664: Add an option to fix InnoDB's doubling of secondary index cardinalities
(With trivial fixes by sergey@mariadb.com)
Added option fix_innodb_cardinality to optimizer_adjust_secondary_key_costs

Using fix_innodb_cardinality disables the 'divide by 2' of rec_per_key_int
in InnoDB that in effect doubles the Cardinality for secondary keys.
This has the biggest effect for indexes where a few rows has the same key
value. Using this may also cause table scans for very small tables (which
in some cases may be better than an index scan).

The user visible effect is that 'SHOW INDEX FROM table_name' will for
InnoDB show the true Cardinality (and not 2x the real value). It will
also allow the optimizer to chose a better index in some cases as the
division by 2 could have a bad effect for tables with 2-5 identical values
per key.

A few notes about using fix_innodb_cardinality:
- It has direct affect for SHOW INDEX FROM table_name. SHOW INDEX
  will also update the statistics in table share.
- The effect of fix_innodb_cardinality for query plans or EXPLAIN
  is only visible after first open of the table. This is why one must
  do a flush tables or use SHOW INDEX for the option to take effect.
- Using fix_innodb_cardinality can thus affect all user in their query
  plans if they are using the same tables.

Because of this, it is strongly recommended that one uses
optimizer_adjust_secondary_key_costs=fix_innodb_cardinality mainly
in configuration files to not cause issues for other users.
2024-07-29 16:40:53 +03:00
Marko Mäkelä
7e5c9ccda5 MDEV-34502 fixup: Do not cripple MSAN
We need to work around deficiencies of Valgrind, and apparently
the previous work-around attempts
(such as d247d64988) do not work
anymore, definitely not on recent clang-based compilers.

MemorySanitizer should be fine; unfortunately we set HAVE_valgrind for it
as well.
2024-07-29 15:04:16 +03:00
Marko Mäkelä
7ead48a72b MDEV-34458: Remove more traces of BTR_MODIFY_PREV
In commit 2f6df93748
we fixed an observed case of the bug by removing
some code related to the no longer needed
BTR_MODIFY_PREV mode.

In commit 73ad436e16
an alternative fix was applied that also fixes the
BTR_SEARCH_PREV case.

Let us clean up some implicit references to BTR_MODIFY_PREV
that were missed in 2f6df93748.

btr_pcur_move_backward_from_page(): Assume that the latch mode was
BTR_SEARCH_LEAF.

btr_pcur_move_to_prev(): Assert that the latch mode is BTR_SEARCH_LEAF.
This function is mostly invoked in row0sel.cc for read operations,
as well as in row0merge.cc for reading from the clustered index.
All callers indeed use a cursor in the BTR_SEARCH_LEAF mode.
2024-07-29 14:13:30 +03:00
Thirunarayanan Balathandayuthapani
3359ac09a4 MDEV-34066 Output of SHOW ENGINE INNODB STATUS uses the nanoseconds suffix for microseconds
- This issue is caused by commit e71e613353
(MDEV-24671). Change the output of transaction lock wait
time in microseconds suffix.
2024-07-23 21:36:13 +05:30
Oleksandr Byelkin
0fe39d368a Merge branch '10.6' into 10.11 2024-07-22 15:14:50 +02:00
Oleksandr Byelkin
88711ee509 New columnstore 23.10.2 2024-07-19 14:17:08 +02:00
Oleksandr Byelkin
9af2caca33 Merge branch '10.5' into 10.6 2024-07-18 16:25:33 +02:00
Sergei Golubchik
d20518168a also protect the /*!999999 sandbox comment 2024-07-17 21:25:40 +02:00
Sutou Kouhei
383d53edbc MDEV-21166 Add Mroonga initialized check to Mroonga UDFs
Mroonga UDFs can't be used without loading Mroonga.
2024-07-17 15:52:58 +10:00
Yuchen Pei
03a350378a
MDEV-30408 Reset explicit_limit in exists2in
Item_exists_subselect::fix_length_and_dec() sets explicit_limit to 1.
In the exists2in transformation it resets select_limit to NULL. For
consistency we should reset explicity_limit too.

This fixes a bug where spider table returns wrong results for queries
that gets through exists2in transformation when semijoin is off.
2024-07-17 08:54:40 +08:00
Yuchen Pei
8416fd323c
MDEV-32627 [fixup] Spider: Fix conn key length
To avoid off-by-one in spider_get_share.

And document conn key and conn key length.
2024-07-17 08:52:35 +08:00