Commit graph

28346 commits

Author SHA1 Message Date
Sergei Golubchik
9ddb94f60e MDEV-35104 Invalid (old?) table or database name upon DDL on table with vector key and unique key
InnoDB rename needs the same workaround for hlindexes
as it has for partitions
2024-11-05 14:00:51 -08:00
Sergei Golubchik
cdc7253787 make MyISAM and Aria report correct reflength to the server
MyISAM and Aria used to lie to the server about the reflength value.
One value was used internally, it was stored on disk, e.g. in indexes,
and couldn't be changed without full table rebuild. A differently
calculated value was reported to the server - that value was sometimes
larger than the true reflength.

That caused the server to allocate more memory per position than
necessary - affecting filesort, join buffer usage, optimizer cost
calculations, and may be more.
2024-11-05 14:00:51 -08:00
Sergei Golubchik
b3afd9f640 MDEV-35042 Vector indexes are allowed for MERGE tables, but do not
disallow hlindexes in MERGE - because we cannot create the secondary
table *in the same engine*
2024-11-05 14:00:51 -08:00
Sergei Golubchik
824a63852b MDEV-35043 Unsuitable error upon an attempt to create MEMORY table with vector key
MEMORY engine doesn't support blobs
2024-11-05 14:00:51 -08:00
Sergei Golubchik
9f80e3fbb7 MDEV-35032 streaming mode for mhnsw search
support SQL semantics for SELECT ... WHERE ... ORDER BY ... LIMIT

* switch from returning k nearest neighbors to returning
  as many as needed, in k-neighbor chunks, with increasing distance
* make search_layer() skips nodes that are closer than a threshold
* read_next keeps a search context - list of k found nodes,
  threshold, ctx, etc.
* when the list of found nodes is exhausted, it repeats the search
  starting from last found nodes and a threshold
* search context kepts ctx->refcount incremented, so ctx won't go away
* but commit_lock is unlocked between calls, so InnoDB can modify the table
* use ctx version to detect that, switch to MHNSW_Trx when it happens

bugfix:
* use the correct lock in ha_external_lock() for the graph table
* InnoDB didn't reset locks on ha_external_lock(F_UNLCK) and previous
  LOCK_X leaked into the next statement
2024-11-05 14:00:51 -08:00
Sergei Golubchik
ec2ff9f2a0 MDEV-35035 Assertion failure in ha_blackhole::position upon INSERT into blackhole table with vector index
let's allow ::position() and ::rnd_pos() in blackhole.
::position() can be called directly after insert, it doesn't need
a search to happen, so it's possible.

::rnd_pos() can be called with a value that ::position() produced,
so, possible too.
2024-11-05 14:00:50 -08:00
Sergei Golubchik
b44cde16cb MDEV-35037 Invalid (old?) table or database name 't#i#00' upon creating RocksDB table with vector index
disallow it, for now

also fixes

MDEV-35036 Assertion failure in myrocks::ha_rocksdb::position upon INSERT into RocksDB table with vector index
2024-11-05 14:00:50 -08:00
Sergey Vojtovich
f867c2a21e Disabled high-level indexes with Aria
... until a few bugs that cause server crash are fixed.
2024-11-05 14:00:50 -08:00
Sergei Golubchik
aed5928207 cleanup: extract transaction-related part of handlerton
into a separate transaction_participant structure

handlerton inherits it, so handlerton itself doesn't change.
but entities that only need to participate in a transaction,
like binlog or online alter log, use a transaction_participant
and no longer need to pretend to be a full-blown but invisible
storage engine which doesn't support create table.
2024-11-05 14:00:50 -08:00
Sergei Golubchik
126d6d787c cleanup: handlerton
remove unused methods, reorder methods, add comments
2024-11-05 14:00:50 -08:00
Sergei Golubchik
445198c10e pos-fixes for rename 2024-11-05 14:00:50 -08:00
Sergei Golubchik
25b4000290 InnoDB support for hlindexes and mhnsw
* mhnsw:
  * use primary key, innodb loves and (and the index cannot have dupes anyway)
    * MyISAM is ok with that, performance-wise
  * must be ha_rnd_init(0) because we aren't going to scan
    * MyISAM resets the position on ha_rnd_init(0) so query it before
    * oh, and use the correct handler, just in case
  * HA_ERR_RECORD_IS_THE_SAME is no error
* innodb:
  * return ref_length on create
  * don't assume table->pos_in_table_list is set
  * ok, assume away, but only for system versioned tables
* set alter_info on create (InnoDB needs to check for FKs)
* pair external_lock/external_unlock correctly
2024-11-05 14:00:49 -08:00
Sergei Golubchik
613542dceb mhnsw: build indexes with the columns of exactly right size 2024-11-05 14:00:49 -08:00
Sergei Golubchik
d6add9a03d initial support for vector indexes
MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
2024-11-05 14:00:48 -08:00
Sergei Golubchik
08a7f18b19 cleanup: init_tmp_table_share(bool thread_specific)
let the caller tell init_tmp_table_share() whether the table
should be thread_specific or not.

In particular, internal tmp tables created in the slave thread
are perfectly thread specific
2024-11-05 14:00:48 -08:00
Sergei Golubchik
44c6328cbb cleanup: thd->alloc<>() and thd->calloc<>()
create templates

  thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)

and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.
2024-11-05 14:00:48 -08:00
Sergei Golubchik
eff16d7593 Revert "MDEV-15458 Segfault in heap_scan() upon UPDATE after ADD SYSTEM VERSIONING"
This partially reverts 43623f04a9

Engines have to set ::position() after ::write_row(), otherwise
the server won't be able to refer to the row just inserted.
This is important for high-level indexes.

heap part isn't reverted, so heap doesn't support high-level indexes.
to fix this, it'll need info->lastpos in addition to info->current_ptr
2024-11-05 14:00:48 -08:00
Sergei Golubchik
07ec1a9e37 cleanup: unused function argument 2024-11-05 14:00:48 -08:00
Sergei Golubchik
1fe8a1bb76 cleanup: generalize ER_INNODB_NO_FT_TEMP_TABLE 2024-11-05 14:00:48 -08:00
Sergei Golubchik
062f8eb37d cleanup: key algorithm vs key flags
the information about index algorithm was stored in two
places inconsistently split between both.

BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user
explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not.

RTREE index had key->algorithm == HA_KEY_ALG_RTREE
and always had key->flags & HA_SPATIAL

FULLTEXT index had  key->algorithm == HA_KEY_ALG_FULLTEXT
and always had key->flags & HA_FULLTEXT

HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF

long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH

In this commit:

All indexes except BTREE and HASH always have key->algorithm
set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except
for storage to keep frms backward compatible).

As a side effect ALTER TABLE now detects FULLTEXT index renames correctly
2024-11-05 14:00:47 -08:00
Sergei Golubchik
32e6f8ff2e cleanup: remove unconditional #ifdef's 2024-11-05 14:00:47 -08:00
Sergei Golubchik
0cc01bde45 cleanup: pass TABLE_SHARE to store_key_options()
preparation for indexes that can be in TABLE_SHARE but not in TABLE
2024-11-05 14:00:47 -08:00
Sergei Golubchik
949fed514a cleanup: get_float convenience helper
more helpers like that can be added as needed
2024-11-05 14:00:47 -08:00
Sergei Golubchik
d046aca0c7 cleanup: CREATE_TYPELIB_FOR() helper 2024-11-05 14:00:47 -08:00
Sergei Golubchik
9fa31c1bd9 cleanup: spaces, casts, comments 2024-11-05 14:00:47 -08:00
Vladislav Vaintroub
7a62b029b3 post-merge cleanup - remove copy&paste code in fil_node_t::find_metadata 2024-11-05 21:44:35 +01:00
Oleksandr Byelkin
26514346bd Merge branch '11.4' into mariadb-11.4.4 2024-11-04 08:58:46 +01:00
Oleksandr Byelkin
f2bb2ab58c Merge branch '10.6' into mariadb-10.6.20 2024-11-04 07:40:45 +01:00
Vlad Lesin
3734ff7c7e MDEV-34690 lock_rec_unlock_unmodified() causes deadlock
Post-push fix: row_vers_impl_x_locked() must be invoked under unlatched
lock_sys, the corresponding assertion was removed in MDEV-34466 and
was not restored in MDEV-34690. This fix restores it.
2024-10-31 12:16:21 +03:00
Oleksandr Byelkin
c770bce898 Merge branch '11.2' into 11.4 2024-10-30 15:11:17 +01:00
Marko Mäkelä
67c0fd2a41 MDEV-35289 innodb_fast_shutdown=0 might corrupt the system tablespace on 32-bit systems
inode_info::get_unused(): Do not try to assign 64 bits
into a potentially narrower variable.
2024-10-30 09:36:12 +02:00
Yuchen Pei
2d4551eef1
MDEV-34272 create server with odbc results in connection string
We expand the tgt_odbc_str fields in SPIDER_SHARE for ha support, and
add the corresponding field in spider_direct_sql.

We also update the messages in monitoring, as odbc SERVER will cause
the usual connection fields (specifically, the one for database) not
to be populated with corresponding SERVER fields.
2024-10-30 13:24:45 +11:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Oleksandr Byelkin
3d0fb15028 Merge branch '10.6' into 10.11 2024-10-29 15:24:38 +01:00
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
Teemu Ollakka
47dd617c7f MDEV-35265 wsrep.wsrep-recover, wsrep.wsrep-recover-v25 fail on assertion
The tests fail on assertion

    ut_ad(!wsrep_is_wsrep_xid(&trx->xid));

in `innobase_recover_rollback_by_xid()`.

The fix is to avoid async rollback for prepared transactions
when wsrep is ON or wsrep recovery is in progress. The rationale
is that the rollback of prepared transactions must complete
before the node starts applying write sets after SST, or in
case of wsrep recovery, the recovery must complete before the
process exists.

Change the assertion into stronger one

    ut_ad(!(WSREP_ON || wsrep_recovery));

to catch if the async rollback codepath is taken when wsrep is
enabled.
2024-10-29 12:15:53 +02:00
Thirunarayanan Balathandayuthapani
db3be9b434 MDEV-35237 Bulk insert fails to apply buffered operation during CREATE..SELECT statement
Problem:
=======
- InnoDB fails to write the buffered insert operation during
create..select operation. This happens when bulk_insert
in transaction is reset to false while unlocking a source table.

Fix:
===
- InnoDB should apply the previous buffered changes to
all tables if we encounter any statement other than
pure INSERT or INSERT..SELECT statement in
ha_innobase::external_lock() and start_stmt().

- Remove the function bulk_insert_apply_for_table()

start_stmt(), external_lock(): Assert that trx->duplicates
should be enabled during bulk insert operation
2024-10-29 15:03:23 +05:30
Marko Mäkelä
decdd4bf49 MDEV-29015/MDEV-29260/MDEV-34938: os_file_get_size() WSL work-around
When MariaDB Server is run in a container under
Windows Subsystem for Linux, the fstat(2) system calls that InnoDB
invokes in os_file_set_size() or os_file_get_size() are causing a
failure in case the file had been renamed in the past while the file
handle was open. This affects at least ALTER TABLE and OPTIMIZE TABLE.

os_file_get_size(): Invoke lseek(2) instead of fstat(2). We do not mind
if the file pointer is moving to the end of the file, because InnoDB
exclusively invokes positioned reads and writes, or in some rare cases,
appends to an existing file.

os_file_set_size(): Invoke os_file_get_size() instead of fstat(2).
Define the POSIX and Windows versions separately. Formerly, the
Windows version was called os_file_change_size_win32().

fil_node_t::read_page0(): Use os_file_get_size() to determine the
size, and do not crash on error.

fil_node_t::read_metadata(): Remove the non-Windows stat* parameter
and always invoke fstat(2) outside Windows, but do tolerate errors.
Because fstat(2) is more likely to fail than lseek(2), and this is
not time critical code, we can afford the extra lseek(2) system call.

Reviewed by: Vladislav Vaintroub
2024-10-24 16:08:56 +03:00
Vlad Lesin
8c7786e7d5 MDEV-34690 lock_rec_unlock_unmodified() causes deadlock
lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock()
or under a combination of lock_sys.rd_lock() + record locks hash table
cell latch. It also requests page latch to check if locked records were
changed by the current transaction or not.

Usually InnoDB requests page latch to find the certain record on the
page, and then requests lock_sys and/or record lock hash cell latch to
request record lock. lock_rec_unlock_unmodified() requests the latches
in the opposite order, what causes deadlocks. One of the possible
scenario for the deadlock is the following:

thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table
           cell latch, the latch is acquired;
thread 2 - purge thread acquires page latch and tries to remove
           delete-marked record, it invokes lock_update_delete(), which
           requests locks hash table cell latch, held by thread 1;
thread 1 - requests page latch, held by thread 2.

To fix it we need to release lock_sys.latch and/or lock hash cell latch,
acquire page latch and re-acquire lock_sys related latches.

When lock_sys.latch and/or lock hash cell latch are released in
lock_release_on_prepare() and lock_release_on_prepare_try(), the page on
which the current lock is held, can be merged. In this case the bitmap
of the current lock must be cleared, and the new lock must be added to
the end of trx->lock.trx_locks list, or bitmap of already existing lock
must be changed.

The new field trx_lock_t::set_nth_bit_calls indicates if new locks
(bits in existing lock bitmaps or new lock objects) were created during
the period when lock_sys was released in trx->lock.trx_locks list
iteration loop in lock_release_on_prepare() or
lock_release_on_prepare_try(). And, if so, we traverse the list again.

The block can be freed during pages merging, what causes assertion
failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page
get mode to it. That's why page_get_mode parameter was added to
btr_block_get() to pass BUF_GET_POSSIBLY_FREED from
lock_release_on_prepare() and lock_release_on_prepare_try() to
buf_page_get_gen().

As searching for id of trx, which modified secondary index record, is
quite expensive operation, restrict its usage for master. System variable
was added to remove the restriction for testing simplifying. The
variable exists only either for debug build or for build with
-DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the
probability of catching bugs for release build with RQG.

Note that the code, which does primary index lookup to find out what
transaction modified secondary index record, is necessary only when
there is no primary key and no unique secondary key on replica with row
based replication, because only in this case extra X locks on unmodified
records can be set during scan phase.

Reviewed by Marko Mäkelä.
2024-10-23 12:36:17 +03:00
Vlad Lesin
92180ad513 MDEV-34466 XA prepare don't release unmodified records for some cases
There is no need to exclude exclusive non-gap locks from the procedure
of locks releasing on XA PREPARE execution in
lock_release_on_prepare_try() after commit
17e59ed3aa (MDEV-33454), because
lock_rec_unlock_unmodified() should check if the record was modified
with the XA, and release the lock if it was not.

lock_release_on_prepare_try(): don't skip X-locks, let
lock_rec_unlock_unmodified() to process them.

lock_sec_rec_some_has_impl(): add template parameter for not acquiring
trx_t::mutex for the case if a caller already holds the mutex, don't
crash if lock's bitmap is clean.

row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): add new argument
to skip trx_t::mutex acquiring.

rw_trx_hash_t::validate_element(): don't acquire trx_t::mutex if the
current thread already holds it.

Thanks to Andrei Elkin for finding the bug.
Reviewed by Marko Mäkelä, Debarun Banerjee.
2024-10-23 12:36:17 +03:00
Marko Mäkelä
1cad1dbde6 MDEV-35235 innodb_snapshot_isolation=ON fails to signal transaction rollback
convert_error_code_to_mysql(): Treat DB_DEADLOCK and DB_RECORD_CHANGED
in the same way, that is, signal to the SQL layer that the transaction
had been rolled back.
2024-10-23 07:55:22 +03:00
Jan Lindström
b3be3c2157 MDEV-30653 : With wsrep_mode=REPLICATE_ARIA only part of mixed-engine transactions is replicated
Replication of non-transactional engines is experimental and
uses TOI. This naturally means that if there is open transaction
with transactional engine it's changes will be rolled back.

Fixed by adding error message if non-transactional engine
is part of multi-engine transaction with warning.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-10-23 04:00:52 +02:00
Alexander Barkov
0d17c540a5 MDEV-27277 Add a warning when max_sort_length is reached
Step#1: fixing the return type of strnxfrm() from size_t to this structure:

typedef struct
{
  size_t m_output_length;
  size_t m_source_length_used;
  uint m_warnings;
} my_strnxfrm_ret_t;
2024-10-22 21:42:53 +07:00
Marko Mäkelä
b38edd09ff MDEV-34830 fixup: Relax an assertion
This follows up 1067046b7f
2024-10-22 11:35:33 +03:00
Marko Mäkelä
1067046b7f MDEV-34830 fixup: Relax an assertion
It is possible that recv_sys.scanned_lsn is ahead of recv_sys.recovered_lsn
by a few 512-byte log blocks in case the last mini-transaction in the log
had not been written out completely before the server was killed.
This is occasionally the case when running the test
innodb.innodb-32k-crash.
2024-10-22 09:09:11 +03:00
Marko Mäkelä
bea4adcb5a MDEV-35225 Bogus debug assertion failures in innodb.innodb-32k-crash
log_sort_flush_list(): Correct some debug assertions that had been added in
commit 0d175968d1 (MDEV-31354).
The writes of some blocks may be completed and the oldest_modification()
set to 1 at any time.

The bogus assertion failures led to occasional failures of the test
innodb.innodb-32k-crash.
2024-10-22 09:07:57 +03:00
Vladislav Vaintroub
e8db5c8760 MDEV-35171 OS_FILE_NORMAL and OS_FILE_AIO are misleading
Removed 'purpose' parameter from os_file_create() and related functions.
Always use FILE_FLAG_OVERLAPPED when opening Windows files.

No performance regression was measured, nor there is any measurable
improvement.
2024-10-21 15:31:32 +02:00
Marko Mäkelä
7701ccb72d MDEV-35149 Race condition around SET GLOBAL innodb_lru_scan_depth
A debug assertion in buf_LRU_get_free_block() could fail if
SET GLOBAL innodb_lru_scan_depth is being executed during a workload
that involves allocating buffer pool pages.

buf_pool_t::LRU_scan_depth: Replaces srv_LRU_scan_depth.

buf_pool_t::flush_neighbors: Replaces srv_flush_neighbors.

innodb_buf_pool_update<T>(): Update a parameter of buf_pool
while holding buf_pool.mutex.
2024-10-21 10:08:58 +03:00
Thirunarayanan Balathandayuthapani
7f7d78bc18 MDEV-35183 ADD FULLTEXT INDEX unnecessarily DROPS FTS COMMON TABLES
- InnoDB fulltext rebuilds the FTS COMMON table while adding the
new fulltext index. This can be optimized by avoiding rebuilding
the FTS COMMON table in case of FTS COMMON TABLE already exists.

Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
2024-10-21 12:27:09 +05:30
Daniel Black
eb29190398 MDEV-34753 memory pressure - erroneous termination condition
The 'if (!m_abort) break' condition was inverted by accident.

Constrain the test case to environments where there is cgroupv2
runtime environment which is the same case that will pass a memory
pressure initialization.

Remove the explicit garbage_collection trigger as it hides the abnormal
termination error on the event loop for memory pressure. This
also means there is no support in non-cgroupv2 environments
(possibly some container environments).

As the trigger to memory pressure is via a different thread we
need to wait until a "[mM]emory pressure" log message is there to
know it has succeeded or failed.

Thanks Kristian Nielsen for noticing and review.
2024-10-19 17:20:27 +11:00