Commit graph

79728 commits

Author SHA1 Message Date
Sergei Golubchik
7aa28a2a54 MDEV-35354 InnoDB: Failing assertion: node->pcur->rel_pos == BTR_PCUR_ON upon LOAD DATA REPLACE with unique blob
restore erroneously changed line

followup for f2512c0fa8
2024-11-07 06:16:03 -08:00
Sergei Golubchik
f24ebbaa5c cleanup: main.loaddata_autocom_innodb 2024-11-07 06:12:54 -08:00
Alexander Barkov
b9f9d804f2 MDEV-28686 Assertion `0' in Type_handler_string_result::make_sort_key or unexpected result
The code in the can_eval_in_optimize() branch in
Item_func_pad::fix_length_and_dec() did not take into account
that the constant can be negative. So the function will return NULL.

This later crashed on DBUG_ASSERT() because a NOT NULL function returned NULL.

Adding set_maybe_null() into this branch if the constant is negative.
2024-11-06 15:45:59 +04:00
Alexander Barkov
4ded2cbe13 MDEV-31910 ASAN memcpy-param-overlap upon CONCAT in ORACLE mode
Fixing Item_func_concat_operator_oracle::val_str() to use
String::copy_or_move(), like Item_func_oracle::val_str() does.
2024-11-06 11:39:50 +04:00
Julius Goryavsky
db68eb69f9 MDEV-35344: post-fix correction for other galera tests 2024-11-06 04:59:10 +01:00
Jan Lindström
e4a3a11dcc MDEV-35344 : Galera test failure on galera_sync_wait_upto
Ignoring configured server_id should not be a warning because
correct configuration is documented. Changed message to info
level with more detailed message what was configured and
what will be actually used.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-11-06 04:59:10 +01:00
Jan Lindström
eb891b6a95 MDEV-35345 : Galera test failure on MW-402
Add missing have_debug[_sync].inc include.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-11-06 04:59:09 +01:00
Denis Protivensky
6d5fe9ed0d MDEV-28378: Don't hang trying to peek log event past the end of log
While applying CTAS log event, we peek the relay log to see if CTAS
contains inserted rows or if it's empty.
The peek function didn't check for end-of-file condition when tried to
get the next event from the log, and thus it hanged.

The fix includes checking for end-of-file while peeking for log events
and considering returned XID_EVENT value as a sign of an empty CTAS.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-11-06 04:59:09 +01:00
Sergei Golubchik
ebbbe9d960 MDEV-35319 ER_LOCK_DEADLOCK not detected upon DML on table with vector key, server crashes
cannot ignore the error in MHNSW_Share::acquire() - it could be a deadlock
signal, after which no further operations are allowed
2024-11-05 14:00:52 -08:00
Sergei Golubchik
574e18f80d MDEV-35308 NO_KEY_OPTIONS SQL mode has no effect on engine key options
hide INVISIBLE and engine field options under sql_mode=no_field_options
hide PARSER and engine key options under sql_mode=no_key_options
2024-11-05 14:00:52 -08:00
Sergei Golubchik
e5a5d2b78d MDEV-35214 Server crashes in FVectorNode::gref_len with insufficient mhnsw_max_cache_size
now with streaming (MDEV-35032) we cannot longer free MHNSW_Trx
at the end of the search. Cannot even free it at the end of the
mhnsw_insert, because there can be a search running (INSERT ... SELECT).

Let's do reference counting, even though it's a thread-local object.
2024-11-05 14:00:52 -08:00
Sergei Golubchik
cbc2812f80 MDEV-35287 ER_KEY_NOT_FOUND upon INSERT into InnoDB table with vector key under READ COMMITTED
InnoDB cannot enable internal bulk insert for hlindex tables
2024-11-05 14:00:52 -08:00
Sergei Golubchik
ad33ffc0b5 MDEV-35296 DESC does not work in ORDER BY with vector key
only user vector indexes for ORDER BY ... ASC
2024-11-05 14:00:52 -08:00
Sergei Golubchik
7feec30939 relax the XA recovery error
it's just a suggestion anyway, not a bullet-proof check,
let's not act as if it is
2024-11-05 14:00:52 -08:00
Sergei Golubchik
b09c8b03d7 MDEV-35244 Vector-related system variables could use better names
considering that users don't interact with MariaDB vector search directly,
but primarily use AI frameworks, we should use names familiar
to vector store connector writers and for AI framework users.
That is industry standard M and ef.

mhnsw_cache_size -> mhnsw_max_cache_size
mhnsw_distance_function -> mhnsw_default_distance
mhnsw_max_edges_per_node -> mhnsw_default_m
mhnsw_min_limit -> mhnsw_ef_search

inside CREATE TABLE:
max_edges_per_node -> m
distance_function -> distance
2024-11-05 14:00:52 -08:00
Sergei Golubchik
784becf3e1 MDEV-35267 Server crashes in _ma_reset_history upon altering on Aria table with vector key under lock
ALTER TABLE needs to open hlindex tables early enough, right after they
were created, so that cleanup after an error would see and delete them.

But they need to be external_lock-ed only in copy_data_between_tables,
after mysql_trans_prepare_alter_copy_data().

Let's move locking out of hlindex_open() into hlindex_lock()
2024-11-05 14:00:52 -08:00
Sergei Golubchik
5d9ebef41e MDEV-35258 Mariabackup does not work with MyISAM tables with vector keys
recognize *#i#* files in mariadb-backup
2024-11-05 14:00:52 -08:00
Sergei Golubchik
0b9bc6c3cd MDEV-35246 Vector search skips a row in the table
stronger condition in select_neighbors() to reject exact matches too
2024-11-05 14:00:52 -08:00
Sergey Vojtovich
d50663198c DDL recovery for high-level indexes 2024-11-05 14:00:52 -08:00
Sergey Vojtovich
883fb66cd4 MDEV-35130 Assertion fails in trx_t::check_bulk_buffer upon CREATE.. SELECT with vector key
Similarly to "ALTER TABLE fixes for high-level indexes", don't enable bulk
insert when issuing create ... insert into a table containing vector
index. InnoDB can't handle situation when bulk insert is enabled for
one table but disabled for another. We can't do bulk insert on vector
index as it does table updates currently.
2024-11-05 14:00:52 -08:00
Sergei Golubchik
f6de9a379a MDEV-34919 post-fix
* add Aria truncate checks
* do store_lock() with a correct TL_xxx level
* remove InnoDB workaround for missing store_lock (from MDEV-35032)
* don't start transaction in temp tables (for Aria, with a test case)
2024-11-05 14:00:52 -08:00
Sergey Vojtovich
1cc7ef52e3 MDEV-34919 Aria crashes with high-level (vector) indexes
Since high-level index tables do not participate in thr_multi_lock(), added
explicit call to THR_LOCK::start_trans(). This is needed mostly for Aria to
handle transaction logging.
2024-11-05 14:00:52 -08:00
Sergei Golubchik
72839c1435 MDEV-35245 SHOW CREATE TABLE produces unusable statement for vector fields with constant default value
print default values for binary types as binary strings
2024-11-05 14:00:52 -08:00
Sergei Golubchik
053bd80d43 MDEV-35230 ASAN errors upon reading from joined temptable views with vector type
fix Field_vector::get_copy_func() for the case when length_bytes differ

fix do_copy_vec() to not guess length_bytes but take it from the field
(for keys length_bytes is always 2 for any length)
2024-11-05 14:00:52 -08:00
Sergei Golubchik
7d081c1b83 MDEV-35223 REPAIR does not fix MyISAM table with vector key after crash recovery
resort to alter for repair too
2024-11-05 14:00:52 -08:00
Sergei Golubchik
e8cff8e829 MDEV-35219 Unexpected ER_DUP_KEY after OPTIMIZE on MyISAM table with vector key
in-engine optimize can break hlindexes. let's fallback to ALTER
2024-11-05 14:00:52 -08:00
Sergei Golubchik
8988decbfe MDEV-35220 Assertion `!item->null_value' failed upon VEC_TOTEXT call
don't forget to reset null_value for each row
2024-11-05 14:00:52 -08:00
Sergei Golubchik
14364b09b9 MDEV-35236 Assertion `(mem_root->flags & 4) == 0' failed in safe_lexcstrdup_root
followup for MDEV-35092
2024-11-05 14:00:52 -08:00
Sergei Golubchik
1a53048299 MDEV-35215 ASAN errors in Item_func_vec_fromtext::val_str upon VEC_FROMTEXT with an invalid argument 2024-11-05 14:00:52 -08:00
Sergei Golubchik
96eb66e5b3 MDEV-35205 Server crash in online alter upon concurrent ALTER and DML on table with vector field
test case
2024-11-05 14:00:52 -08:00
Sergei Golubchik
e020a3a2ce MDEV-35210 Vector type cannot store values which VEC_FromText produces and VEC_ToText accepts
let VEC_FromText validate that the vector l2squared isn't NaN.
VEC_ToText still prints everything.
2024-11-05 14:00:52 -08:00
Sergei Golubchik
f336b10bb1 MDEV-35212 Server crashes in Item_func_vec_fromtext::val_str upon query from empty table 2024-11-05 14:00:52 -08:00
Sergei Golubchik
2bec721316 MDEV-35203 ASAN errors or assertion failures in row_sel_convert_mysql_key_to_innobase upon query from table with usual key on vector field
add test
2024-11-05 14:00:52 -08:00
Sergei Golubchik
2e74a00d9d MDEV-35195 Assertion `tab->join->order' fails upon vector search with DISTINCT #2
MDEV-35337 Server crash or assertion failure in join_read_first upon using vector distance in group by

allow Item_func_distance to be not only in tab->join->order,
but alternatively in tab->join->group_list
2024-11-05 14:00:52 -08:00
Sergei Golubchik
926b339b93 MDEV-35194 non-BNL join fails on assertion
with streaming implemened mhnsw no longer needs to know
the LIMIT in advance. let's just cap it to avoid allocating
too much memory for the one step result set
2024-11-05 14:00:52 -08:00
Sergei Golubchik
597e34d000 MDEV-35213 Server crash or assertion failure upon query with high value of mhnsw_min_limit
mhnsw_min_limit must not be larger than candidates queue size
2024-11-05 14:00:52 -08:00
Sergei Golubchik
dd9a5dd5b5 MDEV-35204 mysqlbinlog --verbose fails on row events with vector type
test case
2024-11-05 14:00:52 -08:00
Sergei Golubchik
ed9fec0266 MDEV-35177 Unexpected ER_TRUNCATED_WRONG_VALUE_FOR_FIELD, diagnostics area assertion failures upon EITS collection with vector type 2024-11-05 14:00:52 -08:00
Sergei Golubchik
db10e5cf6c MDEV-35160 RBR does not work with vector type, ER_SLAVE_CONVERSION_FAILED 2024-11-05 14:00:52 -08:00
Sergei Golubchik
8f49fb8cc3 MDEV-35191 Assertion failure in Create_tmp_table::finalize upon DISTINCT with vector type
test only
2024-11-05 14:00:51 -08:00
Sergei Golubchik
cfbf065893 MDEV-35176 ASAN errors in Field_vector::store with optimizer_trace enabled 2024-11-05 14:00:51 -08:00
Sergei Golubchik
425aa95655 MDEV-35178 Assertion failure in Field_vector::store upon INSERT IGNORE with a wrong data 2024-11-05 14:00:51 -08:00
Sergei Golubchik
88adcbf35a MDEV-35182 crash in online_alter_end_trans with XA over vector indexes
ONLINE ALTER didn't expect XA PREPARE to fail.
Mark rollback on failed prepare with the XA_ROLLBACK_ONLY state,
detect that in ONLINE ALTER
2024-11-05 14:00:51 -08:00
Sergei Golubchik
5bde23990b MDEV-35159 Assertion `tab->join->select_limit < (~ (ha_rows) 0)' fails upon forcing vector key
init_from_binary_frm_image() wrongly assumed that
* if a table has primary key
* and it has the HA_PRIMARY_KEY_IN_READ_INDEX flag
* than ORDER BY any index automatically implies ORDER BY pk at the end,
   that is for an index (a,b,c) ORDER BY a,b,c means ORDER BY a,b,c,pk

which is wrong, it holds not for _any index_ but only for indexes
that can be used for ORDER BY.

So, don't do `field->part_of_sortkey= share->keys_in_use`
but introduce `sort_keys_in_use` and use that.
2024-11-05 14:00:51 -08:00
Sergei Golubchik
88119addff Vec_ToText was underestimating max_length of the result
switch to a more predictable, shorter, and more correct output
that is, print as many significant digits as necessary.
but not more (they'd be just zeros) and not less (it'd lose precision)
2024-11-05 14:00:51 -08:00
Sergei Golubchik
91720da9be MDEV-35158 Assertion `res->length() > 0 && res->length() % 4 == 0' fails upon increasing length of vector column 2024-11-05 14:00:51 -08:00
Sergei Golubchik
6634c88480 MDEV-35150 Column containing non-vector tables can be modified to VECTOR type without warnings 2024-11-05 14:00:51 -08:00
Sergei Golubchik
ca28761066 MDEV-35147 Inconsistent NULL handling in vector type 2024-11-05 14:00:51 -08:00
Sergei Golubchik
f274cf1c25 MDEV-35141 Server crashes in Field_vector::report_wrong_value upon statistic collection 2024-11-05 14:00:51 -08:00
Sergei Golubchik
78119d1ae5 MDEV-33410 VECTOR data type 2024-11-05 14:00:51 -08:00
Sergei Golubchik
b56ca29f89 MDEV-35105 Assertion `tab->join->order' fails upon vector search with DISTINCT
don't apply distinct optimization to order by a vector index
2024-11-05 14:00:51 -08:00
Sergei Golubchik
9ddb94f60e MDEV-35104 Invalid (old?) table or database name upon DDL on table with vector key and unique key
InnoDB rename needs the same workaround for hlindexes
as it has for partitions
2024-11-05 14:00:51 -08:00
Sergei Golubchik
7d9c0e4f62 MDEV-35092 Server crash, hang or ASAN errors in mysql_create_frm_image upon using non-default table options and system variables
extend the option_list expicitly on CREATE/ALTER, not implicitly
on parsing.
2024-11-05 14:00:51 -08:00
Sergei Golubchik
cdc7253787 make MyISAM and Aria report correct reflength to the server
MyISAM and Aria used to lie to the server about the reflength value.
One value was used internally, it was stored on disk, e.g. in indexes,
and couldn't be changed without full table rebuild. A differently
calculated value was reported to the server - that value was sometimes
larger than the true reflength.

That caused the server to allocate more memory per position than
necessary - affecting filesort, join buffer usage, optimizer cost
calculations, and may be more.
2024-11-05 14:00:51 -08:00
Sergei Golubchik
ea1e720391 MDEV-35078 Server crash or ASAN errors in mhnsw_insert
when adding a column or index that uses plugin-defined
sysvar-based options with ALTER TABLE ... ADD, the server
was using the default value of the sysvar, not the current one.

CREATE TABLE was correctly using the current sysvar value.

Fix it so that new columns/indexes added in ALTER TABLE ... ADD
would use a current sysvar value. Existing columns/indexes
in ALTER TABLE would keep using the default sysvar value
(unless they had an explicit value in frm).
2024-11-05 14:00:51 -08:00
Sergei Golubchik
855aefb7b5 mysqldump and mariadb-backup tests of vector indexes 2024-11-05 14:00:51 -08:00
Sergei Golubchik
eb4ab2ce8f MDEV-35061 XA PREPARE "not supported by the engine" from storage engine mhnsw, memory leak
disallow explicit XA PREPARE over mhnsw indexes
2024-11-05 14:00:51 -08:00
Sergei Golubchik
09cd817f5d MDEV-35060 Assertion failure upon DML on table with vector under lock 2024-11-05 14:00:51 -08:00
Sergei Golubchik
09889d417b MDEV-35055 ASAN errors in TABLE_SHARE::lock_share upon committing transaction after FLUSH on table with vector key
MHNSW_Trx cannot store a pointer to the TABLE_SHARE for the duration
of a transaction, because the share can be evicted from the cache any
time.

Use a pointer to the MDL_ticket instead, it is stored in the THD
and has a duration of MDL_TRANSACTION, so won't go away.

When we need a TABLE_SHARE - on commit - get a new one from tdc.
Normally, it'll be already in the cache, so it'd be fast.
We don't optimize for the edge case when TABLE_SHARE was evicted.
2024-11-05 14:00:51 -08:00
Sergei Golubchik
d396fb9226 MDEV-35021 Behavior for RTREE indexes changed, assertion fails
disallow USING RTREE for not SPATIAL index
2024-11-05 14:00:51 -08:00
Sergei Golubchik
b3afd9f640 MDEV-35042 Vector indexes are allowed for MERGE tables, but do not
disallow hlindexes in MERGE - because we cannot create the secondary
table *in the same engine*
2024-11-05 14:00:51 -08:00
Sergei Golubchik
0932c3a27e MDEV-35044 ALTER on a table with vector index attempts to bypass unsupported locking limitation, server crashes in THD::free_tmp_table_share
open secondary tables early enough for the cleanup on error to see
them and remove their underlying files
2024-11-05 14:00:51 -08:00
Sergei Golubchik
824a63852b MDEV-35043 Unsuitable error upon an attempt to create MEMORY table with vector key
MEMORY engine doesn't support blobs
2024-11-05 14:00:51 -08:00
Sergei Golubchik
9f80e3fbb7 MDEV-35032 streaming mode for mhnsw search
support SQL semantics for SELECT ... WHERE ... ORDER BY ... LIMIT

* switch from returning k nearest neighbors to returning
  as many as needed, in k-neighbor chunks, with increasing distance
* make search_layer() skips nodes that are closer than a threshold
* read_next keeps a search context - list of k found nodes,
  threshold, ctx, etc.
* when the list of found nodes is exhausted, it repeats the search
  starting from last found nodes and a threshold
* search context kepts ctx->refcount incremented, so ctx won't go away
* but commit_lock is unlocked between calls, so InnoDB can modify the table
* use ctx version to detect that, switch to MHNSW_Trx when it happens

bugfix:
* use the correct lock in ha_external_lock() for the graph table
* InnoDB didn't reset locks on ha_external_lock(F_UNLCK) and previous
  LOCK_X leaked into the next statement
2024-11-05 14:00:51 -08:00
Sergei Golubchik
be69716287 MDEV-35029 ASAN errors in Lex_ident<Compare_ident_ci>::is_valid_ident upon DDL on table with vector index
in ALTER TABLE or CREATE TABLE LIKE, a create a copy of key->option_list,
because it can be extended later on the thd->mem_root, so it has
to be a copy
2024-11-05 14:00:51 -08:00
Sergei Golubchik
a6499049af MDEV-35033 LeakSanitizer errors in my_malloc / safe_mutex_lazy_init_deadlock_detection / MHNSW_Context::alloc_node and alike
ctx wasn't released on errors
2024-11-05 14:00:51 -08:00
Sergei Golubchik
6837bb54f4 MDEV-35020 After a failed attempt to create vector index temporary file remains and prevents further operation 2024-11-05 14:00:50 -08:00
Sergei Golubchik
ec2ff9f2a0 MDEV-35035 Assertion failure in ha_blackhole::position upon INSERT into blackhole table with vector index
let's allow ::position() and ::rnd_pos() in blackhole.
::position() can be called directly after insert, it doesn't need
a search to happen, so it's possible.

::rnd_pos() can be called with a value that ::position() produced,
so, possible too.
2024-11-05 14:00:50 -08:00
Sergei Golubchik
8ac3f0b1d4 MDEV-35038 Server crash in Index_statistics::get_avg_frequency upon EITS collection for vector index
don't collect eits for vector indexes
2024-11-05 14:00:50 -08:00
Sergei Golubchik
0bd01f4a95 MDEV-35039 Number of indexes inside InnoDB differs from that defined in MariaDB after altering table with vector key
don't show table->s->total_keys to engine in inplace alter
2024-11-05 14:00:50 -08:00
Sergei Golubchik
8253650aaa MDEV-35006 Using varbinary as vector-storing column results in assertion failures
* hlindexes cannot be extended with pk
* hlindexes cannot be covering
2024-11-05 14:00:50 -08:00
Sergei Golubchik
fb04cad37e trying to stabilize floating-point tests 2024-11-05 14:00:50 -08:00
Sergey Vojtovich
f867c2a21e Disabled high-level indexes with Aria
... until a few bugs that cause server crash are fixed.
2024-11-05 14:00:50 -08:00
Sergey Vojtovich
ca17b68bb6 ALTER TABLE fixes for high-level indexes (iii)
quick_rm_table() expects .frm to exist when it removes high-level indexes.
For cases like ALTER TABLE t1 RENAME TO t2, ENGINE=other_engine .frm was
removed earlier.

Another option would be removing high-level indexes explicitly before the
first quick_rm_table() and skipping high-level indexes for subsequent
quick_rm_table(NO_FRM_RENAME).

But this suggested order may also help with ddl log recovery. That is
if we crash before high-level indexes are removed, .frm is going to
exist.
2024-11-05 14:00:50 -08:00
Sergey Vojtovich
7aa6bb3aa3 ALTER TABLE fixes for high-level indexes (ii)
Disable non-copy ALTER algorithms when VECTOR index is affected. Engines
are not supposed to handle high-level indexes anyway.

Also fixed misbehaving IF [NOT] EXISTS variants.
2024-11-05 14:00:50 -08:00
Sergey Vojtovich
a90fa3f397 ALTER TABLE fixes for high-level indexes (i)
Fixes for ALTER TABLE ... ADD/DROP COLUMN, ALGORITHM=COPY.

Let quick_rm_table() remove high-level indexes along with original table.

Avoid locking uninitialized LOCK_share for INTERNAL_TMP_TABLEs.

Don't enable bulk insert when altering a table containing vector index.
InnoDB can't handle situation when bulk insert is enabled for one table
but disabled for another. We can't do bulk insert on vector index as it
does table updates currently.
2024-11-05 14:00:50 -08:00
Sergei Golubchik
2ad9df8c9b VEC_Distance_Cosine() 2024-11-05 14:00:50 -08:00
Sergei Golubchik
2e1fcc6a80 rename VEC_Distance to VEC_Distance_Euclidean
and create a parent Item_func_vec_distance_common class
2024-11-05 14:00:50 -08:00
Sergei Golubchik
0da820cb12 mhnsw: use plugin index options and transaction_participant API 2024-11-05 14:00:50 -08:00
Sergei Golubchik
126d6d787c cleanup: handlerton
remove unused methods, reorder methods, add comments
2024-11-05 14:00:50 -08:00
Sergei Golubchik
8087cefc07 make rename test to work with InnoDB too 2024-11-05 14:00:50 -08:00
Sergei Golubchik
445198c10e pos-fixes for rename 2024-11-05 14:00:50 -08:00
Sergey Vojtovich
97e112fb82 VECTOR indexes support for RENAME TABLE
Rename high-level indexes along with a table.
2024-11-05 14:00:49 -08:00
Sergei Golubchik
ebcbed6d74 post-fixes for TRUNCATE
* fix the truncate-by-handler variant, used by InnoDB
* test that insert works after truncate, meaning graph table was emptied
* test that the vector index size is zero after truncate in MyISAM
2024-11-05 14:00:49 -08:00
Sergey Vojtovich
70575defb7 Fixed TRUNCATE TABLE against VECTOR indexes
This patch fixes only TRUNCATE by recreate variant, there seem to be no
reasonable engine that uses TRUNCATE by handler method for testing.

Reset index_cinfo so that mi_create is not confused by garbage passed via
index_file_name and sets MY_DELETE_OLD flag.

Review question: can we add a test case to make sure VECTOR index is empty
indeed?
2024-11-05 14:00:49 -08:00
Sergey Vojtovich
91a24ddc5d Test insert ... select with vector index 2024-11-05 14:00:49 -08:00
Sergey Vojtovich
4aa1968b89 Disable VECTOR indexes with partitioned tables 2024-11-05 14:00:49 -08:00
Sergey Vojtovich
7c16bba71d CREATE TABLE ... LIKE loses VECTOR index 2024-11-05 14:00:49 -08:00
Vicențiu Ciorbaru
eec1339f5d MDEV-32886 Vec_FromText and Vec_ToText
This commit introduces two utility functions meant to make working with
vectors simpler.

Vec_ToText converts a binary vector into a json array of numbers
(floats).
Vec_FromText takes in a json array of numbers and converts it into a
little-endian IEEE float sequence of bytes (4 bytes per float).
2024-11-05 14:00:49 -08:00
Sergei Golubchik
f44989ff0f UPDATE/DELETE post-fixes 2024-11-05 14:00:49 -08:00
Hugo Wen
0e2b9e7621 MDEV-33408 Initial support for vector DELETE and UPDATE
When the source row is deleted, mark the corresponding node in HNSW
index by setting `tref` to null. An index is added for the `tref` in
secondary table for faster searching of the to-be-marked nodes.

The nodes marked as deleted will still be used for search, but will not
be included in the final query results.

As skipping deleted nodes and not adding deleted nodes for new-inserted
nodes' neighbor list could impact the performance, we now only skip
these nodes in search results.

- for some reason the bitmap is not set for hlindex during the delete so
  I had to temporarily comment out one line

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
2024-11-05 14:00:49 -08:00
Sergei Golubchik
049d839350 mhnsw: inter-statement shared cache
* preserve the graph in memory between statements
* keep it in a TABLE_SHARE, available for concurrent searches
* nodes are generally read-only, walking the graph doesn't change them
* distance to target is cached, calculated only once
* SIMD-optimized bloom filter detects visited nodes
* nodes are stored in an array, not List, to better utilize bloom filter
* auto-adjusting heuristic to estimate the number of visited nodes
  (to configure the bloom filter)
* many threads can concurrently walk the graph. MEM_ROOT and Hash_set
  are protected with a mutex, but walking doesn't need them
* up to 8 threads can concurrently load nodes into the cache,
  nodes are partitioned into 8 mutexes (8 is chosen arbitrarily, might
  need tuning)
* concurrent editing is not supported though
* this is fine for MyISAM, TL_WRITE protects the TABLE_SHARE and the
  graph (note that TL_WRITE_CONCURRENT_INSERT is not allowed, because an
  INSERT into the main table means multiple UPDATEs in the graph)
* InnoDB uses secondary transaction-level caches linked in a list in
  in thd->ha_data via a fake handlerton
* on rollback the secondary cache is discarded, on commit nodes
  from the secondary cache are invalidated in the shared cache
  while it is exclusively locked
* on savepoint rollback both caches are flushed. this can be improved
  in the future with a row visibility callback
* graph size is controlled by @@mhnsw_cache_size, the cache is flushed
  when it reaches the threshold
2024-11-05 14:00:49 -08:00
Sergei Golubchik
5c2b7c6e7f mhnsw: configurable parameters
1. introduce alpha. the value of 1.1 is optimal, so hard-code it.

2. hard-code ef_construction=10, best by test

3. rename hnsw_max_connection_per_layer to mhnsw_max_edges_per_node
   (max_connection is rather ambiguous in MariaDB) and add a help text

4. rename hnsw_ef_search to mhnsw_min_limit and add a help text
2024-11-05 14:00:49 -08:00
Sergei Golubchik
25b4000290 InnoDB support for hlindexes and mhnsw
* mhnsw:
  * use primary key, innodb loves and (and the index cannot have dupes anyway)
    * MyISAM is ok with that, performance-wise
  * must be ha_rnd_init(0) because we aren't going to scan
    * MyISAM resets the position on ha_rnd_init(0) so query it before
    * oh, and use the correct handler, just in case
  * HA_ERR_RECORD_IS_THE_SAME is no error
* innodb:
  * return ref_length on create
  * don't assume table->pos_in_table_list is set
  * ok, assume away, but only for system versioned tables
* set alter_info on create (InnoDB needs to check for FKs)
* pair external_lock/external_unlock correctly
2024-11-05 14:00:49 -08:00
Sergei Golubchik
3ff7f04fd4 misc changes
* sysvars should be REQUIRED_ARG
* fix a mix of US and UK spelling (use US)
* use consistent naming
* work if VEC_DISTANCE arguments are in the swapped order (const, col)
* work if VEC_DISTANCE argument is NULL/invalid or wrong length
* abort INSERT if the value is invalid or wrong length
* store the "number of neighbors" in a blob in endianness-independent way
* use field->store(longlong, bool) not field->store(double)
* a lot more error checking everywhere
* cleanup after errors
* simplify calling conventions, remove reinterpret_cast's
* todo/XXX comments
* whitespaces
* use float consistently

memory management is still totally PoC quality
2024-11-05 14:00:48 -08:00
Vicențiu Ciorbaru
88839e71a3 Initial HNSW implementation
This commit includes the work done in collaboration with Hugo Wen from
Amazon:

    MDEV-33408 Alter HNSW graph storage and fix memory leak

    This commit changes the way HNSW graph information is stored in the
    second table. Instead of storing connections as separate records, it now
    stores neighbors for each node, leading to significant performance
    improvements and storage savings.

    Comparing with the previous approach, the insert speed is 5 times faster,
    search speed improves by 23%, and storage usage is reduced by 73%, based
    on ann-benchmark tests with random-xs-20-euclidean and
    random-s-100-euclidean datasets.

    Additionally, in previous code, vector objects were not released after
    use, resulting in excessive memory consumption (over 20GB for building
    the index with 90,000 records), preventing tests with large datasets.
    Now ensure that vectors are released appropriately during the insert and
    search functions. Note there are still some vectors that need to be
    cleaned up after search query completion. Needs to be addressed in a
    future commit.

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

As well as the commit:

    Introduce session variables to manage HNSW index parameters

    Three variables:

    hnsw_max_connection_per_layer
    hnsw_ef_constructor
    hnsw_ef_search

    ann-benchmark tool is also updated to support these variables in commit
    https://github.com/HugoWenTD/ann-benchmarks/commit/e09784e for branch
    https://github.com/HugoWenTD/ann-benchmarks/tree/mariadb-configurable

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

Co-authored-by: Hugo Wen <wenhug@amazon.com>
2024-11-05 14:00:48 -08:00
Sergei Golubchik
d6add9a03d initial support for vector indexes
MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
2024-11-05 14:00:48 -08:00
Sergei Golubchik
aa09cb3b11 open frm for DROP TABLE
needed to get partitioning and information about
secondary objects
2024-11-05 14:00:48 -08:00
Sergei Golubchik
1fe8a1bb76 cleanup: generalize ER_INNODB_NO_FT_TEMP_TABLE 2024-11-05 14:00:48 -08:00
Sergei Golubchik
fd69abe44f cleanup: generalize ER_SPATIAL_CANT_HAVE_NULL 2024-11-05 14:00:48 -08:00
Sergei Golubchik
062f8eb37d cleanup: key algorithm vs key flags
the information about index algorithm was stored in two
places inconsistently split between both.

BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user
explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not.

RTREE index had key->algorithm == HA_KEY_ALG_RTREE
and always had key->flags & HA_SPATIAL

FULLTEXT index had  key->algorithm == HA_KEY_ALG_FULLTEXT
and always had key->flags & HA_FULLTEXT

HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF

long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH

In this commit:

All indexes except BTREE and HASH always have key->algorithm
set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except
for storage to keep frms backward compatible).

As a side effect ALTER TABLE now detects FULLTEXT index renames correctly
2024-11-05 14:00:47 -08:00
Sergei Golubchik
44ff2f7831 reject invalid spatial key declarations in the parser 2024-11-05 14:00:47 -08:00
Sergei Golubchik
9fa31c1bd9 cleanup: spaces, casts, comments 2024-11-05 14:00:47 -08:00
Sergei Golubchik
4f4c5a2ba9 fix a typo and an old bug in prefschema.transaction test 2024-11-05 14:00:47 -08:00
Sergei Golubchik
70f000f1dc fix main.plugin_vars test to cleanup after itself 2024-11-05 14:00:46 -08:00
Sergei Golubchik
9ddac64188 make INFORMATION_SCHEMA.STATISTICS.COMMENT not nullable
as it can never be null (only "" or "disabled")
2024-11-05 14:00:46 -08:00
Sergei Golubchik
680bdb76a6 fix for 32bit 2024-11-05 14:00:46 -08:00
Vladislav Vaintroub
faf9e755ba MDEV-35109 fix test case
rpl_semi_sync_after_sync_coord_consistency fails on release compilation
2024-11-05 22:38:55 +01:00
Vladislav Vaintroub
37b7986467 Merge branch '10.5' into 10.6 2024-11-05 21:02:22 +01:00
Alexander Barkov
7741065936 MDEV-23895 Server crash, ASAN heap-buffer-overflow or Valgrind Invalid write in Item_func_rpad::val_str
Item_cache_int::val_str() and Item_cache_real::val_str() erroneously
used default_charset(). Fixing to return my_charset_numeric instead.
2024-11-05 12:36:08 +04:00
Oleg Smirnov
a914087fab MDEV-35307 Unexpected error WARN_SORTING_ON_TRUNCATED_LENGTH or assertion failure in diagnostics area #2
When strict mode is enabled, all warnings during `INSERT` are
converted to errors regardless of their actual severity.
`WARN_SORTING_ON_TRUNCATED_LENGTH` is not considered severe enough
to be elevated to the ERROR level, and this commit fixes that
2024-11-05 14:52:20 +07:00
Alexander Barkov
eb41c1171e MDEV-33942 View cuts off the end of string with the utf8 character set in INSERT function
Item_func_insert::fix_length_and_dec() incorrectly calculated max_length
when its collation.collation evaluated to my_charset_bin.

Fixing the code to calculate max_length in terms of octets rather
than in terms of characters when collation.collation is my_charset_bin.
2024-11-05 11:16:10 +04:00
Alexander Barkov
c2bf1d4781 MDEV-29552 LEFT and RIGHT with big value for parameter 'len' >0 return empty value in view
The code in max_length_for_string() erroneously returned 0
for huge numbers like 4294967295.

Rewriting the code in a more straightforward way.
2024-11-05 09:19:05 +04:00
Brandon Nesterenko
b07258a0d5 MDEV-35109: Semi-sync Replication stalling Primary using wait point=AFTER_SYNC
For a primary configured with wait_point=AFTER_SYNC, if two threads
T1 (binlogging through MYSQL_BIN_LOG::write()) and T2 were
binlogging at the same time, T1 could accidentally wait for its
semi-sync ACK using the binlog coordinates of T2. Prior to
MDEV-33551, this only resulted in delayed transactions, because all
transactions shared the same condition variable for ACK signaling.
However, with the MDEV-33551 changes, each thread has its own
condition variable to signal. So T1 could wait indefinitely when
either:
  1) T1's ACK is received but not T2's when T1 goes into
wait_after_sync(), because the ACK receiver thread has already
notified about the T1 ACK, but T1 was _actually_ waiting on T2's
ACK, and therefore tries to wait (in vain).

  2) T1 goes to wait_after_sync() before any ACKs have arrived. When
T1's ACK comes in, T1 is woken up; however, sees it needs to wait
more (because it was actually waiting on T2's ACK), and goes to wait
again (this time, in vain).

Note that the actual cause of T1 waiting on T2's binlog coordinates
is when MYSQL_BIN_LOG::write() would call
Repl_semisync_master::wait_after_sync(), the binlog offset parameter
was read as the end of MYSQL_BIN_LOG::log_file, which is shared
among transactions. So if T2 had updated the binary log _after_ T1
had released LOCK_log, but not yet invoked wait_after_sync(), it
would use the end of the binary log file as the binlog offset, which
was that of T2 (or any future transaction).

The fix in this patch ensures consistency between the binary log
coordinates a transaction uses between report_binlog_update() and
wait_after_sync().

Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2024-11-04 10:45:58 -07:00
Brandon Nesterenko
5290fa043b MDEV-35109 PREP: simulate_delay_semisync_slave_reply use debug_sync
This is a preparatory commit for MDEV-35109 to make its
testing code cleaner (and harden other tests too).

The DEBUG_DBUG point simulate_delay_semisync_slave_reply
up to this patch used my_sleep() to delay an ACK response,
but sleeps are prone to test failures on machines that
run tests when already having a heavy load (e.g. on
buildbot).

This patch changes this DEBUG_DBUG sleep to use DEBUG_SYNC
to coordinate exactly when a slave should send its reply,
which is safer and faster.

As DEBUG_SYNC can't be used while a server is shutting
down, to synchronize threads with SHUTDOWN WAIT FOR SLAVES
logic, we use and extend wait_for_pattern_in_file.inc to
wait for an informational error message in the logic to
indicate that the shutdown process has reached the
intended state (i.e. indicating that the shutdown has
been delayed to await semi-sync ACKs). Specifically, the
extensions are as follows:

 1. wait_for_pattern_in_file.inc is extended with parameter
    wait_for_pattern_count as a number that indicates the
    number of times a pattern should occur in the file before
    return control back to the calling script.

 2. search_for_pattern_in_file.inc is extended with parameter
    SEARCH_ABORT_IS_SUCCESS to inverse the error/success
    logic, so the SEARCH_ABORT condition can be used to
    indicate success, rather than error.
2024-11-04 10:45:58 -07:00
Oleksandr Byelkin
a37f71bd10 Merge branch '10.11' into mariadb-10.11.10 2024-11-04 07:42:26 +01:00
Oleksandr Byelkin
f2bb2ab58c Merge branch '10.6' into mariadb-10.6.20 2024-11-04 07:40:45 +01:00
Oleksandr Byelkin
ecdccddaae Merge branch '10.5' into mariadb-10.5.27 2024-11-04 07:35:28 +01:00
Alexander Barkov
e60fd6c204 MDEV-28767 Collation "binary" is not accepted for databases, tables, columns
MariaDB in a COLLATE clause supported 'binary' only as an identifier:
  COLLATE `binary`

Fixing the parser to understand 'binary' as a keyword:
  COLLATE binary

This is for MySQL compatibility.
2024-11-02 12:47:28 +04:00
Alexander Barkov
5d4a4d2091 Fixing main.type_timestamp failure with --view
The patch for
  MDEV-35250 Assertion `dec <= 6' failed in my_timestamp_binary_length
added a test which depends on
  MDEV-29534 In view FROM_UNIXTIME adds .000000 in the result

Adding --disable_view_protocol around the affected statements.
2024-11-02 12:46:27 +04:00
Sergei Golubchik
ac7fe8b214 fix main.selectivity_notembedded --view 2024-11-01 21:01:31 +01:00
Sergei Golubchik
0a3452cf83 MDEV-35229 fix the test for --view
also, enable it for --ps
2024-11-01 20:52:58 +01:00
Alexander Barkov
d661bc1552 MDEV-20944 Wrong result of LEAST() and ASAN heap-use-after-free in my_strnncollsp_simple / Item::temporal_precision on TIME()
The code tried to avoid String::copy() but did it in a wrong way,
so asan detected heap-use-after-free errors.
Removing the wrong optimization, using copy() instead.
2024-11-01 15:55:09 +04:00
Alexander Barkov
dd41be2a51 MDEV-29184 Assertion `0' in Item_row::illegal_method_call, Type_handler_row::Item_update_null_value, Item::update_null_value
- Moving the check_cols(1) test from fix_fields() to fix_length_and_dec().
  So the test is now done before the code calling val_decimal()
  in fix_length_and_dec().

- Removing Item_func_interval::fix_fields(), as it become equal
  to the inherited one.
2024-11-01 12:40:43 +04:00
Sergei Golubchik
947de4b1db print more digits for floating point options in in mariadbd --help 2024-11-01 08:58:43 +01:00
Monty
40810baffe MDEV-33144 Implement the Percona variable slow_query_log_always_write_time
This task is inspired by the Percona implementation of
slow_query_log_always_write_time.

This task implements the variable log_slow_always_query_time (name
matching other MariaDB variables using the slow query log). The
default value for the variable is 31536000, which makes MariaDB
compatible with older installations.

For queries with execution time longer than log_slow_always_query_time
the variables log_slow_rate_limit and log_slow_min_examined_row_limit
will be ignored and the query will be written to the slow query log
if there is no other limitations (like log_slow_filter etc).

Other things:
- long_query_time internal variable renamed to log_slow_query_time.
- More descriptive information for "log_slow_query_time".
2024-11-01 08:58:37 +01:00
Brandon Nesterenko
e9a502df08 Testing fix for rpl_semi_sync_cond_var_per_thd failure 2024-10-30 08:32:19 -06:00
Oleksandr Byelkin
c770bce898 Merge branch '11.2' into 11.4 2024-10-30 15:11:17 +01:00
Oleg Smirnov
bf9662f6fa MDEV-35275 Unexpected WARN_SORTING_ON_TRUNCATED_LENGTH or assertion failure in diagnostics area
MDEV-27277 added warnings on truncation during sorting for SELECTs
but did not for DML operations. However, UPDATEs and DELETEs may also
perform sorting and thus produce warnings. This commit fixes that
2024-10-30 18:47:11 +07:00
Alexander Barkov
556a40dce0 MDEV-35229 NOCOPY has become reserved word bringing wide incompatibility
This patch was suggested by Sergei Golubchik.

It reverts the second patch from the PR:

  commit fa5eeb4931
    Fixed ALTER TABLE NOCOPY keyword failure

and adds NOCOPY_SYM into keyword_func_sp_var_and_label.

The price is one extra shift/recuce conflict in yy_oracle.yy.
This should to tolerable.
2024-10-30 13:58:20 +04:00
Oleg Smirnov
c3a7a3c7a2 MDEV-34665 Simplify IN predicate processing for NULL-aware materialization involving only one column
It was found that unnecessary work of building Ordered_key structures
is being done when processing NULL-aware materialization for IN predicates
having only one column. In fact, the logic for that simplified case can be
expressed as follows.
Say we have predicate left_expr IN (SELECT <subq1>), where left_expr is
scalar(not a tuple).
Then
    if (left_expr is NULL) {
      if (subq1 produced any rows) {
        // note that we don't care if subq1 has produced
        // NULLs or not.
        NULL IN (<some values>) -> UNKNOWN, i.e. NULL.
      } else {
        NULL IN ({empty-set}) -> FALSE.
      }
    } else {
      // left_expr is a non-NULL value
      if (subq1 output has a match for left_expr) {
        left_expr IN (..., left_expr ...) -> TRUE
      } else {
        // no "known" matches.
        if (subq1 output has a NULL) {
          left_expr IN ( ... NULL ...) ->
           (NULL could have been a match or not)
           -> NULL.
        } else {
          // subq1 didn't produce any "UNKNOWNs" so
          // we're positive there weren't any matches
          -> FALSE.
        }
      }
    }

This commit introduces subselect_single_column_partial_engine class
implementing the logic described.

Reviewer: Sergei Petrunia <sergey@mariadb.com>
2024-10-30 16:48:36 +07:00
Alexander Barkov
4c7cfd2624 MDEV-34817 perfschema.lowercase_fs_off fails on buildbot
I pushed this patch in a mistake to 11.7 instead of 11.6.
Backporting from 11.7.

This is a workaround patch to make buildbot green.

Renaming databases from db1/DB2 to m33020_db1/m33020_DB1
to make them unique. So the garbage left by other tests
does not show up any more.

The real problem will be fixed under terms of:
  MDEV-35282 Performance schema does not clear package routines
2024-10-30 13:47:19 +04:00
Alexander Barkov
8c0a260a5b MDEV-35250 Assertion `dec <= 6' failed in my_timestamp_binary_length
The TIMESTAMP related code did not handle AUTO_SEC_PART_DIGITS.
FROM_UNIXTIME() sets its member 'decimals' to AUTO_SEC_PART_DIGITS.
So some scripts involving FROM_UNIXTIME() crashed on assert in debug
builds and returned unexpected results in release builds.
2024-10-30 11:09:21 +04:00
Alexander Barkov
a79f314f1b MDEV-34817 perfschema.lowercase_fs_off fails on buildbot
This is a workaround patch to make buildbot green.

Renaming databases from db1/DB2 to m33020_db1/m33020_DB1
to make them unique. So the garbage left by other tests
does not show up any more.

The real problem will be fixed under terms of:
  MDEV-35282 Performance schema does not clear package routines
2024-10-30 10:21:29 +04:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Aleksey Midenkov
cc183489da MDEV-27293 Allow converting a versioned table from implicit
to explicit row_start/row_end columns

In case of adding both system fields of same type (length, unsigned
flag) as old implicit system fields do the rename of implicit system
fields to the ones specified in ALTER, remove SYSTEM_INVISIBLE flag in
that case. Correct PERIOD clause must be specified in ALTER as well.

MDEV-34904 Inplace alter for implicit to explicit versioning is broken

Whether ALTER goes inplace and how it goes inplace depends on
handler_flags which goes from alter_info->flags by this logic:

  ha_alter_info->handler_flags|= (alter_info->flags & ~flags_to_remove);

ALTER_VERS_EXPLICIT was not in flags_to_remove and its value (1ULL <<
35) clashed with ALTER_ADD_NON_UNIQUE_NON_PRIM_INDEX.

ALTER_VERS_EXPLICIT must not affect inplace, it is SQL-only so we
remove it from handler_flags.
2024-10-29 17:46:40 +03:00
Oleksandr Byelkin
3d0fb15028 Merge branch '10.6' into 10.11 2024-10-29 15:24:38 +01:00
Sergei Golubchik
5e5c3c7cb6 post-merge changes
* remove duplicate test file
* move all uuidv7 tests into plugin/type_uuid/mysql-test/type_uuid/
* remove mysys/ changes
* auto my_random_bytes() fallback - removes duplicate code from uuid,
  and fixes all other users of my_random_bytes() that don't check
  the return value (because, perhaps, they don't need crypto-strong
  random bytes)
* End of 11.6 -> 11.7 in tests
* clarify the warning text
* UUID_VERSION_MASK()/UUID_VARIANT_MASK() must not depend on the version
* allow 4x more monotonic uuidv7 per millisecond - instead of stretching
  1000 microseconds over 12 bits, let's use extra 2 bits as a counter
* rename for compatibility with Percona Server (uuid_v4, uuid_v7)
2024-10-29 14:47:32 +01:00
StefanoPetrilli
2fe269fdcb MDEV-32637 Implement native UUID7 function 2024-10-29 14:47:32 +01:00
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
Oleksandr Byelkin
6aa47fae30 MDEV-35276 Assertion failure in find_producing_item upon a query from a view
Two problem solved:

1) Item_default_value makes a shallow copy so the copy
 should not delete field belong to the Item

2) Item_default_value should not inherit
 derived_field_transformer_for_having and
 derived_field_transformer_for_where (in this variant
 pushing DEFAULT(f) is prohibited (return NULL) but
 if return "this" it will be allowed (should go with
 a lot of tests))
2024-10-29 11:44:43 +01:00
Thirunarayanan Balathandayuthapani
db3be9b434 MDEV-35237 Bulk insert fails to apply buffered operation during CREATE..SELECT statement
Problem:
=======
- InnoDB fails to write the buffered insert operation during
create..select operation. This happens when bulk_insert
in transaction is reset to false while unlocking a source table.

Fix:
===
- InnoDB should apply the previous buffered changes to
all tables if we encounter any statement other than
pure INSERT or INSERT..SELECT statement in
ha_innobase::external_lock() and start_stmt().

- Remove the function bulk_insert_apply_for_table()

start_stmt(), external_lock(): Assert that trx->duplicates
should be enabled during bulk insert operation
2024-10-29 15:03:23 +05:30
Alexander Barkov
a88c71b294 MDEV-35041 Simple comparison causes "Illegal mix of collations" even with default server settings
The task "MDEV-25829 Change default Unicode collation to uca1400_ai_ci"
previously changed collation derivation for string user variables
from DERIVATION_EXPLICIT to DERIVATION_COERCIBLE, to resolve illegal
collation mix conflicts between table columns and user variables
when they have different collations.

However, DERIVATION_COERCIBLE was a wrong choice because it caused
conflicts between string literals and user variables when they have
different collations.

Adding a new collation derivation level DERIVATION_USERVAR.
This makes the collation of a user variable:
- weaker than a table column (like it was intended by MDEV-25829)
- but stronger than a literal (like it was in pre-MDEV-25829)

Cleanup in sql_type.h:
  Removing the line "- BINARY(expr)" from the before-DERIVATION_CAST
  comment, as it was on a wrong place. It's also listed on the correct
  place before DERIVATION_IMPLICIT.
2024-10-28 16:30:49 +04:00
Monty
066f920484 MDEV-35110 Deadlock on Replica during BACKUP STAGE BLOCK_COMMIT on XA transactions
This is an extension of MDEV-30423 "Deadlock on Replica during BACKUP
STAGE BLOCK_COMMIT on XA transactions"

The original commit in MDEV-30423 was not complete as some usage in XA of
MDL_BACKUP_COMMIT locks did not set thd->backup_commit_lock.
This is required to be set when using parallel replication.

Fixed by ensuring that all usage of BACKUP_COMMIT lock i XA is uniform and
all sets thd->backup_commit_lock. I also changed all locks to be
MDL_EXPLICIT to keep also that part uniform.

A regression test is added.
2024-10-28 13:29:21 +02:00
Vladislav Vaintroub
4b068b7fcb MDEV-32387 - prevent output during temporary changes to STDOUT/STDERR
create_process() temporarily changes STDOUT/STDERR output to error log file

This might redirect mtr output on Windows, so avoid it by holding
flush_lock.
2024-10-28 10:45:50 +01:00
Marko Mäkelä
63a7e4c96b MDEV-35257 Backup fails during an ALTER TABLE with FULLTEXT INDEX
In commit 1c55b845e0 (MDEV-32932) the
test mariabackup.innodb_ddl_on_intermediate_table was introduced but
disabled.

xb_load_single_table_tablespace(): Properly handle missing FTS_ tables.

backup_file_op_fail(): Properly handle FILE_DELETE records.
2024-10-28 07:44:18 +02:00
Sergei Petrunia
284593413f MDEV-35253: xa_prepare_unlock_unmodified fails: shift exponent 32 is too large
The code in best_access_path() uses PREV_BITS(uint, N) to
compute a bitmap of all keyparts: {keypart0, ... keypart{N-1}).

The problem is that PREV_BITS($type, N) macro code can't handle the case
when N=<number of bits in $type).
Also, why use PREV_BITS(uint, ...) for key part map computations when
we could have used PREV_BITS(key_part_map) ?

Fixed both:
- Change PREV_BITS(type, N) to handle any N in [0; n_bits(type)].
- Change PREV_BITS() to use key_part_map when computing key_part_map bitmaps.
2024-10-25 18:02:14 +03:00
Yuchen Pei
b8c2bd9f69
MDEV-35249 Fix regression caused by MDEV-34447
MDEV-34447 Removed setting first_cond_optimization to 0 in update and
delete when leaf_tables_saved. This can cause problems when two ps
executions of an update go through different paths, where the first ps
execution goes through single table update only and the second ps
execution also goes through multi table update. When this happens, the
first_cond_optimization of the outer query is not set to false during
the first ps execution because optimize() is not called for the outer
query. But then the second ps execution will call optimize() on the
outer query, which with first_cond_optimization==true trips the 2nd ps
mem leak detection.

This is not a problem in higher version as both executions go through
multi table updates, possibly due to MDEV-28883.

We fix this problem by restoring the FALSE assignments to
first_cond_optimization.
2024-10-25 18:03:40 +11:00
Sergei Golubchik
3cd706b107 MDEV-35236 Assertion `(mem_root->flags & 4) == 0' failed in safe_lexcstrdup_root
Post-fix for MDEV-35144.

Cannot allocate options values on the statement arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was initially empty, it will be reset for every execution
and any values allocated on the  statement arena will be lost.

Cannot allocate option values on the execution arena, because
HA_CREATE_INFO is shallow-copied for every execution, so if the
option_list was  initially NOT empty, any values appended to the
end will be preserved and if they're on the execution arena their
content will be destroyed.

Let's use thd->change_item_tree() to save and restore necessary pointers
for every execution.

followup for 3da565c41d
2024-10-23 14:58:57 +02:00
Sergei Golubchik
eac33a23da MDEV-32022 ERROR 1054 (42S22): Unknown column 'X' in 'NEW' in trigger
add missing do_get_copy/do_build_clone
2024-10-23 14:58:57 +02:00
Yuchen Pei
4b6922a315
MDEV-25008: UPDATE/DELETE: Cost-based choice IN->EXISTS vs Materialization
Single-table UPDATE/DELETE didn't provide outer_lookup_keys value for
subqueries. This didn't allow to make a meaningful choice between
IN->EXISTS and Materialization strategies for subqueries.

Fix this:
* Make UPDATE/DELETE save Sql_cmd_dml::scanned_rows,
* Then, subquery's JOIN::choose_subquery_plan() can fetch it from
there for outer_lookup_keys

Details:
UPDATE/DELETE now calls select_lex->optimize_unflattened_subqueries()
twice, like SELECT does (first call optimize_constant_subquries() in
JOIN::optimize_inner(), then call optimize_unflattened_subqueries() in
JOIN::optimize_stage2()):
1. Call with const_only=true before any optimizations. This allows
range optimizer and others to use the values of cheap const
subqueries.
2. Call it with const_only=false after range optimizer, partition
pruning, etc. outer_lookup_keys value is provided, so it's possible to
pick a good subquery strategy.

Note: PROTECT_STATEMENT_MEMROOT requires that first SP execution
performs subquery optimization for all subqueries, even for degenerate
query plans like "Impossible WHERE". Due to that, we ensure that the
call to optimize_unflattened_subqueries (with const_only=false) even
for degenerate query plans still happens, as was the case before this
change.
2024-10-23 23:51:24 +11:00
Oleg Smirnov
6bd1cb0ea0 MDEV-34880 Incorrect result for query with derived table having TEXT field
When a derived table which has distinct values and BLOB fields is
materialized, an index is created over all columns to ensure only
unique values are placed to the result.
This index is created in a special mode HA_UNIQUE_HASH to support BLOBs.
Later the optimizer may incorrectly choose this index to retrieve values
from the derived table, although such type of index cannot be used
for data retrieval.

This commit excludes HA_UNIQUE_HASH indexes from adding to
`JOIN::keyuse` array thus preventing their subsequent usage for
data retrieval
2024-10-23 17:55:00 +07:00
Vlad Lesin
8c7786e7d5 MDEV-34690 lock_rec_unlock_unmodified() causes deadlock
lock_rec_unlock_unmodified() is executed either under lock_sys.wr_lock()
or under a combination of lock_sys.rd_lock() + record locks hash table
cell latch. It also requests page latch to check if locked records were
changed by the current transaction or not.

Usually InnoDB requests page latch to find the certain record on the
page, and then requests lock_sys and/or record lock hash cell latch to
request record lock. lock_rec_unlock_unmodified() requests the latches
in the opposite order, what causes deadlocks. One of the possible
scenario for the deadlock is the following:

thread 1 - lock_rec_unlock_unmodified() is invoked under locks hash table
           cell latch, the latch is acquired;
thread 2 - purge thread acquires page latch and tries to remove
           delete-marked record, it invokes lock_update_delete(), which
           requests locks hash table cell latch, held by thread 1;
thread 1 - requests page latch, held by thread 2.

To fix it we need to release lock_sys.latch and/or lock hash cell latch,
acquire page latch and re-acquire lock_sys related latches.

When lock_sys.latch and/or lock hash cell latch are released in
lock_release_on_prepare() and lock_release_on_prepare_try(), the page on
which the current lock is held, can be merged. In this case the bitmap
of the current lock must be cleared, and the new lock must be added to
the end of trx->lock.trx_locks list, or bitmap of already existing lock
must be changed.

The new field trx_lock_t::set_nth_bit_calls indicates if new locks
(bits in existing lock bitmaps or new lock objects) were created during
the period when lock_sys was released in trx->lock.trx_locks list
iteration loop in lock_release_on_prepare() or
lock_release_on_prepare_try(). And, if so, we traverse the list again.

The block can be freed during pages merging, what causes assertion
failure in buf_page_get_gen(), as btr_block_get() passes BUF_GET as page
get mode to it. That's why page_get_mode parameter was added to
btr_block_get() to pass BUF_GET_POSSIBLY_FREED from
lock_release_on_prepare() and lock_release_on_prepare_try() to
buf_page_get_gen().

As searching for id of trx, which modified secondary index record, is
quite expensive operation, restrict its usage for master. System variable
was added to remove the restriction for testing simplifying. The
variable exists only either for debug build or for build with
-DINNODB_ENABLE_XAP_UNLOCK_UNMODIFIED_FOR_PRIMARY option to increase the
probability of catching bugs for release build with RQG.

Note that the code, which does primary index lookup to find out what
transaction modified secondary index record, is necessary only when
there is no primary key and no unique secondary key on replica with row
based replication, because only in this case extra X locks on unmodified
records can be set during scan phase.

Reviewed by Marko Mäkelä.
2024-10-23 12:36:17 +03:00
Vlad Lesin
92180ad513 MDEV-34466 XA prepare don't release unmodified records for some cases
There is no need to exclude exclusive non-gap locks from the procedure
of locks releasing on XA PREPARE execution in
lock_release_on_prepare_try() after commit
17e59ed3aa (MDEV-33454), because
lock_rec_unlock_unmodified() should check if the record was modified
with the XA, and release the lock if it was not.

lock_release_on_prepare_try(): don't skip X-locks, let
lock_rec_unlock_unmodified() to process them.

lock_sec_rec_some_has_impl(): add template parameter for not acquiring
trx_t::mutex for the case if a caller already holds the mutex, don't
crash if lock's bitmap is clean.

row_vers_impl_x_locked(), row_vers_impl_x_locked_low(): add new argument
to skip trx_t::mutex acquiring.

rw_trx_hash_t::validate_element(): don't acquire trx_t::mutex if the
current thread already holds it.

Thanks to Andrei Elkin for finding the bug.
Reviewed by Marko Mäkelä, Debarun Banerjee.
2024-10-23 12:36:17 +03:00
Marko Mäkelä
1cad1dbde6 MDEV-35235 innodb_snapshot_isolation=ON fails to signal transaction rollback
convert_error_code_to_mysql(): Treat DB_DEADLOCK and DB_RECORD_CHANGED
in the same way, that is, signal to the SQL layer that the transaction
had been rolled back.
2024-10-23 07:55:22 +03:00
Jan Lindström
b3be3c2157 MDEV-30653 : With wsrep_mode=REPLICATE_ARIA only part of mixed-engine transactions is replicated
Replication of non-transactional engines is experimental and
uses TOI. This naturally means that if there is open transaction
with transactional engine it's changes will be rolled back.

Fixed by adding error message if non-transactional engine
is part of multi-engine transaction with warning.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-10-23 04:00:52 +02:00
Jan Lindström
7ffa7b6b01 MDEV-31888 : galera.galera_wan, galera.galera_vote_rejoin_* fail
Clean up configuration and tests. Add wait conditions to make
sure test continues from clean state.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2024-10-23 03:47:08 +02:00
Oleg Smirnov
fd87e01f38 MDEV-27277 Add a warning when max_sort_length is reached
During a query execution some sorting and grouping operations
on strings may be involved. System variable max_sort_length defines
the maximum number of bytes to use when comparing strings during
sorting/grouping. Thus, the comparable parts of strings may be less
than their actual size, so the results of the query may be not
sorted/grouped properly.
To indicate that some comparisons were done on a truncated lengths,
a new warning has been introduced with this commit.
2024-10-22 22:39:36 +07:00
Alexander Barkov
0d17c540a5 MDEV-27277 Add a warning when max_sort_length is reached
Step#1: fixing the return type of strnxfrm() from size_t to this structure:

typedef struct
{
  size_t m_output_length;
  size_t m_source_length_used;
  uint m_warnings;
} my_strnxfrm_ret_t;
2024-10-22 21:42:53 +07:00
Oleksandr Byelkin
9b3413c71f MDEV-8578: fix galera test 2024-10-22 09:23:56 +02:00
Oleksandr Byelkin
d29611afa1 MDEV-15497 fixed outdated syntax 2024-10-22 09:12:23 +02:00
Brandon Nesterenko
1ed30e08af MDEV-34122: Assertion `entry' failed in Active_tranx::assert_thd_is_waiter
If semi-sync is switched off then on while a transaction is
in-between binlogging and waiting for an ACK, the semi-sync state of
the transaction is removed, leading to a debug assertion that
indicates the transaction tried to wait, but cannot receive an ACK
signal. More specifically, when semi-sync is switched off, the
Active_tranx list is cleared (where a transaction adds an entry to
this list during binlogging), and each entry in this list saves the
thread which will wait for an ACK, and the thread has the COND
variable to signal to wake itself. So if the entry is lost, the
Ack_receiver thread won’t be able to find the thread to wake up when
an ACK comes in

The fix is to ensure that the entry exists before awaiting the ACK,
and if there is no entry, skip the wait. In debug builds, an
informative message is written explaining that the transaction is
skipping its wait. Additional debug-build only logic is added to
ensure that the cause of the missing entry is due to semi-sync being
turned off and on

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
2024-10-21 15:35:54 -06:00
Alexander Barkov
855c21eb99 Recording ctype_gbk_export_import.result according to MDEV-34883 2024-10-21 14:08:00 +04:00
Thirunarayanan Balathandayuthapani
7f7d78bc18 MDEV-35183 ADD FULLTEXT INDEX unnecessarily DROPS FTS COMMON TABLES
- InnoDB fulltext rebuilds the FTS COMMON table while adding the
new fulltext index. This can be optimized by avoiding rebuilding
the FTS COMMON table in case of FTS COMMON TABLE already exists.

Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
2024-10-21 12:27:09 +05:30
Alexander Barkov
e1cd3c4033 MDEV-12252 ROW data type for stored function return values
Adding support for the ROW data type in the stored function RETURNS clause:

- explicit ROW(..members...) for both sql_mode=DEFAULT and sql_mode=ORACLE

  CREATE FUNCTION f1() RETURNS ROW(a INT, b VARCHAR(32)) ...

- anchored "ROW TYPE OF [db1.]table1" declarations for sql_mode=DEFAULT

  CREATE FUNCTION f1() RETURNS ROW TYPE OF test.t1 ...

- anchored "[db1.]table1%ROWTYPE" declarations for sql_mode=ORACLE

  CREATE FUNCTION f1() RETURN test.t1%ROWTYPE ...

Adding support for anchored scalar data types in RETURNS clause:

- "TYPE OF [db1.]table1.column1" for sql_mode=DEFAULT

  CREATE FUNCTION f1() RETURNS TYPE OF test.t1.column1;

- "[db1.]table1.column1" for sql_mode=ORACLE

  CREATE FUNCTION f1() RETURN test.t1.column1%TYPE;

Details:

- Adding a new sql_mode_t parameter to
    sp_head::create()
    sp_head::sp_head()
    sp_package::create()
    sp_package::sp_package()
  to guarantee early initialization of sp_head::m_sql_mode.
  Before this change, this member was not initialized at all during
  CREATE FUNCTION/PROCEDURE/PACKAGE statements, and was not used.
  Now it needs to be initialized to write properly the
  mysql.proc.returns column, according to the create time sql_mode.

- Code refactoring to make the things simpler and functions smaller:

  * Adding a new method
    Field_row::row_create_fields(THD *thd, List<Spvar_definition> *list)
    to make a Virtual_tmp_table with Fields for ROW members
    from an explicit definition.

  * Adding a new method
    Field_row::row_create_fields(THD *thd, const Spvar_definition &def)
    to make a Virtual_tmp_table with Fields for ROW members
    from an explicit or a table anchored definition.

  * Adding a new method
    Item_args::add_array_of_item_field(THD *thd, const Virtual_tmp_table &vtable)
    to create and array of Item_field corresponding to all Field instances
    in a Virtual_tmp_table

  * Removing Item_field_row::row_create_items(). It was decomposed
    into the new methods described above.

  * Moving the code from the loop body in sp_rcontext::init_var_items()
    into a separate method Spvar_definition::make_item_field_row(),
    to make the code clearer (smaller functions).
    make_item_field_row() itself uses the new methods described above.

- Changing the data type of sp_head::m_return_field_def
  from Column_definition to Spvar_definition.
  So now it supports not only SQL column field types,
  but also explicit ROW and anchored ROW data types,
  as well as anchored column types.

- Adding a new Column_definition parameter to sp_head::create_result_field().
  Before this patch, create_result_field() took the definition only
  from m_return_field_def. Now it's also called with a local Column_definition
  variable which contains the explicit definition resolved from an
  anchored defition.

- Modifying sql_yacc.yy to support the new grammar.
  Adding new helper methods:
    * sf_return_fill_definition_row()
    * sf_return_fill_definition_rowtype_of()
    * sf_return_fill_definition_type_of()

- Fixing tests in:
  * Virtual_tmp_table::setup_field_pointers() in sql_select.cc
  * Send_field::normalize() in field.h
  * store_column_type()
  to prevent calling Type_handler_row::field_type(),
  which is implemented a DBUG_ASSERT(0).
  Before this patch the affected methods and functions were called only
  for scalar data types. Now ROW is also possible.

- Adding a new virtual method Field::cols()

- Overriding methods:
   Item_func_sp::cols()
   Item_func_sp::element_index()
   Item_func_sp::check_cols()
   Item_func_sp::bring_value()
  to support the ROW data type.

- Extending the rule sp_return_type to support
  * explicit ROW and anchored ROW data types
  * anchored scalar data types

- Overriding Field_row::sql_type() to print
  the data type of an explicit ROW.
2024-10-21 07:59:29 +04:00
Alexander Barkov
dfaf7e2eb4 MDEV-15751 CURRENT_TIMESTAMP should return a TIMESTAMP [WITH TIME ZONE?]
Changing the return type of the following functions:
  - CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(), NOW()
  - SYSDATE()
  - FROM_UNIXTIME()
from DATETIME to TIMESTAMP.

Note, the old function NOW() returning DATETIME is still available
as LOCALTIMESTAMP or LOCALTIMESTAMP(), e.g.:

  SELECT
    LOCALTIMESTAMP,     -- DATETIME
    CURRENT_TIMESTAMP;  -- TIMESTAMP

The change in the functions return data type fixes some problems
that occurred near a DST change:

- Problem #1

INSERT INTO t1 (timestamp_field) VALUES (CURRENT_TIMESTAMP);
INSERT INTO t1 (timestamp_field) VALUES (COALESCE(CURRENT_TIMESTAMP));

could result into two different values inserted.

- Problem #2

INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526));
INSERT INTO t1 (timestamp_field) VALUES (FROM_UNIXTIME(1288477526+3600));

could result into two equal TIMESTAMP values near a DST change.

Additional changes:

- FROM_UNIXTIME(0) now returns SQL NULL instead of '1970-01-01 00:00:00'
  (assuming time_zone='+00:00')

- UNIX_TIMESTAMP('1970-01-01 00:00:00') now returns SQL NULL instead of 0
  (assuming time_zone='+00:00'

These additional changes are needed for consistency with TIMESTAMP fields,
which cannot store '1970-01-01 00:00:00 +00:00'
2024-10-19 22:48:23 +02:00
Sergei Golubchik
128fc34990 fix rdiff files in sys_var suite 2024-10-19 16:54:48 +02:00
Sergei Golubchik
15a291e4e0 MDEV-14978 fix client.client-env-variable test
* fix paths to work when installed and not only from the source dir
* don't use a cnf file (no need to restart the server for this)
* set MYSQL_HOST to a valid hostname when testing an invalid MARIADB_HOST
* use invalid ip to have clients fail quickly and not waste time
  on resolving the invalid hostname

followup for eedbb901e5
2024-10-19 16:53:16 +02:00
Kristian Nielsen
abc46259c6 MDEV-34753 memory pressure - erroneous termination condition
Fix race condition in test case by waiting for the expected state to occur.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-10-19 17:20:27 +11:00
Daniel Black
eb29190398 MDEV-34753 memory pressure - erroneous termination condition
The 'if (!m_abort) break' condition was inverted by accident.

Constrain the test case to environments where there is cgroupv2
runtime environment which is the same case that will pass a memory
pressure initialization.

Remove the explicit garbage_collection trigger as it hides the abnormal
termination error on the event loop for memory pressure. This
also means there is no support in non-cgroupv2 environments
(possibly some container environments).

As the trigger to memory pressure is via a different thread we
need to wait until a "[mM]emory pressure" log message is there to
know it has succeeded or failed.

Thanks Kristian Nielsen for noticing and review.
2024-10-19 17:20:27 +11:00
Sergei Petrunia
a68e74b5a4 MDEV-35164: optimizer_join_limit_pref_ratio: assertion when the ORDER BY table becomes constant
Assertion failure has happened due to this scenario:

A query was ran with optimizer_join_limit_pref_ratio=1.
The query had "ORDER BY t1.col LIMIT N".
The optimizer set join->limit_shortcut_applicable=1.
Then, table t1 was marked as constant.
The code in choose_query_plan() still set join->limit_optimization_mode=1
which caused the optimizer to only consider t1 as the first non-const table.
But t1 was already put into the join prefix as the constant table.
The optimizer couldn't produce any join order at all and crashed.

Fixed by not searching for shortcut plan if ORDER BY table is a constant.
We will not try to do sorting anyway in this case (and LIMIT short-cutting
will be done for any join order).
2024-10-18 15:42:05 +03:00
Rucha Deodhar
e14d2b7974 MDEV-8578: Wrong error code/message with enforce_storage_engine and
NO_ENGINE_SUBSTITUTION

Analysis:
When the error is hit, wrong error code is passed in my_error
Fix:
Pass a better error code.
2024-10-18 16:42:52 +05:30
Sergei Petrunia
0540eac05c MDEV-35180: ref_to_range rewrite causes poor query plan
(Variant 2: only allow rewrite for ref(const))

make_join_select() has a "ref_to_range" rewrite: it would rewrite
any ref access to a range access on the same index if the latter uses
more keyparts.
It seems, he initial intent of this was to fix poor query plan choice
in cases like

  t.keypart1=const AND t.keypart2 < 'foo'

Due to deficiency in cost model, ref access could be picked while range
would enumerate fewer rows and be cheaper.
However, the condition also forces a rewrite in cases like:

  t.keypart1=prev_table.col AND t.keypart1<='foo' AND t.keypart2<'bar'

Here, it can be that
* keypart1=prev_table.col is highly selective
* (keypart1, keypart2) <= ('foo', 'bar') is not at all selective.

Still, the rewrite would be made and poor query plan chosen.
Fixed this by only doing the rewrite if ref access was ref(const)
so we can be certain that quick select also used these restrictions
and will scan a subset of rows that ref access would scan.
2024-10-18 13:37:04 +03:00
Marko Mäkelä
ebefef658e Merge 10.11 into 11.2 2024-10-18 11:32:22 +03:00
Marko Mäkelä
eca552a1a4 MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

In crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-18 10:12:47 +03:00
Brandon Nesterenko
e213e916ad MDEV-32014: Fix mysqld--help,win.rdiff 2024-10-17 15:53:00 -06:00
Sergei Golubchik
3a1cf2c85b MDEV-34679 ER_BAD_FIELD uses non-localizable substrings 2024-10-17 21:37:37 +02:00
Sergei Golubchik
3693fb9581 MDEV-25199 cluster fails to start up
if you need innodb in your test - enable it yourself
2024-10-17 21:37:37 +02:00
Sergei Golubchik
e1e836fc76 update results after the merge 2024-10-17 21:37:37 +02:00
Sergei Golubchik
3da565c41d MDEV-35144 CREATE TABLE ... LIKE uses current innodb_compression_default instead of the create value
When adding a column or index that uses plugin-defined
sysvar-based options with CREATE ... LIKE the server
was using the current value of the sysvar, not the default one.

Because parse_option_list() function was used both in create
and open and it tried to guess when it's create (need to use
current sysvar value and add a new name=value pair to the list)
or open (need to use default, without extending the list).

Let's move the list extending functionality into a separate
function and call it explicitly when needed. Operations that
add new objects (CREATE, ALTER ... ADD) will extend the list,
other operations (ALTER, CREATE ... LIKE, open) will not.
2024-10-17 16:28:39 +02:00
Marko Mäkelä
bb47e575de MDEV-34830: LSN in the future is not being treated as serious corruption
The invariant of write-ahead logging is that before any change to a
page is written to the data file, the corresponding log record must
must first have been durably written.

On crash recovery, there were some sloppy checks for this. Let us
implement accurate checks and flag an inconsistency as a hard error,
so that we can avoid further corruption of a corrupted database.
For data extraction from the corrupted database, innodb_force_recovery
can be used.

Before recovery is reading any data pages or invoking
buf_dblwr_t::recover() to recover torn pages from the
doublewrite buffer, InnoDB will have parsed the log until the
final LSN and updated log_sys.lsn to that. So, we can rely on
log_sys.lsn at all times. The doublewrite buffer recovery has been
refactored in such a way that the recv_sys.dblwr.pages may be consulted
while discovering files and their page sizes, but nothing will be
written back to data files before buf_dblwr_t::recover() is invoked.

A section of the test mariabackup.innodb_redo_overwrite
that is parsing some mariadb-backup --backup output has
been removed, because that output "redo log block is overwritten"
would often be missing in a Microsoft Windows environment
as a result of these changes.

recv_max_page_lsn, recv_lsn_checks_on: Remove.

recv_sys_t::validate_checkpoint(): Validate the write-ahead-logging
condition at the end of the recovery.

recv_dblwr_t::validate_page(): Keep track of the maximum LSN
(if we are checking a non-doublewrite copy of a page) but
do not complain LSN being in the future. The doublewrite buffer
is a special case, because it will be read early during recovery.
Besides, starting with commit 762bcb81b5
the dblwr=true copies of pages may legitimately be "too new".

recv_dblwr_t::find_page(): Find a valid page with the smallest
FIL_PAGE_LSN that is in the valid range for recovery.

recv_dblwr_t::restore_first_page(): Replaced by find_page().
Only buf_dblwr_t::recover() will write to data files.

buf_dblwr_t::recover(): Simplify the message output. Do attempt
doublewrite recovery on user page read error. Ignore doublewrite
pages whose FIL_PAGE_LSN is outside the usable bounds. Previously,
we could wrongly recover a too new page from the doublewrite buffer.
It is unlikely that this could have lead to an actual error.
Write back all recovered pages from the doublewrite buffer here,
including for the first page of any tablespace.

buf_page_is_corrupted(): Distinguish the return values
CORRUPTED_FUTURE_LSN and CORRUPTED_OTHER.

buf_page_check_corrupt(): Return the error code DB_CORRUPTION
in case the LSN is in the future.

Datafile::read_first_page(): Handle FSP_SPACE_FLAGS=0xffffffff
in the same way on both 32-bit and 64-bit architectures.

Datafile::read_first_page_flags(): Split from read_first_page().
Take a copy of the first page as a parameter.

recv_sys_t::free_corrupted_page(): Take the file as a parameter
and return whether a message was displayed. This avoids some duplicated
and incomplete error messages.

buf_page_t::read_complete(): Remove some redundant output and always
display the name of the corrupted file. Never return DB_FAIL;
use it only in internal error handling.

IORequest::read_complete(): Assume that buf_page_t::read_complete()
will have reported any error.

fil_space_t::set_corrupted(): Return whether this is the first time
the tablespace had been flagged as corrupted.

Datafile::validate_first_page(), fil_node_open_file_low(),
fil_node_open_file(), fil_space_t::read_page0(),
fil_node_t::read_page0(): Add a parameter for a copy of the
first page, and a parameter to indicate whether the FIL_PAGE_LSN
check should be suppressed. Before buf_dblwr_t::recover() is
invoked, we cannot validate the FIL_PAGE_LSN, but we can trust the
FSP_SPACE_FLAGS and the tablespace ID that may be present in a
potentially too new copy of a page.

Reviewed by: Debarun Banerjee
2024-10-17 17:24:20 +03:00
Brandon Nesterenko
39cce39ae1 MDEV-32014: typo fix in test 2024-10-17 07:54:09 -06:00
Sergei Golubchik
70aa713f58 MDEV-32014 test fix 2024-10-17 07:53:59 -06:00
Libing Song
72cc58bb71 MDEV-32014 Rename binlog cache temporary file to binlog file
for large transaction

Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.

The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.

Design
======
* binlog_large_commit_threshold
  type: ulonglong
  scope: global
  dynamic: yes
  default: 128MB

  Only the binlog cache temporary files large than 128MB are
  renamed to binlog file.

* #binlog_cache_files directory
  To support rename, all binlog cache temporary files are managed
  as normal files now. `#binlog_cache_files` directory is in the same
  directory with binlog files. It is created at server startup if it doesn't
  exist. Otherwise, all files in the directory is deleted at startup.

  The temporary files are named with ML_ prefix and the memorary address
  of the binlog_cache_data object which guarantees it is unique.

* Reserve space
  To supprot rename feature, It must reserve enough space at the
  begin of the binlog cache file. The space is required for
  Format description, Gtid list, checkpoint and Gtid events when
  renaming it to a binlog file.

  Since binlog_cache_data's cache_log is directly accessed by binlog log,
  online alter and wsrep. It is not easy to update all the code. Thus
  binlog cache will not reserve space if it is not session binlog cache or
  wsrep session is enabled.

  - m_file_reserved_bytes
    Stores the bytes reserved at the begin of the cache file.
    It is initialized in write_prepare() and cleared by reset().

    The reserved file header is hide to callers. Thus there is no
    change for callers. E.g.
    - get_byte_position() still get the length of binlog data
      written to the cache, but not the file length.
    - truncate(0) will truncate the file to m_file_reserved_bytes but not 0.

  - write_prepare()
    write_prepare() is called everytime when anything is being written
    into the cache. It will call init_file_reserved_bytes() to  create
    the cache file (if it doesn't exist) and reserve suitable space if
    the data written exceeds buffer's size.

* Binlog_commit_by_rotate
  It is used to encapsulate the code for remaing a binlog cache
  tempoary file to binlog file.
  - should_commit_by_rotate()
    it is called by write_transaction_to_binlog_events() to check if
    a binlog cache should be rename to a binlog file.
  - commit()
    That is the entry to rename a binlog cache and commit the
    transaction. Both rename and commit are protected by LOCK_log,
    Thus not other transactions can write anything into the renamed
    binlog before it.

    Rename happens in a rotation. After the new binlog file is generated,
    replace_binlog_file() is called to:
    - copy data from the new binlog file to its binlog cache file.
    - write gtid event.
    - rename the binlog cache file to binlog file.

    After that the rotation will continue to succeed. Then the transaction
    is committed in a seperated group itself. Its cache file will be
    detached and cache log will be reset before calling
    trx_group_commit_with_engines(). Thus only Xid event be written.
2024-10-17 07:53:59 -06:00
Oleksandr Byelkin
600c42ea86 MDEV-34883 LOAD DATA INFILE with geometry data fails
We write field using field data charset, so we should read it
using the field charset.
2024-10-17 10:33:36 +02:00
Sergei Golubchik
3b58c6b93f MDEV-35079 Migrate MySQL5.7 to MariaDB 10.4, then to MariaDB 10.11 Failed
correctly detect when partitioning is disabled
2024-10-17 10:08:24 +02:00
Sergei Golubchik
7842cab8c0 MDEV-34318 post-merge fix 2024-10-17 10:08:24 +02:00
Sergei Golubchik
6b436cba01 Revert "Fixes buildbot issue with plugin.fulltext_plugin"
This reverts commit a8010e7689.

The test doesn't require embedded after ab15628bbc
2024-10-17 09:11:47 +02:00
Thirunarayanan Balathandayuthapani
4a1ded61a4 MDEV-34529 Shrink the system tablespace when system tablespace contains MDEV-30671 leaked undo pages
- InnoDB fails to shrink the system tablespace when it contains
the leaked undo log pages caused by MDEV-30671.

- InnoDB does free the unused segment in system tablespace
before shrinking the tablespace. InnoDB fails to free
the unused segment if XA PREPARE transaction exist or
if the previous shutdown was not with innodb_fast_shutdown=0

inode_info: Structure to store the inode page and offsets.

fil_space_t::garbage_collect(): Frees the system tablespace
unused segment

fsp_get_sys_used_segment(): Iterates through all default
file segment and index segment present in system tablespace.

trx_sys_t::is_xa_exist(): Returns true if the XA transaction
exist in the undo logs

fseg_inode_free(): Frees the extents, fragment pages for the
given index node and ignores any error similar to
trx_purge_free_segment()

trx_sys_t::reset_page(): Retain the TRX_SYS_FSEG_HEADER value
in trx_sys page while resetting the page.
2024-10-16 21:34:24 +05:30
Monty
4955f6018a MDEV-29351 SIGSEGV when doing forward reference of item in select list
The reason for the crash was the code assumed that
SELECT_LEX.ref_pointer_array would be initialized with zero, which was
not the case. This cause the test of
if (!select->ref_pointer_array[counter]) in item.cc to be unpredictable
and causes crashes.

Fixed by zero-filling ref_pointer_array on allocation.
2024-10-16 17:24:46 +03:00
Monty
0de2613e7a Fixed that SHOW CREATE TABLE for sequences shows used table options 2024-10-16 17:24:46 +03:00
Monty
2c52fdd28a MDEV-32350 Can't selectively restore sequences using innodb tables from backup
Added support for sequences to do  discard and import tablespace
2024-10-16 17:24:46 +03:00
Monty
ee908140ac Fixed bug in main.connect test where Connection_errors showed wrong value 2024-10-16 17:24:46 +03:00
Monty
a8010e7689 Fixes buildbot issue with plugin.fulltext_plugin
The test is using features not in the embedded server.
Fixed by including not_embedded.inc
2024-10-16 17:24:46 +03:00
Vladislav Vaintroub
c1fc59277a MDEV-34929 page-compressed tables do not work on Windows
Remove workaround for MDEV-13941, it served for 5 years,and all affected
pre-release 10.2 installation should have been already fixed in between.

Apparently Innodb is using is_sparse parameter in os_file_set_size()
inconsistently, and it passes is_sparse=false now during first file
extension. With MDEV-13941 workaround in place, it would unsparse
the file, which is makes compression not to work at all anymore.
2024-10-16 16:02:13 +02:00
Sergei Petrunia
89493a9980 MDEV-34993: fix merge into 10.6: OPTIMIZER_ADJ_FIX_CARD_MULT should be ON by default 2024-10-16 16:46:38 +03:00
Sergei Golubchik
5ebda30ccc Revert "MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2"
This reverts commit 8ae462a220.
2024-10-16 13:23:47 +02:00
Kristian Nielsen
8ae462a220 MDEV-35019 Provide a way to enable "rollback XA on disconnect" behavior we had before 10.5.2
Implement variable legacy_xa_rollback_at_disconnect to support
backwards compatibility for applications that rely on the pre-10.5
behavior for connection disconnect, which is to rollback the
transaction (in violation of the XA specification).

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-10-16 10:18:36 +02:00
Sergei Petrunia
9849e3f948 MDEV-35072: Assertion with optimizer_join_limit_pref_ratio and 1-table select
Pre-11.0 variant:

1. In recompute_join_cost_with_limit(), add an assertion that that
partial_join_cost >= 0.0.

2. best_extension_by_limited_search() subtracts COST_EPS from
   join->best_read. But it is not subtracted from
   join->positions[0].read_time, add it back.

2. We could get very small negative partial_join_cost due to rounding
  errors. For fraction=1.0, we were computing essentially this (denote
  as EXPR-1):

  $row_read_cost + $where_cost - ($row_read_cost + $where_cost)

which should compute to 0.
But the computation was done in the following order (left-to-right):
EXPR-2:

  ($row_read_cost + $where_cost) - $row_read_cost - $where_cost

this produced a value of -1.1102230246251565e-16 due to a rounding
error. Change the computation use EXPR-1 instead of EXPR-2.
2024-10-15 15:56:41 +03:00
Sergei Petrunia
66b8d32b75 MDEV-35072: Assertion with optimizer_join_limit_pref_ratio and 1-table select
Variant for 11.2+:

In recompute_join_cost_with_limit(), do not subtract the cost of checking
the WHERE:
   pos->records_read* WHERE_COST_THD(join->thd)

It is already included in pos->read_time.

Also added comments about difference between this fix and the pre-11.2
variant.
2024-10-15 15:01:29 +03:00