Commit graph

10682 commits

Author SHA1 Message Date
Marko Mäkelä
15700f54c2 Merge 11.4 into 11.7 2025-01-09 09:41:38 +02:00
Marko Mäkelä
17f01186f5 Merge 10.11 into 11.4 2025-01-09 07:58:08 +02:00
Marko Mäkelä
420d9eb27f Merge 10.6 into 10.11 2025-01-08 12:51:26 +02:00
Monty
a2d37705ca Only print "InnoDB: Transaction was aborted..." if log_warnings >= 4
This is a minor fixup for
MDEV-24035 Failing assertion UT_LIST_GET_LEN(lock.trx_locks) == 0
causing disruption and replication failure
2025-01-05 16:40:12 +02:00
Monty
e600f9aebb MDEV-35750 Change MEM_ROOT allocation sizes to reduse calls to malloc() and avoid memory fragmentation
This commit updates default memory allocations size used with MEM_ROOT
objects to minimize the number of calls to malloc().

Changes:
- Updated MEM_ROOT block sizes in sql_const.h
- Updated MALLOC_OVERHEAD to also take into account the extra memory
  allocated by my_malloc()
- Updated init_alloc_root() to only take MALLOC_OVERHEAD into account as
  buffer size, not MALLOC_OVERHEAD + sizeof(USED_MEM).
- Reset mem_root->first_block_usage if and only if first block was used.
- Increase MEM_ROOT buffers sized used by my_load_defaults, plugin_init,
  Create_tmp_table, allocate_table_share, TABLE and TABLE_SHARE.
  This decreases number of malloc calls during queries.
- Use a small buffer for THD->main_mem_root in THD::THD. This avoids
  multiple malloc() call for new connections.

I tried the above changes on a complex select query with 12 tables.
The following shows the number of extra allocations that where used
to increase the size of the MEM_ROOT buffers.

Original code:
- Connection to MariaDB:   9 allocations
- First query run:       146 allocations
- Second query run:       24 allocations

Max memory allocated for thd when using with heap table:  61,262,408
Max memory allocated for thd when using Aria tmp table:      419,464

After changes:
Connection to MariaDB:     0 allocations
- First run:              25 allocations
- Second run:              7 allocations

Max memory allocated for thd when using with heap table:  61,347,424
Max memory allocated for thd when using Aria table:          529,168

The new code uses slightly more memory, but avoids memory fragmentation
and is slightly faster thanks to much fewer calls to malloc().

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
2025-01-05 16:40:11 +02:00
Monty
52c29f3bdc MDEV-35469 Heap tables are calling mallocs to often
Heap tables are allocated blocks to store rows according to
my_default_record_cache (mapped to the server global variable
 read_buffer_size).
This causes performance issues when the record length is big
(> 1000 bytes) and the my_default_record_cache is small.

Changed to instead split the default heap allocation to 1/16 of the
allowed space and not use my_default_record_cache anymore when creating
the heap. The allocation is also aligned to be just under a power of 2.

For some test that I have been running, which was using record length=633,
the speed of the query doubled thanks to this change.

Other things:
- Fixed calculation of max_records passed to hp_create() to take
  into account padding between records.
- Updated calculation of memory needed by heap tables. Before we
  did not take into account internal structures needed to access rows.
- Changed block sized for memory_table from 1 to 16384 to get less
  fragmentation. This also avoids a problem where we need 1K
  to manage index and row storage which was not counted for before.
- Moved heap memory usage to a separate test for 32 bit.
- Allocate all data blocks in heap in powers of 2. Change reported
  memory usage for heap to reflect this.

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
2025-01-05 16:40:11 +02:00
Monty
7fcaab7aaa MDEV-20912 Add support for utf8mb4_0900_* collations in MariaDB Server
This is done by mapping most of the existing MySQL unicode 0900 collations
to MariadB 1400 unicode collations. The assumption is that 1400 is a super
set of 0900 for all practical purposes.

I also added a new function 'compare_collations()' and changed most code
to use this instead of comparing character sets directly.
This enables one to seamlessly mix-and-match the corresponding 0900 and
1400 sets. Field comparision and alter table treats the character sets
as identical.

All MySQL 8.0 0900 collations are supported except:
- utf8mb4_ja_0900_as_cs
- utf8mb4_ja_0900_as_cs_ks
- utf8mb4_ru_0900_as_cs
- utf8mb4_zh_0900_as_cs

These do not have corresponding entries in the MariadB 01400 collations.

Other things:
- Added COMMENT colum to information_schema.collations. For utf8mb4_0900
  colletions it contains the corresponding alias collation.
2024-12-28 10:23:49 +02:00
Marko Mäkelä
a54d151fc1 Merge 10.6 into 10.11 2024-12-19 15:38:53 +02:00
Marko Mäkelä
ddd7d5d8e3 MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure
Under unknown circumstances, the SQL layer may wrongly disregard an
invocation of thd_mark_transaction_to_rollback() when an InnoDB
transaction had been aborted (rolled back) due to one of the following errors:
* HA_ERR_LOCK_DEADLOCK
* HA_ERR_RECORD_CHANGED (if innodb_snapshot_isolation=ON)
* HA_ERR_LOCK_WAIT_TIMEOUT (if innodb_rollback_on_timeout=ON)

Such an error used to cause a crash of InnoDB during transaction commit.
These changes aim to catch and report the error earlier, so that not only
this crash can be avoided but also the original root cause be found and
fixed more easily later.

The idea of this fix is from Michael 'Monty' Widenius.

HA_ERR_ROLLBACK: A new error code that will be translated into
ER_ROLLBACK_ONLY, signalling that the current transaction
has been aborted and the only allowed action is ROLLBACK.

trx_t::state: Add TRX_STATE_ABORTED that is like
TRX_STATE_NOT_STARTED, but noting that the transaction had been
rolled back and aborted.

trx_t::is_started(): Replaces trx_is_started().

ha_innobase: Check the transaction state in various places.
Simplify the logic around SAVEPOINT.

ha_innobase::is_valid_trx(): Replaces ha_innobase::is_read_only().

The InnoDB logic around transaction savepoints, commit, and rollback
was unnecessarily complex and might have contributed to this
inconsistency. So, we are simplifying that logic as well.

trx_savept_t: Replace with const undo_no_t*. When we rollback to
a savepoint, all we need to know is the number of undo log records
that must survive.

trx_named_savept_t, DB_NO_SAVEPOINT: Remove. We can store undo_no_t
directly in the space allocated at innobase_hton->savepoint_offset.

fts_trx_create(): Do not copy previous savepoints.

fts_savepoint_rollback(): If a savepoint was not found, roll back
everything after the default savepoint of fts_trx_create().
The test innodb_fts.savepoint is extended to cover this code.

Reviewed by: Vladislav Lesin
Tested by: Matthias Leich
2024-12-12 18:02:00 +02:00
Marko Mäkelä
69e20cab28 Merge 10.5 into 10.6 2024-12-11 14:46:43 +02:00
Daniel Black
807e4f320f Change my_umask{,_dir} to mode_t and remove os_innodb_umask
os_innodb_umask was of the incorrect type resulting in warnings
in clang-19. The correct type is mode_t.

As os_innodb_umask was set during innnodb_init from my_umask,
corrected the type there along with its companion my_umask_dir.
Because of this, the defaults mask values in innodb never
had an effect.

The resulting change allow found signed differences in
my_create{,_nosymlink}, open_nosymlinks:

mysys/my_create.c:47:20: error: operand of ?: changes signedness from ‘int’ to ‘mode_t’ {aka ‘unsigned int’} due to unsignedness of other operand [-Werror=sign-compare]
   47 |      CreateFlags ? CreateFlags : my_umask);

Ref: clang-19 warnings:

[55/123] Building CXX object storage/innobase/CMakeFiles/innobase.dir/os/os0file.cc.o
storage/innobase/os/os0file.cc:1075:46: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
 1075 |                 file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
      |                        ~~~~                                ^~~~~~~~~~~~~~~
storage/innobase/os/os0file.cc:1249:46: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
 1249 |                 file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
      |                        ~~~~                                ^~~~~~~~~~~~~~~
storage/innobase/os/os0file.cc:1381:45: warning: implicit conversion loses integer precision: 'ulint' (aka 'unsigned long') to 'mode_t' (aka 'unsigned int') [-Wshorten-64-to-32]
 1381 |         file = open(name, create_flag | O_CLOEXEC, os_innodb_umask);
      |                ~~~~                                ^~~~~~~~~~~~~~~
2024-12-11 17:21:01 +11:00
Kristian Nielsen
0f47db8525 Merge 10.11 -> 11.4
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 11:01:42 +01:00
Kristian Nielsen
e7c6cdd842 Merge 10.6 -> 10.11
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 10:11:58 +01:00
Kristian Nielsen
0166c89e02 Merge 10.5 -> 10.6
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 09:20:36 +01:00
Daniel Black
aca72b326a MDEV-34815 SIGILL error when executing mariadbd compiled for RISC-V with Clang
RISC-V and Clang produce rdcycle for __builtin_readcyclecounter.

Since Linux kernel 6.6 this is a privileged instruction not available
to userspace programs.

The use of __builtin_readcyclecounter is excluded from RISCV falling
back to the rdtime/rdtimeh instructions provided in MDEV-33435.

Thanks Alexander Richardson for noting it should be linux only in the
code and noting FreeBSD RISC-V permits rdcycle.

Author: BINSZ on JIRA
2024-12-05 02:36:25 +11:00
Aleksey Midenkov
aa9d5aea48 MDEV-34770 GCC warning fix
kvm-bintar-trusty-x86: warning: ‘no_sanitize’ attribute directive
ignored [-Wattributes]

The attribute is supported since gcc 8.
2024-12-04 18:22:31 +03:00
Monty
8e9aa9c6b0 Fix MariadDB to compile with gcc 7.5.0
gcc 7.5.0 does not understand __attribute__((no_sanitize("undefined"))
I moved the usage of this attribute from sql/set_var.h to
include/my_attribute.h and created a macro for it depending on
compiler used.
2024-12-04 12:58:00 +02:00
Marko Mäkelä
33907f9ec6 Merge 11.4 into 11.7 2024-12-02 17:51:17 +02:00
Marko Mäkelä
2719cc4925 Merge 10.11 into 11.4 2024-12-02 11:35:34 +02:00
Marko Mäkelä
4d9548876e MDEV-31340 fixup: clang++-20 -Wdeprecated-literal-operator 2024-12-02 10:44:06 +02:00
Marko Mäkelä
3d23adb766 Merge 10.6 into 10.11 2024-11-29 13:43:17 +02:00
Marko Mäkelä
7d4077cc11 Merge 10.5 into 10.6 2024-11-29 12:37:46 +02:00
Daniel Black
27c7e73f9a MDEV-35513 fails to compile on riscv32
Misplaced brace in my_rdtsc.h results in RV32 failing to compile.

ref: https://github.com/MariaDB/server/pull/1981/files#r1859762957

Thanks Jessica Clarke for the note.
2024-11-28 03:13:15 +02:00
Brandon Nesterenko
3c785499da MDEV-34348: Fix casts relating to tree_walk_action
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:23 -07:00
Brandon Nesterenko
840fe316d4 MDEV-34348: my_hash_get_key fixes
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

Change the type of my_hash_get_key to:
 1) Return const
 2) Change the context parameter to be const void*

Also fix casting in hash adjacent areas.

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:22 -07:00
Brandon Nesterenko
dbfee9fc2b MDEV-34348: Consolidate cmp function declarations
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.

This patch consolidates the functions all into the
qsort_cmp2 interface.

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:22 -07:00
Oleksandr Byelkin
b12ff287ec Merge branch '11.6' into 11.7 2024-11-10 19:22:21 +01:00
Oleksandr Byelkin
9e1fb104a3 MariaDB 11.4.4 release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEF39AEP5WyjM2MAMF8WVvJMdM0dgFAmck77AACgkQ8WVvJMdM
 0dgccQ/+Lls8fWt4D+gMPP7x+drJSO/IE/gZFt3ugbWF+/p3B2xXAs5AAE83wxEh
 QSbp4DCkb/9PnuakhLmzg0lFbxMUlh4rsJ1YyiuLB2J+YgKbAc36eQQf+rtYSipd
 DT5uRk36c9wOcOXo/mMv4APEvpPXBIBdIL4VvpKFbIOE7xT24Sp767zWXdXqrB1f
 JgOQdM2ct+bvSPC55oZ5p1kqyxwvd6K6+3RB3CIpwW9zrVSLg7enT3maLjj/761s
 jvlRae+Cv+r+Hit9XpmEH6n2FYVgIJ3o3WhdAHwN0kxKabXYTg7OCB7QxDZiUHI9
 C/5goKmKaPB1PCQyuTQyLSyyK9a8nPfgn6tqw/p/ZKDQhKT9sWJv/5bSWecrVndx
 LLYifSTrFC/eXLzgPvCnNv/U8SjsZaAdMIKS681+qDJ0P5abghUIlGnMYTjYXuX1
 1B6Vrr0bdrQ3V1CLB3tpkRjpUvicrsabtuAUAP65QnEG2G9UJXklOer+DE291Gsl
 f1I0o6C1zVGAOkUUD3QEYaHD8w7hlvyfKme5oXKUm3DOjaAar5UUKLdr6prxRZL4
 ebhmGEy42Mf8fBYoeohIxmxgvv6h2Xd9xCukgPp8hFpqJGw8abg7JNZTTKH4h2IY
 J51RpD10h4eoi6WRn3opEcjexTGvZ+xNR7yYO5WxWw6VIre9IUA=
 =s+WW
 -----END PGP SIGNATURE-----

Merge tag '11.4' into 11.6

MariaDB 11.4.4 release
2024-11-08 07:17:00 +01:00
Sergei Golubchik
a471389d07 MDEV-34970 Vector search fails to compile on s390x
add missing casts to float4store/float8store for bigendian.
fix a typo in float4store() usage
remove unnecessary one-byte-at-time appends
2024-11-05 14:00:50 -08:00
Sergei Golubchik
aed5928207 cleanup: extract transaction-related part of handlerton
into a separate transaction_participant structure

handlerton inherits it, so handlerton itself doesn't change.
but entities that only need to participate in a transaction,
like binlog or online alter log, use a transaction_participant
and no longer need to pretend to be a full-blown but invisible
storage engine which doesn't support create table.
2024-11-05 14:00:50 -08:00
Sergei Golubchik
049d839350 mhnsw: inter-statement shared cache
* preserve the graph in memory between statements
* keep it in a TABLE_SHARE, available for concurrent searches
* nodes are generally read-only, walking the graph doesn't change them
* distance to target is cached, calculated only once
* SIMD-optimized bloom filter detects visited nodes
* nodes are stored in an array, not List, to better utilize bloom filter
* auto-adjusting heuristic to estimate the number of visited nodes
  (to configure the bloom filter)
* many threads can concurrently walk the graph. MEM_ROOT and Hash_set
  are protected with a mutex, but walking doesn't need them
* up to 8 threads can concurrently load nodes into the cache,
  nodes are partitioned into 8 mutexes (8 is chosen arbitrarily, might
  need tuning)
* concurrent editing is not supported though
* this is fine for MyISAM, TL_WRITE protects the TABLE_SHARE and the
  graph (note that TL_WRITE_CONCURRENT_INSERT is not allowed, because an
  INSERT into the main table means multiple UPDATEs in the graph)
* InnoDB uses secondary transaction-level caches linked in a list in
  in thd->ha_data via a fake handlerton
* on rollback the secondary cache is discarded, on commit nodes
  from the secondary cache are invalidated in the shared cache
  while it is exclusively locked
* on savepoint rollback both caches are flushed. this can be improved
  in the future with a row visibility callback
* graph size is controlled by @@mhnsw_cache_size, the cache is flushed
  when it reaches the threshold
2024-11-05 14:00:49 -08:00
Sergei Golubchik
613542dceb mhnsw: build indexes with the columns of exactly right size 2024-11-05 14:00:49 -08:00
Sergei Golubchik
553815ea24 cleanup: C++11 range-based for loop for Hash_set<> 2024-11-05 14:00:48 -08:00
Sergei Golubchik
d6add9a03d initial support for vector indexes
MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
2024-11-05 14:00:48 -08:00
Sergei Golubchik
1fe8a1bb76 cleanup: generalize ER_INNODB_NO_FT_TEMP_TABLE 2024-11-05 14:00:48 -08:00
Sergei Golubchik
fd69abe44f cleanup: generalize ER_SPATIAL_CANT_HAVE_NULL 2024-11-05 14:00:48 -08:00
Sergei Golubchik
062f8eb37d cleanup: key algorithm vs key flags
the information about index algorithm was stored in two
places inconsistently split between both.

BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user
explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not.

RTREE index had key->algorithm == HA_KEY_ALG_RTREE
and always had key->flags & HA_SPATIAL

FULLTEXT index had  key->algorithm == HA_KEY_ALG_FULLTEXT
and always had key->flags & HA_FULLTEXT

HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF

long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH

In this commit:

All indexes except BTREE and HASH always have key->algorithm
set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except
for storage to keep frms backward compatible).

As a side effect ALTER TABLE now detects FULLTEXT index renames correctly
2024-11-05 14:00:47 -08:00
Sergei Golubchik
f5e9c4e9ef cleanup: Queue and Bounded_queue
Bounded_queue<> pretended to be a typesafe C++ wrapper
on top of pure C queues.h.

But it wasn't, it was tightly bounded to filesort and only useful there.

* implement Queue<> - a typesafe C++ wrapper on top of QUEUE
* move Bounded_queue to filesort.cc, remove pointless "generalizations"
  change it to use Queue.
* remove bounded_queue.h
* change subselect_rowid_merge_engine to use Queue, not QUEUE
2024-11-05 14:00:47 -08:00
Sergei Golubchik
ae59127158 cleanup: lex_string_set3() 2024-11-05 14:00:47 -08:00
Sergei Golubchik
32e6f8ff2e cleanup: remove unconditional #ifdef's 2024-11-05 14:00:47 -08:00
Sergei Golubchik
949fed514a cleanup: get_float convenience helper
more helpers like that can be added as needed
2024-11-05 14:00:47 -08:00
Sergei Golubchik
d046aca0c7 cleanup: CREATE_TYPELIB_FOR() helper 2024-11-05 14:00:47 -08:00
Sergei Golubchik
9fa31c1bd9 cleanup: spaces, casts, comments 2024-11-05 14:00:47 -08:00
Sergei Golubchik
9f2adffcca memroot improvement: fix savepoint support
savepoint support was added a year ago as get_last_memroot_block()
and free_all_new_blocks(), but it didn't work and was disabled.

* fix it to work
* instead of freeing the memory, only mark blocks free - this feature
  is supposed to be used inside a loop (otherwise there is no need
  to free anything, end of statement will do it anyway). And freeing
  blocks inside a loop is a bad idea, as they'll be all malloc-ed
  on the next iteration again. So, don't.
* fix a bug in mark_blocks_free() - it doesn't change the number of blocks
2024-11-05 14:00:47 -08:00
Oleksandr Byelkin
c770bce898 Merge branch '11.2' into 11.4 2024-10-30 15:11:17 +01:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Oleksandr Byelkin
3d0fb15028 Merge branch '10.6' into 10.11 2024-10-29 15:24:38 +01:00
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
Sergei Petrunia
284593413f MDEV-35253: xa_prepare_unlock_unmodified fails: shift exponent 32 is too large
The code in best_access_path() uses PREV_BITS(uint, N) to
compute a bitmap of all keyparts: {keypart0, ... keypart{N-1}).

The problem is that PREV_BITS($type, N) macro code can't handle the case
when N=<number of bits in $type).
Also, why use PREV_BITS(uint, ...) for key part map computations when
we could have used PREV_BITS(key_part_map) ?

Fixed both:
- Change PREV_BITS(type, N) to handle any N in [0; n_bits(type)].
- Change PREV_BITS() to use key_part_map when computing key_part_map bitmaps.
2024-10-25 18:02:14 +03:00
Oleg Smirnov
fd87e01f38 MDEV-27277 Add a warning when max_sort_length is reached
During a query execution some sorting and grouping operations
on strings may be involved. System variable max_sort_length defines
the maximum number of bytes to use when comparing strings during
sorting/grouping. Thus, the comparable parts of strings may be less
than their actual size, so the results of the query may be not
sorted/grouped properly.
To indicate that some comparisons were done on a truncated lengths,
a new warning has been introduced with this commit.
2024-10-22 22:39:36 +07:00