Commit graph

25442 commits

Author SHA1 Message Date
Marko Mäkelä
92f79a22e6 Merge 10.5 into 10.6 2022-02-22 12:12:49 +02:00
Vlad Lesin
a112a80b47 Merge 10.4 into 10.5 2022-02-22 10:35:16 +03:00
Vlad Lesin
f6f055a191 Merge 10.3 into 10.4 2022-02-21 14:10:27 +03:00
Vlad Lesin
a6f258e47f MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration
Backported from 10.5 20e9e804c1 and
5948d7602e.

sel_restore_position_for_mysql() moves forward persistent cursor
position after btr_pcur_restore_position() call if cursor relative position
is BTR_PCUR_ON and the cursor points to the record with NOT the same field
values as in a stored record(and some other not important for this case
conditions).

It was done because btr_pcur_restore_position() sets
page_cur_mode_t mode  to PAGE_CUR_LE for cursor->rel_pos ==  BTR_PCUR_ON
before opening cursor. So we are searching for the record less or equal
to stored one. And if the found record is not equal to stored one, then
it is less and we need to move cursor forward.

But there can be a situation when the stored record was purged, but the
new one with the same key but different value was inserted while
row_search_mvcc() was suspended. In this case, when the thread is
awaken, it will invoke sel_restore_position_for_mysql(), which, in turns,
invoke btr_pcur_restore_position(), which will return false because found
record don't match stored record, and
sel_restore_position_for_mysql() will move forward cursor position.

The above can lead to the case when awaken row_search_mvcc() do not see
records inserted by other transactions while it slept. The mtr test case
shows the example how it can be.

The fix is to return special value from persistent cursor restoring
function which would notify its caller that uniq fields of restored
record and stored record are the same, and in this case
sel_restore_position_for_mysql() don't move cursor forward.

Delete-marked records are correctly processed in row_search_mvcc().
Non-unique secondary indexes are "uniquified" by adding the PK, the
index->n_uniq should then be index->n_fields. So there is no need in
additional checks in the fix.

If transaction's readview can't see the changes made in secondary index
record, it requests clustered index record in row_search_mvcc() to check
its transaction id and get the correspondent record version. After this
row_search_mvcc() commits mtr to preserve clustered index latching
order, and starts mtr. Between those mtr commit and start secondary
index pages are unlatched, and purge has the ability to remove stored in
the cursor record, what causes rows duplication in result set for
non-locking reads, as cursor position is restored to the previously
visited record.

To solve this the changes are just switched off for non-locking reads,
it's quite simple solution, besides the changes don't make sense for
non-locking reads.

The more complex and effective from performance perspective solution is
to create mtr savepoint before clustered record requesting and rolling
back to that savepoint after that. See MDEV-27557.

One more solution is to have per-record transaction id for secondary
indexes. See MDEV-17598.

If any of those is implemented, just remove select_lock_type argument in
sel_restore_position_for_mysql().
2022-02-21 12:49:54 +03:00
Vlad Lesin
5f001bd7b8 MDEV-27025 insert-intention lock conflicts with waiting ORDINARY lock
The code was backported from 10.5 be8113861c
commit. See that commit message for details.
2022-02-21 12:49:54 +03:00
Marko Mäkelä
4030a9fb2e MDEV-26476: Implement futex for FreeBSD, DragonFly BSD 2022-02-18 15:13:56 +02:00
Nayuta Yanagisawa
66f55a018b MDEV-27730 Add PLUGIN_VAR_DEPRECATED flag to plugin variables
The sys_var class has the deprecation_substitute member to mark the
deprecated variables. As it's set, the server produces warnings when
these variables are used. However, the plugin has no means to utilize
that functionality.

So, the PLUGIN_VAR_DEPRECATED flag is introduced to set the
deprecation_substitute with the empty string. A non-empty string can
make the warning more informative, but there's no nice way seen to
specify it, and not that needed at the moment.
2022-02-18 13:10:20 +09:00
Vladislav Vaintroub
fa557986ac MDEV-24175 Windows - fix detection of whether file is on SSD
Fix detection. SSD is when storage does *not* incur a seek penalty.
2022-02-17 22:55:08 +01:00
Marko Mäkelä
f04b459fb7 Merge 10.5 into 10.6 2022-02-17 14:37:17 +02:00
Marko Mäkelä
cac995ec6f Merge 10.4 into 10.5 2022-02-17 11:58:25 +02:00
Marko Mäkelä
f921db7aa5 Merge 10.3 into 10.4 2022-02-17 11:33:08 +02:00
Vladislav Vaintroub
8bc5bf2cb6 MDEV-26789 Fix stall of group commit waiters
Fixed a condition where designated group commit lead was woken in release,
but returned early without trying to take over the lock.

So, the group commit locks would be unowned, and the waiters could
be stalled, until another thread comes that would on whatever reasons
flush the redo log.

Also, use better criteria for choosing potential next group commit lead.
2022-02-17 10:24:14 +01:00
Marko Mäkelä
5b237e5965 Merge 10.2 into 10.3 2022-02-17 10:53:58 +02:00
Marko Mäkelä
73c391afc5 MDEV-27583 InnoDB uses different constants for FK cascade error message in SQL vs error log
convert_error_code_to_mysql(): Use the correct limit FK_MAX_CASCADE_DEL
in the error message. The DICT_FK_MAX_RECURSIVE_LOAD applies to
the number of foreign key constraints in table definitions,
not to the number of rows that are visited while processing
a foreign key constraint.
2022-02-17 10:48:24 +02:00
Monty
0a92ef458b MDEV-17223 Assertion `thd->killed != 0' failed in ha_maria::enable_indexes
MDEV-22500 Assertion `thd->killed != 0' failed in ha_maria::enable_indexes

For MDEV-17223 the issue was an assert that didn't take into account that
we could get duplicate key errors when enablling unique indexes.
Fixed by not retrying repair in case of duplicate key error for this
case, which avoids the assert.

For MDEV-22500 I removed the assert, as it's not critical (just a way to
find potential wrong code) and we will anyway get things logged in the
error log if this happens. This case cannot triggered an assert in 10.3
but I verified that it would trigger in 10.5 and that this patch fixes
it.
2022-02-16 17:16:10 +02:00
Marko Mäkelä
cf574cf53b MDEV-27634 innodb_zip tests failing on s390x
Some GNU/Linux distributions ship a zlib that is modified to use
the s390x DFLTCC instruction. That modification would essentially
redefine compressBound(sourceLen) as (sourceLen * 16 + 2308) / 8 + 6.

Let us relax the tests for InnoDB ROW_FORMAT=COMPRESSED to cope with
such a weaker compression guarantee.

create_table_info_t::row_size_is_acceptable(): Remove a bogus debug-only
assertion that would fail to hold for the test innodb_zip.bug36169.
The function page_zip_empty_size() may indeed return 0.
2022-02-16 17:03:02 +02:00
Vlad Lesin
497809d26d Merge 10.5 into 10.6 2022-02-15 11:32:15 +03:00
Vlad Lesin
5948d7602e MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration
Post-push fix: remove unstable test.

The test was developed to find the reason of duplicated rows caused by
MDEV-20605 fix. The test is not necessary as the reason was found and
the bug was fixed.
2022-02-15 10:04:05 +03:00
Vlad Lesin
f2f22c382b Merge 10.5 into 10.6 2022-02-14 18:30:51 +03:00
Vlad Lesin
20e9e804c1 MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration
sel_restore_position_for_mysql() moves forward persistent cursor
position after btr_pcur_restore_position() call if cursor relative position
is BTR_PCUR_ON and the cursor points to the record with NOT the same field
values as in a stored record(and some other not important for this case
conditions).

It was done because btr_pcur_restore_position() sets
page_cur_mode_t mode  to PAGE_CUR_LE for cursor->rel_pos ==  BTR_PCUR_ON
before opening cursor. So we are searching for the record less or equal
to stored one. And if the found record is not equal to stored one, then
it is less and we need to move cursor forward.

But there can be a situation when the stored record was purged, but the
new one with the same key but different value was inserted while
row_search_mvcc() was suspended. In this case, when the thread is
awaken, it will invoke sel_restore_position_for_mysql(), which, in turns,
invoke btr_pcur_restore_position(), which will return false because found
record don't match stored record, and
sel_restore_position_for_mysql() will move forward cursor position.

The above can lead to the case when awaken row_search_mvcc() do not see
records inserted by other transactions while it slept. The mtr test case
shows the example how it can be.

The fix is to return special value from persistent cursor restoring
function which would notify its caller that uniq fields of restored
record and stored record are the same, and in this case
sel_restore_position_for_mysql() don't move cursor forward.

Delete-marked records are correctly processed in row_search_mvcc().
Non-unique secondary indexes are "uniquified" by adding the PK, the
index->n_uniq should then be index->n_fields. So there is no need in
additional checks in the fix.

If transaction's readview can't see the changes made in secondary index
record, it requests clustered index record in row_search_mvcc() to check
its transaction id and get the correspondent record version. After this
row_search_mvcc() commits mtr to preserve clustered index latching
order, and starts mtr. Between those mtr commit and start secondary
index pages are unlatched, and purge has the ability to remove stored in
the cursor record, what causes rows duplication in result set for
non-locking reads, as cursor position is restored to the previously
visited record.

To solve this the changes are just switched off for non-locking reads,
it's quite simple solution, besides the changes don't make sense for
non-locking reads.

The more complex and effective from performance perspective solution is
to create mtr savepoint before clustered record requesting and rolling
back to that savepoint after that. See MDEV-27557.

One more solution is to have per-record transaction id for secondary
indexes. See MDEV-17598.

If any of those is implemented, just remove select_lock_type argument in
sel_restore_position_for_mysql().
2022-02-14 17:35:04 +03:00
Marko Mäkelä
feb8004b58 Merge 10.5 into 10.6 2022-02-14 09:16:41 +02:00
Marko Mäkelä
52b32c60c2 Merge 10.4 into 10.5 2022-02-14 08:59:33 +02:00
Marko Mäkelä
6405ed63e1 MariaDB 10.5.15 release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEF39AEP5WyjM2MAMF8WVvJMdM0dgFAmIICXMACgkQ8WVvJMdM
 0dhR2BAAzzT/xidb2vGKdIG4jzaXdS1fYRitiVUmL1DXdNHtDb3T2LM62XenrA5/
 zvnFac5MmfIiIzBoimbuB8L/7VC4VYx6Ud6Dn24AirF797U2US8/sgDVLp8MBQeg
 RIjayqFVYIP2/8zWTilgRARgwI1oiZtnGg6VzE2YOvpt3v0qBSlXDHJHoLPjuK1P
 5nB2LqeenVOvaM8qJMZ7S4X+yx51MaRPf9BIrzqUMfGlZm5+xU/xk2Y0NnJpQeuA
 0z9e7K48CQebtZZeP3ja/3U/HVzlLAmqC2rxIk4vu8LCl/w/nHrzXF+Pfjrq54oT
 LImxHNTSowE4ArAIBYk4QDrCBiyqD/mKrMalpaHmY63T9wY18FsEjEgIqvyoOlIq
 x8YYPpVUGj/Q6feF9LEqCl2j7KkwVMEvFDFgpoRGZj13HmnQZ3kVocn+oiPt4F/V
 iaTyllMPNfUeYG0DM8rAxOXA0agAUvKTXw2YAAa7kryA+aq28WMSpsecp56UtTJ/
 DYhZ18A/DyKxxHHS84zoha/szxjlnQ+PsnZsslV7Z5J7XUuplunPWT8JXNLZJqXh
 aKLadcrne19bSwVrsmQLpMYIWaXy1/edhjIRAuWNY8YqXpKUqtz6lHwQFFZiFEMk
 Cob5fXS60bQNiq8JQLLmR6qPfdvehICWnC636L+Mh+4/bBfxWwg=
 =XaIA
 -----END PGP SIGNATURE-----

Merge mariadb-10.5.15 into 10.5
2022-02-14 08:59:13 +02:00
Marko Mäkelä
c9bc10e6e8 Merge 10.3 into 10.4 2022-02-14 08:56:50 +02:00
Marko Mäkelä
e928fdbff1 Merge 10.2 into 10.3 2022-02-14 08:49:11 +02:00
Marko Mäkelä
7b891008ce MDEV-27817 InnoDB recovery of recently created files is not crash-safe
Before commit 86dc7b4d4c (MDEV-24626)
all tablespace ID that needed recovery were known already in
recv_init_crash_recovery_spaces().

recv_sys_t::recover_deferred(): Invoke fil_names_dirty(space) on
the newly initialized tablespace. In this way, if the next log
checkpoint occurs at some LSN that is after the initialization of
the tablespace and before the last recovered LSN, a FILE_MODIFY
record will be written, so that a subsequent recovery will succeed.

The recovery was broken when
commit 0261eac57f merged the 10.5
commit f443cd1100 (MDEV-27022).
2022-02-13 17:33:40 +02:00
Marko Mäkelä
f1e08eaa5d MariaDB 10.6.7 release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEF39AEP5WyjM2MAMF8WVvJMdM0dgFAmIIAwwACgkQ8WVvJMdM
 0dgx5g/+Peg6omiOQTbuKjBbtkAF2K1WDkWL5RaD80mfyqpU3iwPeTMWjOP1tnwT
 fKKakw4GUv+iHI4/pHj1JBrUKaPp4a7bNIeG9gURMAxvYLZMdwNjvwUSFyCUrNUi
 HQMukcyMMjMJxvfEi7mPY3v299tUx3IJrhvcDypvFA1loZGMzbrdGXlSUtS5AfqO
 2y3+J74ZHZYff6rBP4Ngx1jJQ+JueRZLazRCcpUnW+IsSiAYhG4HNESv/l/4n+rd
 1Tk3S5uxzrk9w3ZezlYlWaKPiAqOivk2f/kkp3/l1agByRsqzGRyUmP5tpd1+G84
 70FzNRH6PxMcp1zkTIVJvdLKB6jSB0yeoCIOGNaj8Cmfpp0vyw2T67HKjCv7vQRH
 XT11Ndaxp1SfvhesAETmsCqb+m9w/X/zGLiRUbTU6fvgXthC9Qg6YDWRe6lnaxvn
 9q2qrPkD4xuYLQZwB3WmSkGESpzPuTuuE273zl6e2VlMd2xpkUaCqm81hYjXLEwf
 NNdAaEDg7FB47LDX8007e8XzsmYBD/FCCmGAAtjpm/6gLoByPkpPkDi7bdLQYbI2
 XTyjGp8JD60HLFT+KElxSHucrZ2HXkFkEJgUQapevV9oBQPnxpzw1c7uHPEch4vA
 b/IoyNwXLLhV0TYIjIozTjUQiE6aiUzkKFlVls9kL+etTf/w5cs=
 =Ahti
 -----END PGP SIGNATURE-----

Merge mariadb-10.6.7 into 10.6
2022-02-13 17:10:15 +02:00
Krunal Bauskar
fb875055c6 MDEV-27805: tpcc workload shows regression with MDB-10.6
- regression got revealed while running tpcc workload.

- as part of MDEV-25919 changes logic for statistics computation was revamped.

- if the table has changed to certain threshold then table is added to
  statistics recomputation queue (dict_stats_recalc_pool_add)

- after the table is added to queue the background statistics thread is
  notified

- during revamp the condition to notify background statistics threads was
  wrongly updated to check if the queue/vector is empty when it should
  check if there is queue/vector has entries to process.

- vec.begin() == vec.end() : only when vector is empty

- also accessing these iterator outside the parallely changing vector is not
  safe

- fix now tend to notify background statistics thread if the logic adds
  an entry to the queue/vector.
2022-02-11 12:32:11 +02:00
Vlad Lesin
3b10e8f80c MDEV-27746 Wrong comparision of BLOB's empty preffix with non-preffixed BLOB causes rows count mismatch for clustered and secondary indexes during non-locking read
row_sel_sec_rec_is_for_clust_rec() treats empty BLOB prefix field in
secondary index as a field equal to any external BLOB field in clustered
index. Row_sel_get_clust_rec_for_mysql::operator() doesn't zerro out
clustered record pointer in row_search_mvcc(), and row_search_mvcc()
thinks that delete-marked secondary index record has visible for
"CHECK TABLE"'s read view old-versioned clustered index record, and
row_scan_index_for_mysql() counts it as a row.

The fix is to execute row_sel_sec_rec_is_for_blob() in
row_sel_sec_rec_is_for_clust_rec() if clustered field contains BLOB's
reference.
2022-02-11 12:26:27 +03:00
Samuel Thibault
7c6ec0a53b MDEV-27804 Fails to build - perf schema - thread id of type uintptr_t requires header
While building on GNU/Hurd and kfreebsd.

On the C++ standard uintptr_t can be defined in <cstdint>
ref: https://www.cplusplus.com/reference/cstdint/

Fixes: 0d44792a83
2022-02-11 14:40:46 +11:00
Sergei Golubchik
e3894f5d39 Merge branch '10.5 into 10.6 2022-02-10 21:07:03 +01:00
Sergei Golubchik
9aa3564e8a Merge branch '10.4' into 10.5 2022-02-10 21:04:51 +01:00
Sergei Golubchik
b4477ae73c Merge branch '10.3' into 10.4 2022-02-10 20:39:13 +01:00
Sergei Golubchik
a36fc80aeb Merge branch '10.2' into 10.3 2022-02-10 20:23:56 +01:00
Sergei Golubchik
9e2c26b0f6 MDEV-26351 segfault - (MARIA_HA *) 0x0 in ha_maria::extra
don't let Aria create a table that it cannot open
2022-02-10 15:48:06 +01:00
Sergei Golubchik
9e39d0ae44 MDEV-25787 Bug report: crash on SELECT DISTINCT thousands_blob_fields
fix a debug assert to account for not opened temp tables
2022-02-10 13:45:11 +01:00
Marko Mäkelä
cce994057b Merge 10.5 into 10.6 2022-02-09 15:49:50 +02:00
Marko Mäkelä
fd101daa84 MDEV-27716 mtr_t::commit() acquires log_sys.mutex when writing no log
mtr_t::is_block_dirtied(), mtr_t::memo_push(): Never set m_made_dirty
for pages of the temporary tablespace. Ever since
commit 5eb539555b
we never add those pages to buf_pool.flush_list.

mtr_t::commit(): Implement part of mtr_t::prepare_write() here,
and avoid acquiring log_sys.mutex if no log is written.
During IMPORT TABLESPACE fixup, we do not write log, but we must
add pages to buf_pool.flush_list and for that, be prepared
to acquire log_sys.flush_order_mutex.

mtr_t::do_write(): Replaces mtr_t::prepare_write().
2022-02-09 15:10:10 +02:00
Oleksandr Byelkin
34c5019698 Merge branch '10.5' into bb-10.5-release 2022-02-09 08:57:41 +01:00
Marko Mäkelä
5c46751f23 MDEV-27734 Set innodb_change_buffering=none by default
The aim of the InnoDB change buffer is to avoid delays when a leaf page
of a secondary index is not present in the buffer pool, and a record needs
to be inserted, delete-marked, or purged. Instead of reading the page into
the buffer pool for making such a modification, we may insert a record to
the change buffer (a special index tree in the InnoDB system tablespace).
The buffered changes are guaranteed to be merged if the index page
actually needs to be read later.

The change buffer could be useful when the database is stored on a
rotational medium (hard disk) where random seeks are slower than
sequential reads or writes.

Obviously, the change buffer will cause write amplification, due to
potentially large amount of metadata that is being written to the
change buffer. We will have to write redo log records for modifying
the change buffer tree as well as the user tablespace. Furthermore,
in the user tablespace, we must maintain a change buffer bitmap page
that uses 2 bits for estimating the amount of free space in pages,
and 1 bit to specify whether buffered changes exist. This bitmap needs
to be updated on every operation, which could reduce performance.

Even if the change buffer were free of bugs such as MDEV-24449
(potentially causing the corruption of any page in the system tablespace)
or MDEV-26977 (corruption of secondary indexes due to a currently
unknown reason), it will make diagnosis of other data corruption harder.

Because of all this, it is best to disable the change buffer by default.
2022-02-09 08:36:41 +02:00
Vladislav Vaintroub
881918bf77 MDEV-27754 : Assertion with innodb_flush_method=O_DSYNC
If innodb_flush_method=O_DSYNC, log_sys.flushed_to_disk_lsn  is changed
without 'flush_lock' protection inside log_write().

This leads to a race condition, if there are 2 threads running in parallel,
doing log_write_up_to() with different values for 'flush_to_disk'

In this case, log_write() and log_write_flush_to_disk_low() can execute at
the same time, and both would change flushed_lsn.

The fix is to remove special treatment of durable writes from log_write().
There is no apparent reason for this special treatment, log_write_flush_to_disk_low()
is already optimized for durable writes.

Nor there is an apparent reason to call log_flush_notify() more often in
for O_DSYNC.
2022-02-07 09:14:00 +01:00
Sergei Golubchik
4ffffd98a5 update columnstore 2022-02-05 14:50:25 +01:00
Marko Mäkelä
82f5981e72 MDEV-27058 fixup: Crash in innodb.leaf_page_corrupted_during_recovery
buf_page_get_low(): If the page was read-fixed, validate the page ID
because the page could have been marked as corrupted. We should retry
the page read in this case, instead of returning a soon-to-be-evicted
corrupted page to the caller.

This was initially only observed on Microsoft Windows.
On Linux, this was repeated after adding a sleep
to buf_pool_t::corrupted_evict() between
bpage->zip.fix.fetch_sub() and bpage->lock.x_unlock().
2022-02-03 17:02:27 +01:00
Marko Mäkelä
05c33d6216 MDEV-27736 Allow seamless upgrade despite ROW_FORMAT=COMPRESSED
In commit 9bc874a594 (MDEV-23497)
the configuration option innodb_read_only_compressed was introduced
to giver users advance notice of a plan to remove ROW_FORMAT=COMPRESSED
support for InnoDB.

Based on user feedback, this plan has been scrapped.
Even though ROW_FORMAT=COMPRESSED is a dead end and causes some
overhead for InnoDB data structures, we can live with that.

Now that we know that some users really want to keep using
ROW_FORMAT=COMPRESSED, the previous default value of the parameter
innodb_read_only_compressed=ON should be changed to OFF, to allow
smooth upgrades to 10.6 and later versions, without requiring users
to update any configuration file.
2022-02-03 17:02:14 +01:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Thirunarayanan Balathandayuthapani
8d742fe4ac MDEV-26326 mariabackup skip valid ibd file
- Store the deferred tablespace name while loading the tablespace
for backup process.

- Mariabackup stores the list of space ids which has page0 INIT_PAGE
records. backup_first_page_op() and first_page_init() was introduced
to track the page0 INIT_PAGE records.

- backup_file_op() and log_file_op() was changed to handle
FILE_MODIFY redo log records. It is used to identify the
deferred tablespace space id.

- Whenever file operation redo log was processed by backup,
backup_file_op() should check whether the space name exist
in deferred tablespace. If it is then it needs to store the
space id, name when FILE_MODIFY, FILE_RENAME redo log processed
and it should delete the tablespace name from defer list in other
cases.

- backup_fix_ddl() should check whether deferred tablespace has
any page0 init records. If it is then consider the tablespace
as newly created tablespace. If not then backup should try
to reload the tablespace with SRV_BACKUP_NO_DEFER mode to
avoid the deferring of tablespace.
2022-02-01 19:50:08 +05:30
Oleksandr Byelkin
c04a203a10 Rocksdb result fix after merge 2022-01-31 08:37:33 +01:00
Sergei Golubchik
77b3777bab update columnstore to 6.2.3-1 2022-01-30 16:37:12 +01:00
Oleksandr Byelkin
a576a1cea5 Merge branch '10.3' into 10.4 2022-01-30 09:46:52 +01:00