Commit graph

1352 commits

Author SHA1 Message Date
Daniel Black
058484687a Add TL_FIRST_WRITE in SQL layer for determining R/W
Use < TL_FIRST_WRITE for determining a READ transaction.

Use TL_FIRST_WRITE as the relative operator replacing TL_WRITE_ALLOW_WRITE
as the minimium WRITE lock type.
2021-04-08 16:51:36 +10:00
Marko Mäkelä
03ff588d15 Merge 10.5 into 10.6 2021-03-05 16:05:47 +02:00
Marko Mäkelä
10d544aa7b Merge 10.4 into 10.5 2021-03-05 12:54:43 +02:00
Marko Mäkelä
fcc9f8b10c Remove unused HA_EXTRA_FAKE_START_STMT
This is fixup for commit f06a0b5338.
2021-03-05 10:40:16 +02:00
Marko Mäkelä
94b4578704 Merge 10.5 into 10.6 2021-02-17 19:39:05 +02:00
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Marko Mäkelä
1110beccd4 Merge 10.5 into 10.6 2021-02-02 15:15:53 +02:00
Marko Mäkelä
6d1f1b61b5 MDEV-24564 Statistics are lost after ALTER TABLE
Ever since commit 007f68c37f,
ALTER TABLE no longer invokes handler::open() after
handler::commit_inplace_alter_table().

ha_innobase::reload_statistics(): Reload or recompute statistics
after ALTER TABLE.

innodb_notify_tabledef_changed(): A new function to invoke
ha_innobase::reload_statistics().

handlerton::notify_tabledef_changed(): Add the parameter handler*
so that ha_innobase::reload_statistics() can be invoked.

ha_partition::notify_tabledef_changed(),
partition_notify_tabledef_changed(): Pass through the call
to any partitions or subpartitions.

This is based on code that was supplied by Monty.
2021-01-28 14:15:01 +02:00
Nikita Malyavin
21809f9a45 MDEV-17556 Assertion `bitmap_is_set_all(&table->s->all_set)' failed
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.

This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
 bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
 sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
 directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.

The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.

To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
 orig_*, all_set bitmaps are never substituted already.

This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
 to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
 MY_BITMAP* instead of my_bitmap_map*

These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
2021-01-27 00:50:55 +10:00
Marko Mäkelä
3cef4f8f0f MDEV-515 Reduce InnoDB undo logging for insert into empty table
We implement an idea that was suggested by Michael 'Monty' Widenius
in October 2017: When InnoDB is inserting into an empty table or partition,
we can write a single undo log record TRX_UNDO_EMPTY, which will cause
ROLLBACK to clear the table.

For this to work, the insert into an empty table or partition must be
covered by an exclusive table lock that will be held until the transaction
has been committed or rolled back, or the INSERT operation has been
rolled back (and the table is empty again), in lock_table_x_unlock().

Clustered index records that are covered by the TRX_UNDO_EMPTY record
will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot
be distinguished from what MDEV-12288 leaves behind after purging the
history of row-logged operations.

Concurrent non-locking reads must be adjusted: If the read view was
created before the INSERT into an empty table, then we must continue
to imagine that the table is empty, and not try to read any records.
If the read view was created after the INSERT was committed, then
all records must be visible normally. To implement this, we introduce
the field dict_table_t::bulk_trx_id.

This special handling only applies to the very first INSERT statement
of a transaction for the empty table or partition. If a subsequent
statement in the transaction is modifying the initially empty table again,
we must enable row-level undo logging, so that we will be able to
roll back to the start of the statement in case of an error (such as
duplicate key).

INSERT IGNORE will continue to use row-level logging and locking, because
implementing it would require the ability to roll back the latest row.
Since the undo log that we write only allows us to roll back the entire
statement, we cannot support INSERT IGNORE. We will introduce a
handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage
engines that INSERT IGNORE is being executed.

In many test cases, we add an extra record to the table, so that during
the 'interesting' part of the test, row-level locking and logging will
be used.

Replicas will continue to use row-level logging and locking until
MDEV-24622 has been addressed. Likewise, this optimization will be
disabled in Galera cluster until MDEV-24623 enables it.

dict_table_t::bulk_trx_id: The latest active or committed transaction
that initiated an insert into an empty table or partition.
Protected by exclusive table lock and a clustered index leaf page latch.

ins_node_t::bulk_insert: Whether bulk insert was initiated.

trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert).
Unlike earlier, this collection will cover also temporary tables.

trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(),
is_bulk_insert(), was_bulk_insert().

trx_undo_report_row_operation(): Before accessing any undo log pages,
invoke trx->mod_tables.emplace() in order to determine whether undo
logging was disabled, or whether this is the first INSERT and we are
supposed to write a TRX_UNDO_EMPTY record.

row_ins_clust_index_entry_low(): If we are inserting into an empty
clustered index leaf page, set the ins_node_t::bulk_insert flag for
the subsequent trx_undo_report_row_operation() call.

lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock():
Remove the redundant parameter 'flags' that can be checked in the caller.

btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write
DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation().

trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT),
ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that
the next statement will not be covered by table-level undo logging.

ReadView::changes_visible(trx_id_t) const: New accessor for the case
where the trx_id_t is not read from a potentially corrupted index page
but directly from the memory. In this case, we can skip a sanity check.

row_sel(), row_sel_try_search_shortcut(), row_search_mvcc():
row_sel_try_search_shortcut_for_mysql(),
row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id.

row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees().

lock_sec_rec_cons_read_sees(): Replaced with lower-level code.

btr_root_page_init(): Refactored from btr_create().

dict_index_t::clear(), dict_table_t::clear(): Empty an index or table,
for the ROLLBACK of an INSERT operation.

ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT
into an empty table.

This is joint work with Thirunarayanan Balathandayuthapani,
who created a working prototype.
Thanks to Matthias Leich for extensive testing.
2021-01-25 18:41:27 +02:00
Marko Mäkelä
6a1e655cb0 Merge 10.4 into 10.5 2020-12-02 18:29:49 +02:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Alexey Botchkov
75e7132fca MDEV-21842 auto_increment does not increment with compound primary key on partitioned table.
The idea of this fix is that it's enough to prevent the
next_auto_inc_val from incrementing if an error, to fix this problem
and also the MDEV-17333.
So this patch basically reverts the existing fix to the MDEV-17333.
2020-11-23 14:12:30 +04:00
Marko Mäkelä
46957a6a77 Merge 10.3 into 10.4 2020-10-22 13:27:18 +03:00
Kentoku SHIBA
b30ad01d40 MDEV-20100 MariaDB 13.3.9 Crash "[ERROR] mysqld got signal 11 ;"
Some functions on ha_partition call functions on all partitions, but handler->reset() is only called that pruned by m_partitions_to_reset. So Spider didn't clear pointer on unpruned partitions, if the unpruned partitions are used by next query, Spider reference the pointer that is already freed.
2020-10-22 05:25:53 +09:00
Kentoku SHIBA
ac8d205795 MDEV-20100 MariaDB 13.3.9 Crash "[ERROR] mysqld got signal 11 ;"
Some functions on ha_partition call functions on all partitions, but handler->reset() is only called that pruned by m_partitions_to_reset. So Spider didn't clear pointer on unpruned partitions, if the unpruned partitions are used by next query, Spider reference the pointer that is already freed.
2020-10-22 05:21:35 +09:00
Monty
2c8c15483d MDEV-23730 s3.replication_partition 'innodb,mix' segv
This failure was caused because of several bugs:
- Someone had removed s3-slave-ignore-updates=1 from slave.cnf, which
  caused the slave to remove files that the master was working on.
- Bug in ha_partition::change_partitions() that didn't reset m_new_file
  in case of errors. This caused crashes in ha_maria::extra() as the
  maria handler was called on files that was already closed.
- In ma_pagecache there was a bug that when one got a read error one a
  big block (s3 block), it left the flag PCBLOCK_BIG_READ on for the page
  which cased an assert when the page where flushed.
- Flush all cached tables in case of ignored ALTER TABLE

Note that when merging code from 10.3, that fixes the partition bug, use
the code from this patch instead.

Changes to ma_pagecache.cc written or reviewed by Sanja
2020-10-21 03:09:29 +03:00
Kentoku SHIBA
88d22f0e65 MDEV-20100 MariaDB 13.3.9 Crash "[ERROR] mysqld got signal 11 ;"
Some functions on ha_partition call functions on all partitions, but handler->reset() is only called that pruned by m_partitions_to_reset. So Spider didn't clear pointer on unpruned partitions, if the unpruned partitions are used by next query, Spider reference the pointer that is already freed.
2020-10-20 22:32:12 +09:00
Monty
311b7f94e6 MDEV-23248 Server crashes in mi_extra / ha_partition::loop_extra_alter upon REORGANIZE
This also fixes some issues with
MDEV-23730 s3.replication_partition 'innodb,mix' segv

The problem was that mysql_change_partitions() closes all handler files
in case of error, which was not properly reflected in
fast_alter_partition_table(). This caused handle_alter_part_error() to
try to close already closed tables, which caused the crash.

Fixed fast_alter_partion_table() to reflect when tables are opened.
I also fixed that ha_partition::change_partitions() resets m_new_file in
case of errors.
Either of the above changes fixes the issue, but both are needed to ensure
that the code works as expected.
2020-10-16 19:48:36 +03:00
Marko Mäkelä
cf87f3e08c Merge 10.4 into 10.5 2020-08-14 11:33:35 +03:00
Marko Mäkelä
2f7b37b021 Merge 10.3 into 10.4, except MDEV-22543
Also, fix GCC -Og -Wmaybe-uninitialized in run_backup_stage()
2020-08-13 18:48:41 +03:00
Marko Mäkelä
b811c6ecc7 Fix GCC 10.2.0 -Og -Wmaybe-uninitialized
Fix some more cases after merging
commit 31aef3ae99.
Some warnings look possibly genuine, others are clearly bogus.
2020-08-13 18:21:30 +03:00
Oleksandr Byelkin
48b5777ebd Merge branch '10.4' into 10.5 2020-08-04 17:24:15 +02:00
Oleksandr Byelkin
57325e4706 Merge branch '10.3' into 10.4 2020-08-03 14:44:06 +02:00
Oleksandr Byelkin
c32f71af7e Merge branch '10.2' into 10.3 2020-08-03 13:41:29 +02:00
Oleksandr Byelkin
ef7cb0a0b5 Merge branch '10.1' into 10.2 2020-08-02 11:05:29 +02:00
Ian Gilfillan
d2982331a6 Code comment spellfixes 2020-07-22 23:18:12 +02:00
Marko Mäkelä
4d4865de6f Merge 10.4 into 10.5 2020-07-20 15:55:59 +03:00
Marko Mäkelä
4b959bd8df Merge 10.3 into 10.4 2020-07-20 15:34:59 +03:00
Alexey Botchkov
2cae58f891 MDEV-18371 Server crashes in ha_innobase::cmp_ref upon UPDATE with PARTITION clause.
m_file[0] not always is a good sample.
2020-07-17 12:20:23 +04:00
Sergei Golubchik
c55c292832 introduce hton->drop_table() method
first step in moving drop table out of the handler.
todo: other methods that don't need an open table

for now hton->drop_table is optional, for backward compatibility
reasons
2020-07-04 01:44:46 +02:00
Monty
5211af1c16 Merge remote-tracking branch 'origin/10.3' into 10.4 2020-07-03 00:35:28 +03:00
Monty
65f831d17c Fixed bugs found by valgrind
- Some of the bug fixes are backports from 10.5!
- The fix in innobase/fil/fil0fil.cc is just a backport to get less
  error messages in mysqld.1.err when running with valgrind.
- Renamed HAVE_valgrind_or_MSAN to HAVE_valgrind
2020-07-02 17:57:34 +03:00
Monty
d35616aab3 Fixed crash in failing instant alter table with partitioned table
MDEV-22649 SIGSEGV in ha_partition::create_partitioning_metadata on ALTER
MDEV-22804 SIGSEGV in ha_partition::create_partitioning_metadata
2020-06-14 19:39:42 +03:00
Sergei Petrunia
d7d80689b3 MDEV-15101: Stop ANALYZE TABLE from flushing table definition cache
Apply this patch from Percona Server (amended for 10.5):

commit cd7201514fee78aaf7d3eb2b28d2573c76f53b84
Author: Laurynas Biveinis <laurynas.biveinis@gmail.com>
Date:   Tue Nov 14 06:34:19 2017 +0200

    Fix bug 1704195 / 87065 / TDB-83 (Stop ANALYZE TABLE from flushing table definition cache)

    Make ANALYZE TABLE stop flushing affected tables from the table
    definition cache, which has the effect of not blocking any subsequent
    new queries involving the table if there's a parallel long-running
    query:

    - new table flag HA_ONLINE_ANALYZE, return it for InnoDB and TokuDB
      tables;
    - in mysql_admin_table, if we are performing ANALYZE TABLE, and the
      table flag is set, do not remove the table from the table
      definition cache, do not invalidate query cache;
    - in partitioning handler, refresh the query optimizer statistics
      after ANALYZE if the underlying handler supports HA_ONLINE_ANALYZE;
    - new testcases main.percona_nonflushing_analyze_debug,
      parts.percona_nonflushing_abalyze_debug and a supporting debug sync
      point.

    For TokuDB, this change exposes bug TDB-83 (Index cardinality stats
    updated for handler::info(HA_STATUS_CONST), not often enough for
    tokudb_cardinality_scale_percent). TokuDB may return different
    rec_per_key values depending on dynamic variable
    tokudb_cardinality_scale_percent value. The server does not have a way
    of knowing that changing this variable invalidates the previous
    rec_per_key values in any opened table shares, and so does not call
    info(HA_STATUS_CONST) again. Fix by updating rec_per_key for both
    HA_STATUS_CONST and HA_STATUS_VARIABLE. This also forces a re-record
    of tokudb.bugs.db756_card_part_hash_1_pick, with the new output
    seeming to be more correct.
2020-06-12 20:29:05 +03:00
Sergei Golubchik
89a33303c4 remove dead code
reduce the amount of engine-specific code in the server,
particularly as it does not serve any purpose now.

may be needed for VP engine,
to be reconsidered in MDEV-7795
2020-06-09 14:32:43 +02:00
Marko Mäkelä
4a0b56f604 Merge 10.4 into 10.5 2020-05-31 10:28:59 +03:00
Marko Mäkelä
6da14d7b4a Merge 10.3 into 10.4 2020-05-30 11:04:27 +03:00
Marko Mäkelä
e9aaa10c11 Merge 10.2 into 10.3 2020-05-29 22:21:19 +03:00
Aleksey Midenkov
4783494a5e MDEV-22283 Server crashes in key_copy or unexpected error 156
(The table already existed in the storage engine)

Wrong algorithm of closing partitions on error doesn't close last
partition.
2020-05-29 16:19:15 +03:00
Sergei Golubchik
e64dc07125 assert(a && b); -> assert(a); assert(b); 2020-05-27 15:56:40 +02:00
Monty
4102f1589c Aria will now register it's transactions
MDEV-22531 Remove maria::implicit_commit()
MDEV-22607 Assertion `ha_info->ht() != binlog_hton' failed in
           MYSQL_BIN_LOG::unlog_xa_prepare

From the handler point of view, Aria now looks like a transactional
engine. One effect of this is that we don't need to call
maria::implicit_commit() anymore.

This change also forces the server to call trans_commit_stmt() after doing
any read or writes to system tables.  This work will also make it easier
to later allow users to have system tables in other engines than Aria.

To handle the case that Aria doesn't support rollback, a new
handlerton flag, HTON_NO_ROLLBACK, was added to engines that has
transactions without rollback (for the moment only binlog and Aria).

Other things
- Moved freeing of MARIA_SHARE to a separate function as the MARIA_SHARE
  can be still part of a transaction even if the table has closed.
- Changed Aria checkpoint to use the new MARIA_SHARE free function. This
  fixes a possible memory leak when using S3 tables
- Changed testing of binlog_hton to instead test for HTON_NO_ROLLBACK
- Removed checking of has_transaction_manager() in handler.cc as we can
  assume that as the transaction was started by the engine, it does
  support transactions.
- Added new class 'start_new_trans' that can be used to start indepdendent
  sub transactions, for example while reading mysql.proc, using help or
  status tables etc.
- open_system_tables...() and open_proc_table_for_Read() doesn't anymore
  take a Open_tables_backup list. This is now handled by 'start_new_trans'.
- Split thd::has_transactions() to thd::has_transactions() and
  thd::has_transactions_and_rollback()
- Added handlerton code to free cached transactions objects.
  Needed by InnoDB.

squash! 2ed35999f2a2d84f1c786a21ade5db716b6f1bbc
2020-05-23 12:29:10 +03:00
Sergei Golubchik
67aaf51cf9 cleanup: ha_external_unlock() helper
as mentioned in f9f33b85be and generally to make it
easier to talk about
2020-05-05 19:41:12 +02:00
Monty
eca5c2c67f Added support for more functions when using partitioned S3 tables
MDEV-22088 S3 partitioning support

All ALTER PARTITION commands should now work on S3 tables except

REBUILD PARTITION
TRUNCATE PARTITION
REORGANIZE PARTITION

In addition, PARTIONED S3 TABLES can also be replicated.
This is achived by storing the partition tables .frm and .par file on S3
for partitioned shared (S3) tables.

The discovery methods are enchanced by allowing engines that supports
discovery to also support of the partitioned tables .frm and .par file

Things in more detail

- The .frm and .par files of partitioned tables are stored in S3 and kept
  in sync.
- Added hton callback create_partitioning_metadata to inform handler
  that metadata for a partitoned file has changed
- Added back handler::discover_check_version() to be able to check if
  a table's or a part table's definition has changed.
- Added handler::check_if_updates_are_ignored(). Needed for partitioning.
- Renamed rebind() -> rebind_psi(), as it was before.
- Changed CHF_xxx hadnler flags to an enum
- Changed some checks from using table->file->ht to use
  table->file->partition_ht() to get discovery to work with partitioning.
- If TABLE_SHARE::init_from_binary_frm_image() fails, ensure that we
  don't leave any .frm or .par files around.
- Fixed that writefrm() doesn't leave unusable .frm files around
- Appended extension to path for writefrm() to be able to reuse to function
  for creating .par files.
- Added DBUG_PUSH("") to a a few functions that caused a lot of not
  critical tracing.
2020-04-19 17:33:51 +03:00
Sergei Golubchik
0515577d12 cleanup: prepare "update_handler" for WITHOUT OVERLAPS
* rename to a generic name
* move remaning initializations from query exec to prepare time
* simplify/unify key handling in open_table_from_share and delayed
* remove dead code
* move tests where they belong
2020-03-31 17:42:34 +02:00
Nikita Malyavin
b9df4d2a35 Fix real keyread count for partitions
Sergei's commit ac6b3c4430 implemented handler status counters
compensation for underlying handlers like ha_partition.
`index_read_idx_map` is missing there, but it should have been fixed as
well (proof: ha_partition::index_read_idx_map never calls
ha_partition::index_read_map).

Note: all this compensation logic could be broken for subpartitions! (We
can experience double decrement)
2020-03-31 17:42:34 +02:00
Nikita Malyavin
e6af62189e unify "partitioning cannot do X" error messages 2020-03-31 17:42:34 +02:00
Marko Mäkelä
37c14690fc Merge 10.4 into 10.5 2020-03-30 19:07:25 +03:00
Marko Mäkelä
e2f1f88fa6 Merge 10.3 into 10.4 2020-03-30 14:50:23 +03:00