Commit graph

282 commits

Author SHA1 Message Date
Marko Mäkelä
5c76e1e693 Merge 10.5 into 10.6 2021-04-15 20:21:11 +03:00
Justin Jagieniak
1715fef107 Fix cross-compile to consider CMAKE_CROSSCOMPILING_EMULATOR
When CMAKE_CROSSCOMPILING_EMULATOR is defined, a cross-compile
can be made, however with native (emulated) execution possible.

This commit takes those points in the build system that
execute built targets natively and allow these to be executed
in a crosscompile if CMAKE_CROSSCOMPILING_EMULATOR is defined.

Closes #1805
2021-04-15 10:07:50 +10:00
Marko Mäkelä
c071cc3455 Merge 10.5 into 10.6 2021-04-14 15:48:05 +03:00
Marko Mäkelä
d1f2001ee6 fixup 6c3e860cbf 2021-04-14 15:23:00 +03:00
Daniel Black
553ef1a78b MDEV-13115: Implement SELECT SKIP LOCKED
Adds an implementation for SELECT ... FOR UPDATE SKIP LOCKED /
SELECT ... LOCK IN SHARED MODE SKIP LOCKED

This is implemented only InnoDB at the moment, not in RockDB yet.

This adds a new hander flag HA_CAN_SKIP_LOCKED than
will be used when the storage engine advertises the flag.

When a storage engine indicates this flag it will get
TL_WRITE_SKIP_LOCKED and TL_READ_SKIP_LOCKED transaction types.

The Lex structure has been updated to store both the FOR UPDATE/LOCK IN
SHARE as well as the SKIP LOCKED so the SHOW CREATE VIEW
implementation is simplier.

"SELECT FOR UPDATE ... SKIP LOCKED" combined with CREATE TABLE AS or
INSERT.. SELECT on the result set is not safe for STATEMENT based
replication. MIXED replication will replicate this as row based events."

Thanks to guidance from Facebook commit
193896c466
This helped verify basic test case, and components that need implementing
(even though every part was implemented differently).

Thanks Marko for guidance on simplier InnoDB implementation.

Reviewers: Marko, Monty
2021-04-08 16:51:36 +10:00
Marko Mäkelä
03ff588d15 Merge 10.5 into 10.6 2021-03-05 16:05:47 +02:00
Marko Mäkelä
10d544aa7b Merge 10.4 into 10.5 2021-03-05 12:54:43 +02:00
Marko Mäkelä
fcc9f8b10c Remove unused HA_EXTRA_FAKE_START_STMT
This is fixup for commit f06a0b5338.
2021-03-05 10:40:16 +02:00
Marko Mäkelä
94b4578704 Merge 10.5 into 10.6 2021-02-17 19:39:05 +02:00
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Nikita Malyavin
c88fcf07d9 fixup MDEV-17556: fix mroonga 2021-02-01 18:05:47 +10:00
Nikita Malyavin
21809f9a45 MDEV-17556 Assertion `bitmap_is_set_all(&table->s->all_set)' failed
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.

This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
 bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
 sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
 directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.

The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.

To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
 orig_*, all_set bitmaps are never substituted already.

This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
 to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
 MY_BITMAP* instead of my_bitmap_map*

These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
2021-01-27 00:50:55 +10:00
Marko Mäkelä
3cef4f8f0f MDEV-515 Reduce InnoDB undo logging for insert into empty table
We implement an idea that was suggested by Michael 'Monty' Widenius
in October 2017: When InnoDB is inserting into an empty table or partition,
we can write a single undo log record TRX_UNDO_EMPTY, which will cause
ROLLBACK to clear the table.

For this to work, the insert into an empty table or partition must be
covered by an exclusive table lock that will be held until the transaction
has been committed or rolled back, or the INSERT operation has been
rolled back (and the table is empty again), in lock_table_x_unlock().

Clustered index records that are covered by the TRX_UNDO_EMPTY record
will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot
be distinguished from what MDEV-12288 leaves behind after purging the
history of row-logged operations.

Concurrent non-locking reads must be adjusted: If the read view was
created before the INSERT into an empty table, then we must continue
to imagine that the table is empty, and not try to read any records.
If the read view was created after the INSERT was committed, then
all records must be visible normally. To implement this, we introduce
the field dict_table_t::bulk_trx_id.

This special handling only applies to the very first INSERT statement
of a transaction for the empty table or partition. If a subsequent
statement in the transaction is modifying the initially empty table again,
we must enable row-level undo logging, so that we will be able to
roll back to the start of the statement in case of an error (such as
duplicate key).

INSERT IGNORE will continue to use row-level logging and locking, because
implementing it would require the ability to roll back the latest row.
Since the undo log that we write only allows us to roll back the entire
statement, we cannot support INSERT IGNORE. We will introduce a
handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage
engines that INSERT IGNORE is being executed.

In many test cases, we add an extra record to the table, so that during
the 'interesting' part of the test, row-level locking and logging will
be used.

Replicas will continue to use row-level logging and locking until
MDEV-24622 has been addressed. Likewise, this optimization will be
disabled in Galera cluster until MDEV-24623 enables it.

dict_table_t::bulk_trx_id: The latest active or committed transaction
that initiated an insert into an empty table or partition.
Protected by exclusive table lock and a clustered index leaf page latch.

ins_node_t::bulk_insert: Whether bulk insert was initiated.

trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert).
Unlike earlier, this collection will cover also temporary tables.

trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(),
is_bulk_insert(), was_bulk_insert().

trx_undo_report_row_operation(): Before accessing any undo log pages,
invoke trx->mod_tables.emplace() in order to determine whether undo
logging was disabled, or whether this is the first INSERT and we are
supposed to write a TRX_UNDO_EMPTY record.

row_ins_clust_index_entry_low(): If we are inserting into an empty
clustered index leaf page, set the ins_node_t::bulk_insert flag for
the subsequent trx_undo_report_row_operation() call.

lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock():
Remove the redundant parameter 'flags' that can be checked in the caller.

btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write
DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation().

trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT),
ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that
the next statement will not be covered by table-level undo logging.

ReadView::changes_visible(trx_id_t) const: New accessor for the case
where the trx_id_t is not read from a potentially corrupted index page
but directly from the memory. In this case, we can skip a sanity check.

row_sel(), row_sel_try_search_shortcut(), row_search_mvcc():
row_sel_try_search_shortcut_for_mysql(),
row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id.

row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees().

lock_sec_rec_cons_read_sees(): Replaced with lower-level code.

btr_root_page_init(): Refactored from btr_create().

dict_index_t::clear(), dict_table_t::clear(): Empty an index or table,
for the ROLLBACK of an INSERT operation.

ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT
into an empty table.

This is joint work with Thirunarayanan Balathandayuthapani,
who created a working prototype.
Thanks to Matthias Leich for extensive testing.
2021-01-25 18:41:27 +02:00
Marko Mäkelä
a13fac9eee Merge 10.5 into 10.6 2020-12-03 08:12:47 +02:00
Marko Mäkelä
6a1e655cb0 Merge 10.4 into 10.5 2020-12-02 18:29:49 +02:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Alexey Botchkov
75e7132fca MDEV-21842 auto_increment does not increment with compound primary key on partitioned table.
The idea of this fix is that it's enough to prevent the
next_auto_inc_val from incrementing if an error, to fix this problem
and also the MDEV-17333.
So this patch basically reverts the existing fix to the MDEV-17333.
2020-11-23 14:12:30 +04:00
Marko Mäkelä
09a1f0075a Merge 10.5 into 10.6 2020-11-02 12:49:19 +02:00
Marko Mäkelä
898521e2dd Merge 10.4 into 10.5 2020-10-30 11:15:30 +02:00
Marko Mäkelä
7b2bb67113 Merge 10.3 into 10.4 2020-10-29 13:38:38 +02:00
Sergei Golubchik
6cefe7d31e cleanup: use predefined CMAKE_DL_LIBS
instead of, say, MY_SEARCH_LIBS(dlopen dl LIBDL)
2020-10-23 13:37:26 +02:00
Marko Mäkelä
d61c800308 MDEV-23024 Remove Cassandra Storage Engine
Cassandra has long been non-functional, doesn't compile, etc.,
and development on it has halted. Since MariaDB Server 10.5,
the build is disabled by default.

It makes sense to remove it as it's just taking up space.
2020-07-14 16:24:50 +03:00
Vladislav Vaintroub
d46576b35a merge 10.5 2020-07-04 18:14:41 +02:00
Vladislav Vaintroub
272828a171 Merge branch '10.5' into 10.6 2020-07-04 11:53:26 +02:00
Sergei Golubchik
c55c292832 introduce hton->drop_table() method
first step in moving drop table out of the handler.
todo: other methods that don't need an open table

for now hton->drop_table is optional, for backward compatibility
reasons
2020-07-04 01:44:46 +02:00
Monty
5bcb1d6532 MDEV-11412 Ensure that table is truly dropped when using DROP TABLE
The used code is largely based on code from Tencent

The problem is that in some rare cases there may be a conflict between .frm
files and the files in the storage engine. In this case the DROP TABLE
was not able to properly drop the table.

Some MariaDB/MySQL forks has solved this by adding a FORCE option to
DROP TABLE. After some discussion among MariaDB developers, we concluded
that users expects that DROP TABLE should always work, even if the
table would not be consistent. There should not be a need to use a
separate keyword to ensure that the table is really deleted.

The used solution is:
- If a .frm table doesn't exists, try dropping the table from all storage
  engines.
- If the .frm table exists but the table does not exist in the engine
  try dropping the table from all storage engines.
- Update storage engines using many table files (.CVS, MyISAM, Aria) to
  succeed with the drop even if some of the files are missing.
- Add HTON_AUTOMATIC_DELETE_TABLE to handlerton's where delete_table()
  is not needed and always succeed. This is used by ha_delete_table_force()
  to know which handlers to ignore when trying to drop a table without
  a .frm file.

The disadvantage of this solution is that a DROP TABLE on a non existing
table will be a bit slower as we have to ask all active storage engines
if they know anything about the table.

Other things:
- Added a new flag MY_IGNORE_ENOENT to my_delete() to not give an error
  if the file doesn't exist. This simplifies some of the code.
- Don't clear thd->error in ha_delete_table() if there was an active
  error. This is a bug fix.
- handler::delete_table() will not abort if first file doesn't exists.
  This is bug fix to handle the case when a drop table was aborted in
  the middle.
- Cleaned up mysql_rm_table_no_locks() to ensure that if_exists uses
  same code path as when it's not used.
- Use non_existing_Table_error() to detect if table didn't exists.
  Old code used different errors tests in different position.
- Table_triggers_list::drop_all_triggers() now drops trigger file if
  it can't be parsed instead of leaving it hanging around (bug fix)
- InnoDB doesn't anymore print error about .frm file out of sync with
  InnoDB directory if .frm file does not exists. This change was required
  to be able to try to drop an InnoDB file when .frm doesn't exists.
- Fixed bug in mi_delete_table() where the .MYD file would not be dropped
  if the .MYI file didn't exists.
- Fixed memory leak in Mroonga when deleting non existing table
- Fixed memory leak in Connect when deleting non existing table

Bugs fixed introduced by the original version of this commit:
MDEV-22826 Presence of Spider prevents tables from being force-deleted from
           other engines
2020-06-14 19:39:42 +03:00
Marko Mäkelä
701efbb25b Merge 10.4 into 10.5 2020-06-03 09:45:39 +03:00
Marko Mäkelä
8059148154 Merge 10.3 into 10.4 2020-06-03 07:32:09 +03:00
Marko Mäkelä
8300f639a1 Merge 10.2 into 10.3 2020-06-02 10:25:11 +03:00
Marko Mäkelä
d72eebaa3d Merge 10.1 into 10.2 2020-06-01 09:33:03 +03:00
Marko Mäkelä
4a0b56f604 Merge 10.4 into 10.5 2020-05-31 10:28:59 +03:00
Marko Mäkelä
6da14d7b4a Merge 10.3 into 10.4 2020-05-30 11:04:27 +03:00
Marko Mäkelä
e9aaa10c11 Merge 10.2 into 10.3 2020-05-29 22:21:19 +03:00
Sergey Vojtovich
49854811fa Attempt fixing mroonga gcc 8 build failure
Part of MDEV-19061 - table_share used for reading statistical tables is
                     not protected
2020-05-29 22:51:45 +04:00
Kentoku SHIBA
38ea795bb6 Add a counter to avoid multiple initialization of Groonga mecab tokenizer 2020-05-29 21:48:47 +09:00
Kentoku SHIBA
6e6a4227c0 Add grn_db_fin_mecab_tokenizer to finalyze mecab tokenizer 2020-05-29 21:48:30 +09:00
Monty
4ea171ffab Fixed compiler warnings from gcc and clang 5.0.1 2020-05-23 12:29:10 +03:00
Marko Mäkelä
7924158496 MDEV-19780 Remove the TokuDB storage engine
The TokuDB storage engine has been deprecated by upstream
Percona Server 8.0 in favor of MyRocks and will not be available
in subsequent major upstream releases.

Let us remove it from MariaDB Server as well.
MyRocks is actively maintained, and it can be used instead.
2020-05-14 10:11:47 +03:00
Sergei Golubchik
67aaf51cf9 cleanup: ha_external_unlock() helper
as mentioned in f9f33b85be and generally to make it
easier to talk about
2020-05-05 19:41:12 +02:00
Marko Mäkelä
fbe2712705 Merge 10.4 into 10.5
The functional changes of commit 5836191c8f
(MDEV-21168) are omitted due to MDEV-742 having addressed the issue.
2020-04-25 21:57:52 +03:00
Monty
eca5c2c67f Added support for more functions when using partitioned S3 tables
MDEV-22088 S3 partitioning support

All ALTER PARTITION commands should now work on S3 tables except

REBUILD PARTITION
TRUNCATE PARTITION
REORGANIZE PARTITION

In addition, PARTIONED S3 TABLES can also be replicated.
This is achived by storing the partition tables .frm and .par file on S3
for partitioned shared (S3) tables.

The discovery methods are enchanced by allowing engines that supports
discovery to also support of the partitioned tables .frm and .par file

Things in more detail

- The .frm and .par files of partitioned tables are stored in S3 and kept
  in sync.
- Added hton callback create_partitioning_metadata to inform handler
  that metadata for a partitoned file has changed
- Added back handler::discover_check_version() to be able to check if
  a table's or a part table's definition has changed.
- Added handler::check_if_updates_are_ignored(). Needed for partitioning.
- Renamed rebind() -> rebind_psi(), as it was before.
- Changed CHF_xxx hadnler flags to an enum
- Changed some checks from using table->file->ht to use
  table->file->partition_ht() to get discovery to work with partitioning.
- If TABLE_SHARE::init_from_binary_frm_image() fails, ensure that we
  don't leave any .frm or .par files around.
- Fixed that writefrm() doesn't leave unusable .frm files around
- Appended extension to path for writefrm() to be able to reuse to function
  for creating .par files.
- Added DBUG_PUSH("") to a a few functions that caused a lot of not
  critical tracing.
2020-04-19 17:33:51 +03:00
Marko Mäkelä
af91266498 Merge 10.3 into 10.4
In main.index_merge_myisam we remove the test that was added in
commit a2d24def8c because
it duplicates the test case that was added in
commit 5af12e4635.
2020-04-16 12:12:26 +03:00
Marko Mäkelä
84db10f27b Merge 10.2 into 10.3 2020-04-15 09:56:03 +03:00
Oleksandr Byelkin
cb4da5da74 MDEV-20604: Duplicate key value is silently truncated to 64 characters in print_keydup_error
Added indication of truncated string for "s" and "M" formats
2020-04-01 11:34:32 +02:00
Nikita Malyavin
045510cb92 fix mroonga: change field's table as well as ptr for ad-hoc // fixes ptr_in_record usage 2020-03-31 17:42:34 +02:00
Monty
eb483c5181 Updated optimizer costs in multi_range_read_info_const() and sql_select.cc
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
  not 1.0.  In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
  TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
  keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
  The default keyread_time() was optimized for blocks and not suitable for
  HEAP. The effect was the HEAP prefered table scans over ranges for btree
  indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
  Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
  we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
  as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
  HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
  This was to ensure that handler::keyread_time() doesn't give
  higher cost for heap tables than for normal tables. One effect of
  this is that heap and derived tables stored in heap will prefer
  key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
  multi_range_read_info_const(). All index cost calculation is now
  done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
  so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
2020-03-27 03:58:32 +02:00
Monty
f36ca142f7 Added page_range to records_in_range() to improve range statistics
Prototype change:
-  virtual ha_rows records_in_range(uint inx, key_range *min_key,
-                                   key_range *max_key)
+  virtual ha_rows records_in_range(uint inx, const key_range *min_key,
+                                   const key_range *max_key,
+                                   page_range *res)

The handler can ignore the page_range parameter. In the case the handler
updates the parameter, the optimizer can deduce the following:
- If previous range's last key is on the same block as next range's first
  key
- If the current key range is in one block
- We can also assume that the first and last block read are cached!
  This can be used for a better calculation of IO seeks when we
  estimate the cost of a range index scan.

The parameter is fully implemented for MyISAM, Aria and InnoDB.
A separate patch will update handler::multi_range_read_info_const() to
take the benefits of this change and also remove the double
records_in_range() calls that are not anymore needed.
2020-03-27 03:54:45 +02:00
Monty
37393bea23 Replace handler::primary_key_is_clustered() with handler::pk_is_clustering_key()
This was done to both simplify the code and also to be easier to handle
storage engines that are clustered on some other index than the primary
key.

As pk_is_clustering_key() and is_clustering_key now are using only
index_flags, these where removed from all storage engines.
2020-03-24 21:00:04 +02:00
Sergey Vojtovich
da82e75901 handler::rebind()
- rename PFS specific rebind_psi() to generic rebind()
- call rebind independently of PFS compilation status
- allow rebind() return an error
2020-03-24 20:47:41 +02:00