Commit graph

99 commits

Author SHA1 Message Date
Sergei Golubchik
ae8600d514 MDEV-27408 DESC index on a Mroonga table causes wrong order of result set
disallow descending indexes in mroonga
2022-01-26 18:43:06 +01:00
Sergei Golubchik
0299ec29d4 cleanup: MY_BITMAP mutex
in about a hundred of users of MY_BITMAP, only two were using its
built-in mutex, and only one of those two was actually needing it.

Remove the mutex from MY_BITMAP, remove all associated conditions
and checks in bitmap functions. Use an external LOCK_temp_pool
mutex and temp_pool_set_next/temp_pool_clear_bit acccessors.

Remove bitmap_init/bitmap_free, always use my_* versions.
2021-08-26 23:39:52 +02:00
Vicențiu Ciorbaru
13cf8f5e9a cleanup: Refactor select_limit in select lex
Replace
  * select_lex::offset_limit
  * select_lex::select_limit
  * select_lex::explicit_limit
with select_lex::Lex_select_limit

The Lex_select_limit already existed with the same elements and was used in
by the yacc parser.

This commit is in preparation for FETCH FIRST implementation, as it
simplifies a lot of the code.

Additionally, the parser is simplified by making use of the stack to
return Lex_select_limit objects.

Cleanup of init_query() too. Removes explicit_limit= 0 as it's done a bit later
in init_select() with limit_params.empty()
2021-04-21 14:08:58 +03:00
Daniel Black
553ef1a78b MDEV-13115: Implement SELECT SKIP LOCKED
Adds an implementation for SELECT ... FOR UPDATE SKIP LOCKED /
SELECT ... LOCK IN SHARED MODE SKIP LOCKED

This is implemented only InnoDB at the moment, not in RockDB yet.

This adds a new hander flag HA_CAN_SKIP_LOCKED than
will be used when the storage engine advertises the flag.

When a storage engine indicates this flag it will get
TL_WRITE_SKIP_LOCKED and TL_READ_SKIP_LOCKED transaction types.

The Lex structure has been updated to store both the FOR UPDATE/LOCK IN
SHARE as well as the SKIP LOCKED so the SHOW CREATE VIEW
implementation is simplier.

"SELECT FOR UPDATE ... SKIP LOCKED" combined with CREATE TABLE AS or
INSERT.. SELECT on the result set is not safe for STATEMENT based
replication. MIXED replication will replicate this as row based events."

Thanks to guidance from Facebook commit
193896c466
This helped verify basic test case, and components that need implementing
(even though every part was implemented differently).

Thanks Marko for guidance on simplier InnoDB implementation.

Reviewers: Marko, Monty
2021-04-08 16:51:36 +10:00
Marko Mäkelä
03ff588d15 Merge 10.5 into 10.6 2021-03-05 16:05:47 +02:00
Marko Mäkelä
10d544aa7b Merge 10.4 into 10.5 2021-03-05 12:54:43 +02:00
Marko Mäkelä
fcc9f8b10c Remove unused HA_EXTRA_FAKE_START_STMT
This is fixup for commit f06a0b5338.
2021-03-05 10:40:16 +02:00
Marko Mäkelä
94b4578704 Merge 10.5 into 10.6 2021-02-17 19:39:05 +02:00
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Nikita Malyavin
c88fcf07d9 fixup MDEV-17556: fix mroonga 2021-02-01 18:05:47 +10:00
Marko Mäkelä
3cef4f8f0f MDEV-515 Reduce InnoDB undo logging for insert into empty table
We implement an idea that was suggested by Michael 'Monty' Widenius
in October 2017: When InnoDB is inserting into an empty table or partition,
we can write a single undo log record TRX_UNDO_EMPTY, which will cause
ROLLBACK to clear the table.

For this to work, the insert into an empty table or partition must be
covered by an exclusive table lock that will be held until the transaction
has been committed or rolled back, or the INSERT operation has been
rolled back (and the table is empty again), in lock_table_x_unlock().

Clustered index records that are covered by the TRX_UNDO_EMPTY record
will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot
be distinguished from what MDEV-12288 leaves behind after purging the
history of row-logged operations.

Concurrent non-locking reads must be adjusted: If the read view was
created before the INSERT into an empty table, then we must continue
to imagine that the table is empty, and not try to read any records.
If the read view was created after the INSERT was committed, then
all records must be visible normally. To implement this, we introduce
the field dict_table_t::bulk_trx_id.

This special handling only applies to the very first INSERT statement
of a transaction for the empty table or partition. If a subsequent
statement in the transaction is modifying the initially empty table again,
we must enable row-level undo logging, so that we will be able to
roll back to the start of the statement in case of an error (such as
duplicate key).

INSERT IGNORE will continue to use row-level logging and locking, because
implementing it would require the ability to roll back the latest row.
Since the undo log that we write only allows us to roll back the entire
statement, we cannot support INSERT IGNORE. We will introduce a
handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage
engines that INSERT IGNORE is being executed.

In many test cases, we add an extra record to the table, so that during
the 'interesting' part of the test, row-level locking and logging will
be used.

Replicas will continue to use row-level logging and locking until
MDEV-24622 has been addressed. Likewise, this optimization will be
disabled in Galera cluster until MDEV-24623 enables it.

dict_table_t::bulk_trx_id: The latest active or committed transaction
that initiated an insert into an empty table or partition.
Protected by exclusive table lock and a clustered index leaf page latch.

ins_node_t::bulk_insert: Whether bulk insert was initiated.

trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert).
Unlike earlier, this collection will cover also temporary tables.

trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(),
is_bulk_insert(), was_bulk_insert().

trx_undo_report_row_operation(): Before accessing any undo log pages,
invoke trx->mod_tables.emplace() in order to determine whether undo
logging was disabled, or whether this is the first INSERT and we are
supposed to write a TRX_UNDO_EMPTY record.

row_ins_clust_index_entry_low(): If we are inserting into an empty
clustered index leaf page, set the ins_node_t::bulk_insert flag for
the subsequent trx_undo_report_row_operation() call.

lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock():
Remove the redundant parameter 'flags' that can be checked in the caller.

btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write
DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation().

trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT),
ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that
the next statement will not be covered by table-level undo logging.

ReadView::changes_visible(trx_id_t) const: New accessor for the case
where the trx_id_t is not read from a potentially corrupted index page
but directly from the memory. In this case, we can skip a sanity check.

row_sel(), row_sel_try_search_shortcut(), row_search_mvcc():
row_sel_try_search_shortcut_for_mysql(),
row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id.

row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees().

lock_sec_rec_cons_read_sees(): Replaced with lower-level code.

btr_root_page_init(): Refactored from btr_create().

dict_index_t::clear(), dict_table_t::clear(): Empty an index or table,
for the ROLLBACK of an INSERT operation.

ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT
into an empty table.

This is joint work with Thirunarayanan Balathandayuthapani,
who created a working prototype.
Thanks to Matthias Leich for extensive testing.
2021-01-25 18:41:27 +02:00
Sergei Golubchik
c55c292832 introduce hton->drop_table() method
first step in moving drop table out of the handler.
todo: other methods that don't need an open table

for now hton->drop_table is optional, for backward compatibility
reasons
2020-07-04 01:44:46 +02:00
Monty
eca5c2c67f Added support for more functions when using partitioned S3 tables
MDEV-22088 S3 partitioning support

All ALTER PARTITION commands should now work on S3 tables except

REBUILD PARTITION
TRUNCATE PARTITION
REORGANIZE PARTITION

In addition, PARTIONED S3 TABLES can also be replicated.
This is achived by storing the partition tables .frm and .par file on S3
for partitioned shared (S3) tables.

The discovery methods are enchanced by allowing engines that supports
discovery to also support of the partitioned tables .frm and .par file

Things in more detail

- The .frm and .par files of partitioned tables are stored in S3 and kept
  in sync.
- Added hton callback create_partitioning_metadata to inform handler
  that metadata for a partitoned file has changed
- Added back handler::discover_check_version() to be able to check if
  a table's or a part table's definition has changed.
- Added handler::check_if_updates_are_ignored(). Needed for partitioning.
- Renamed rebind() -> rebind_psi(), as it was before.
- Changed CHF_xxx hadnler flags to an enum
- Changed some checks from using table->file->ht to use
  table->file->partition_ht() to get discovery to work with partitioning.
- If TABLE_SHARE::init_from_binary_frm_image() fails, ensure that we
  don't leave any .frm or .par files around.
- Fixed that writefrm() doesn't leave unusable .frm files around
- Appended extension to path for writefrm() to be able to reuse to function
  for creating .par files.
- Added DBUG_PUSH("") to a a few functions that caused a lot of not
  critical tracing.
2020-04-19 17:33:51 +03:00
Nikita Malyavin
045510cb92 fix mroonga: change field's table as well as ptr for ad-hoc // fixes ptr_in_record usage 2020-03-31 17:42:34 +02:00
Monty
f36ca142f7 Added page_range to records_in_range() to improve range statistics
Prototype change:
-  virtual ha_rows records_in_range(uint inx, key_range *min_key,
-                                   key_range *max_key)
+  virtual ha_rows records_in_range(uint inx, const key_range *min_key,
+                                   const key_range *max_key,
+                                   page_range *res)

The handler can ignore the page_range parameter. In the case the handler
updates the parameter, the optimizer can deduce the following:
- If previous range's last key is on the same block as next range's first
  key
- If the current key range is in one block
- We can also assume that the first and last block read are cached!
  This can be used for a better calculation of IO seeks when we
  estimate the cost of a range index scan.

The parameter is fully implemented for MyISAM, Aria and InnoDB.
A separate patch will update handler::multi_range_read_info_const() to
take the benefits of this change and also remove the double
records_in_range() calls that are not anymore needed.
2020-03-27 03:54:45 +02:00
Monty
37393bea23 Replace handler::primary_key_is_clustered() with handler::pk_is_clustering_key()
This was done to both simplify the code and also to be easier to handle
storage engines that are clustered on some other index than the primary
key.

As pk_is_clustering_key() and is_clustering_key now are using only
index_flags, these where removed from all storage engines.
2020-03-24 21:00:04 +02:00
Sergey Vojtovich
da82e75901 handler::rebind()
- rename PFS specific rebind_psi() to generic rebind()
- call rebind independently of PFS compilation status
- allow rebind() return an error
2020-03-24 20:47:41 +02:00
Sergei Golubchik
7af733a5a2 perfschema compilation, test and misc fixes 2020-03-10 19:24:23 +01:00
Vicențiu Ciorbaru
5aebd78e27 MDEV-18650: Options deprecated in previous versions - mroonga_default_parser
Variable is marked as deprecated since 10.1.6. Update tests to not make
use of it.
2020-02-13 13:42:01 +02:00
Monty
97dd057702 Fixed issues when running mtr with --valgrind
- Note that some issues was also fixed in 10.2 and 10.4. I also fixed them
  here to be able to continue with making 10.5 valgrind safe again
- Disable connection threads warnings when doing shutdown
2019-08-23 22:03:54 +02:00
Marko Mäkelä
efb8485d85 Merge 10.3 into 10.4, except for MDEV-20265
The MDEV-20265 commit e746f451d5
introduces DBUG_ASSERT(right_op == r_tbl) in
st_select_lex::add_cross_joined_table(), and that assertion would
fail in several tests that exercise joins. That commit was skipped
in this merge, and a separate fix of MDEV-20265 will be necessary in 10.4.
2019-08-23 08:06:17 +03:00
Marko Mäkelä
32ec5fb979 Merge 10.2 into 10.3 2019-08-21 15:23:45 +03:00
Monty
6dd1d6cb90 MDEV-20379 Mroonga has memory leak in ha_mroonga::is_foreign_key_field 2019-08-20 15:28:01 +03:00
Alexander Barkov
afe6eb499d Revert "MDEV-20342 Turn Field::flags from a member to a method"
This reverts commit e86010f909.

Reverting on Monty's request, as this change makes merging
things from 10.5 to 10.2 much harder.
2019-08-14 20:27:00 +04:00
Alexander Barkov
e86010f909 MDEV-20342 Turn Field::flags from a member to a method 2019-08-14 13:33:01 +04:00
Marko Mäkelä
624dd71b94 Merge 10.4 into 10.5 2019-08-13 18:57:00 +03:00
Alexander Barkov
feb2695ed3 MDEV-20004 Move Field_geom from field.cc to sql_type_geom.cc 2019-07-09 19:47:57 +04:00
Eugene Kosov
c9aa495fb6 MDEV-19955 make argument of handler::ha_write_row() const
MDEV-19486 and one more similar bug appeared because handler::write_row() interface
welcomes to modify buffer by storage engine. But callers are not ready for that
thus bugs are possible in future.

handler::write_row():
handler::ha_write_row(): make argument const
2019-07-05 13:14:19 +03:00
Alexander Barkov
8d0983330f MDEV-19833 Reuse new I_S table definition helper classes for Mronga 2019-06-22 06:03:33 +04:00
Robert Bindar
bf70430ead MDEV-17709 Remove handlerton::state 2019-06-06 22:09:31 +04:00
Marko Mäkelä
826f9d4f7e Merge 10.4 into 10.5 2019-05-23 10:32:21 +03:00
Monty
007f68c37f Replace ha_notify_table_changed() with notify_tabledef_changed()
Reason for the change was that ha_notify_table_changed() was done
after table open when .frm had been replaced, which caused failure
in engines that checks on open if .frm matches the engines table
definition.

Other changes:
- Remove not needed open/close call at end of inline alter table.
  Some test that depended on the table beeing in the table cache after
  ALTER TABLE had to be updated.
2019-05-23 01:20:18 +03:00
Sergey Vojtovich
00e533c78b Fixed Mroonga to follow THD ha_data protocol
Use thd_get_ha_data()/thd_set_ha_data() which protect against plugin
removal until it has THD ha_data.

Do not reset THD ha_data in mrn_close_connection(), cleaner approach
is to let ha_close_connection() do it.

Part of MDEV-19515 - Improve connect speed
2019-05-21 17:55:09 +04:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
cb248f8806 Merge branch '5.5' into 10.1 2019-05-11 22:19:05 +03:00
Marko Mäkelä
514b305dfb Merge 10.3 into 10.4
The MDEV-17262 commit 26432e49d3
was skipped. In Galera 4, the implementation would seem to require
changes to the streaming replication.

In the tests archive.rnd_pos main.profiling, disable_ps_protocol
for SHOW STATUS and SHOW PROFILE commands until MDEV-18974
has been fixed.
2019-03-20 10:41:32 +02:00
Sergei Golubchik
b64fde8f38 Merge branch '10.2' into 10.3 2019-03-17 13:06:41 +01:00
Sergei Golubchik
f1134d5676 post-merge: gcc 8 warnings
note: Inherit String from Sql_alloc,
to get operators new and new[] in sync

in rocksdb gcc was complaining that non-lvalue was cast to const.
2019-03-15 21:00:50 +01:00
Sergei Golubchik
0508d327ae Merge branch '10.1' into 10.2 2019-03-15 21:00:41 +01:00
Sergei Golubchik
3d2d060b62 fix gcc 8 compiler warnings
There were two newly enabled warnings:
1. cast for a function pointers. Affected sql_analyse.h, mi_write.c
   and ma_write.cc, mf_iocache-t.cc, mysqlbinlog.cc, encryption.cc, etc

2. memcpy/memset of nontrivial structures. Fixed as:
* the warning disabled for InnoDB
* TABLE, TABLE_SHARE, and TABLE_LIST got a new method reset() which
  does the bzero(), which is safe for these classes, but any other
  bzero() will still cause a warning
* Table_scope_and_contents_source_st uses `TABLE_LIST *` (trivial)
  instead of `SQL_I_List<TABLE_LIST>` (not trivial) so it's safe to
  bzero now.
* added casts in debug_sync.cc and sql_select.cc (for JOIN)
* move assignment method for MDL_request instead of memcpy()
* PARTIAL_INDEX_INTERSECT_INFO::init() instead of bzero()
* remove constructor from READ_RECORD() to make it trivial
* replace some memcpy() with c++ copy assignments
2019-03-14 16:33:17 +01:00
Marko Mäkelä
c67b306e4f Merge 10.3 into 10.4 2019-03-08 11:19:48 +02:00
Marko Mäkelä
2d0dd62cf7 Merge 10.2 into 10.3 2019-03-08 00:26:55 +02:00
Sergei Golubchik
2b4027e633 mronga: fix a memory leak
use the correct delete operator

This fixes mroonga/storage.column_generated_stored_add_column failures
in ASAN_OPTIONS="abort_on_error=1" runs
2019-03-06 15:28:27 +01:00
Sergei Golubchik
5f105e756b MDEV-18625 ASAN unknown-crash in my_copy_fix_mb / ha_mroonga::storage_inplace_alter_table_add_column
disable inplace alter for adding stored generated columns.

This fixes mroonga/storage.column_generated_stored_add_column failures
in ASAN_OPTIONS="abort_on_error=1" runs

Also, add a test case that shows the bug without ASAN.
2019-03-06 15:28:27 +01:00
Sachin
d00f19e832 MDEV-371 Unique Index for long columns
This patch implements engine independent unique hash index.

Usage:- Unique HASH index can be created automatically for blob/varchar/test column whose key
 length > handler->max_key_length()
or it can be explicitly specified.

  Automatic Creation:-
   Create TABLE t1 (a blob unique);
  Explicit Creation:-
   Create TABLE t1 (a int , unique(a) using HASH);

Internal KEY_PART Representations:-
 Long unique key_info will have 2 representations.
 (lets understand this with an example create table t1(a blob, b blob , unique(a, b)); )

 1. User Given Representation:- key_info->key_part array will be similar to what user has defined.
 So in case of example it will have 2 key_parts (a, b)

 2. Storage Engine Representation:- In this case there will be only one key_part and it will point to
 HASH_FIELD. This key_part will be always after user defined key_parts.

 So:- User Given Representation          [a] [b] [hash_key_part]
                  key_info->key_part ----^
  Storage Engine Representation          [a] [b] [hash_key_part]
                  key_info->key_part ------------^

 Table->s->key_info will have User Given Representation, While table->key_info will have Storage Engine
 Representation.Representation can be changed into each other by calling re/setup_keyinfo_hash function.

Working:-

1. So when user specifies HASH_INDEX or key_length is > handler->max_key_length(), In mysql_prepare_create_table
One extra vfield is added (for each long unique key). And key_info->algorithm is set to HA_KEY_ALG_LONG_HASH.

2. In init_from_binary_frm_image values for hash_keypart is set (like fieldnr , field and flags)

3. In parse_vcol_defs, HASH_FIELD->vcol_info is created. Item_func_hash is used with list of Item_fields,
   When Explicit length is given by user then Item_left is used to concatenate Item_field values.

4. In ha_write_row/ha_update_row check_duplicate_long_entry_key is called which will create the hash key from
table->record[0] and then call ha_index_read_map , if we found duplicated hash , we will compare the result
field by field.
2019-02-22 00:35:40 +01:00
Alexander Barkov
4447a02cf1 MDEV-16991 Rounding vs truncation for TIME, DATETIME, TIMESTAMP 2018-11-26 08:10:47 +04:00
Marko Mäkelä
074c684099 Merge 10.3 into 10.4 2018-11-06 16:24:16 +02:00