Commit graph

770 commits

Author SHA1 Message Date
Marko Mäkelä
37c14690fc Merge 10.4 into 10.5 2020-03-30 19:07:25 +03:00
Marko Mäkelä
e2f1f88fa6 Merge 10.3 into 10.4 2020-03-30 14:50:23 +03:00
Marko Mäkelä
1a9b6c4c7f Merge 10.2 into 10.3 2020-03-30 11:12:56 +03:00
Monty
eb483c5181 Updated optimizer costs in multi_range_read_info_const() and sql_select.cc
- multi_range_read_info_const now uses the new records_in_range interface
- Added handler::avg_io_cost()
- Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is
  not 1.0.  In this case we trust the avg_io_cost() from the handler.
- Changed test_quick_select to use TIME_FOR_COMPARE instead of
  TIME_FOR_COMPARE_IDX to align this with the rest of the code.
- Fixed bug when using test_if_cheaper_ordering where we didn't use
  keyread if index was changed
- Fixed a bug where we didn't use index only read when using order-by-index
- Added keyread_time() to HEAP.
  The default keyread_time() was optimized for blocks and not suitable for
  HEAP. The effect was the HEAP prefered table scans over ranges for btree
  indexes.
- Fixed get_sweep_read_cost() for HEAP tables
- Ensure that range and ref have same cost for simple ranges
  Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure
  we favior ref for range for simple queries.
- Fixed that matching_candidates_in_table() uses same number of records
  as the rest of the optimizer
- Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for
  HEAP and temporary tables better. A few tests changed because of this.
- heap::read_time() and heap::keyread_time() adjusted to not add +1.
  This was to ensure that handler::keyread_time() doesn't give
  higher cost for heap tables than for normal tables. One effect of
  this is that heap and derived tables stored in heap will prefer
  key access as this is now regarded as cheap.
- Changed cost for index read in sql_select.cc to match
  multi_range_read_info_const(). All index cost calculation is now
  done trough one function.
- 'ref' will now use quick_cost for keys if it exists. This is done
  so that for '=' ranges, 'ref' is prefered over 'range'.
- scan_time() now takes avg_io_costs() into account
- get_delayed_table_estimates() uses block_size and avg_io_cost()
- Removed default argument to test_if_order_by_key(); simplifies code
2020-03-27 03:58:32 +02:00
Monty
f36ca142f7 Added page_range to records_in_range() to improve range statistics
Prototype change:
-  virtual ha_rows records_in_range(uint inx, key_range *min_key,
-                                   key_range *max_key)
+  virtual ha_rows records_in_range(uint inx, const key_range *min_key,
+                                   const key_range *max_key,
+                                   page_range *res)

The handler can ignore the page_range parameter. In the case the handler
updates the parameter, the optimizer can deduce the following:
- If previous range's last key is on the same block as next range's first
  key
- If the current key range is in one block
- We can also assume that the first and last block read are cached!
  This can be used for a better calculation of IO seeks when we
  estimate the cost of a range index scan.

The parameter is fully implemented for MyISAM, Aria and InnoDB.
A separate patch will update handler::multi_range_read_info_const() to
take the benefits of this change and also remove the double
records_in_range() calls that are not anymore needed.
2020-03-27 03:54:45 +02:00
Monty
37393bea23 Replace handler::primary_key_is_clustered() with handler::pk_is_clustering_key()
This was done to both simplify the code and also to be easier to handle
storage engines that are clustered on some other index than the primary
key.

As pk_is_clustering_key() and is_clustering_key now are using only
index_flags, these where removed from all storage engines.
2020-03-24 21:00:04 +02:00
Vladislav Vaintroub
82b3f1a80f MDEV-21930 RocksDB does not compile anymore, with Visual Studio
Update submodule, change the source file list
rename CACHE_LINE_SIZE in ut0counter, so it does not conflics with
rocksdb sources, which also defines this constant now.
2020-03-23 11:25:01 +01:00
Rasmus Johansson
9e1b3af4a4 MDEV-21303 Make executables MariaDB named
To change all executables to have a mariadb name I had to:
- Do name changes in every CMakeLists.txt that produces executables
- CREATE_MARIADB_SYMLINK was removed and GET_SYMLINK added by Wlad to reuse the function in other places also
- The scripts/CMakeLists.txt could make use of GET_SYMLINK instead of introducing redundant code, but I thought I'll leave that for next release
- A lot of changes to debian/.install and debian/.links files due to swapping of real executable and symlink. I did not however change the name of the manpages, so the real name is still mysql there and mariadb are symlinks.
- The Windows part needed a change now when we made the executables mariadb -named. MSI (and ZIP) do not support symlinks and to not break backward compatibility we had to include mysql named binaries also. Done by Wlad
2020-03-21 20:20:29 +01:00
Marko Mäkelä
6fb59d525b MDEV-742: Fix clang -Winconsistent-missing-override 2020-03-21 12:39:58 +02:00
Monty
305cffebab merge 10.4 to 10.5 2020-03-18 12:00:38 +02:00
Monty
1242eb3d32 Removed double records_in_range calls from multi_range_read_info_const
This was to remove a performance regression between 10.3 and 10.4
In 10.5 we will have a better implementation of records_in_range
that will enable us to get more statistics.
This change was not done in 10.4 because the 10.5 will be part of
a larger change that is not suitable for the GA 10.4 version

Other things:
- Changed default handler block_size to 8192 to fix things statistics
  for engines that doesn't set the block size.
- Fixed a bug in spider when using multiple part const ranges
  (Patch from Kentoku)
2020-03-17 02:16:48 +02:00
Sergei Golubchik
79499b597a update the test result for new perfschema 2020-03-16 01:13:01 +01:00
Andrei Elkin
c8ae357341 MDEV-742 XA PREPAREd transaction survive disconnect/server restart
Lifted long standing limitation to the XA of rolling it back at the
transaction's
connection close even if the XA is prepared.

Prepared XA-transaction is made to sustain connection close or server
restart.
The patch consists of

    - binary logging extension to write prepared XA part of
      transaction signified with
      its XID in a new XA_prepare_log_event. The concusion part -
      with Commit or Rollback decision - is logged separately as
      Query_log_event.
      That is in the binlog the XA consists of two separate group of
      events.

      That makes the whole XA possibly interweaving in binlog with
      other XA:s or regular transaction but with no harm to
      replication and data consistency.

      Gtid_log_event receives two more flags to identify which of the
      two XA phases of the transaction it represents. With either flag
      set also XID info is added to the event.

      When binlog is ON on the server XID::formatID is
      constrained to 4 bytes.

    - engines are made aware of the server policy to keep up user
      prepared XA:s so they (Innodb, rocksdb) don't roll them back
      anymore at their disconnect methods.

    - slave applier is refined to cope with two phase logged XA:s
      including parallel modes of execution.

This patch does not address crash-safe logging of the new events which
is being addressed by MDEV-21469.

CORNER CASES: read-only, pure myisam, binlog-*, @@skip_log_bin, etc

Are addressed along the following policies.
1. The read-only at reconnect marks XID to fail for future
   completion with ER_XA_RBROLLBACK.

2. binlog-* filtered XA when it changes engine data is regarded as
   loggable even when nothing got cached for binlog.  An empty
   XA-prepare group is recorded. Consequent Commit-or-Rollback
   succeeds in the Engine(s) as well as recorded into binlog.

3. The same applies to the non-transactional engine XA.

4. @@skip_log_bin=OFF does not record anything at XA-prepare
   (obviously), but the completion event is recorded into binlog to
   admit inconsistency with slave.

The following actions are taken by the patch.

At XA-prepare:
   when empty binlog cache - don't do anything to binlog if RO,
   otherwise write empty XA_prepare (assert(binlog-filter case)).

At Disconnect:
   when Prepared && RO (=> no binlogging was done)
     set Xid_cache_element::error := ER_XA_RBROLLBACK
     *keep* XID in the cache, and rollback the transaction.

At XA-"complete":
   Discover the error, if any don't binlog the "complete",
   return the error to the user.

Kudos
-----
Alexey Botchkov took to drive this work initially.
Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of
good recommendations.
Sergei Voitovich made a magnificent review and improvements to the code.
They all deserve a bunch of thanks for making this work done!
2020-03-14 22:45:48 +02:00
Oleksandr Byelkin
fad47df995 Merge branch '10.4' into 10.5 2020-03-11 17:52:49 +01:00
Alexander Barkov
a1e330de5a MDEV-21743 Split up SUPER privilege to smaller privileges 2020-03-10 23:49:47 +04:00
Sergei Golubchik
cbede21d0d cleanup: pass trxid by value 2020-03-10 19:24:23 +01:00
Sergei Golubchik
81cffda2e6 perfschema transaction instrumentation related changes 2020-03-10 19:24:23 +01:00
Sergei Golubchik
0d837e8153 perfschema table io instrumentation related changes 2020-03-10 19:24:23 +01:00
Sergei Golubchik
7c58e97bf6 perfschema memory related instrumentation changes 2020-03-10 19:24:22 +01:00
Marko Mäkelä
9e488653ae Cleanup: Make MONITOR_LSN_CHECKPOINT_AGE a value.
Compute MONITOR_LSN_CHECKPOINT_AGE on demand in
srv_mon_process_existing_counter().
This allows us to remove the overhead of MONITOR_SET
calls for the counter.
2020-03-04 12:59:20 +02:00
Sergei Petrunia
adcfea710f Fix compile failure, compare_key_parts in handler shadowed by MyRocks
The two functions have different signature.
Use "using ..." to prevent shadowing
2020-02-19 14:57:47 +03:00
Marko Mäkelä
20a7f75fbf MDEV-15058: Revert the changes to INFORMATION_SCHEMA
For compatibility with diagnostic software, let us
return a dummy buffer pool identifier 0 and restore
the columns that were initially deleted in
commit 1a6f708ec5:

	information_schema.innodb_buffer_page.pool_id
	information_schema.innodb_buffer_page_lru.pool_id
	information_schema.innodb_buffer_pool_stats.pool_id
	information_schema.innodb_cmpmem.buffer_pool_instance
	information_schema.innodb_cmpmem_reset.buffer_pool_instance

Thanks to Vladislav Vaintroub for pointing this out.
2020-02-12 20:54:59 +02:00
Marko Mäkelä
1a6f708ec5 MDEV-15058: Deprecate and ignore innodb_buffer_pool_instances
Our benchmarking efforts indicate that the reasons for splitting the
buf_pool in commit c18084f71b
have mostly gone away, possibly as a result of
mysql/mysql-server@ce6109ebfd
or similar work.

Only in one write-heavy benchmark where the working set size is
ten times the buffer pool size, the buf_pool->mutex would be
less contended with 4 buffer pool instances than with 1 instance,
in buf_page_io_complete(). That contention could be alleviated
further by making more use of std::atomic and by splitting
buf_pool_t::mutex further (MDEV-15053).

We will deprecate and ignore the following parameters:

	innodb_buffer_pool_instances
	innodb_page_cleaners

There will be only one buffer pool and one page cleaner task.

In a number of INFORMATION_SCHEMA views, columns that indicated
the buffer pool instance will be removed:

	information_schema.innodb_buffer_page.pool_id
	information_schema.innodb_buffer_page_lru.pool_id
	information_schema.innodb_buffer_pool_stats.pool_id
	information_schema.innodb_cmpmem.buffer_pool_instance
	information_schema.innodb_cmpmem_reset.buffer_pool_instance
2020-02-12 14:45:21 +02:00
Marko Mäkelä
8b6cfda631 Merge 10.4 into 10.5 2020-02-07 08:51:20 +02:00
Monty
4d61f1247a Fixed compiler warnings from gcc 7.4.1
- Fixed possible error in rocksdb/rdb_datadic.cc
2020-01-29 23:23:55 +02:00
Alexander Barkov
f1e13fdc8d MDEV-21581 Helper functions and methods for CHARSET_INFO 2020-01-28 12:29:23 +04:00
Marko Mäkelä
8cc15c036d Merge 10.4 into 10.5 2019-12-27 21:17:16 +02:00
Marko Mäkelä
4c25e75ce7 Merge 10.3 into 10.4 2019-12-27 18:20:28 +02:00
Marko Mäkelä
5ab70e7f68 Merge 10.2 into 10.3 2019-12-27 15:14:48 +02:00
Sergei Golubchik
0d70f40bdb Disable -Werror for rocksdb submodule
We cannot have -Werror for third-party submodules because
whenever they break the build we cannot fix it as needed.
2019-12-17 21:22:57 +01:00
Marko Mäkelä
28c89b7151 Merge 10.4 into 10.5 2019-12-16 07:47:17 +02:00
Oleksandr Byelkin
a15234bf4b Merge branch '10.3' into 10.4 2019-12-09 15:09:41 +01:00
Vicențiu Ciorbaru
8a46b706aa Update innodb_i_s_tables_disabled.result post typo fix 2019-12-03 14:47:55 +02:00
Faustin Lammler
2df2238cb8 Lintian complains on spelling error
The lintian check complains on spelling error:
https://salsa.debian.org/mariadb-team/mariadb-10.3/-/jobs/95739
2019-12-02 12:41:13 +02:00
Oleksandr Byelkin
3ad37ed0eb Merge 10.4 into 10.5 2019-11-07 08:52:30 +01:00
Oleksandr Byelkin
6818f40adf Postmerge fix of rocksdb test results 2019-11-02 08:40:43 +01:00
Oleksandr Byelkin
f566889a79 Merge branch '10.3' into 10.4 2019-11-01 20:01:44 +01:00
Sergei Petrunia
9c6fec88b1 MDEV-17171: RocksDB Tables do not have "Creation Date"
- Add SLEEP() calls to the testcase to make it really test that the
  time doesn't change.
- Always use .frm file creation time as a creation timestamp (attempts
  to re-use older create_time when the table DDL changes are not good
  because then create_time will change after server restart)
- Use the same method names as the upstream patch does
- Use std::atomic for m_update_time
2019-11-01 21:40:10 +03:00
Sergei Petrunia
9c72963d2a MDEV-17171: RocksDB Tables do not have "Creation Date"
Variant#5 of the patch:
- take creation date from the .frm file, like InnoDB does
- Update_time is in-memory only (like in InnoDB).
2019-11-01 08:57:56 +01:00
Sergei Petrunia
fcd65b0376 MDEV-17171: RocksDB Tables do not have "Creation Date"
Variant#5 of the patch:
- take creation date from the .frm file, like InnoDB does
- Update_time is in-memory only (like in InnoDB).
2019-10-31 19:45:25 +03:00
Sergei Petrunia
eafc9d8516 MDEV-17171: RocksDB Tables do not have "Creation Date"
Temporarily revert the patch
2019-10-31 19:45:25 +03:00
Marko Mäkelä
b42294bc64 MDEV-19514 Defer change buffer merge until pages are requested
We will remove the InnoDB background operation of merging buffered
changes to secondary index leaf pages. Changes will only be merged as a
result of an operation that accesses a secondary index leaf page,
such as a SQL statement that performs a lookup via that index,
or is modifying the index. Also ROLLBACK and some background operations,
such as purging the history of committed transactions, or computing
index cardinality statistics, can cause change buffer merge.
Encryption key rotation will not perform change buffer merge.

The motivation of this change is to simplify the I/O logic and to
allow crash recovery to happen in the background (MDEV-14481).
We also hope that this will reduce the number of "mystery" crashes
due to corrupted data. Because change buffer merge will typically
take place as a result of executing SQL statements, there should be
a clearer connection between the crash and the SQL statements that
were executed when the server crashed.

In many cases, a slight performance improvement was observed.

This is joint work with Thirunarayanan Balathandayuthapani
and was tested by Axel Schwenke and Matthias Leich.

The InnoDB monitor counter innodb_ibuf_merge_usec will be removed.

On slow shutdown (innodb_fast_shutdown=0), we will continue to
merge all buffered changes (and purge all undo log history).

Two InnoDB configuration parameters will be changed as follows:

innodb_disable_background_merge: Removed.
This parameter existed only in debug builds.
All change buffer merges will use synchronous reads.

innodb_force_recovery will be changed as follows:
* innodb_force_recovery=4 will be the same as innodb_force_recovery=3
(the change buffer merge cannot be disabled; it can only happen as
a result of an operation that accesses a secondary index leaf page).
The option used to be capable of corrupting secondary index leaf pages.
Now that capability is removed, and innodb_force_recovery=4 becomes 'safe'.
* innodb_force_recovery=5 (which essentially hard-wires
SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED)
becomes safe to use. Bogus data can be returned to SQL, but
persistent InnoDB data files will not be corrupted further.
* innodb_force_recovery=6 (ignore the redo log files)
will be the only option that can potentially cause
persistent corruption of InnoDB data files.

Code changes:

buf_page_t::ibuf_exist: New flag, to indicate whether buffered
changes exist for a buffer pool page. Pages with pending changes
can be returned by buf_page_get_gen(). Previously, the changes
were always merged inside buf_page_get_gen() if needed.

ibuf_page_exists(const buf_page_t&): Check if a buffered changes
exist for an X-latched or read-fixed page.

buf_page_get_gen(): Add the parameter allow_ibuf_merge=false.
All callers that know that they may be accessing a secondary index
leaf page must pass this parameter as allow_ibuf_merge=true,
unless it does not matter for that caller whether all buffered
changes have been applied. Assert that whenever allow_ibuf_merge
holds, the page actually is a leaf page. Attempt change buffer
merge only to secondary B-tree index leaf pages.

btr_block_get(): Add parameter 'bool merge'.
All callers of btr_block_get() should know whether the page could be
a secondary index leaf page. If it is not, we should avoid consulting
the change buffer bitmap to even consider a merge. This is the main
interface to requesting index pages from the buffer pool.

ibuf_merge_or_delete_for_page(), recv_recover_page(): Replace
buf_page_get_known_nowait() with much simpler logic, because
it is now guaranteed that that the block is x-latched or read-fixed.

mlog_init_t::mark_ibuf_exist(): Renamed from mlog_init_t::ibuf_merge().
On crash recovery, we will no longer merge any buffered changes
for the pages that we read into the buffer pool during the last batch
of applying log records.

buf_page_get_gen_known_nowait(), BUF_MAKE_YOUNG, BUF_KEEP_OLD: Remove.

btr_search_guess_on_hash(): Merge buf_page_get_gen_known_nowait()
to its only remaining caller.

buf_page_make_young_if_needed(): Define as an inline function.
Add the parameter buf_pool.

buf_page_peek_if_young(), buf_page_peek_if_too_old(): Add the
parameter buf_pool.

fil_space_validate_for_mtr_commit(): Remove a bogus comment
about background merge of the change buffer.

btr_cur_open_at_rnd_pos_func(), btr_cur_search_to_nth_level_func(),
btr_cur_open_at_index_side_func(): Use narrower data types and scopes.

ibuf_read_merge_pages(): Replaces buf_read_ibuf_merge_pages().
Merge the change buffer by invoking buf_page_get_gen().
2019-10-11 17:28:15 +03:00
Marko Mäkelä
1333da90b5 Merge 10.4 into 10.5 2019-09-24 10:07:56 +03:00
Marko Mäkelä
5a92ccbaea Merge 10.3 into 10.4
Disable MDEV-20576 assertions until MDEV-20595 has been fixed.
2019-09-23 17:35:29 +03:00
Marko Mäkelä
c016ea660e Merge 10.2 into 10.3 2019-09-23 10:25:34 +03:00
Marko Mäkelä
2931fd2917 After-merge fix: Adjust a result 2019-09-23 09:04:51 +03:00
Marko Mäkelä
bb4214272a Merge 10.1 into 10.2 2019-09-18 16:24:48 +03:00
Sergei Petrunia
be6beb73e9 MDEV-16560: [counter] rocksdb.ttl_secondary_read_filtering fail in buildbot
It is not reproducible, but the issue seems to be the same as with
MDEV-20490 and rocksdb.ttl_primary_read_filtering - a compaction caused
by DROP TABLE gets behind and compacts away the expired rows for the next
test. Fix this in the same way.
2019-09-11 23:05:12 +03:00
Alexander Barkov
0636645e7e Merge remote-tracking branch 'origin/10.4' into 10.5 2019-09-11 11:46:31 +04:00
Marko Mäkelä
e980cf91cd Merge mariadb-10.4.8 to 10.4 2019-09-10 16:09:14 +03:00