The error
"Unsupported collation on string indexed column %s Use
binary collation (latin1_bin, binary, utf8_bin)."
is misleading. Change it:
- It is now a warning
- It is printed only for collations that do not support index-only access
(reversible collations that use unpack_info are ok)
- The new warning text is:
Indexed column %s.%s uses a collation that does not allow index-only
access in secondary key and has reduced disk space efficiency
in primary key.
The crash (sometimes assert) in MYSQL_BIN_LOG::mark_xid_done was caused by a
fact that log.cc:binlog_background_thread_queue could become a cyclic list.
This possibility becomes real with two checkpoint capable engines that
may execute TC_LOG_BINLOG::commit_checkpoint_notify() in succession before
binlog_background thread gets control and eventually finds a freed memory
while otherwise endlessly looping in while(queue).
It is fixed with counting the notificaion kind instead of en-listing the same notificaion kind in commit_checkpoint_notify as formerly. The while(queue) of binlog background thread is refined to pay attention to the new counter. In effectno more access to free memory is possible.
Part2: make MyRocks add its directory into @@ignore_db_dirs when starting.
This is necessary because apparently not everybody are using plugin's my.cnf
So load ha_rocksdb.{so,dll} manually and then hit MDEV-12451, MDEV-14461
etc.
TABLE_SHARE::init_from_binary_frm_image() calls handler_file->index_flags()
before it has set TABLE_SHARE::primary_key (it is 0 while it should be
MAX_KEY in my example).
This causes MyRocks to report wrong index flags (it thinks it's a PK while
it is not), which causes invalid query plans later on.
Do the only thing that seems feasible: adjust field->part_of key to have
correct value in ha_rocksdb::open.
is set to true, as it should.
Copy and modify original io_win.h header file to a different location
(as we cannot patch anything in submodule). Make sure modified header is
used.
- FB/MySQL 5.6' MyRocks has START TRANSACTION WITH CONSISTENT
ROCKSDB SNAPSHOT, which returns binlog position.
- MariaDB has a cross-engine START TRANSACTION WITH CONSISTENT
SNAPSHOT. It can be used for the same purpose. Binlog position
can be obtained from Binlog_snapshot_file/position status vars.
Apply fix for https://github.com/facebook/mysql-5.6/issues/748
A few tests in rocksdb suite fail with --ps-protocol
They fail because --ps-protocol uses different data format on the
wire. Work around that by doing a dummy CONCAT operation which forces
the data to be transfered in text form (like it is done without
--ps-protocol)
Port the previous patch:
- Implement MariaDB's Group Commit API. This is a first
attempt which lacks the expected performance.
To newer MariaDB (which includes newer MyRocks)
- Fix win64 pointer truncation warnings
(usually coming from misusing 0x%lx and long cast in DBUG)
- Also fix printf-format warnings
Make the above mentioned warnings fatal.
- fix pthread_join on Windows to set return value.
Upstream cset we are merging from:
commit 184a4a2d82f4f6f3cbcb1015bcdb32bebe73315c
Author: Abhinav Sharma <abhinavsharma@fb.com>
Date: Thu Sep 14 11:40:08 2017 -0700
Bump rocksdb submodule
Summary:
Bump rocksdb to include the fix for rocksdb.trx_info_rpl
Lots of conflicts, got the code to compile but tests are likely to
be broken
Make rocksdb.cardinality test faster (77 -> 42 sec with --mem) by
loading records in batches.
(loading everything as one bulk load batch will remove the purpose
of the test)
This allows basic master crash-safety
- Un-comment and update relevant parts of the code
- Make rocksdb_rpl suite work like other MyRocks testsuites
(load the MyRocks plugin, don't start if it is not compiled in, etc)
- For now, disable all tests in the rocksdb_rpl suite.
- MariaDB-fication of rpl_rocksdb_2p_crash_recover test.
- Add include/index_merge*. Upstream has different files than MariaDB,
use copies theirs, not ours.
- There was a prblem with running "DDL-like" commands with binlog=ON:
MariaDB sets binlog_format=STATEMENT for the duration of such command
to prevent RBR replication from catching (and replicating) updates to
system tables.
However, MyRocks tries to prevent any writes to MyRocks tables with
binlog_format!=ROW.
- Added exceptions for DDL-type commands (ANALYZE TABLE, OPTIMIZE TABLE)
- Added special handling for "LOCK TABLE(s) myrocks_table WRITE".
Apply this patch from upstream:
commit 2c8deddfb67f1cd41ea3d1ac95aa1aa9327e3406
Author: Yoshinori Matsunobu <yoshinorim@users.noreply.github.com>
Date: Tue Aug 15 16:21:58 2017 -0700
Set exclusive_manual_compaction = false on manual compactions
Summary:
Combining exclusive manual compaction and
non-exclusive manual compaction may hit rocksdb assertion errors.
This diff makes all MyRocks internal manual compactions non exclusive.
Closes https://github.com/facebook/mysql-5.6/pull/682
Differential Revision: D5633619
Pulled By: yoshinorim
fbshipit-source-id: a90786d
The test mis-used MTR's "restart the server if it crashed or exited"
feature to try starting MyRocks plugin with invalid arguments.
Changed the test to use the --default-storage-engine=myisam which
allows the server to start when MyRocks fails to start.
This removes the need to "start the server with the arguments which
will caused it to fail to start", and so removes the race conditions
with MTR server restart code and mysqld.*.expect file.
It may produce test failures like this because of non-deterministic
cost calculations:
-1 SIMPLE t1 # col1 col1 259 NULL # Using where
+1 SIMPLE t1 # col1 NULL NULL NULL # Using where
- Fix the bad merge in drop_table.test
- Remove the obsolete rocksdb_info_log_level=info_level option
which caused warnings to be found in the error log.
commit 394d0712d3d46a87a8063e14e998e9c22336e3a6
Author: Anca Agape <anca@fb.com>
Date: Thu Jul 27 15:43:07 2017 -0700
Fix rpl.rpl_4threads_deadlock test broken by D5005670
Summary:
In D5005670 in fill_fields_processlist() function we introduced a point
where we were trying to take the LOCK_thd_data before the
synchronization point used by test
processlist_after_LOCK_thd_count_before_LOCK_thd_data. This was
happening in get_attached_srv_session() function called. Replaced this
with get_attached_srv_session_safe() and moved it after lock is aquired.
Reviewed By: tianx
Differential Revision: D5505992
fbshipit-source-id: bc53924
ha_partition creates temporary ha_XXX objects for its partitions when
performing DDL operations. The objects were created on a MEM_ROOT and
never deleted.
This works as long as ha_XXX objects free all data ha_XXX::close() and
don't rely on a proper destructor invocation. Unfortunately, ha_rocksdb
includes String members which need to be delete'd properly.
Fixed the bug by having ha_partition::~ha_partition delete these temporary
objects.
Disable memory leak check in debug server, if rocksdb is loaded.
There is some subtle bug somewhere in 3rd party code we cannot
do much about.
The bug is manifested as follows
Rocksdb does not shutdown worker threads, when plugin is shut down. Thus
OS does not unload the library since there are some active threads using
this library's code. Thus global destructors in the library do not run,
and there is still some memory allocated when server exits.
The workaround disables server's memory leak check, if rocksdb engine was
loaded.
(from: http://buildbot.askmonty.org/buildbot/builders/p8-rhel6-bintar/builds/820/steps/test/logs/stdio)
Errors like the following indicate a potential endian storage issue:
rocksdb.rocksdb_range w1 [ fail ]
Test ended at 2017-04-27 18:56:11
CURRENT_TEST: rocksdb.rocksdb_range
--- /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/rocksdb/mysql-test/rocksdb/r/rocksdb_range.result 2017-04-27 17:41:27.740050347 -0400
+++ /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/rocksdb/mysql-test/rocksdb/r/rocksdb_range.reject 2017-04-27 18:56:11.230050346 -0400
@@ -25,15 +25,15 @@
select * from t2 force index (a) where a=0;
pk a b
0 0 0
-1 0 1
-2 0 2
-3 0 3
-4 0 4
-5 0 5
-6 0 6
-7 0 7
-8 0 8
-9 0 9
+16777216 0 1
+33554432 0 2
+50331648 0 3
+67108864 0 4
+83886080 0 5
+100663296 0 6
+117440512 0 7
+134217728 0 8
+150994944 0 9
# The rest are for code coverage:
explain
select * from t2 force index (a) where a=2;
@@ -41,23 +41,23 @@
1 SIMPLE t2 ref a a 4 const #
select * from t2 force index (a) where a=2;
pk a b
-20 2 20
-21 2 21
-22 2 22
-23 2 23
-24 2 24
-25 2 25
-26 2 26
-27 2 27
-28 2 28
-29 2 29
+335544320 2 20
+352321536 2 21
+369098752 2 22
+385875968 2 23
+402653184 2 24
+419430400 2 25
+436207616 2 26
+452984832 2 27
+469762048 2 28
+486539264 2 29
explain
select * from t2 force index (a) where a=3 and pk=33;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t2 const a a 8 const,const #
select * from t2 force index (a) where a=3 and pk=33;
pk a b
-33 3 33
+553648128 3 33
select * from t2 force index (a) where a=99 and pk=99;
pk a b
select * from t2 force index (a) where a=0 and pk=0;
...
Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
- Update rocksdb submodule to revision
d616ebea23fa88cb9c2c8588533526a566d9cfab
- Normally this should be done by doing a merge from upstream
MyRocks, but now we are just updating rocksdb, MyRocks merge will
follow later.
- Apply a part of 4f6f072f6c74513087004755508eb6d6c432c5c1
use_direct_writes was renamed to use_direct_io_for_flush_and_compaction
- Update build_rocksdb.cmake - RocksDB have moved files around
use CMAKE_CXX_STANDARD to set C++11 flags with CMake 3.1+ (apples flags are somehow different from standard clang)
port htonbe16/32/64 macros for rocksdb
use reinterpret_cast<size_t> to cast macOS's pthread_t (pointer type) to size_t , for rocksdb
remove hard-coded paths (that assumed we're in a source tree)
remove various shell/perl/awk/whatsnot scripts, use mysqltest and perl
remove numerous --exec /some/unix/tool commands, use mysqltest and perl
namely, restart_mysqld_with_option.inc and kill_and_restart_mysqld.inc -
use restart_mysqld.inc instead.
Also remove innodb_wl6501_crash_stripped.inc that wasn't used anywhere.
Either we are building from a source package, in which case all sources
should be present, or we are building from a repository. The repository
needs to fetch the rocksdb submodule before building rocksdb.
Change the returned error code to be ER_CANT_CREATE_TABLE.
Emit the warning text ourselves.
(When a query produces both an error and a warning, command-line client
with default settings will not provide any indication that the warning
is present, unfortunately. Need \W)
This .result file is not a statement of which storage engine
should be used for any particular table in mysql database.
This is just a check that a query against I_S doesn't crash.
Most tests use CREATE TABLE ... ENGINE=ROCKSB, but there are some
exceptions: rpl_savepoint, rpl_row_stats.
In order to avoid any "oh we are using the wrong storage engine"
surprises, set the default for the whole testsuite.
- Disable rocksdb.show_enge
- Disable rocksdb.rpl_row_not_found
- Run rocksdb.blind_delete_without_tx_api only with binlog_format=row
(like its .cnf file specifies)
The default value of 1 causes many tests to time out (primary reason is
that many tests populate tables with one-row INSERT statements that
run with autocommit=1).
commit ba00e640f658ad8d0a4dff09a497a51b8a4de935
Author: Herman Lee <herman@fb.com>
Date: Wed Feb 22 06:30:06 2017 -0800
Improve add_index_alter_cardinality test
Summary:
Split add_index_inplace_cardinality test out and add a debug_sync point
to it so that the flush of the memtable occurs while the alter is
running.
Closes https://github.com/facebook/mysql-5.6/pull/539
Reviewed By: alxyang
Differential Revision: D4597887
Pulled By: hermanlee
fbshipit-source-id: faedda2
#define __STDC_FORMAT_MACROS. Unfortunately there is no single location
that would be #includ'ed before everything else. Have to put the #define
into each .cc file
- Get the suite to work with dynamically-linked plugin (ha_rocksdb.so)
- Due to the push to keep everything MyRocks-related in storage/rocksdb,
there is no mysql-test/include/have_rocksdb.* anymore.
Make a copy of storage/rocksdb/mysql-test/rocksdb/include/have_rocksdb*,
hopefully these files wont be changed [often].
- Maria-fication of rocksdb_persistent_cache_path test.
This change should have been a part of
Merge 'merge-myrocks' into 'bb-10.2-mariarocks'
Merged cset:
Copy of
commit d1bb19b8f751875472211312c8e810143a7ba4b6
We probably should make submodule info a part of the mergetree process.
Merged cset:
Copy of
commit d1bb19b8f751875472211312c8e810143a7ba4b6
Author: Manuel Ung <mung@fb.com>
Date: Fri Feb 3 11:50:34 2017 -0800
...
Add cardinality stats to information schema
Test suite parameters for 'rocksdb' test suite were disabled in order
to get mysqld to start at all when ha_rocksdb is a dynamic plugin.
A lot of tests depend on these parameters being enabled, though. Put
them back by using the loose- form.
This change add WITH_ROCKSDB_{LZ4,BZIP2,ZSTD,snappy} CMake variables
that can be set to ON/OFF/AUTO.
If variable has default value AUTO, rocksdb links with corresponding
compression library. OFF disables compiling/linking with specific compression
library, ON forces compiling with it (cmake would throw error if library
is not available)
Support for ZLIB is added unconditionally, as it is always there.
commit d1bb19b8f751875472211312c8e810143a7ba4b6
Author: Manuel Ung <mung@fb.com>
Date: Fri Feb 3 11:50:34 2017 -0800
Add cardinality stats to information schema
Summary: This adds cardinality stats to the INFORMATION_SCHEMA.ROCKSDB_INDEX_FILE_MAP table. This is the only missing user collected properties from SST files that we don't expose, which is useful for debugging cardinality bugs.
Reviewed By: hermanlee
Differential Revision: D4509156
fbshipit-source-id: 2d3918a
- Put back the assert on SQL layer at the right location
- Adjust rdb_pack_with_make_sort_key to work around the assert (like
it is done at other palaces): MyRocks may need to pack a column
value even when the column is not in the read set.
- It turns out, ha_rocksdb::table_flags() can return
HA_PRIMARY_KEY_IN_READ_INDEX for all kinds of tables (as its meaning
is "if there is a PK, PK columns contribute to the secondary index
tuple". There is no assumption that a certain PK column can be decoded
from the secondary index.
(Should probably be fixed in the upstream, too, but I was unable to
construct a testcase showing this is necessary).
- Following the above, we can undo the init_with_fields() changes in
table.cc. MyRocks calls init_with_fields() from ha_rocksdb::open()
which sets index-only read capabilities properly.
- Use rocksdb_sys_vars/my.cnf so that one can run tests from that suite
by just "./mtr rocksdb_sys_vars.$TESTNAME"
- Add rocksdb and rocksdb_sys_vars to the set of default test suites.
Don't run with embedded server, yet.
"Userstat" feature in MariaDB does not have
I_S.table_statistics.rows_requested column.
We'll use I_S.table_statistics.rows_read instead. The testcase
doesn't do anything where rows_requested != rows_read.
MariaDB doesn't have NO_CLEAR_EVENT support in DEBUG_SYNC facility.
Luckily, the test can be re-written to use two different sync points
instead. (I've checked that the modified test fails with fb/mysql-5.6
without the fix for e004fd9f (PR #394)
- Fix the test cases to not use userstat counters specific to
facebook/mysql-5.6
- Make testcase also check MariaDB's ICP counters
- Remove ha_rocksdb::check_index_cond(), call handler_index_cond_check
instead.
In MySQL 5.6, QUICK_SELECT_DESC calls handler->set_end_range() to
inform the storage engine about the bounds of the range being scanned.
MariaDB doesn't have that (handler::set_end_range call was back-ported
but it is not called from QUICK_SELECT_DESC).
Instead, it got prepare_*scan() methods from TokuDB.
Implement these methods so that MyRocks has information about the range
end.
- rocksdb.tmpdir works (however @@rocksdb_tmpdir has no effect yet!)
- trx_info_rpl is only run in RBR mode
- type_char_indexes_collation now works
= take into account that characters with the same weight can have
any order after sorting (and they do in MariaDB)
= MariaDB doesn't use index-only for extended keys that have partially-
covered columns.
- Fix include paths, add suite.opt
- Add a test for @@rocksdb_supported_compression_types
Now all tests pass, except rocksdb_sysvars.rocksdb_rpl_skip_tx_api_basic
The 'combinations' system in MTR ignores settings from $testname.cnf,
and tries to run RBR test with binlog_format=mixed.
Fixed by using "source include/have_binlog_format_row.inc" which tells
MTR to only run the test with binlog_format=ROW.
The test still needs its $testname.cnf to include suite/rpl/my.cnf. This
is necessary to setup replication
(Using "source include/master-slave.inc" will have MTR set up replication
for the test, but only as long as the testsuite doesn't have its own
suite/rocksdb/my.cnf. We do have that file (and it doesn't set up
replication), so we need to have $testname.cnf to setup replication).
- port Regex_list_handler from facebook/mysql-5.6/sql/handler.cc
put it into a separate file in storage/rocksdb directory
- Adjust the build process so that the main library is build with
Regex_list_handler (which has dependencies on the server),
while RocksDB tools are built without it.
- Un-comment @@rdb_collation_exceptions handling in ha_rocksdb.cc
- Also adjust rocksdb_set_collation_exception_list() to free the
old variable value and alloc the new one.
- Make ha_rocksdb::check_if_supported_inplace_alter() take into
account the Alter_inplace_info::ALTER_PARTITIONED flag
- Adjust the testcase to work in MariaDB
Failure to do so caused a failure in rocksdb.rocksdb test.
When test_if_cheaper_ordering computes is_covering= ...,
- MySQL calls table->file->primary_key_is_clustered()
- MariaDB calls (table->file->index_flags(nr, 0, 1) &
HA_CLUSTERED_INDEX)
The first produces true, the second used to produce false.
... due to different index statistics
Make statistics calculations in MariaRocks produce the same values
that MyRocks produces.
Added a comment in rdb_datadic.cc
- EXPLAIN result differences are due to MariaDB's MDEV-11172
- Don't print the value of rocksdb_supported_compression_types
to .result file
- The rest is trivial Maria-fication
- EXPLAIN is different
- error message is
- the output order is different, because MySQL knows when to use
ha_partition::handle_unordered_scan_next_partition.
Reading the table data without any ordering happens to produce
MariaDB uses ha_partition::handle_ordered_index_scan for this index
scan (this is a deficiency), which causes it to produce the row with
pk=1 first.
MariaDB uses
This cset just re-uses the approach from facebook/mysql-5.6 (Perhaps we
will have something different for MariaDB in the end).
For now this is:
Port this fix
dd7eeae69503cb8ab6ddc8fd9e2fef451cc31a32
Issue#250: MyRocks/Innodb different output from query with order by on table with index and decimal type
Summary:
Make open_binary_frm() set TABLE_SHARE::primary_key before it computes
Also add the patch for
https://github.com/facebook/mysql-5.6/issues/376
- MariaDB produces a warning instead of error when the key
length is too long
- Trivial test results updates
- rocksdb.rocksdb still fails but this commit makes some progress.
The warning
"ORDER BY ignored as there is a user-defined clustered index in the table 't1'
was missing.
The reason is different condition in copy_data_between_tables():
MariaDB has a change, it uses
to->file->ha_table_flags() & HA_TABLE_SCAN_ON_INDEX
while MySQL uses:
to->file->primary_key_is_clustered()).
For some reason, MyRocks didn't have HA_TABLE_SCAN_ON_INDEX flag.
It should have one, will raise that with upstream, too.
rocksdb.lock: LOCK TABLE t1 LOW_PRIORITY WRITE does not produce a
warining in MariaDB
rocksdb.unique_check:
- MariaDB's mtr prints connection actions
- New (but temporary) ER_LOCK_WAIT_TIMEOUT text
rocksdb.allow_pk_no_concurrent_insert:
- Fix path
rocksdb.locking_issues
- Fix path
- The test still fails but for a different reason now
- Introduce @@rocksdb_supported_compression_types read-only variable.
It has a comma-separated list of compiled-in compression algorithms.
- Make rocksdb.compression_zstd test skip itself when ZSTD support
is not compiled in