This patch is the result of running
run-clang-tidy -fix -header-filter=.* -checks='-*,modernize-use-equals-default' .
Code style changes have been done on top. The result of this change
leads to the following improvements:
1. Binary size reduction.
* For a -DBUILD_CONFIG=mysql_release build, the binary size is reduced by
~400kb.
* A raw -DCMAKE_BUILD_TYPE=Release reduces the binary size by ~1.4kb.
2. Compiler can better understand the intent of the code, thus it leads
to more optimization possibilities. Additionally it enabled detecting
unused variables that had an empty default constructor but not marked
so explicitly.
Particular change required following this patch in sql/opt_range.cc
result_keys, an unused template class Bitmap now correctly issues
unused variable warnings.
Setting Bitmap template class constructor to default allows the compiler
to identify that there are no side-effects when instantiating the class.
Previously the compiler could not issue the warning as it assumed Bitmap
class (being a template) would not be performing a NO-OP for its default
constructor. This prevented the "unused variable warning".
clang15 finally errors on old prototype definations.
Its also a lot fussier about variables that aren't used
as is the case a number of time with loop counters that
aren't examined.
RocksDB was complaining that its get_range function was
declared without the array length in ha_rocksdb.h. While
a constant is used rather than trying to import the
Rdb_key_def::INDEX_NUMBER_SIZE header (was causing a lot of
errors on the defination of other orders). If the constant
does change can be assured that the same compile warnings will
tell us of the error.
The ha_rocksdb::index_read_map_impl DBUG_EXECUTE_IF was similar
to the existing endless functions used in replication tests.
Its rather moot point as the rocksdb.force_shutdown test that
uses myrocks_busy_loop_on_row_read is currently disabled.
Fixing a few problems relealed by UBSAN in type_float.test
- multiplication overflow in dtoa.c
- uninitialized Field::geom_type (and Field::srid as well)
- Wrong call-back function types used in combination with SHOW_FUNC.
Changes in the mysql_show_var_func data type definition were not
properly addressed all around the code by the following commits:
b4ff64568c18feb62fee0ee879ff8a
Adding a helper SHOW_FUNC_ENTRY() function and replacing
all mysql_show_var_func declarations using SHOW_FUNC
to SHOW_FUNC_ENTRY, to catch mysql_show_var_func in the future
at compilation time.
Add a --rocksdb_ignore_datadic_errors plugin option for MyRocks.
The default is 0, and this means MyRocks will call abort() if it detects
a DDL mismatch.
Setting rocksdb_ignore_datadic_errors=1 makes MyRocks to try to ignore the
errors and allow to start the server for repairs.
In ha_rocksdb::open(), check if the number of indexes seen from the
SQL layer matches the number of indexes in the internal MyRocks data
dictionary.
Produce an error if there is a mismatch. (If we don't produce this error,
we are likely to crash as soon as we attempt to use an index)
Drop and add same key is considered rename (look ALTER_RENAME_INDEX in
fill_alter_inplace_info()). But in this case order of keys may be
changed, because mysql_prepare_alter_table() yet does not know about
rename and treats 2 operations: drop and add.
In that case we disable inplace algorithm for such engines as Memory,
MyISAM and Aria with ALTER_INDEX_ORDER flag. These engines have no
specialized check_if_supported_inplace_alter() and default
handler::check_if_supported_inplace_alter() sees an unknown flag and
returns HA_ALTER_INPLACE_NOT_SUPPORTED.
ha_innobase::check_if_supported_inplace_alter() works differently and
inplace is not disabled (with the help of modified
INNOBASE_INPLACE_IGNORE). add_drop_v_cols fork was also tweaked as it
wrongly failed with MSG_UNSUPPORTED_ALTER_ONLINE_ON_VIRTUAL_COLUMN
when it seen ALTER_INDEX_ORDER.
No-op operation must be still no-op no matter of ALTER_INDEX_ORDER
presence, so we tweek its condition as well.
rocksdb_checkpoint_request() should call FlushWAL(sync=true) (which does
write-out and sync), not just SyncWAL() (which just syncs without writing
out)
Followup: the test requires debug sync facility
(This is a backport to 10.5)
Starting with MariaDB 10.5, roughly after MDEV-23855 was fixed,
we are observing sporadic hangs during the execution of the
RESET MASTER statement. We are hoping to fix the hangs with these
changes, but due to the rather infrequent occurrence of the hangs
and our inability to reliably reproduce the hangs, we cannot be
sure of this.
What we do know is that innodb_force_recovery=2 (or a larger setting)
will prevent srv_master_callback (the former srv_master_thread) from
running. In that mode, periodic log flushes would never occur and
RESET MASTER could hang indefinitely. That is demonstrated by the new
test case that was developed by Andrei Elkin. We fix this case by
implementing a special case for it.
This also includes some code cleanup and renames of misleadingly
named code. The interface has nothing to do with log checkpoints in
the storage engine; it is only about requesting log writes to be
persistent.
handlerton::commit_checkpoint_request,
commit_checkpoint_notify_ha(): Remove the unused parameter hton.
log_requests.start: Replaces pending_checkpoint_list.
log_requests.end: Replaces pending_checkpoint_list_end.
log_requests.mutex: Replaces pending_checkpoint_mutex.
log_flush_notify_and_unlock(), log_flush_notify(): Replaces
innobase_mysql_log_notify(). The new implementation should be
functionally equivalent to the old one.
innodb_log_flush_request(): Replaces innobase_checkpoint_request().
Implement a fast path for common cases, and reduce the mutex hold time.
POSSIBLE FIX OF THE HANG: We will invoke commit_checkpoint_notify_ha()
for the current request if it is already satisfied, as well as invoke
log_flush_notify_and_unlock() for any satisfied requests.
log_write(): Invoke log_flush_notify() when the write is already durable.
This was missing WITH_PMEM when the log is in persistent memory.
Reviewed by: Vladislav Vaintroub
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.
This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.
The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.
To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
orig_*, all_set bitmaps are never substituted already.
This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
MY_BITMAP* instead of my_bitmap_map*
These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.
This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.
The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.
To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
orig_*, all_set bitmaps are never substituted already.
This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
MY_BITMAP* instead of my_bitmap_map*
These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
Insert worked incorrect as well. RocksDB used table->record[0] internally to store some
intermediate results for key conversion, during index searching among other operations.
So table->record[0] is spoiled during ha_rnd_index_map in ha_check_overlaps, so in turn
the broken record data was inserted.
The fix is to store RocksDB intermediate result in its own buffer instead of table->record[0].
`rocksdb` MTR suite is is checked and runs fine.
No need for additional tests. The existing overlaps.test covers the case completely.
However, I am not going to add anything related to rocksdb to suite, to keep it away
from additional dependencies.
To run tests with RocksDB engine, one can add following to engines.combinations:
[rocksdb]
plugin-load=$HA_ROCKSDB_SO
default-storage-engine=rocksdb
rocksdb
This reverts commit 6f1f911497.
because it doesn't do anything now (the server doesn't check
my_disable_leak_check) and it never did anything before
(because without `extern` it simply created a local instance of
my_disable_leak_check, did not affect server's my_disable_leak_check).
Rowid Filter check is just like Index Condition Pushdown check: before
we check the filter, we must check if we have walked out of the range
we are scanning. (If we did, we should return, and not continue the scan).
Consequences of this:
- Rowid filtering doesn't work for keys that have partially-covered
blob columns (just like Index Condition Pushdown)
- The rowid filter function has three return values: CHECK_POS (passed)
CHECK_NEG (filtered out), CHECK_OUT_OF_RANGE.
All of the above is implemented in this patch
Prototype change:
- virtual ha_rows records_in_range(uint inx, key_range *min_key,
- key_range *max_key)
+ virtual ha_rows records_in_range(uint inx, const key_range *min_key,
+ const key_range *max_key,
+ page_range *res)
The handler can ignore the page_range parameter. In the case the handler
updates the parameter, the optimizer can deduce the following:
- If previous range's last key is on the same block as next range's first
key
- If the current key range is in one block
- We can also assume that the first and last block read are cached!
This can be used for a better calculation of IO seeks when we
estimate the cost of a range index scan.
The parameter is fully implemented for MyISAM, Aria and InnoDB.
A separate patch will update handler::multi_range_read_info_const() to
take the benefits of this change and also remove the double
records_in_range() calls that are not anymore needed.
This was done to both simplify the code and also to be easier to handle
storage engines that are clustered on some other index than the primary
key.
As pk_is_clustering_key() and is_clustering_key now are using only
index_flags, these where removed from all storage engines.
Lifted long standing limitation to the XA of rolling it back at the
transaction's
connection close even if the XA is prepared.
Prepared XA-transaction is made to sustain connection close or server
restart.
The patch consists of
- binary logging extension to write prepared XA part of
transaction signified with
its XID in a new XA_prepare_log_event. The concusion part -
with Commit or Rollback decision - is logged separately as
Query_log_event.
That is in the binlog the XA consists of two separate group of
events.
That makes the whole XA possibly interweaving in binlog with
other XA:s or regular transaction but with no harm to
replication and data consistency.
Gtid_log_event receives two more flags to identify which of the
two XA phases of the transaction it represents. With either flag
set also XID info is added to the event.
When binlog is ON on the server XID::formatID is
constrained to 4 bytes.
- engines are made aware of the server policy to keep up user
prepared XA:s so they (Innodb, rocksdb) don't roll them back
anymore at their disconnect methods.
- slave applier is refined to cope with two phase logged XA:s
including parallel modes of execution.
This patch does not address crash-safe logging of the new events which
is being addressed by MDEV-21469.
CORNER CASES: read-only, pure myisam, binlog-*, @@skip_log_bin, etc
Are addressed along the following policies.
1. The read-only at reconnect marks XID to fail for future
completion with ER_XA_RBROLLBACK.
2. binlog-* filtered XA when it changes engine data is regarded as
loggable even when nothing got cached for binlog. An empty
XA-prepare group is recorded. Consequent Commit-or-Rollback
succeeds in the Engine(s) as well as recorded into binlog.
3. The same applies to the non-transactional engine XA.
4. @@skip_log_bin=OFF does not record anything at XA-prepare
(obviously), but the completion event is recorded into binlog to
admit inconsistency with slave.
The following actions are taken by the patch.
At XA-prepare:
when empty binlog cache - don't do anything to binlog if RO,
otherwise write empty XA_prepare (assert(binlog-filter case)).
At Disconnect:
when Prepared && RO (=> no binlogging was done)
set Xid_cache_element::error := ER_XA_RBROLLBACK
*keep* XID in the cache, and rollback the transaction.
At XA-"complete":
Discover the error, if any don't binlog the "complete",
return the error to the user.
Kudos
-----
Alexey Botchkov took to drive this work initially.
Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of
good recommendations.
Sergei Voitovich made a magnificent review and improvements to the code.
They all deserve a bunch of thanks for making this work done!
- Add SLEEP() calls to the testcase to make it really test that the
time doesn't change.
- Always use .frm file creation time as a creation timestamp (attempts
to re-use older create_time when the table DDL changes are not good
because then create_time will change after server restart)
- Use the same method names as the upstream patch does
- Use std::atomic for m_update_time