Commit graph

22777 commits

Author SHA1 Message Date
Marko Mäkelä
8cc15c036d Merge 10.4 into 10.5 2019-12-27 21:17:16 +02:00
Marko Mäkelä
b86e0f25f8 Cleanup: Use more page_id_t in crash recovery 2019-12-27 20:44:34 +02:00
Marko Mäkelä
4c25e75ce7 Merge 10.3 into 10.4 2019-12-27 18:20:28 +02:00
Marko Mäkelä
5ab70e7f68 Merge 10.2 into 10.3 2019-12-27 15:14:48 +02:00
Marko Mäkelä
16bce0f6fe Cleanup: Remove dict_delete_tablespace_and_datafiles()
The function was only called by innobase_drop_tablespace(),
which was removed in commit 494e4b99a4
and added in commit 2e814d4702.
2019-12-27 11:23:28 +02:00
Alexander Barkov
4c57ab34d4 Merge remote-tracking branch 'origin/10.3' into 10.4 2019-12-25 13:33:28 +04:00
Eugene Kosov
373443903b MDEV-21362 do something with -fno-builtin-memcmp for rem0cmp.cc
Such CMake check is not relevant as the oldest supported gcc is 4.8 and
compiler issue was solved in gcc-4.6 already.
https://godbolt.org/z/7G64qo

cmp_data(): simplify code
2019-12-24 17:06:04 +08:00
Eugene Kosov
cbc170adf2 MDEV-21362 do something with -fno-builtin-memcmp for rem0cmp.cc
innodb.cmake: restrict -fno-builtin-memcmp for GCC versions older that 4.6
where optimization issue was solved.
2019-12-24 17:03:27 +08:00
Thirunarayanan Balathandayuthapani
90ba87cb9e MDEV-19176 Reduce the memory usage during recovery
- post-push to fix the compilation issue
2019-12-23 15:56:57 +05:30
Thirunarayanan Balathandayuthapani
bba59abb03 MDEV-19176 Reduce the memory usage during recovery
- Moved the recv_sys->heap memory condition inside recv_parse_log_recs().
So that, InnoDB can mark the status as STORE_NO earlier.

- InnoDB uses one third of buffer pool chunk size for reading the redo
log records. In that case, we can avoid the scenario where buffer ran
out of memory issue during recovery.
2019-12-23 15:51:02 +05:30
Marko Mäkelä
a59a015a75 Plug memory leaks from numa_get_mems_allowed() 2019-12-23 07:47:16 +02:00
Marko Mäkelä
73985d8301 Merge 10.1 into 10.2 2019-12-23 07:14:51 +02:00
Eugene Kosov
496532b5c5 MDEV-20950: Fix 32-bit Windows build 2019-12-21 21:36:25 +02:00
Alexey Botchkov
9dadfdcde5 MDEV-14024 PCRE2.
Related changes in the server code.
2019-12-21 10:34:02 +01:00
Sergei Golubchik
aade6e53d3 fix a bad merge
in 10.1+ one should use

MY_CHECK_AND_SET_COMPILER_FLAG("-Wno-address-of-packed-member")

and it's already done in storage/tokudb/PerconaFT/CMakeLists.txt
2019-12-20 19:47:31 +01:00
Marko Mäkelä
8174e68895 MDEV-21371 Assertion failure in page_rec_get_next_low() during innodb_gis.rtree_compress
A debug assertion that was added in
commit ed0793e096
turns out to be too strict. In the test innodb_gis.rtree_compress,4k
the function is sometimes being invoked by purge for a
spatial index root page that is not a leaf page (PAGE_LEVEL is 1).
2019-12-20 19:21:37 +02:00
Marko Mäkelä
ce70573f62 MDEV-21353 Assertion failures due to btr_pcur_restore_pos() misbehaving
In commit befde6e97e
a bogus condition was added, aiming to avoid debug assertion failures
during ALTER TABLE...IMPORT TABLESPACE.

page_cur_delete_rec(): Remove the bogus condition, and replace the
use of buf_block_modify_clock_inc() with a lower-level operation
that skips the debug checks that would fail during IMPORT.
2019-12-20 17:57:32 +02:00
Eugene Kosov
f6b58d2916 MDEV-21239 ASAN use-after-poison in a server shutdown in innodb.innodb_buffer_pool_resize
If we manually poison memory region, ASAN requires that we manually unpoison
it too. We didn't do it and munmap() didn't do it too. That was the issue.

os_mem_free_large(): unpoison memory region. Just in case.

https://github.com/google/sanitizers/issues/182
86acaa9457/compiler-rt/include/sanitizer/asan_interface.h (L21)
2019-12-20 19:59:58 +08:00
Eugene Kosov
305081a735 check I/O buffer size and alignment in InnoDB 2019-12-19 23:46:49 +08:00
Eugene Kosov
a5a433e256 fix windows compilation 2019-12-18 23:35:33 +08:00
Marko Mäkelä
44be8652c5 Cleanup: Remove fil_space_get_flags()
Replace fil_space_get_flags(space) == ULINT_UNDEFINED
with the functionally equivalent fil_space_get_size(space) == 0.
2019-12-18 16:27:26 +02:00
Eugene Kosov
3d3d8f1012 MDEV-21337 fix aligned_malloc()
do not fallback to malloc(), always return properly aligned buffer
2019-12-18 20:09:52 +08:00
Sergei Golubchik
0d70f40bdb Disable -Werror for rocksdb submodule
We cannot have -Werror for third-party submodules because
whenever they break the build we cannot fix it as needed.
2019-12-17 21:22:57 +01:00
Marko Mäkelä
fb4a897fd9 MDEV-12353 preparation: Remove UNIV_LOG_LSN_DEBUG
The debug instrumentation with the MLOG_LSN pseudo-record has not been
used for debugging for years. Let us remove this code now.
It would have to be removed as part of MDEV-12353 or MDEV-14425 anyway,
when implementing a new redo log file format.
2019-12-17 15:39:21 +02:00
Marko Mäkelä
59a088744d Remove unused mlog_catenate_ulint_compressed()
The function was only used by trx_undo_page_init_log()
(writing the MLOG_UNDO_INIT record), which was removed in
commit ccb3550221.
2019-12-16 13:40:00 +02:00
Marko Mäkelä
c24253d0fa MDEV-21174: Fix the WITH_INNODB_EXTRA_DEBUG build
One reference to PageConverter::m_page_zip_ptr was not adjusted
commit 745fd4b39f when the
data member was removed.
2019-12-16 13:14:05 +02:00
Alexander Barkov
3d98892232 Merge remote-tracking branch 'origin/5.5' into 10.1 2019-12-16 13:08:17 +04:00
Marko Mäkelä
28c89b7151 Merge 10.4 into 10.5 2019-12-16 07:47:17 +02:00
Marko Mäkelä
745fd4b39f MDEV-21174: Remove some mlog_write_initial_log_record_fast()
Pass buf_block_t* to more functions that write redo log.

page_zip_write_node_ptr(), page_zip_write_blob_ptr(),
page_zip_compress_write_log_no_data():
Take buf_block_t* as parameter, and do not tolerate mtr=NULL.

page_zip_compress(): Do not tolerate mtr=NULL.

page_zip_dir_insert(): Take page_cur_t* as parameter.

mlog_write_initial_log_record(): Remove. This function was unused.

RecIterator::remove(): Remove the redundant page_zip parameter.

PageConverter::m_page_zip_ptr: Remove.
2019-12-13 18:15:51 +02:00
Marko Mäkelä
2b5a269cb4 MDEV-21174: Clean up record insertion
page_cur_insert_rec_low(): Take page_cur_t* as a parameter,
and do not tolerate mtr=NULL.

page_cur_insert_rec_zip(): Do not tolerate mtr=NULL.
2019-12-13 18:15:51 +02:00
Marko Mäkelä
befde6e97e MDEV-12353 preparation: Clean up page_cur_delete_rec()
page_cur_delete_rec(): Do not tolerate mtr=NULL.

page_delete_rec(): Merge with the only caller, RecIterator::remove().

RecIterator::m_mtr: New data member: a dummy mini-transaction.
2019-12-13 18:15:35 +02:00
Marko Mäkelä
8fa759a576 Merge 10.3 into 10.4
We disable the MDEV-21189 test galera.galera_partition
because it times out.
2019-12-13 17:30:37 +02:00
Eugene Kosov
a9ea0056c7 MDEV-21133: use aligned memcpy in redo log and buffer pool 2019-12-13 21:03:50 +07:00
Sergei Golubchik
794911a27a tokudb: disable check_huge_pages_in_practice()
crashes on Debian 10
2019-12-13 11:24:23 +01:00
Sergei Golubchik
91c3d99804 tokudb: fix to compile with gcc 9.2.0 2019-12-13 11:24:23 +01:00
Marko Mäkelä
3466b47b0d Merge 10.2 into 10.3 2019-12-13 10:08:57 +02:00
Eugene Kosov
f0aa073f2b MDEV-20950 Reduce size of record offsets
offset_t: this is a type which represents one record offset.
It's unsigned short int.

a lot of functions: replace ulint with offset_t

btr_pcur_restore_position_func(),
page_validate(),
row_ins_scan_sec_index_for_duplicate(),
row_upd_clust_rec_by_insert_inherit_func(),
row_vers_impl_x_locked_low(),
trx_undo_prev_version_build():
  allocate record offsets on the stack instead of waiting for rec_get_offsets()
  to allocate it from mem_heap_t. So, reducing  memory allocations.

RECORD_OFFSET, INDEX_OFFSET:
  now it's less convenient to store pointers in offset_t*
  array. One pointer occupies now several offset_t. And those constant are start
  indexes into array to places where to store pointer values

REC_OFFS_HEADER_SIZE: adjusted for the new reality

REC_OFFS_NORMAL_SIZE:
  increase size from 100 to 300 which means less heap allocations.
  And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which
  is smaller than previous 800 bytes.

REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality

rem0rec.h, rem0rec.ic, rem0rec.cc:
  various arguments, return values and local variables types were changed to
  fix numerous integer conversions issues.

enum field_type_t:
  offset types concept was introduces which replaces old offset flags stuff.
  Like in earlier version, 2 upper bits are used to store offset type.
  And this enum represents those types.

REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed

get_type(), set_type(), get_value(), combine():
  these are convenience functions to work with offsets and it's types

rec_offs_base()[0]:
  still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL

rec_offs_base()[i]:
  these have type offset_t now. Two upper bits contains type.
2019-12-13 00:26:50 +07:00
Eugene Kosov
014e125830 optimize crash recovery
recv_dblwr_t::list is used for appending to the beginning and iterating
through its elements. std::deque fits better for that purpose because
it does less allocations than std::forward_list and provides better memory
locality.
2019-12-12 22:19:41 +07:00
Eugene Kosov
ce41a9075a fix a memory leak introduced by f4b4284650 2019-12-12 21:29:51 +07:00
Marko Mäkelä
0a20e5ab77 Merge 10.2 into 10.3 2019-12-12 14:41:51 +02:00
Vladislav Vaintroub
3304004a57 MDEV-21260 Innodb/Wjndows do not report error when trying open volumes on UNC paths
fil_node_t::find_metadata() tries to find out whether file
is on an SSD, and the disk sector size.
On Windows, it opens the corresponding volume for finding this data.

This does not go well, if datadir is on network path/UNC. The volume name
is invalid, CreateFile() function fails, and a cryptic (from the end user
perspective) error is reported. Like this

[ERROR] InnoDB: File \\.\\\workpc\work: 'CreateFile()' returned OS error 203.

The fix is not to report error if open volume failed, and the path was not
on fixed disk, i.e not on HDD or SSD. This is not a fatal error, there is
a fallback anyway.
2019-12-12 11:50:18 +01:00
Eugene Kosov
f4b4284650 MDEV-16678 Prefer MDL to dict_sys.latch for innodb background tasks
Use std::queue backed by std::deque instead of list because it does less
allocations which still having O(1) push and pop operations.

Also store trx_purge_rec_t directly, because its only 16 bytes and allocating
it is to wasteful. This should be faster.

purge_node_t::purge_node_t: container is already empty after creation

purge_node_t::end(): replace clearing container with assertion that it's clear
2019-12-11 23:32:50 +07:00
Vladislav Vaintroub
8d2f6d3ca5 Fix overly chatty connect cmake, once again 2019-12-11 16:57:59 +01:00
Marko Mäkelä
41e6a154ec MDEV-14482 - Cache line contention on ut_rnd_interval()
InnoDB RNG maintains global state, causing otherwise unnecessary bus
traffic. Even worse, this is cross-mutex traffic. That is, different
mutexes suffer from contention.

Fixed delay of 4 was verified to give best throughput by OLTP update
index and read-write benchmarks on Intel Broadwell (2/20/40) and
ARM (1/46/46).

This is a backport of ce04790065 from
MariaDB Server 10.3.
2019-12-10 17:01:36 +02:00
Marko Mäkelä
b1f2d3a8c8 MDEV-21256: Replace the 64-bit LCG with a 32-bit Galois LFSR
We should not need anywhere near 32 bits of entropy, so we might
just limit ourselves to a 32-bit random number generator.

Also, it might be cheaper to use exclusive-or, bit shifting and
conditional jumps, instead of multiplication and addition.

We use relaxed atomic operations on the global random number generator
state in order in an attempt to silence any warnings about race conditions.
There is an obvious race condition between the load and store in
ut_rnd_gen(), but we do not think that it matters much that the
state of the random number generator could 'stutter'.

This change seems makes the 'uncompress_ops' nondeterministic
in innodb_zip.cmp_per_index after the restart. It looks like
there is an inherent race condition in the test, because the
table could be opened for InnoDB statistics recalculation
already before innodb_cmp_per_index_enabled was set. We might
end up having uncompress_ops anywhere between 0 and 9, or perhaps
even more. Let us remove that part of the test.
2019-12-10 16:59:34 +02:00
Marko Mäkelä
d146e3dcfe MDEV-21256: Simplify ut_rnd_interval()
ut_rnd_interval(): Remove the first parameter, which was mostly
passed as 0. Implement as a simple wrapper around ut_rnd_gen().
Trivially return 0 if the size of the interval is smaller than 2.

ut_rnd_ulint_counter, ut_rnd_gen_next_ulint(), ut_rnd_gen_ulint(): Remove.
2019-12-10 16:58:28 +02:00
Marko Mäkelä
51fc8ab73e MDEV-21256: Reduce the use of ut_rnd_gen_next_ulint()
ut_rnd_set_seed(): Unused function; remove.

ut_rnd_gen(): Renamed from page_cur_lcg_prng().

ut_rnd_current: The internal state of ut_rnd_gen().

page_cur_open_on_rnd_user_rec(): Replace linear search with
page_rec_get_nth().
2019-12-10 16:58:28 +02:00
Marko Mäkelä
adb117cf69 MDEV-16678: Fix a problem with duplicate #sql2 table names
row_drop_table_for_mysql(): If a #sql2 table is open in another
thread (purge) while we attempting to drop it, rename it to #sql-ib
name to hide it from the SQL layer.

Adjust tests accordingly to hide #sql-ib tables, which can
continue to exist until the background DROP TABLE completes.
2019-12-10 16:18:30 +02:00
Marko Mäkelä
ea37b14409 MDEV-16678 Prefer MDL to dict_sys.latch for innodb background tasks
This is joint work with Thirunarayanan Balathandayuthapani.
The MDL interface between InnoDB and the rest of the server
(in storage/innobase/dict/dict0dict.cc and in include/)
is my work, while most everything else is Thiru's.

The collection of InnoDB persistent statistics and the
defragmentation were not refactored to use MDL. They will
keep relying on lower-level interlocking with
fil_check_pending_operations().

The purge of transaction history and the background operations on
fulltext indexes will use MDL. We will revert
commit 2c4844c9e7
(MDEV-17813) because thanks to MDL, purge cannot conflict
with DDL operations anymore. For a similar reason, we will remove
the MDEV-16222 test case from gcol.innodb_virtual_debug_purge.

Purge is essentially replacing all use of the global dict_sys.latch
with MDL. Purge will skip the undo log records for tables whose names
start with #sql-ib or #sql2. Theoretically, such tables might
be renamed back to visible table names if TRUNCATE fails to
create a new table, or the final rename in ALTER TABLE...ALGORITHM=COPY
fails. In that case, purge could permanently leave some garbage
in the table. Such garbage will be tolerated; the table would not
be considered corrupted.

To avoid repeated MDL releases and acquisitions,
trx_purge_attach_undo_recs() will sort undo log records by table_id,
and purge_node_t will keep the MDL and table handle open for multiple
successive undo log records.

get_purge_table(): A new accessor, used during the purge of
history for indexed virtual columns. This interface should ideally
not exist at all.

thd_mdl_context(): Accessor of THD::mdl_context.
Wrapped in a new thd_mdl_service.

dict_get_db_name_len(): Define inline.

dict_acquire_mdl_shared(): Acquire explicit shared MDL on a table name
if needed.

dict_table_open_on_id(): Return MDL_ticket, if requested.

dict_table_close(): Release MDL ticket, if requested.

dict_fts_index_syncing(), dict_index_t::index_fts_syncing: Remove.
row_drop_table_for_mysql() no longer needs to check these, because
MDL guarantees that a fulltext index sync will not be in progress
while MDL_EXCLUSIVE is protecting a DDL operation.

dict_table_t::parse_name(): Parse the table name for acquiring MDL.

purge_node_t::undo_recs: Change the type to std::list<trx_purge_rec_t*>
(different container, and storing also roll_ptr).

purge_node_t: Add mdl_ticket, last_table_id, purge_thd, mdl_hold_recs
for acquiring MDL and for keeping the table open across multiple
undo log records.

purge_vcol_info_t, row_purge_store_vsec_cur(), row_purge_restore_vsec_cur():
Remove. We will acquire the MDL earlier.

purge_sys_t::heap: Added, for reading undo log records.

fts_sync_during_ddl(): Invoked during ALGORITHM=INPLACE operations
to ensure that fts_sync_table() will not conflict with MDL_EXCLUSIVE.
Uses fts_t::sync_message for bookkeeping.
2019-12-10 15:42:50 +02:00
Vladislav Vaintroub
66de4fef76 MDEV-16264 - some improvements
- wait notification, tpool_wait_begin/tpool_wait_end - to notify the
threadpool that current thread is going to wait

Use it to wait for IOs to complete and also when purge waits for workers.
2019-12-09 21:12:13 +01:00