Commit graph

450 commits

Author SHA1 Message Date
Marko Mäkelä
39d248fa55 MDEV-16092 Crash in encryption.create_or_replace
If the tablespace is dropped or truncated after the
space->is_stopping() check in fil_crypt_get_page_throttle_func(),
we would proceed to request the page, and eventually report a fatal
error.

buf_page_get_gen(): Do not retry reading if mode==BUF_GET_POSSIBLY_FREED.

lock_rec_block_validate(): Be prepared for a NULL return value when
invoking buf_page_get_gen() with mode=BUF_GET_POSSIBLY_FREED.
2018-05-04 22:44:33 +03:00
Marko Mäkelä
3498a656c9 MDEV-14705: Follow-up fixes
buf_flush_remove(): Disable the output for now, because we
certainly do not want this after every page flush on shutdown.
It must be rate-limited somehow. There already is a timeout
extension for waiting the page cleaner to exit in
logs_empty_and_mark_files_at_shutdown().

log_write_up_to(): Use correct format.

srv_purge_should_exit(): Move the timeout extension to the
appropriate place, from one of the callers.
2018-04-06 12:29:25 +03:00
Daniel Black
1479273cdb MDEV-14705: slow innodb startup/shutdown can exceed systemd timeout
Use systemd EXTEND_TIMEOUT_USEC to advise systemd of progress

Move towards progress measures rather than pure time based measures.

Progress reporting at numberious shutdown/startup locations incuding:
* For innodb_fast_shutdown=0 trx_roll_must_shutdown() for rolling back incomplete transactions.
* For merging the change buffer (in srv_shutdown(bool ibuf_merge))
* For purging history, srv_do_purge

Thanks Marko for feedback and suggestions.
2018-04-06 09:58:14 +03:00
Vicențiu Ciorbaru
24b353162f Merge branch '10.0-galera' into 10.1 2018-03-19 15:21:01 +02:00
Daniel Black
8b54c31486 MDEV-8743: where O_CLOEXEC is available, use for innodb buf_dump
As this is the only moderately critical fopened for writing file,
create an alternate path to use open and fdopen for non-glibc platforms
that support O_CLOEXEC (BSDs).

Tested on Linux (by modifing the GLIBC defination) to take this
alternate path:

$ cd /proc/23874
$ more fdinfo/71
pos:    0
flags:  02100001
mnt_id: 24
$ ls -la fd/71
l-wx------. 1 dan dan 64 Mar 14 13:30 fd/71 -> /dev/shm/var_auto_i7rl/mysqld.1/data/ib_buffer_pool.incomplete
2018-03-15 12:07:43 +02:00
Jan Lindström
564891c532 MDEV-14508: encryption.innodb-compressed-blob failed in buildbot, assertion in btr0cur.cc line 1398
Before that line there is call to buf_page_get_gen that could
return block = NULL when decrypting a page fails. However,
we should set error to be != DB_SUCCESS also. In error log
there was error about decompression but in that code there
is one case where error is not set correctly.
2018-02-09 17:17:32 +02:00
Sergei Golubchik
d4df7bc9b1 Merge branch 'github/10.0' into 10.1 2018-02-02 10:09:44 +01:00
Jan Lindström
c7e5feb259 Merge tag 'mariadb-10.0.34' into 10.0-galera
Conflicts:
	storage/innobase/lock/lock0lock.cc
	storage/xtradb/lock/lock0lock.cc
	storage/xtradb/lock/lock0wait.cc
	support-files/mysql.server.sh
2018-02-01 14:09:48 +02:00
Vicențiu Ciorbaru
b20f821e07 Fix Innodb ASAN error on init
Backport 7c03edf2fe from xtradb to innodb
2018-01-24 15:18:36 +02:00
Vicențiu Ciorbaru
d833bb65d5 Merge remote-tracking branch '5.5' into 10.0 2018-01-24 12:29:31 +02:00
Marko Mäkelä
8637931f11 Add ASAN instrumentation (and more strict Valgrind) to InnoDB
mem_heap_free_heap_top(): Remove UNIV_MEM_ASSERT_W() and unpoison
the memory region first, because part of it may have been poisoned
by an earlier mem_heap_free_top() call.
Poison the address range at the end.

mem_heap_block_free(): Poison the address range at the end.

UNIV_MEM_ASSERT_AND_ALLOC(): Replace with UNIV_MEM_ALLOC().
We want to keep the address ranges poisoned (unaccessible) as
long as possible.

UNIV_MEM_ASSERT_AND_FREE(): Replace with UNIV_MEM_FREE().
2018-01-23 20:34:05 +02:00
Jan Lindström
07aa985979 MDEV-14776: InnoDB Monitor output generated by specific error is flooding error logs
innodb/buf_LRU_get_free_block
	Add debug instrumentation to produce error message about
	no free pages. Print error message only once and do not
	enable innodb monitor.

xtradb/buf_LRU_get_free_block
	Add debug instrumentation to produce error message about
	no free pages. Print error message only once and do not
	enable innodb monitor. Remove code that does not seem to
	be used.

innodb-lru-force-no-free-page.test
	New test case to force produce desired error message.
2018-01-09 12:48:31 +02:00
wlad
b6d72ed44d MDEV-14283 : Fix Solaris 10 build.
- introduce system check for posix_memalign (not available on Solaris 10)
- Disable dtrace probes, to fix weird link errors in mariabackup
2017-11-21 21:14:06 +01:00
Marko Mäkelä
5691109689 Merge 10.0 into 10.1 2017-11-06 18:10:23 +02:00
Marko Mäkelä
51b4366bfb MDEV-13328 ALTER TABLE…DISCARD TABLESPACE takes a lot of time
With a big buffer pool that contains many data pages,
DISCARD TABLESPACE took a long time, because it would scan the
entire buffer pool to remove any pages that belong to the tablespace.
With a large buffer pool, this would take a lot of time, especially
when the table-to-discard is empty.

The minimum amount of work that DISCARD TABLESPACE must do is to
remove the pages of the to-be-discarded table from the
buf_pool->flush_list because any writes to the data file must be
prevented before the file is deleted.

If DISCARD TABLESPACE does not evict the pages from the buffer pool,
then IMPORT TABLESPACE must do it, because we must prevent pre-DISCARD,
not-yet-evicted pages from being mistaken for pages of the imported
tablespace.

It would not be a useful fix to simply move the buffer pool scan to
the IMPORT TABLESPACE step. What we can do is to actively evict those
pages that could be mistaken for imported pages. In this way, when
importing a small table into a big buffer pool, the import should
still run relatively fast.

Import is bypassing the buffer pool when reading pages for the
adjustment phase. In the adjustment phase, if a page exists in
the buffer pool, we could replace it with the page from the imported
file. Unfortunately I did not get this to work properly, so instead
we will simply evict any matching page from the buffer pool.

buf_page_get_gen(): Implement BUF_EVICT_IF_IN_POOL, a new mode
where the requested page will be evicted if it is found. There
must be no unwritten changes for the page.

buf_remove_t: Remove. Instead, use trx!=NULL to signify that a write
to file is desired, and use a separate parameter bool drop_ahi.

buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Replace buf_remove_t.

buf_LRU_remove_pages(), buf_LRU_remove_all_pages(): Remove.

PageConverter::m_mtr: A dummy mini-transaction buffer

PageConverter::PageConverter(): Complete the member initialization list.

PageConverter::operator()(): Evict any 'shadow' pages from the
buffer pool so that pre-existing (garbage) pages cannot be mistaken
for pages that exist in the being-imported file.

row_discard_tablespace(): Remove a bogus comment that seems to
refer to IMPORT TABLESPACE, not DISCARD TABLESPACE.
2017-11-06 18:08:33 +02:00
Marko Mäkelä
57ba66b9ab Remove redundant function parameters
buf_flush_or_remove_pages(), buf_flush_dirty_pages(): Remove the
redundant parameter flush=(trx!=NULL).
2017-11-06 18:08:33 +02:00
Marko Mäkelä
6a524fcfdd MDEV-14140 IMPORT TABLESPACE must not go beyond FSP_FREE_LIMIT
ibuf_check_bitmap_on_import(): Only access the pages that
are below FSP_FREE_LIMIT. It is possible that especially with
ROW_FORMAT=COMPRESSED, the FSP_SIZE will be much bigger than
the FSP_FREE_LIMIT, and the bitmap pages (page_size*N, 1+page_size*N)
are filled with zero bytes.

buf_page_is_corrupted(), buf_page_io_complete(): Make the
fault injection compatible with MariaDB 10.2.

Backport the IMPORT tests from 10.2.
2017-11-06 14:55:34 +02:00
Sachin Setiya
3cecb1bab3 Merge tag 'mariadb-10.0.33' into bb-10.0-galera 2017-11-03 12:34:05 +05:30
Marko Mäkelä
f9b50c0657 MDEV-13512 buf_flush_update_zip_checksum() corrupts SPATIAL INDEX in ROW_FORMAT=COMPRESSED tables
In MariaDB Server 10.1, this problem manifests itself only as
a debug assertion failure in page_zip_decompress() when an insert
requires a page to be decompressed.

In MariaDB 10.1, the encryption of InnoDB data files repurposes the
previously unused field FILE_FLUSH_LSN for an encryption key version.
This field was only used in the first page of each file of the system
tablespace. For ROW_FORMAT=COMPRESSED tables, the field was always
written as 0 until encryption was implemented.

There is no bug in the encryption, because the buffer pool blocks will
not be written to files. Instead, copies of the blocks will be encrypted.
In these encrypted copies, the key version field will be updated before
the buffer is written to the file. The field in the buffer pool is
basically garbage that does not really matter.

Already in MariaDB 10.0, the memset() calls to reset this unused field
in buf_flush_update_zip_checksum() and buf_flush_write_block_low()
are unnecessary, because fsp_init_file_page_low() would guarantee that
the field is always 0 in the buffer pool (unless 10.1 encryption is
used).

Removing the unnecessary memset() calls makes page_zip_decompress()
happy and will prevent a SPATIAL INDEX corruption bug in
MariaDB Server 10.2. In MySQL 5.7.5, as part of WL#6968, the same
field was repurposed for an R-tree split sequence number (SSN) and
these memset() were removed. (Because of the repurposing, MariaDB
encryption is not available for tables that contain SPATIAL INDEX.)
2017-10-06 17:51:29 +03:00
Marko Mäkelä
cd694d76ce Merge 10.0 into 10.1 2017-09-06 15:32:56 +03:00
Marko Mäkelä
6b45355e6b MDEV-13103 Assertion `flags & BUF_PAGE_PRINT_NO_CRASH' failed in buf_page_print
buf_page_print(): Remove the parameter 'flags',
and when a server abort is intended, perform that in the caller.

In this way, page corruption reports due to different reasons
can be distinguished better.

This is non-functional code refactoring that does not fix any
page corruption issues. The change is only made to avoid falsely
grouping together unrelated causes of page corruption.
2017-09-06 14:01:15 +03:00
Jan Lindström
eba0120d8f Fix test failures on embedded server.
Problem was incorrect definition of wsrep_recovery,
trx_sys_update_wsrep_checkpoint and
trx_sys_read_wsrep_checkpoint functions causing
innodb_plugin not to load as there was undefined symbols.
2017-08-31 14:04:02 +03:00
Jan Lindström
b29f26d774 Fix test failures on embedded server.
Problem was incorrect definition of wsrep_recovery,
trx_sys_update_wsrep_checkpoint and
trx_sys_read_wsrep_checkpoint functions causing
innodb_plugin not to load as there was undefined symbols.
2017-08-31 08:38:26 +03:00
Jan Lindström
c23efc7d50 Merge remote-tracking branch 'origin/10.0-galera' into 10.1 2017-08-21 13:35:00 +03:00
Jan Lindström
109b858258 MDEV-13432: Assertion failure in buf0rea.cc line 577
Page read could return DB_PAGE_CORRUPTED error that should
be reported and passed to upper layer. In case of unknown
error code we should print both number and string.
2017-08-17 07:19:12 +03:00
Daniele Sciascia
3ef3c467ad MW-365 Do not load/dump innodb buffer pool with wsrep_recover 2017-08-11 14:15:27 +03:00
Jan Lindström
2ef7a5a13a MDEV-13443: Port innochecksum tests from 10.2 innodb_zip suite to 10.1
This is basically port of WL6045:Improve Innochecksum with some
code refactoring on innochecksum.

Added page0size.h include from 10.2 to make 10.1 vrs 10.2 innochecksum
as identical as possible.

Added page 0 checksum checking and if that fails whole test fails.
2017-08-07 12:39:38 +03:00
Jan Lindström
8b019f87dd MDEV-11939: innochecksum mistakes a file for an encrypted one (page 0 invalid)
Always read full page 0 to determine does tablespace contain
encryption metadata. Tablespaces that are page compressed or
page compressed and encrypted do not compare checksum as
it does not exists. For encrypted tables use checksum
verification written for encrypted tables and normal tables
use normal method.

buf_page_is_checksum_valid_crc32
buf_page_is_checksum_valid_innodb
buf_page_is_checksum_valid_none
	Add Innochecksum logging to file

buf_page_is_corrupted
        Remove ib_logf and page_warn_strict_checksum
        calls in innochecksum compilation. Add innochecksum
        logging to file.

fil0crypt.cc fil0crypt.h
        Modify to be able to use in innochecksum compilation and
	move fil_space_verify_crypt_checksum to end of the file.
	Add innochecksum logging to file.

univ.i
        Add innochecksum strict_verify, log_file and cur_page_num
        variables as extern.

page_zip_verify_checksum
        Add innochecksum logging to file.

innochecksum.cc
        Lot of changes most notable able to read encryption
        metadata from page 0 of the tablespace.

Added test case where we corrupt intentionally
FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION (encryption key version)
FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION+4 (post encryption checksum)
FIL_DATA+10 (data)
2017-08-03 08:29:36 +03:00
Marko Mäkelä
e555540ab6 MDEV-13105 InnoDB fails to load a table with PAGE_COMPRESSION_LEVEL after upgrade from 10.1.20
When using innodb_page_size=16k, InnoDB tables
that were created in MariaDB 10.1.0 to 10.1.20 with
PAGE_COMPRESSED=1 and
PAGE_COMPRESSION_LEVEL=2 or PAGE_COMPRESSION_LEVEL=3
would fail to load.

fsp_flags_is_valid(): When using innodb_page_size=16k, use a
more strict check for .ibd files, with the assumption that
nobody would try to use different-page-size files.
2017-07-05 14:55:56 +03:00
Marko Mäkelä
fa57479fcd Merge 10.0 into 10.1 2017-06-12 14:26:32 +03:00
Marko Mäkelä
417434f12d MDEV-13039 innodb_fast_shutdown=0 may fail to purge all undo log
When a slow shutdown is performed soon after spawning some work for
background threads that can create or commit transactions, it is possible
that new transactions are started or committed after the purge has finished.
This is violating the specification of innodb_fast_shutdown=0, namely that
the purge must be completed. (None of the history of the recent transactions
would be purged.)

Also, it is possible that the purge threads would exit in slow shutdown
while there exist active transactions, such as recovered incomplete
transactions that are being rolled back. Thus, the slow shutdown could
fail to purge some undo log that becomes purgeable after the transaction
commit or rollback.

srv_undo_sources: A flag that indicates if undo log can be generated
or the persistent, whether by background threads or by user SQL.
Even when this flag is clear, active transactions that already exist
in the system may be committed or rolled back.

innodb_shutdown(): Renamed from innobase_shutdown_for_mysql().
Do not return an error code; the operation never fails.
Clear the srv_undo_sources flag, and also ensure that the background
DROP TABLE queue is empty.

srv_purge_should_exit(): Do not allow the purge to exit if
srv_undo_sources are active or the background DROP TABLE queue is not
empty, or in slow shutdown, if any active transactions exist
(and are being rolled back).

srv_purge_coordinator_thread(): Remove some previous workarounds
for this bug.

innobase_start_or_create_for_mysql(): Set buf_page_cleaner_is_active
and srv_dict_stats_thread_active directly. Set srv_undo_sources before
starting the purge subsystem, to prevent immediate shutdown of the purge.
Create dict_stats_thread and fts_optimize_thread immediately
after setting srv_undo_sources, so that shutdown can use this flag to
determine if these subsystems were started.

dict_stats_shutdown(): Shut down dict_stats_thread. Backported from 10.2.

srv_shutdown_table_bg_threads(): Remove (unused).
2017-06-09 16:20:42 +03:00
Jan Lindström
58c56dd7f8 MDEV-12610: MariaDB start is slow
Problem appears to be that the function fsp_flags_try_adjust()
is being unconditionally invoked on every .ibd file on startup.
Based on performance investigation also the top function
fsp_header_get_crypt_offset() needs to addressed.

Ported implementation of fsp_header_get_encryption_offset()
function from 10.2 to fsp_header_get_crypt_offset().

Introduced a new function fil_crypt_read_crypt_data()
to read page 0 if it is not yet read.

fil_crypt_find_space_to_rotate(): Now that page 0 for every .ibd
file is not read on startup we need to check has page 0 read
from space that we investigate for key rotation, if it is not read
we read it.

fil_space_crypt_get_status(): Now that page 0 for every .ibd
file is not read on startup here also we need to read page 0
if it is not yet read it. This is needed
as tests use IS query to wait until background encryption
or decryption has finished and this function is used to
produce results.

fil_crypt_thread(): Add is_stopping condition for tablespace
so that we do not rotate pages if usage of tablespace should
be stopped. This was needed for failure seen on regression
testing.

fil_space_create: Remove page_0_crypt_read and extra
unnecessary info output.

fil_open_single_table_tablespace(): We call fsp_flags_try_adjust
only when when no errors has happened and server was not started
on read only mode and tablespace validation was requested or
flags contain other table options except low order bits to
FSP_FLAGS_POS_PAGE_SSIZE position.

fil_space_t::page_0_crypt_read removed.

Added test case innodb-first-page-read to test startup when
encryption is on and when encryption is off to check that not
for all tables page 0 is read on startup.
2017-06-09 13:15:39 +03:00
Marko Mäkelä
fbeb9489cd Cleanup of MDEV-12600: crash during install_db with innodb_page_size=32K and ibdata1=3M
The doublewrite buffer pages must fit in the first InnoDB system
tablespace data file. The checks that were added in the initial patch
(commit 112b21da37)
were at too high level and did not cover all cases.

innodb.log_data_file_size: Test all innodb_page_size combinations.

fsp_header_init(): Never return an error. Move the change buffer creation
to the only caller that needs to do it.

btr_create(): Clean up the logic. Remove the error log messages.

buf_dblwr_create(): Try to return an error on non-fatal failure.
Check that the first data file is big enough for creating the
doublewrite buffers.

buf_dblwr_process(): Check if the doublewrite buffer is available.
Display the message only if it is available.

recv_recovery_from_checkpoint_start_func(): Remove a redundant message
about FIL_PAGE_FILE_FLUSH_LSN mismatch when crash recovery has already
been initiated.

fil_report_invalid_page_access(): Simplify the message.

fseg_create_general(): Do not emit messages to the error log.

innobase_init(): Revert the changes.

trx_rseg_create(): Refactor (no functional change).
2017-06-08 11:55:47 +03:00
Marko Mäkelä
30df297c2f Merge 10.0 into 10.1
Rewrite the test encryption.innodb-checksum-algorithm not to
require any restarts or re-bootstrapping, and to cover all
innodb_page_size combinations.

Test innodb.101_compatibility with all innodb_page_size combinations.
2017-06-06 10:59:54 +03:00
Jan Lindström
6b6987154a MDEV-12114: install_db shows corruption for rest encryption and innodb_checksum_algorithm=strict_none
Problem was that checksum check resulted false positives that page is
both not encrypted and encryted when checksum_algorithm was
strict_none.

Encrypton checksum will use only crc32 regardless of setting.

buf_zip_decompress: If compression fails report a error message
containing the space name if available (not available during import).
And note if space could be encrypted.

buf_page_get_gen: Do not assert if decompression fails,
instead unfix the page and return NULL to upper layer.

fil_crypt_calculate_checksum: Use only crc32 method.

fil_space_verify_crypt_checksum: Here we need to check
crc32, innodb and none method for old datafiles.

fil_space_release_for_io: Allow null space.

encryption.innodb-compressed-blob is now run with crc32 and none
combinations.

Note that with none and strict_none method there is not really
a way to detect page corruptions and page corruptions after
decrypting the page with incorrect key.

New test innodb-checksum-algorithm to test different checksum
algorithms with encrypted, row compressed and page compressed
tables.
2017-06-01 14:07:48 +03:00
Jan Lindström
1af8bf39ca MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M;
Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for
encrypted pages even in system datafiles should contain key_version
except very first page (0:0) is after encryption overwritten with
flush lsn.

Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1
The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during
InnoDB startup.

At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
from the first page of each file in the InnoDB system tablespace.
If there are multiple files, the minimum and maximum LSN can differ.
These numbers are passed to InnoDB startup.

Having the number in other files than the first file of the InnoDB
system tablespace is not providing much additional value. It is
conflicting with other use of the field, such as on InnoDB R-tree
index pages and encryption key_version.

This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to
other files than the first file of the InnoDB system tablespace
(page number 0:0) when system tablespace is encrypted. If tablespace
is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION
to all first pages of system tablespace to avoid unnecessary
warnings on downgrade.

open_or_create_data_files(): pass only one flushed_lsn parameter

xb_load_tablespaces(): pass only one flushed_lsn parameter.

buf_page_create(): Improve comment about where
FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set.

fil_write_flushed_lsn(): A new function, merged from
fil_write_lsn_and_arch_no_to_file() and
fil_write_flushed_lsn_to_data_files().
Only write to the first page of the system tablespace (page 0:0)
if tablespace is encrypted, or write all first pages of system
tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE)
afterwards.

fil_read_first_page(): read flush_lsn and crypt_data only from
first datafile.

fil_open_single_table_tablespace(): Remove output of LSN, because it
was only valid for the system tablespace and the undo tablespaces, not
user tablespaces.

fil_validate_single_table_tablespace(): Remove output of LSN.

checkpoint_now_set(): Use fil_write_flushed_lsn and output
a error if operation fails.

Remove lsn variable from fsp_open_info.

recv_recovery_from_checkpoint_start(): Remove unnecessary second
flush_lsn parameter.

log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn
and output error if it fails.

open_or_create_data_files(): Pass only one flushed_lsn variable.
2017-06-01 14:07:48 +03:00
Marko Mäkelä
c2ef0bb6ce Merge 5.5 into 10.0 2017-05-29 13:15:36 +03:00
Marko Mäkelä
2cb94aa1b7 MDEV-11626 innodb.innodb-change-buffer-recovery fails for xtradb
buf_page_get_gen(): Remove the error log messages about
page flushing and eviction when
innodb_change_buffering_debug=1 is in effect.
2017-05-29 13:07:23 +03:00
Jan Lindström
90c52e5291 MDEV-12615: InnoDB page compression method snappy mostly does not compress pages
Snappy compression method require that output buffer
used for compression is bigger than input buffer.
Similarly lzo require additional work memory buffer.
Increase the allocated buffer accordingly.

buf_tmp_buffer_t: removed unnecessary lzo_mem, crypt_buf_free and
comp_buf_free.

buf_pool_reserve_tmp_slot: use alligned_alloc and if snappy
available allocate size based on snappy_max_compressed_length and
if lzo is available increase buffer by LZO1X_1_15_MEM_COMPRESS.

fil_compress_page: Remove unneeded lzo mem (we use same buffer)
and if output buffer is not yet allocated allocate based similarly
as above.

Decompression does not require additional work area.

    Modify test to use same test as other compression method tests.
2017-05-20 21:51:34 +03:00
Marko Mäkelä
a4d4a5fe82 After-merge fix for MDEV-11638
In commit 360a4a0372
some debug assertions were introduced to the page flushing code
in XtraDB. Add these assertions to InnoDB as well, and adjust
the InnoDB shutdown so that these assertions will not fail.

logs_empty_and_mark_files_at_shutdown(): Advance
srv_shutdown_state from the first phase SRV_SHUTDOWN_CLEANUP
only after no page-dirtying activity is possible
(well, except by srv_master_do_shutdown_tasks(), which will be
fixed separately in MDEV-12052).

rotate_thread_t::should_shutdown(): Already exit the key rotation
threads at the first phase of shutdown (SRV_SHUTDOWN_CLEANUP).

page_cleaner_sleep_if_needed(): Do not sleep during shutdown.
This change is originally from XtraDB.
2017-05-20 08:41:34 +03:00
Marko Mäkelä
65e1399e64 Merge 10.0 into 10.1
Significantly reduce the amount of InnoDB, XtraDB and Mariabackup
code changes by defining pfs_os_file_t as something that is
transparently compatible with os_file_t.
2017-05-20 08:41:20 +03:00
Marko Mäkelä
13a350ac29 Merge 10.0 into 10.1 2017-05-19 12:29:37 +03:00
Vicențiu Ciorbaru
45898c2092 Merge remote-tracking branch 'origin/10.0' into 10.0 2017-05-18 15:45:55 +03:00
Vicențiu Ciorbaru
b87873b221 Merge branch 'merge-innodb-5.6' into bb-10.0-vicentiu
This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293
from current 5.6.36 innodb.

Bug #23481444	OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE
                       INDEX APPLIED BY UNCOMMITTED ROW
Problem:
========
row_search_for_mysql() does whole table traversal for range query
even though the end range is passed. Whole table traversal happens
when the record is not with in transaction read view.

Solution:
=========

Convert the innodb last record of page to mysql format and compare
with end range if the traversal of row_search_mvcc() exceeds 100,
no ICP involved. If it is out of range then InnoDB can avoid the
whole table traversal. Need to refactor the code little bit to
make it compile.

Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com>
Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com>
RB: 14660
2017-05-17 14:53:28 +03:00
Marko Mäkelä
956d2540c4 Remove redundant UT_LIST_INIT() calls
The macro UT_LIST_INIT() zero-initializes the UT_LIST_NODE.
There is no need to call this macro on a buffer that has
already been zero-initialized by mem_zalloc() or mem_heap_zalloc()
or similar.

For some reason, the statement UT_LIST_INIT(srv_sys->tasks) in
srv_init() caused a SIGSEGV on server startup when compiling with
GCC 7.1.0 for AMD64 using -O3. The zero-initialization was attempted
by the instruction movaps %xmm0,0x50(%rax), while the proper offset
of srv_sys->tasks would seem to have been 0x48.
2017-05-17 10:33:49 +03:00
Vicențiu Ciorbaru
0af9818240 5.6.36 2017-05-15 17:17:16 +03:00
Marko Mäkelä
03dca7a333 Merge 10.0 into 10.1 2017-05-12 13:12:45 +03:00
Marko Mäkelä
ff16609374 MDEV-12674 Innodb_row_lock_current_waits has overflow
There is a race condition related to the variable
srv_stats.n_lock_wait_current_count, which is only
incremented and decremented by the function lock_wait_suspend_thread(),

The incrementing is protected by lock_sys->wait_mutex, but the
decrementing does not appear to be protected by anything.
This mismatch could allow the counter to be corrupted when a
transactional InnoDB table or record lock wait is terminating
roughly at the same time with the start of a wait on a
(possibly different) lock.

ib_counter_t: Remove some unused methods. Prevent instantiation for N=1.
Add an inc() method that takes a slot index as a parameter.

single_indexer_t: Remove.

simple_counter<typename Type, bool atomic=false>: A new counter wrapper.
Optionally use atomic memory operations for modifying the counter.
Aligned to the cache line size.

lsn_ctr_1_t, ulint_ctr_1_t, int64_ctr_1_t: Define as simple_counter<Type>.
These counters are either only incremented (and we do not care about
losing some increment operations), or the increment/decrement operations
are protected by some mutex.

srv_stats_t::os_log_pending_writes: Document that the number is protected
by log_sys->mutex.

srv_stats_t::n_lock_wait_current_count: Use simple_counter<ulint, true>,
that is, atomic inc() and dec() operations.

lock_wait_suspend_thread(): Release the mutexes before incrementing
the counters. Avoid acquiring the lock mutex if the lock wait has
already been resolved. Atomically increment and decrement
srv_stats.n_lock_wait_current_count.

row_insert_for_mysql(), row_update_for_mysql(),
row_update_cascade_for_mysql(): Use the inc() method with the trx->id
as the slot index. This is a non-functional change, just using
inc() instead of add(1).

buf_LRU_get_free_block(): Replace the method add(index, n) with inc().
There is no slot index in the simple_counter.
2017-05-12 12:24:53 +03:00
Marko Mäkelä
9d2c1d09aa MDEV-12253 post-push fix: buf_read_page_low() can return DB_ERROR
The function buf_read_page_low() invokes fil_io(), which can return
DB_ERROR when the requested page is out of bounds (such as when
restoring a buffer pool dump). The callers should be handling that.
2017-05-09 14:36:15 +03:00
Marko Mäkelä
b82c602db5 MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0
This fixes a regression caused by MDEV-12428.
When we introduced a variant of fil_space_acquire() that could
increment space->n_pending_ops after space->stop_new_ops was set,
the logic of fil_check_pending_operations() was broken.

fil_space_t::n_pending_ios: A new field to track read or write
access from the buffer pool routines immediately before a block
write or after a block read in the file system.

fil_space_acquire_for_io(), fil_space_release_for_io(): Similar
to fil_space_acquire_silent() and fil_space_release(), but
modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops.

Adjust a number of places accordingly, and remove some redundant
tablespace lookups.

The following parts of this fix differ from the 10.2 version of this fix:

buf_page_get_corrupt(): Add a tablespace parameter.

In 10.2, we already had a two-phase process of freeing fil_space objects
(first, fil_space_detach(), then release fil_system->mutex, and finally
free the fil_space and fil_node objects).

fil_space_free_and_mutex_exit(): Renamed from fil_space_free().
Detach the tablespace from the fil_system cache, release the
fil_system->mutex, and then wait for space->n_pending_ios to reach 0,
to avoid accessing freed data in a concurrent thread.
During the wait, future calls to fil_space_acquire_for_io() will
not find this tablespace, and the count can only be decremented to 0,
at which point it is safe to free the objects.

fil_node_free_part1(), fil_node_free_part2(): Refactored from
fil_node_free().
2017-04-28 14:12:52 +03:00