Commit graph

890 commits

Author SHA1 Message Date
Marko Mäkelä
5e84ea9634 MDEV-12266: Remove dict_table_is_discarded()
The predicate dict_table_is_discarded() checks whether
ALTER TABLE…DISCARD TABLESPACE has been executed.

Replace most occurrences of dict_table_is_discarded() with
checks of dict_table_t::space. A few checks for the flag
DICT_TF2_DISCARDED are necessary; write them inline.

Because !is_readable() implies !space, some checks for
dict_table_is_discarded() were redundant.
2018-05-12 22:12:12 +03:00
Marko Mäkelä
c57e9835ff Replace dict_col_is_virtual(col) with col->is_virtual() 2018-05-12 22:12:12 +03:00
Marko Mäkelä
2b27ac8282 Fix many -Wunused-parameter
Remove unused InnoDB function parameters and functions.

i_s_sys_virtual_fill_table(): Do not allocate heap memory.

mtr_is_block_fix(): Replace with mtr_memo_contains().

mtr_is_page_fix(): Replace with mtr_memo_contains_page().
2018-05-01 16:52:19 +03:00
Marko Mäkelä
b2c4740034 Fix some -Wsign-conversion
InnoDB was using int64_t instead of ha_rows (unsigned 64-bit).
2018-04-29 17:53:21 +03:00
Marko Mäkelä
9ed2b2b2b8 Do not divide or multiply by srv_page_size
Instead, shift by srv_page_size_shift.
2018-04-28 20:52:22 +03:00
Marko Mäkelä
a90100d756 Replace univ_page_size and UNIV_PAGE_SIZE
Try to use one variable (srv_page_size) for innodb_page_size.

Also, replace UNIV_PAGE_SIZE_SHIFT with srv_page_size_shift.
2018-04-28 20:45:45 +03:00
Marko Mäkelä
ba19764209 Fix most -Wsign-conversion in InnoDB
Change innodb_buffer_pool_size, innodb_fill_factor to unsigned.
2018-04-28 20:45:45 +03:00
Marko Mäkelä
6f88bc4511 MDEV-15914: Use buf_block_t* for undo, not page_t*
trx_undof_page_add_undo_rec_log(): Write the undo page number
directly from the buf_block_t descriptor, not by decoding the
fields in the page frame.
2018-04-26 22:53:33 +03:00
Marko Mäkelä
76c62bc69c MDEV-15914: Restore MLOG_UNDO_INSERT
trx_undof_page_add_undo_rec_log(): Write the MLOG_UNDO_INSERT
record instead of the equivalent MLOG_2BYTES and MLOG_WRITE_STRING.
This essentially reverts commit 9ee8917dfd.

In MariaDB 10.3, I attempted to simplify the crash recovery code
by making use of lower-level redo log records. It turns out that
we must keep the redo log parsing code in order to allow crash-upgrade
from older MariaDB versions (MDEV-14848).

Now, it further turns out that the InnoDB redo log record format is
suboptimal for logging multiple changes to a single page. This simple
change to the redo logging of undo log significantly affects the
INSERT and UPDATE performance.

Essentially, we wrote
	(space_id,page_number,MLOG_2BYTES,2 bytes)
	(space_id,page_number,MLOG_WRITE_STRING,N+4 bytes)
instead of the previously written
	(space_id,page_number,MLOG_UNDO_INSERT,N+2 bytes)

The added redo log volume caused a single-threaded INSERT
(without innodb_adaptive_hash_index) of
1,000,000 rows to consume 11 seconds instead of 9 seconds,
and a subsequent UPDATE of 30,000,000 rows to consume 64 seconds
instead of 58 seconds. If we omitted all redo logging for the
undo log, the INSERT would consume only 4 seconds.
2018-04-26 22:53:33 +03:00
Marko Mäkelä
83bd4dd1ee MDEV-15914: Remove trx_t::undo_mutex
The trx_t::undo_mutex covered both some main-memory data structures
(trx_undo_t) and access to undo pages. The trx_undo_t is only
accessed by the thread that is associated with a running transaction.
Likewise, each transaction has its private set of undo pages.
The thread that is associated with an active transaction may
lock multiple undo pages concurrently, but no other thread may
lock multiple pages of a foreign transaction.

Concurrent access to the undo logs of an active transaction is possible,
but trx_undo_get_undo_rec_low() only locks one undo page at a time,
without ever holding any undo_mutex.

It seems that the trx_t::undo_mutex would have been necessary if
multi-threaded execution or rollback of a single transaction
had been implemented in InnoDB.
2018-04-26 22:53:33 +03:00
Marko Mäkelä
f7cac5e26c MDEV-12288/MDEV-15132/MDEV-15158: Adjust a comment 2018-04-26 22:53:33 +03:00
Marko Mäkelä
ff0000cdd2 MDEV-15914: Remove trx_undo_t::empty
Use the value trx_undo_t::top_undo_no == IB_ID_MAX for indicating
that an undo log is empty.
2018-04-26 22:53:33 +03:00
Marko Mäkelä
e3fb8e9569 Remove trx_t::undo_rseg_space
The field undo_rseg_space was only used in a debug check.

trx_roll_check_undo_rec_ordering(): Remove.
2018-04-25 07:56:11 +03:00
Marko Mäkelä
7396dfcca7 Merge 10.2 into 10.3 2018-04-24 20:59:57 +03:00
Marko Mäkelä
7b5543b21d MDEV-15030 Add ASAN instrumentation to trx_t Pool
Pool::mem_free(): Poison the freed memory. Assert that it was
fully initialized, because the reuse of trx_t objects will
assume that the objects were previously initialized.

Pool::~Pool(), Pool::get(): Unpoison the allocated memory,
and mark it initialized.

trx_free(): After invoking Pool::mem_free(), unpoison
trx_t::mutex and trx_t::undo_mutex, because MutexMonitor
will access these even for freed trx_t objects.
2018-04-24 20:33:27 +03:00
Marko Mäkelä
de942c9f61 MDEV-15983 Reduce fil_system.mutex contention further
fil_space_t::n_pending_ops, n_pending_ios: Use a combination of
fil_system.mutex and atomic memory access for protection.

fil_space_t::release(): Replaces fil_space_release().
Does not acquire fil_system.mutex.

fil_space_t::release_for_io(): Replaces fil_space_release_for_io().
Does not acquire fil_system.mutex.
2018-04-23 13:15:54 +03:00
Marko Mäkelä
c6ba758d1d Merge 10.2 into 10.3 2018-04-23 09:49:58 +03:00
Thirunarayanan Balathandayuthapani
211842dd86 MDEV-15374 Server hangs and aborts with long semaphore wait or assertion `len < ((ulint) srv_page_size)' fails in trx_undo_rec_copy upon ROLLBACK on temporary table
Problem:
=======
InnoDB cleans all temporary undo logs during commit. During rollback
of secondary index entry, InnoDB tries to build the previous version
of clustered index. It leads to access of freed undo page during
previous transaction commit and it leads to undo log corruption.

Solution:
=========
During rollback, temporary undo logs should not try to build
the previous version of the record.
2018-04-23 11:22:58 +05:30
Marko Mäkelä
d71a8855ee Merge 10.2 to 10.3
Temporarily disable main.cte_recursive due to hang in
an added test related to MDEV-15575.
2018-04-19 15:23:21 +03:00
Thirunarayanan Balathandayuthapani
341edddc3d MDEV-15826 Purge attempts to free BLOB page after BEGIN;INSERT;UPDATE;ROLLBACK
- During rollback, redo segments priorities over no-redo rollback
segments and it leads to failure of redo rollback segment undo
logs truncation.
2018-04-18 12:39:39 +05:30
Marko Mäkelä
97e51d24cb MDEV-13697 DB_TRX_ID is not always reset
The rollback of the modification of a pre-existing record
should involve a purge-like operation. Before MDEV-12288
the only purge-like operation was the removal of a
delete-marked record.

After MDEV-12288, any rollback of updating an existing record
must reset the DB_TRX_ID column when it is no longer visible
in the purge read view.

row_vers_must_preserve_del_marked(): Remove. It is cleaner to
perform the check directly in row0umod.cc.

row_trx_id_offset(): Auxiliary function to retrieve the byte
offset of DB_TRX_ID in a clustered index leaf page record.

row_undo_mod_must_purge(): Determine if a record should be purged.

row_undo_mod_clust(): For temporary tables, skip the purge checks.
When rolling back an update so that the original record was not
delete-marked, reset DB_TRX_ID if the history is no longer visible.
2018-04-15 14:51:26 +03:00
Vicențiu Ciorbaru
65eefcdc60 Merge remote-tracking branch '10.2' into 10.3 2018-04-12 12:41:19 +03:00
Marko Mäkelä
dd127799bc MDEV-15832 With innodb_fast_shutdown=3, skip the rollback of connected transactions
row_undo_step(): If innodb_fast_shutdown=3 has been requested,
abort the rollback of any non-DDL transactions. Starting with
MDEV-12323, we aborted the rollback of recovered transactions. The
transactions would be rolled back on subsequent server startup.

trx_roll_report_progress(): Renamed from trx_roll_must_shutdown(),
now that the shutdown check has been moved to the only caller.

trx_commit_low(): Allow mtr=NULL for transactions that are aborted
on rollback.

trx_rollback_finish(): Clean up aborted transactions to avoid
assertion failures and memory leaks on shutdown. This code was
previously in trx_rollback_active().

trx_rollback_to_savepoint_low(), trx_rollback_for_mysql_low():
Remove some redundant assertions.
2018-04-11 05:39:36 +03:00
Vicențiu Ciorbaru
45e6d0aebf Merge branch '10.1' into 10.2 2018-04-10 17:43:18 +03:00
Marko Mäkelä
8eff803a1b Revert "MDEV-14705: Do not rollback on InnoDB shutdown"
This reverts commit 76ec37f522.

This behaviour change will be done separately in:
MDEV-15832 With innodb_fast_shutdown=3, skip the rollback
of connected transactions
2018-04-10 08:55:20 +03:00
Eugene Kosov
1513630d30 remove dead code 2018-04-09 17:21:21 +03:00
Marko Mäkelä
0c8d6fd66c MDEV-15364 FOREIGN CASCADE operations in system versioned referenced tables
Merge pull request #667
2018-04-09 11:02:24 +03:00
Marko Mäkelä
df44e75b42 Minor clean-up of purge code
purge_sys_t::n_submitted: Document that it is only accessed by
srv_purge_coordinator_thread.

purge_sys_t::n_completed: Exclusively use my_atomic access.

srv_task_execute(): Simplify the code.

srv_purge_coordinator_thread(): Test the cheaper condition first.

trx_purge(): Atomically access purge_sys.n_completed.
Remove some code duplication.

trx_purge_wait_for_workers_to_complete(): Atomically access
purge_sys.n_completed. Remove an unnecessary local variable.

trx_purge_stop(): Remove a redundant assignment.
2018-04-08 18:11:49 +03:00
Daniel Black
1479273cdb MDEV-14705: slow innodb startup/shutdown can exceed systemd timeout
Use systemd EXTEND_TIMEOUT_USEC to advise systemd of progress

Move towards progress measures rather than pure time based measures.

Progress reporting at numberious shutdown/startup locations incuding:
* For innodb_fast_shutdown=0 trx_roll_must_shutdown() for rolling back incomplete transactions.
* For merging the change buffer (in srv_shutdown(bool ibuf_merge))
* For purging history, srv_do_purge

Thanks Marko for feedback and suggestions.
2018-04-06 09:58:14 +03:00
Marko Mäkelä
76ec37f522 MDEV-14705: Do not rollback on InnoDB shutdown
row_undo_step(): If fast shutdown has been requested, abort the
rollback of any non-DDL transactions. Starting with MDEV-12323,
we aborted the rollback of recovered transactions. These
transactions would be rolled back on subsequent server startup.

trx_roll_report_progress(): Renamed from trx_roll_must_shutdown(),
now that the shutdown check has been moved to the only caller.
2018-04-06 09:58:14 +03:00
Sergey Vojtovich
e6a9ce2759 MDEV-15773 - Simplified away trx_sys_t::m_views
Use trx_sys_t::trx_list instead.
2018-04-04 14:09:37 +04:00
Sergey Vojtovich
3d5f7ad23a MDEV-15773 - Simplified away trx_free_for_(mysql|background) 2018-04-04 14:09:37 +04:00
Sergey Vojtovich
0993d6b81b MDEV-15773 - trx_sys.mysql_trx_list -> trx_sys.trx_list
Replaced "list of transactions created for MySQL" with "list of all
transactions". This simplifies code and allows further removal of
trx_sys.m_views.
2018-04-04 14:09:37 +04:00
Sergey Vojtovich
061c767cce MDEV-15773 - Simplified away trx_t::in_mysql_trx_list 2018-04-04 14:09:37 +04:00
Sergey Vojtovich
d6d58836bb MDEV-15773 - trx_allocate_for_background() -> trx_create()
trx_free_resurrected(): Remove, unused function
2018-04-04 14:09:37 +04:00
Marko Mäkelä
4cad42392a MDEV-12266: Change dict_table_t::space to fil_space_t*
InnoDB always keeps all tablespaces in the fil_system cache.
The fil_system.LRU is only for closing file handles; the
fil_space_t and fil_node_t for all data files will remain
in main memory. Between startup to shutdown, they can only be
created and removed by DDL statements. Therefore, we can
let dict_table_t::space point directly to the fil_space_t.

dict_table_t::space_id: A numeric tablespace ID for the corner cases
where we do not have a tablespace. The most prominent examples are
ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file.

There are a few functional differences; most notably:
(1) DROP TABLE will delete matching .ibd and .cfg files,
even if they were not attached to the data dictionary.
(2) Some error messages will report file names instead of numeric IDs.

There still are many functions that use numeric tablespace IDs instead
of fil_space_t*, and many functions could be converted to fil_space_t
member functions. Also, Tablespace and Datafile should be merged with
fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use
fil_space_t& instead of a numeric ID, and after moving to a single
buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to
fil_space_t::page_hash.

FilSpace: Remove. Only few calls to fil_space_acquire() will remain,
and gradually they should be removed.

mtr_t::set_named_space_id(ulint): Renamed from set_named_space(),
to prevent accidental calls to this slower function. Very few
callers remain.

fseg_create(), fsp_reserve_free_extents(): Take fil_space_t*
as a parameter instead of a space_id.

fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(),
fil_name_write_rename(), fil_rename_tablespace(). Mariabackup
passes the parameter log=false; InnoDB passes log=true.

dict_mem_table_create(): Take fil_space_t* instead of space_id
as parameter.

dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter
'status' with 'bool cached'.

dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name.

fil_ibd_open(): Return the tablespace.

fil_space_t::set_imported(): Replaces fil_space_set_imported().

truncate_t: Change many member function parameters to fil_space_t*,
and remove page_size parameters.

row_truncate_prepare(): Merge to its only caller.

row_drop_table_from_cache(): Assert that the table is persistent.

dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL
if the tablespace has been discarded.

row_import_update_discarded_flag(): Remove a constant parameter.
2018-03-29 22:02:05 +03:00
Marko Mäkelä
9043bec954 MDEV-12266: Make trx_rseg_t::space a pointer
trx_rsegf_get(), trx_undo_get_first_rec(): Change the parameter to
fil_space_t* so that fewer callers need to be adjusted.

trx_undo_free_page(): Remove the redundant parameter 'space'.
2018-03-29 20:47:41 +03:00
Marko Mäkelä
39ed074317 MDEV-12266: Remove trx_undo_t::space 2018-03-29 20:47:41 +03:00
Marko Mäkelä
332e805e2c MDEV-12266: Refactor trx_rseg_header_create() 2018-03-29 20:47:38 +03:00
Marko Mäkelä
c577192d6c MDEV-12266: fsp_flags_try_adjust(): Remove a lookup
fsp_header_init(): Take fil_space_t* as a parameter.
2018-03-29 20:47:29 +03:00
Marko Mäkelä
2ac8b1a907 MDEV-12266: Add fil_system.sys_space, temp_space
Add fil_system_t::sys_space, fil_system_t::temp_space.
These will replace lookups for TRX_SYS_SPACE or SRV_TMP_SPACE_ID.

mtr_t::m_undo_space, mtr_t::m_sys_space: Remove.

mtr_t::set_sys_modified(): Remove.

fil_space_get_type(), fil_space_get_n_reserved_extents(): Remove.

fsp_header_get_tablespace_size(), fsp_header_inc_size():
Merge to the only caller, innobase_start_or_create_for_mysql().
2018-03-29 19:18:11 +03:00
Sergey Vojtovich
4faf34ad63 Clean-up trx_sys.mutex misuse
Currently trx_sys.mutex protects only trx_sys.mysql_trx_list and
trx_sys.m_views, which are not accessed by lock0lock debug routines.
Thus there's no need to bother trx_sys.mutex here.

Removed trx_assert_started(): this assertion is fully covered by
check_trx_state().
2018-03-29 12:24:42 +04:00
Sergey Vojtovich
b36da48ad3 MDEV-15612 - Latching violation in trx_roll_must_shutdown
recv_sys_t::mutex and rw_trx_hash_elementi_t::mutex were acquired
in reverse (to recorded) order.

Fixed by releasing recv_sys_t::mutex, before iterating rw_trx_hash.
Statistics gathering doesn't really need recv_sys_t::mutex protection,
since it is always done in one thread (trx_roll_crash_recv_trx) and
thus it can't go wrong.
2018-03-29 12:24:42 +04:00
Marko Mäkelä
1924594b80 Minor (mainly non-functional) cleanup 2018-03-28 17:32:21 +02:00
Sergei Golubchik
b1818dccf7 Merge branch '10.2' into 10.3 2018-03-28 17:31:57 +02:00
Teemu Ollakka
33aad1d273 MDEV-15505 Fixes to compilation without -DWITH_WSREP:BOOL=ON
Removed including wsrep_api.h from service_wsrep.h. This caused
various kinds of collisions with definitions when wsrep is
not supposed to be built in. Defined functions wsrep_xid_seqno()
and wsrep_xid_uuid() in wsrep_dummy.cc. Replaced wsrep_seqno_t
with long long where wsrep_api.h is not included.

Removed wsrep_xid_seqno() macro from wsrep_mysqld.h and made
wsrep code using wsrep_xid_seqno() in handler.cc to be compiled
in only if WITH_WSREP is ON.

Included wsrep_api.h for mariabackup if WITH_WSREP is ON.
2018-03-21 12:02:09 +02:00
Eugene Kosov
fd73c6dda4 Vers IB: Mark unversioned fields in system versioning tables
Some fields in system-versioned table may be unversioned.
SQL layer marks unversioned.
And this patch makes InnoDB mark unversioned too because of two reasons:
1) by default fields are versioned
2) most of fields are expected to be versioned

dtype_t::vers_sys_field(): fixed return true on row_start/row_end
dict_col_t::vers_sys_field(): fixed return true on row_start/row_end
2018-03-19 17:32:54 +03:00
Marko Mäkelä
bd7ed1b923 MDEV-13935 INSERT stuck at state Unlocking tables
Revert the dead code for MySQL 5.7 multi-master replication (GCS),
also known as
WL#6835: InnoDB: GCS Replication: Deterministic Deadlock Handling
(High Prio Transactions in InnoDB).

Also, make innodb_lock_schedule_algorithm=vats skip SPATIAL INDEX,
because the code does not seem to be compatible with them.

Add FIXME comments to some SPATIAL INDEX locking code. It looks
like Galera write-set replication might not work with SPATIAL INDEX.
2018-03-16 15:50:04 +02:00
Teemu Ollakka
b125ae0a84 MDEV-15505 New wsrep XID format for backwards compatibility
A new wsrep XID format was added to keep the XID implementation
backwards compatible. Original version always reads XID seqno
part in host byte order, the new version in little endian byte
order. Wsrep XID will always be written in the new format.

Included wsrep_api.h from service_wsrep.h for wsrep type definitions.
Removed redundant wsrep XID code from mariabackup and included
service_wsrep.h in order to use
2018-03-12 14:51:49 +02:00
Teemu Ollakka
dd74b94823 MDEV-15505 Fix wsrep XID seqno byte order
The problem is that the seqno part of wsrep XID is always
stored in host byte order. This may cause issues when a physical
backup is restored on a host with different architecture, the
seqno part with XID may have incorrect value.

In order to fix this, wsrep XID seqno is always written into
XID data buffer in little endian byte order using int8store()
and read from data buffer using sint8korr(). For backwards
compatibility the seqno is read from TRX_SYS page in host
byte order during upgrade.

This patch implements byte ordering in wsrep_xid_init(),
wsrep_xid_seqno(), and exposes functions to read wsrep
XID uuid and seqno in wsrep_service_st. Backwards compatibility
for upgrade is provided in trx_rseg_init_wsrep_xid().
2018-03-12 14:46:20 +02:00