Commit graph

229 commits

Author SHA1 Message Date
Marko Mäkelä
09af00cbde MDEV-13564: Remove old crash-upgrade logic in 10.4
Stop supporting the additional *trunc.log files that were
introduced via MySQL 5.7 to MariaDB Server 10.2 and 10.3.

DB_TABLESPACE_TRUNCATED: Remove.

purge_sys.truncate: A new structure to track undo tablespace
file truncation.

srv_start(): Remove the call to buf_pool_invalidate(). It is
no longer necessary, given that we no longer access things in
ways that violate the ARIES protocol. This call was originally
added for innodb_file_format, and it may later have been necessary
for the proper function of the MySQL 5.7 TRUNCATE recovery, which
we are now removing.

trx_purge_cleanse_purge_queue(): Take the undo tablespace as a
parameter.

trx_purge_truncate_history(): Rewrite everything mostly in a
single function, replacing references to undo::Truncate.

recv_apply_hashed_log_recs(): If any redo log is to be applied,
and if the log_sys.log.subformat indicates that separately
logged truncate may have been used, refuse to proceed except if
innodb_force_recovery is set. We will still refuse crash-upgrade
if TRUNCATE TABLE was logged. Undo tablespace truncation would
only be logged in undo*trunc.log files, which we are no longer
checking for.
2018-09-11 21:32:15 +03:00
Marko Mäkelä
67fa97dc2c Merge 10.3 into 10.4 2018-09-11 21:31:47 +03:00
Marko Mäkelä
1bf3e8ab43 Merge 10.3 into 10.4 2018-09-11 21:31:03 +03:00
Marko Mäkelä
5a1868b58d MDEV-13564 Mariabackup does not work with TRUNCATE
This is a merge from 10.2, but the 10.2 version of this will not
be pushed into 10.2 yet, because the 10.2 version would include
backports of MDEV-14717 and MDEV-14585, which would introduce
a crash recovery regression: Tables could be lost on
table-rebuilding DDL operations, such as ALTER TABLE,
OPTIMIZE TABLE or this new backup-friendly TRUNCATE TABLE.
The test innodb.truncate_crash occasionally loses the table due to
the following bug:

MDEV-17158 log_write_up_to() sometimes fails
2018-09-07 22:15:06 +03:00
Marko Mäkelä
e67b1070bb MDEV-17049 Enable innodb_undo tests on buildbot
Remove the innodb_undo suite, and move and adapt the tests.
Remove unnecessary restarts, and add innodb_page_size_small.inc
for combinations.

innodb.undo_truncate is the merge of innodb_undo.truncate
and innodb_undo.truncate_multi_client.

Add the global status variable innodb_undo_truncations.
Without this, the test innodb.undo_truncate would occasionally
report that truncation did not happen. The test was only waiting
for the history list length to reach 0, but the undo tablespace
truncation would only take place some time after that.

Undo tablespace truncation will only occasionally occur with
innodb_page_size=32k, and typically never occur (with this amount
of undo log operations) with innodb_page_size=64k. We disable
these combinations.

innodb.undo_truncate_recover was formerly called
innodb_undo.truncate_recover.
2018-09-07 22:10:02 +03:00
Marko Mäkelä
4901f31c13 Merge 10.2 into 10.3 2018-09-07 22:09:28 +03:00
Marko Mäkelä
59950df533 Remove some debug-only global status variables 2018-09-07 22:06:20 +03:00
Igor Babaev
cab1d63826 Merge branch '10.3' into 10.4 2018-06-03 10:34:41 -07:00
Marko Mäkelä
13f7ac2269 MDEV-15705 Remove global status counter Innodb_pages0_read
MDEV-9931 introduced a counter for keeping track of reads of the
first page of InnoDB data files, because the original implementation
of data-at-rest-encryption for InnoDB introduced new code paths for
reading the pages.

Ultimately, the extra reads of the first page were removed, and
the encryption subsystem will be initialized whenever we first read
the first page of each data file, in fil_node_open_file(). It should not
be that interesting to observe how many times an InnoDB data file was
opened for the first time.
2018-05-28 09:36:18 +03:00
Marko Mäkelä
6f01c42fd6 Merge 10.2 into 10.3 2018-05-24 22:36:40 +03:00
Marko Mäkelä
1c8c6bcd6f MDEV-13779 InnoDB fails to shut down purge, causing hang
thd_destructor_proxy(): Ensure that purge actually exits,
like the logic should have been ever since MDEV-14080.

srv_purge_shutdown(): A new function to wait for the
purge coordinator to exit. Before exiting, the
purge coordinator will ensure that all purge workers have exited.
2018-05-24 15:30:22 +03:00
Marko Mäkelä
2b27ac8282 Fix many -Wunused-parameter
Remove unused InnoDB function parameters and functions.

i_s_sys_virtual_fill_table(): Do not allocate heap memory.

mtr_is_block_fix(): Replace with mtr_memo_contains().

mtr_is_page_fix(): Replace with mtr_memo_contains_page().
2018-05-01 16:52:19 +03:00
Marko Mäkelä
8bbcc0d505 MDEV-12218: Put back mariabackup --innodb-flush-method
Implement innodb_flush_method as an enum parameter in Mariabackup,
instead of ignoring the option and hard-wiring it to a default value.

xb0xb.h: Remove. Only xtrabackup.cc refers to the enum parameters.

innodb_flush_method_names[], innodb_flush_method_typelib[]:
Define as non-static, so that mariabackup can share the definitions.

srv_file_flush_method: Change the type to ulong, to match the
assignment in init_one_value() and handle_options() in mariabackup.
2018-04-30 18:22:52 +03:00
Marko Mäkelä
baa5a43d8c MDEV-16045: Replace log_group_t with log_t::files
There is only one log_sys and only one log_sys.log.

log_t::files::create(): Replaces log_init().

log_t::files::close(): Replaces log_group_close(), log_group_close_all().

fil_close_log_files(): if (free) log_sys.log_close();
The callers that passed free=true used to call log_group_close_all().

log_header_read(): Replaces log_group_header_read().

log_t::files::file_header_bufs_ptr: Use a single allocation.

log_t::files::file_header_bufs[]: Statically allocate the pointers.

log_t::files::set_fields(): Replaces log_group_set_fields().

log_t::files::calc_lsn_offset(): Replaces log_group_calc_lsn_offset().
Simplify the computation by using fewer variables.

log_t::files::read_log_seg(): Replaces log_group_read_log_seg().

log_sys_t::complete_checkpoint(): Replaces log_io_complete_checkpoint().

fil_aio_wait(): Move the logic from log_io_complete().
2018-04-29 09:46:24 +03:00
Marko Mäkelä
d73a898d64 MDEV-16045: Allocate log_sys statically
There is only one redo log subsystem in InnoDB. Allocate the object
statically, to avoid unnecessary dereferencing of the pointer.

log_t::create(): Renamed from log_sys_init().

log_t::close(): Renamed from log_shutdown().

log_t::checkpoint_buf_ptr: Remove. Allocate log_t::checkpoint_buf
statically.
2018-04-29 09:46:24 +03:00
Marko Mäkelä
715e4f4320 MDEV-12218 Clean up InnoDB parameter validation
Bind more InnoDB parameters directly to MYSQL_SYSVAR and
remove "shadow variables".

innodb_change_buffering: Declare as ENUM, not STRING.

innodb_flush_method: Declare as ENUM, not STRING.

innodb_log_buffer_size: Bind directly to srv_log_buffer_size,
without rounding it to a multiple of innodb_page_size.

LOG_BUFFER_SIZE: Remove.

SysTablespace::normalize_size(): Renamed from normalize().

innodb_init_params(): A new function to initialize and validate
InnoDB startup parameters.

innodb_init(): Renamed from innobase_init(). Invoke innodb_init_params()
before actually trying to start up InnoDB.

srv_start(bool): Renamed from innobase_start_or_create_for_mysql().
Added the input parameter create_new_db.

SRV_ALL_O_DIRECT_FSYNC: Define only for _WIN32.

xb_normalize_init_values(): Merge to innodb_init_param().
2018-04-29 09:41:42 +03:00
Marko Mäkelä
a90100d756 Replace univ_page_size and UNIV_PAGE_SIZE
Try to use one variable (srv_page_size) for innodb_page_size.

Also, replace UNIV_PAGE_SIZE_SHIFT with srv_page_size_shift.
2018-04-28 20:45:45 +03:00
Marko Mäkelä
1292a01c7f Simplify simple_counter
Introduce a separate simple_atomic_counter
2018-04-28 20:43:31 +03:00
Sergei Golubchik
b1818dccf7 Merge branch '10.2' into 10.3 2018-03-28 17:31:57 +02:00
Eugene Kosov
8d32959b09 fix data races
srv_last_monitor_time: make all accesses relaxed atomical

WARNING: ThreadSanitizer: data race (pid=12041)
  Write of size 8 at 0x000003949278 by thread T26 (mutexes: write M226445748578513120):
    #0 thd_destructor_proxy storage/innobase/handler/ha_innodb.cc:314:14 (mysqld+0x19b5505)

  Previous read of size 8 at 0x000003949278 by main thread:
    #0 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4180:11 (mysqld+0x1a03404)
    #1 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc5ec73)
    #2 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x134908d)
    #3 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13484f0)
    #4 init_server_components() sql/mysqld.cc:5345:7 (mysqld+0xbf720f)
    #5 mysqld_main(int, char**) sql/mysqld.cc:5940:7 (mysqld+0xbf107d)
    #6 main sql/main.cc:25:10 (mysqld+0xbe971b)

  Location is global 'srv_running' of size 8 at 0x000003949278 (mysqld+0x000003949278)

WARNING: ThreadSanitizer: data race (pid=27869)
  Atomic write of size 4 at 0x7b4800000c00 by thread T8:
    #0 __tsan_atomic32_exchange llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:589 (mysqld+0xbd4eac)
    #1 TTASEventMutex<GenericPolicy>::exit() storage/innobase/include/ib0mutex.h:467:7 (mysqld+0x1a8d4cb)
    #2 PolicyMutex<TTASEventMutex<GenericPolicy> >::exit() storage/innobase/include/ib0mutex.h:609:10 (mysqld+0x1a7839e)
    #3 fil_validate() storage/innobase/fil/fil0fil.cc:5535:2 (mysqld+0x1abd913)
    #4 fil_validate_skip() storage/innobase/fil/fil0fil.cc:204:9 (mysqld+0x1aba601)
    #5 fil_aio_wait(unsigned long) storage/innobase/fil/fil0fil.cc:5296:2 (mysqld+0x1abbae6)
    #6 io_handler_thread storage/innobase/srv/srv0start.cc:340:3 (mysqld+0x21abe1e)

  Previous read of size 4 at 0x7b4800000c00 by main thread (mutexes: write M1273, write M1271):
    #0 TTASEventMutex<GenericPolicy>::state() const storage/innobase/include/ib0mutex.h:530:10 (mysqld+0x21c66e2)
    #1 sync_array_detect_deadlock(sync_array_t*, sync_cell_t*, sync_cell_t*, unsigned long) storage/innobase/sync/sync0arr.cc:746:14 (mysqld+0x21c1c7a)
    #2 sync_array_wait_event(sync_array_t*, sync_cell_t*&) storage/innobase/sync/sync0arr.cc:465:6 (mysqld+0x21c1708)
    #3 TTASEventMutex<GenericPolicy>::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:516:6 (mysqld+0x1a8c206)
    #4 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:635:10 (mysqld+0x1a782c3)
    #5 fil_mutex_enter_and_prepare_for_io(unsigned long) storage/innobase/fil/fil0fil.cc:1131:3 (mysqld+0x1a9a92e)
    #6 fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*, bool) storage/innobase/fil/fil0fil.cc:5082:2 (mysqld+0x1ab8de2)
    #7 buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1112:3 (mysqld+0x1cb970a)
    #8 buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1270:3 (mysqld+0x1cb7d70)
    #9 buf_flush_try_neighbors(page_id_t const&, buf_flush_t, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1493:9 (mysqld+0x1cc9674)
    #10 buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:1565:13 (mysqld+0x1cbadf3)
    #11 buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1825:3 (mysqld+0x1cbbcb8)
    #12 buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:1895:16 (mysqld+0x1cbb459)
    #13 buf_flush_do_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:2065:2 (mysqld+0x1cbcfe1)
    #14 buf_flush_lists(unsigned long, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:2167:8 (mysqld+0x1cbd5a3)
    #15 log_preflush_pool_modified_pages(unsigned long) storage/innobase/log/log0log.cc:1400:13 (mysqld+0x1eefc3b)
    #16 log_make_checkpoint_at(unsigned long, bool) storage/innobase/log/log0log.cc:1751:10 (mysqld+0x1eefb16)
    #17 buf_dblwr_create() storage/innobase/buf/buf0dblwr.cc:335:2 (mysqld+0x1cd2141)
    #18 innobase_start_or_create_for_mysql() storage/innobase/srv/srv0start.cc:2539:10 (mysqld+0x21b4d8e)
    #19 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4193:8 (mysqld+0x1a5e3d7)
    #20 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc74d33)
    #21 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x1376d5d)
    #22 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13761c0)
    #23 init_server_components() sql/mysqld.cc:5348:7 (mysqld+0xc0d0ff)
    #24 mysqld_main(int, char**) sql/mysqld.cc:5943:7 (mysqld+0xc06f9d)
    #25 main sql/main.cc:25:10 (mysqld+0xbff71b)

WARNING: ThreadSanitizer: data race (pid=29031)
  Write of size 8 at 0x0000039e48e0 by thread T15:
    #0 srv_monitor_thread storage/innobase/srv/srv0srv.cc:1699:24 (mysqld+0x21a254e)

  Previous write of size 8 at 0x0000039e48e0 by thread T14:
    #0 srv_refresh_innodb_monitor_stats() storage/innobase/srv/srv0srv.cc:1165:24 (mysqld+0x21a3124)
    #1 srv_error_monitor_thread storage/innobase/srv/srv0srv.cc:1836:3 (mysqld+0x21a2d40)

  Location is global 'srv_last_monitor_time' of size 8 at 0x0000039e48e0 (mysqld+0x0000039e48e0)
2018-03-22 14:42:15 +04:00
Daniel Black
9b8d0d9ff9 MDEV-11455: test case for status variable innodb_buffer_pool_load_incomplete
Add innodb debug system variable, innodb_buffer_pool_load_pages_abort, to test
the behaviour of innodb_buffer_pool_load_incomplete.
(innodb_buufer_pool_dump_abort_loads.test)
2018-02-22 15:50:50 +11:00
Daniel Black
8440e8fa98 MDEV-11455: create status variable innodb_buffer_pool_load_incomplete
This status variable indicates that an innodb buffer pool load never
completed and dumping at shutdown would result in an incomplete dump file.

This status variable is set to 1 once a buffer pool loads. Upon a successful
load this status variable returns to 0.

With this status variable set, the system variable
innodb_buffer_pool_dump_at_shutdown==1 will have no effect as dumping after
an incomplete load will generate a less complete dump file than the current
one.

If a user aborts a buffer pool load by changing the system variable
innodb_buffer_pool_load_abort=1 will cause the the status variable
innodb_buffer_pool_load_incomplete to remain set to 1.

A shutdown that occurs while innodb is loading the buffer pool will
not save the buffer pool on shutdown.

A user may indirectly set innodb_buffer_pool_load_incomplete
to 0 by:
* Forcing a load, by setting innodb_buffer_pool_load_now=ON, or
* Forcing a dump, by setting innodb_buffer_pool_dump_now=ON

This will enable the next dump on shutdown to complete.

Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
2018-02-22 15:50:47 +11:00
Sergey Vojtovich
64048bafe0 Removed purge_trx_id_age and purge_view_trx_id_age
These were unused status variables available in debug builds only.
Also removed trx_sys.rw_max_trx_id: not used anymore.
2018-01-20 16:10:37 +04:00
Sergei Golubchik
e52a237fe9 remove ifdefs around PSI_THREAD_CALL
same change as for PSI_TABLE_CALL
2018-01-09 14:21:20 +03:00
Alexander Barkov
835cbbcc7b Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3
TODO: enable MDEV-13049 optimization for 10.3
2017-10-30 20:47:39 +04:00
Marko Mäkelä
1b478a7aba MDEV-13311 Presence of old logs in 10.2.7 will corrupt restored instance (change in behavior)
Mariabackup 10.2.7 would delete the redo log files after a successful
--prepare operation. If the user is manually copying the prepared files
instead of using the --copy-back option, it could happen that some old
redo log file would be preserved in the restored location. These old
redo log files could cause corruption of the restored data files when
the server is started up.

We prevent this scenario by creating a "poisoned" redo log file
ib_logfile0 at the end of the --prepare step. The poisoning consists
of simply truncating the file to an empty file. InnoDB will refuse
to start up on an empty redo log file.

copy_back(): Delete all redo log files in the target if the source
file ib_logfile0 is empty. (Previously we did this if the source
file is missing.)

SRV_OPERATION_RESTORE_EXPORT: A new variant of SRV_OPERATION_RESTORE
when the --export option is specified. In this mode, we will keep
deleting all redo log files, instead of truncating the first one.

delete_log_files(): Add a parameter for the first file to delete,
to be passed as 0 or 1.

innobase_start_or_create_for_mysql(): In mariabackup --prepare,
tolerate an empty ib_logfile0 file. Otherwise, require the first
redo log file to be longer than 4 blocks (2048 bytes). Unless
--export was specified, truncate the first log file at the
end of --prepare.
2017-10-10 15:54:11 +03:00
Marko Mäkelä
a4948dafcd MDEV-11369 Instant ADD COLUMN for InnoDB
For InnoDB tables, adding, dropping and reordering columns has
required a rebuild of the table and all its indexes. Since MySQL 5.6
(and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing
concurrent modification of the tables.

This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT
and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously,
with only minor changes performed to the table structure. The counter
innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS
is incremented whenever a table rebuild operation is converted into
an instant ADD COLUMN operation.

ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN.

Some usability limitations will be addressed in subsequent work:

MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY
and ALGORITHM=INSTANT
MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE

The format of the clustered index (PRIMARY KEY) is changed as follows:

(1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT,
and a new field PAGE_INSTANT will contain the original number of fields
in the clustered index ('core' fields).
If instant ADD COLUMN has not been used or the table becomes empty,
or the very first instant ADD COLUMN operation is rolled back,
the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset
to 0 and FIL_PAGE_INDEX.

(2) A special 'default row' record is inserted into the leftmost leaf,
between the page infimum and the first user record. This record is
distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the
same format as records that contain values for the instantly added
columns. This 'default row' always has the same number of fields as
the clustered index according to the table definition. The values of
'core' fields are to be ignored. For other fields, the 'default row'
will contain the default values as they were during the ALTER TABLE
statement. (If the column default values are changed later, those
values will only be stored in the .frm file. The 'default row' will
contain the original evaluated values, which must be the same for
every row.) The 'default row' must be completely hidden from
higher-level access routines. Assertions have been added to ensure
that no 'default row' is ever present in the adaptive hash index
or in locked records. The 'default row' is never delete-marked.

(3) In clustered index leaf page records, the number of fields must
reside between the number of 'core' fields (dict_index_t::n_core_fields
introduced in this work) and dict_index_t::n_fields. If the number
of fields is less than dict_index_t::n_fields, the missing fields
are replaced with the column value of the 'default row'.
Note: The number of fields in the record may shrink if some of the
last instantly added columns are updated to the value that is
in the 'default row'. The function btr_cur_trim() implements this
'compression' on update and rollback; dtuple::trim() implements it
on insert.

(4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new
status value REC_STATUS_COLUMNS_ADDED will indicate the presence of
a new record header that will encode n_fields-n_core_fields-1 in
1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header
always explicitly encodes the number of fields.)

We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for
covering the insert of the 'default row' record when instant ADD COLUMN
is used for the first time. Subsequent instant ADD COLUMN can use
TRX_UNDO_UPD_EXIST_REC.

This is joint work with Vin Chen (陈福荣) from Tencent. The design
that was discussed in April 2017 would not have allowed import or
export of data files, because instead of the 'default row' it would
have introduced a data dictionary table. The test
rpl.rpl_alter_instant is exactly as contributed in pull request #408.
The test innodb.instant_alter is based on a contributed test.

The redo log record format changes for ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPACT are as contributed. (With this change present,
crash recovery from MariaDB 10.3.1 will fail in spectacular ways!)
Also the semantics of higher-level redo log records that modify the
PAGE_INSTANT field is changed. The redo log format version identifier
was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1.

Everything else has been rewritten by me. Thanks to Elena Stepanova,
the code has been tested extensively.

When rolling back an instant ADD COLUMN operation, we must empty the
PAGE_FREE list after deleting or shortening the 'default row' record,
by calling either btr_page_empty() or btr_page_reorganize(). We must
know the size of each entry in the PAGE_FREE list. If rollback left a
freed copy of the 'default row' in the PAGE_FREE list, we would be
unable to determine its size (if it is in ROW_FORMAT=COMPACT or
ROW_FORMAT=DYNAMIC) because it would contain more fields than the
rolled-back definition of the clustered index.

UNIV_SQL_DEFAULT: A new special constant that designates an instantly
added column that is not present in the clustered index record.

len_is_stored(): Check if a length is an actual length. There are
two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL.

dict_col_t::def_val: The 'default row' value of the column.  If the
column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT.

dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(),
instant_value().

dict_col_t::remove_instant(): Remove the 'instant ADD' status of
a column.

dict_col_t::name(const dict_table_t& table): Replaces
dict_table_get_col_name().

dict_index_t::n_core_fields: The original number of fields.
For secondary indexes and if instant ADD COLUMN has not been used,
this will be equal to dict_index_t::n_fields.

dict_index_t::n_core_null_bytes: Number of bytes needed to
represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable).

dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that
n_core_null_bytes was not initialized yet from the clustered index
root page.

dict_index_t: Add the accessors is_instant(), is_clust(),
get_n_nullable(), instant_field_value().

dict_index_t::instant_add_field(): Adjust clustered index metadata
for instant ADD COLUMN.

dict_index_t::remove_instant(): Remove the 'instant ADD' status
of a clustered index when the table becomes empty, or the very first
instant ADD COLUMN operation is rolled back.

dict_table_t: Add the accessors is_instant(), is_temporary(),
supports_instant().

dict_table_t::instant_add_column(): Adjust metadata for
instant ADD COLUMN.

dict_table_t::rollback_instant(): Adjust metadata on the rollback
of instant ADD COLUMN.

prepare_inplace_alter_table_dict(): First create the ctx->new_table,
and only then decide if the table really needs to be rebuilt.
We must split the creation of table or index metadata from the
creation of the dictionary table records and the creation of
the data. In this way, we can transform a table-rebuilding operation
into an instant ADD COLUMN operation. Dictionary objects will only
be added to cache when table rebuilding or index creation is needed.
The ctx->instant_table will never be added to cache.

dict_table_t::add_to_cache(): Modified and renamed from
dict_table_add_to_cache(). Do not modify the table metadata.
Let the callers invoke dict_table_add_system_columns() and if needed,
set can_be_evicted.

dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the
system columns (which will now exist in the dict_table_t object
already at this point).

dict_create_table_step(): Expect the callers to invoke
dict_table_add_system_columns().

pars_create_table(): Before creating the table creation execution
graph, invoke dict_table_add_system_columns().

row_create_table_for_mysql(): Expect all callers to invoke
dict_table_add_system_columns().

create_index_dict(): Replaces row_merge_create_index_graph().

innodb_update_n_cols(): Renamed from innobase_update_n_virtual().
Call my_error() if an error occurs.

btr_cur_instant_init(), btr_cur_instant_init_low(),
btr_cur_instant_root_init():
Load additional metadata from the clustered index and set
dict_index_t::n_core_null_bytes. This is invoked
when table metadata is first loaded into the data dictionary.

dict_boot(): Initialize n_core_null_bytes for the four hard-coded
dictionary tables.

dict_create_index_step(): Initialize n_core_null_bytes. This is
executed as part of CREATE TABLE.

dict_index_build_internal_clust(): Initialize n_core_null_bytes to
NO_CORE_NULL_BYTES if table->supports_instant().

row_create_index_for_mysql(): Initialize n_core_null_bytes for
CREATE TEMPORARY TABLE.

commit_cache_norebuild(): Call the code to rename or enlarge columns
in the cache only if instant ADD COLUMN is not being used.
(Instant ADD COLUMN would copy all column metadata from
instant_table to old_table, including the names and lengths.)

PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields.
This is repurposing the 16-bit field PAGE_DIRECTION, of which only the
least significant 3 bits were used. The original byte containing
PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B.

page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT.

page_ptr_get_direction(), page_get_direction(),
page_ptr_set_direction(): Accessors for PAGE_DIRECTION.

page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION.

page_direction_increment(): Increment PAGE_N_DIRECTION
and set PAGE_DIRECTION.

rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes,
and assume that heap_no is always set.
Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records,
even if the record contains fewer fields.

rec_offs_make_valid(): Add the parameter 'leaf'.

rec_copy_prefix_to_dtuple(): Assert that the tuple is only built
on the core fields. Instant ADD COLUMN only applies to the
clustered index, and we should never build a search key that has
more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR.
All these columns are always present.

dict_index_build_data_tuple(): Remove assertions that would be
duplicated in rec_copy_prefix_to_dtuple().

rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose
number of fields is between n_core_fields and n_fields.

cmp_rec_rec_with_match(): Implement the comparison between two
MIN_REC_FLAG records.

trx_t::in_rollback: Make the field available in non-debug builds.

trx_start_for_ddl_low(): Remove dangerous error-tolerance.
A dictionary transaction must be flagged as such before it has generated
any undo log records. This is because trx_undo_assign_undo() will mark
the transaction as a dictionary transaction in the undo log header
right before the very first undo log record is being written.

btr_index_rec_validate(): Account for instant ADD COLUMN

row_undo_ins_remove_clust_rec(): On the rollback of an insert into
SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the
last column from the table and the clustered index.

row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(),
trx_undo_update_rec_get_update(): Handle the 'default row'
as a special case.

dtuple_t::trim(index): Omit a redundant suffix of an index tuple right
before insert or update. After instant ADD COLUMN, if the last fields
of a clustered index tuple match the 'default row', there is no
need to store them. While trimming the entry, we must hold a page latch,
so that the table cannot be emptied and the 'default row' be deleted.

btr_cur_optimistic_update(), btr_cur_pessimistic_update(),
row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low():
Invoke dtuple_t::trim() if needed.

row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling
row_ins_clust_index_entry_low().

rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number
of fields to be between n_core_fields and n_fields. Do not support
infimum,supremum. They are never supposed to be stored in dtuple_t,
because page creation nowadays uses a lower-level method for initializing
them.

rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the
number of fields.

btr_cur_trim(): In an update, trim the index entry as needed. For the
'default row', handle rollback specially. For user records, omit
fields that match the 'default row'.

btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete():
Skip locking and adaptive hash index for the 'default row'.

row_log_table_apply_convert_mrec(): Replace 'default row' values if needed.
In the temporary file that is applied by row_log_table_apply(),
we must identify whether the records contain the extra header for
instantly added columns. For now, we will allocate an additional byte
for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table
has been subject to instant ADD COLUMN. The ROW_T_DELETE records are
fine, as they will be converted and will only contain 'core' columns
(PRIMARY KEY and some system columns) that are converted from dtuple_t.

rec_get_converted_size_temp(), rec_init_offsets_temp(),
rec_convert_dtuple_to_temp(): Add the parameter 'status'.

REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED:
An info_bits constant for distinguishing the 'default row' record.

rec_comp_status_t: An enum of the status bit values.

rec_leaf_format: An enum that replaces the bool parameter of
rec_init_offsets_comp_ordinary().
2017-10-06 09:50:10 +03:00
Marko Mäkelä
e17a282da9 Merge bb-10.2-ext into 10.3 2017-09-18 11:38:07 +03:00
Marko Mäkelä
d9277732d7 Merge 10.1 into 10.2
This should also fix the MariaDB 10.2.2 bug
MDEV-13826 CREATE FULLTEXT INDEX on encrypted table fails.

MDEV-12634 FIXME: Modify innodb-index-online, innodb-table-online
so that they will write and read merge sort files. InnoDB 5.7
introduced some optimizations to avoid using the files for small tables.

Many collation test results have been adjusted for MDEV-10191.
2017-09-17 11:05:33 +03:00
Jan Lindström
fa2701c6f7 MDEV-12634: Uninitialised ROW_MERGE_RESERVE_SIZE bytes written to tem…
…porary file

Fixed by removing writing key version to start of every block that
was encrypted. Instead we will use single key version from log_sys
crypt info.

After this MDEV also blocks writen to row log are encrypted and blocks
read from row log aren decrypted if encryption is configured for the
table.

innodb_status_variables[], struct srv_stats_t
	Added status variables for merge block and row log block
	encryption and decryption amounts.

Removed ROW_MERGE_RESERVE_SIZE define.

row_merge_fts_doc_tokenize
	Remove ROW_MERGE_RESERVE_SIZE

row_log_t
	Add index, crypt_tail, crypt_head to be used in case of
	encryption.

row_log_online_op, row_log_table_close_func
	Before writing a block encrypt it if encryption is enabled

row_log_table_apply_ops, row_log_apply_ops
	After reading a block decrypt it if encryption is enabled

row_log_allocate
	Allocate temporary buffers crypt_head and crypt_tail
	if needed.

row_log_free
	Free temporary buffers crypt_head and crypt_tail if they
	exist.

row_merge_encrypt_buf, row_merge_decrypt_buf
	Removed.

row_merge_buf_create, row_merge_buf_write
	Remove ROW_MERGE_RESERVE_SIZE

row_merge_build_indexes
	Allocate temporary buffer used in decryption and encryption
	if needed.

log_tmp_blocks_crypt, log_tmp_block_encrypt, log_temp_block_decrypt
	New functions used in block encryption and decryption

log_tmp_is_encrypted
	New function to check is encryption enabled.

Added test case innodb-rowlog to force creating a row log and
verify that operations are done using introduced status
variables.
2017-09-14 09:23:20 +03:00
Jan Lindström
016c35a7f2 MDEV-13690: Remove unnecessary innodb_use_mtflush, innodb_mtflush_threads parameters and related code
Users can use innodb-page-cleaners instead.
2017-09-01 18:33:46 +03:00
Sergei Golubchik
bb8e99fdc3 Merge branch 'bb-10.2-ext' into 10.3 2017-08-26 00:34:43 +02:00
Marko Mäkelä
59caf2c3c1 MDEV-13485 MTR tests fail massively with --innodb-sync-debug
The parameter --innodb-sync-debug, which is disabled by default,
aims to find potential deadlocks in InnoDB.

When the parameter is enabled, lots of tests failed. Most of these
failures were due to bogus diagnostics. But, as part of this fix,
we are also fixing a bug in error handling code and removing dead
code, and fixing cases where an uninitialized mutex was being
locked and unlocked.

dict_create_foreign_constraints_low(): Remove an extraneous
mutex_exit() call that could cause corruption in an error handling
path. Also, do not unnecessarily acquire dict_foreign_err_mutex.
Its only purpose is to control concurrent access to
dict_foreign_err_file.

row_ins_foreign_trx_print(): Replace a redundant condition with a
debug assertion.

srv_dict_tmpfile, srv_dict_tmpfile_mutex: Remove. The
temporary file is never being written to or read from.

log_free_check(): Allow SYNC_FTS_CACHE (fts_cache_t::lock)
to be held.

ha_innobase::inplace_alter_table(), row_merge_insert_index_tuples():
Assert that no unexpected latches are being held.

sync_latch_meta_init(): Properly initialize dict_operation_lock_key
at SYNC_DICT_OPERATION. dict_sys->mutex is SYNC_DICT, and
the now-removed SRV_DICT_TMPFILE was wrongly registered at
SYNC_DICT_OPERATION.

buf_block_init(): Correctly register buf_block_t::debug_latch.
It was previously misleadingly reported as LATCH_ID_DICT_FOREIGN_ERR.

latch_level_t: Correct the relative latching order of
SYNC_IBUF_PESS_INSERT_MUTEX,SYNC_INDEX_TREE and
SYNC_FILE_FORMAT_TAG,SYNC_DICT_OPERATION to avoid bogus failures.

row_drop_table_for_mysql(): Avoid accessing btr_defragment_mutex
if the defragmentation thread has not been started. This is the
case during fts_drop_orphaned_tables() in recv_recovery_rollback_active().

fil_space_destroy_crypt_data(): Avoid acquiring fil_crypt_threads_mutex
when it is uninitialized. We may have created crypt_data before the
mutex was created, and the mutex creation would be skipped if
InnoDB startup failed or --innodb-read-only was specified.
2017-08-23 08:44:11 +03:00
Marko Mäkelä
e9e051d297 Follow-up fix to MDEV-12988 backup fails if innodb_undo_tablespaces>0
The fix broke mariabackup --prepare --incremental.

The restore of an incremental backup starts up (parts of) InnoDB twice.
First, all data files are discovered for applying .delta files. Then,
after the .delta files have been applied, InnoDB will be restarted
more completely, so that the redo log records will be applied via the
buffer pool.

During the first startup, the buffer pool is not initialized, and thus
trx_rseg_get_n_undo_tablespaces() must not be invoked. The apply of
the .delta files will currently assume that the --innodb-undo-tablespaces
option correctly specifies the number of undo tablespace files, just
like --backup does.

The second InnoDB startup of --prepare for applying the redo log will
properly invoke trx_rseg_get_n_undo_tablespaces().

enum srv_operation_mode: Add SRV_OPERATION_RESTORE_DELTA for
distinguishing the apply of .delta files from SRV_OPERATION_RESTORE.

srv_undo_tablespaces_init(): In mariabackup --prepare --incremental,
in the initial SRV_OPERATION_RESTORE_DELTA phase, do not invoke
trx_rseg_get_n_undo_tablespaces() because the buffer pool or the
redo logs are not available. Instead, blindly rely on the parameter
--innodb-undo-tablespaces.
2017-08-18 09:12:04 +03:00
Marko Mäkelä
57fea53615 Merge bb-10.2-ext into 10.3 2017-07-07 12:39:43 +03:00
Alexander Barkov
3b9273d203 Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3 2017-07-05 17:43:32 +04:00
Marko Mäkelä
8c71c6aa8b MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2
InnoDB I/O and buffer pool interfaces and the redo log format
have been changed between MariaDB 10.1 and 10.2, and the backup
code has to be adjusted accordingly.

The code has been simplified, and many memory leaks have been fixed.
Instead of the file name xtrabackup_logfile, the file name ib_logfile0
is being used for the copy of the redo log. Unnecessary InnoDB startup and
shutdown and some unnecessary threads have been removed.

Some help was provided by Vladislav Vaintroub.

Parameters have been cleaned up and aligned with those of MariaDB 10.2.

The --dbug option has been added, so that in debug builds,
--dbug=d,ib_log can be specified to enable diagnostic messages
for processing redo log entries.

By default, innodb_doublewrite=OFF, so that --prepare works faster.
If more crash-safety for --prepare is needed, double buffering
can be enabled.

The parameter innodb_log_checksums=OFF can be used to ignore redo log
checksums in --backup.

Some messages have been cleaned up.
Unless --export is specified, Mariabackup will not deal with undo log.
The InnoDB mini-transaction redo log is not only about user-level
transactions; it is actually about mini-transactions. To avoid confusion,
call it the redo log, not transaction log.

We disable any undo log processing in --prepare.

Because MariaDB 10.2 supports indexed virtual columns, the
undo log processing would need to be able to evaluate virtual column
expressions. To reduce the amount of code dependencies, we will not
process any undo log in prepare.

This means that the --export option must be disabled for now.

This also means that the following options are redundant
and have been removed:
	xtrabackup --apply-log-only
	innobackupex --redo-only

In addition to disabling any undo log processing, we will disable any
further changes to data pages during --prepare, including the change
buffer merge. This means that restoring incremental backups should
reliably work even when change buffering is being used on the server.
Because of this, preparing a backup will not generate any further
redo log, and the redo log file can be safely deleted. (If the
--export option is enabled in the future, it must generate redo log
when processing undo logs and buffered changes.)

In --prepare, we cannot easily know if a partial backup was used,
especially when restoring a series of incremental backups. So, we
simply warn about any missing files, and ignore the redo log for them.

FIXME: Enable the --export option.

FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write
a test that initiates a backup while an ALGORITHM=INPLACE operation
is creating indexes or rebuilding a table. An error should be detected
when preparing the backup.

FIXME: In --incremental --prepare, xtrabackup_apply_delta() should
ensure that if FSP_SIZE is modified, the file size will be adjusted
accordingly.
2017-07-05 11:43:28 +03:00
Marko Mäkelä
84e4e4506f Reduce the granularity of innodb_log_file_size
In Mariabackup, we would want the backed-up redo log file size to be
a multiple of 512 bytes, or OS_FILE_LOG_BLOCK_SIZE. However, at startup,
InnoDB would be picky, requiring the file size to be a multiple of
innodb_page_size.

Furthermore, InnoDB would require the parameter to be a multiple of
one megabyte, while the minimum granularity is 512 bytes. Because
the data-file-oriented fil_io() API is being used for writing the
InnoDB redo log, writes will for now require innodb_log_file_size to
be a multiple of the maximum innodb_page_size (65536 bytes).

To complicate matters, InnoDB startup divided srv_log_file_size by
UNIV_PAGE_SIZE, so that initially, the unit was bytes, and later it
was innodb_page_size. We will simplify this and keep srv_log_file_size
in bytes at all times.

innobase_log_file_size: Remove. Remove some obsolete checks against
overflow on 32-bit systems. srv_log_file_size is always 64 bits, and
the maximum size 512GiB in multiples of innodb_page_size always fits
in ulint (which is 32 or 64 bits). 512GiB would be 8,388,608*64KiB or
134,217,728*4KiB.

log_init(): Remove the parameter file_size that was always passed as
srv_log_file_size.

log_set_capacity(): Add a parameter for passing the requested file size.

srv_log_file_size_requested: Declare static in srv0start.cc.

create_log_file(), create_log_files(),
innobase_start_or_create_for_mysql(): Invoke fil_node_create()
with srv_log_file_size expressed in multiples of innodb_page_size.

innobase_start_or_create_for_mysql(): Require the redo log file sizes
to be multiples of 512 bytes.
2017-06-29 23:15:05 +03:00
Marko Mäkelä
b3171607e3 Avoid InnoDB messages about recovery after creating redo logs
srv_log_files_created: A debug flag to ensure that InnoDB redo log
files can only be created once in the server lifetime, and that
after log files have been created, no crash recovery will take place.

recv_scan_log_recs(): Detect the special case where the log consists
of a sole MLOG_CHECKPOINT record, such as immediately after creating
the redo logs.

recv_recovery_from_checkpoint_start(): Skip the recovery message
if the redo log is logically empty.
2017-06-28 11:58:43 +03:00
Marko Mäkelä
3e1d0ff574 Fix a merge error in commit 8f643e2063
A merge error caused InnoDB bootstrap to fail when
innodb_undo_tablespaces was set to more than 2.
This was because of a bug that was introduced to
srv_undo_tablespaces_init() by the merge.

Furthermore, some adjustments for Oracle Bug#25551311 aka
Bug#23517560 changes were forgotten. We must minimize direct
references to srv_undo_tablespaces_open and use predicates
instead.

srv_undo_tablespaces_init(): Increment srv_undo_tablespaces_open
once, not twice, per loop iteration.

is_system_or_undo_tablespace(): Remove (unused function).

is_predefined_tablespace(): Invoke srv_is_undo_tablespace().
2017-06-27 21:23:12 +03:00
Marko Mäkelä
1e3886ae80 Merge bb-10.2-ext into 10.3 2017-06-19 17:28:08 +03:00
Marko Mäkelä
0c92794db3 Remove deprecated InnoDB file format parameters
The following options will be removed:

innodb_file_format
innodb_file_format_check
innodb_file_format_max
innodb_large_prefix

They have been deprecated in MySQL 5.7.7 (and MariaDB 10.2.2) in WL#7703.

The file_format column in two INFORMATION_SCHEMA tables will be removed:

innodb_sys_tablespaces
innodb_sys_tables

Code to update the file format tag at the end of page 0:5
(TRX_SYS_PAGE in the InnoDB system tablespace) will be removed.
When initializing a new database, the bytes will remain 0.

All references to the Barracuda file format will be removed.
Some references to the Antelope file format (meaning
ROW_FORMAT=REDUNDANT or ROW_FORMAT=COMPACT) will remain.

This basically ports WL#7704 from MySQL 8.0.0 to MariaDB 10.3.1:

commit 4a69dc2a95995501ed92d59a1de74414a38540c6
Author: Marko Mäkelä <marko.makela@oracle.com>
Date:   Wed Mar 11 22:19:49 2015 +0200
2017-06-02 09:36:14 +03:00
Marko Mäkelä
3d615e4b1a Merge branch 'bb-10.2-ext' into 10.3
This excludes MDEV-12472 (InnoDB should accept XtraDB parameters,
warning that they are ignored). In other words, MariaDB 10.3 will not
recognize any XtraDB-specific parameters.
2017-06-02 09:36:04 +03:00
Marko Mäkelä
8f643e2063 Merge 10.1 into 10.2 2017-05-23 11:09:47 +03:00
Marko Mäkelä
65e1399e64 Merge 10.0 into 10.1
Significantly reduce the amount of InnoDB, XtraDB and Mariabackup
code changes by defining pfs_os_file_t as something that is
transparently compatible with os_file_t.
2017-05-20 08:41:20 +03:00
Vicențiu Ciorbaru
b87873b221 Merge branch 'merge-innodb-5.6' into bb-10.0-vicentiu
This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293
from current 5.6.36 innodb.

Bug #23481444	OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE
                       INDEX APPLIED BY UNCOMMITTED ROW
Problem:
========
row_search_for_mysql() does whole table traversal for range query
even though the end range is passed. Whole table traversal happens
when the record is not with in transaction read view.

Solution:
=========

Convert the innodb last record of page to mysql format and compare
with end range if the traversal of row_search_mvcc() exceeds 100,
no ICP involved. If it is out of range then InnoDB can avoid the
whole table traversal. Need to refactor the code little bit to
make it compile.

Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com>
Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com>
Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com>
RB: 14660
2017-05-17 14:53:28 +03:00
Vicențiu Ciorbaru
0af9818240 5.6.36 2017-05-15 17:17:16 +03:00
Marko Mäkelä
f9069a3dc0 MDEV-12674 Post-merge fix: Include accidentally omitted changes
In 10.2, the definition of simple_counter resides in the file
sync0types.h, not in the file os0sync.h which has been removed.
2017-05-12 15:44:17 +03:00
Marko Mäkelä
b9474b5d51 Merge 10.1 into 10.2 2017-05-12 13:29:24 +03:00
Marko Mäkelä
03dca7a333 Merge 10.0 into 10.1 2017-05-12 13:12:45 +03:00