Summary
=======
With FULL_NODUP mode, before image inclues all columns and after
image inclues only the changed columns. flashback will swap the
value of changed columns from after image to before image.
For example:
BI: c1, c2, c3_old, c4_old
AI: c3_new, c4_new
flashback will reconstruct the before and after images to
BI: c1, c2, c3_new, c4_new
AI: c3_old, c4_old
Implementation
==============
When parsing the before and after image, position and length of
the fields are collected into ai_fields and bi_fields, if it is an
Update_rows_event and the after image doesn't includes all columns.
The changed fields are swapped between bi_fields and ai_fields.
Then it recreates the before image and after image by using
bi_fields and ai_fields. nullbit will be set to 1 if the
field is NULL, otherwise nullbit will be 0.
It also optimized flashback a little bit.
- calc_row_event_length is used instead of print_verbose_one_row
- swap_buff1 and swap_buff2 are removed.
os_file_set_size(): Let us invoke the Linux system call fallocate(2)
directly, because the GNU libc posix_fallocate() implements a fallback
that writes to the file 1 byte every 4096 or fewer bytes. In one
environment, invoking fallocate() directly would lead to 4 times the
file growth rate during ALTER TABLE. Presumably, what happened was
that the NFS server used a smaller allocation block size than 4096 bytes
and therefore created a heavily fragmented sparse file when
posix_fallocate() was used. For example, extending a file by 4 MiB
would create 1,024 file fragments. When the file is actually being
written to with data, it would be "unsparsed".
The built-in EOPNOTSUPP fallback in os_file_set_size() writes a buffer
of 1 MiB of NUL bytes. This was always used on musl libc and other
Linux implementations of posix_fallocate().
BASE 62 uses 0-9, A-Z and then a-z to give the numbers 0-61. This patch
increases the range of the string functions to cover this.
Based on ideas and tests in PR #2589, but re-written into the charset
functions.
Includes fix by Sergei, UBSAN complained:
ctype-simple.c:683:38: runtime error: negation of -9223372036854775808
cannot be represented in type 'long long int'; cast to an unsigned
type to negate this value to itself
Co-authored-by: Weijun Huang <huangweijun1001@gmail.com>
Co-authored-by: Sergei Golubchik <serg@mariadb.org>
Modify the NS_ZERO state in the JSON number parser to allow
exponential notation with a zero coefficient (e.g. 0E-4).
The NS_ZERO state transition on 'E' was updated to move to the
NS_EX state rather than returning a syntax error. Similar change
was made for the NS_ZE1 (negative zero) starter state.
This allows accepted number grammar to include cases like:
- 0E4
- -0E-10
which were previously disallowed. Numeric parsing remains
the same for all other states.
Test cases are added to func_json.test to validate parsing for
various exponential numbers starting with zero coefficients.
All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services.
The parameter innodb_undo_log_truncate=ON enables a multi-phased logic:
1. Any "producers" (new starting transactions) are prohibited
from using the rollback segments that reside in the undo tablespace.
2. Any transactions that use any of the rollback segments must be
committed or aborted.
3. The purge of committed transaction history must process all the
rollback segments.
4. The undo tablespace is truncated and rebuilt.
5. The rollback segments are re-enabled for new transactions.
There was one flaw in this logic: The first step was not being invoked
as often as it could be, and therefore innodb_undo_log_truncate=ON
would have no chance to work during a heavy write workload.
Independent of innodb_undo_log_truncate, even after
commit 86767bcc0f
we are missing some chances to free processed undo log pages.
If we prohibited the creation of new transactions in one busy
rollback segment at a time, we would be eventually guaranteed
to be able to free such pages.
purge_sys_t::skipped_rseg: The current candidate rollback segment
for shrinking the history independent of innodb_undo_log_truncate.
purge_sys_t::iterator::free_history_rseg(): Renamed from
trx_purge_truncate_rseg_history(). Implement the logic
around purge_sys.m_skipped_rseg.
purge_sys_t::truncate_undo_space: Renamed from truncate.
purge_sys.truncate_undo_space.last: Changed the type to integer
to get rid of some pointer dereferencing and conditional branches.
purge_sys_t::truncating_tablespace(), purge_sys_t::undo_truncate_try():
Refactored from trx_purge_truncate_history().
Set purge_sys.truncate_undo_space.current if applicable,
or return an already set purge_sys.truncate_undo_space.current.
purge_coordinator_state::do_purge(): Invoke
purge_sys_t::truncating_tablespace() as part of the normal work loop,
to implement innodb_undo_log_truncate=ON as often as possible.
trx_purge_truncate_rseg_history(): Remove a redundant parameter.
trx_undo_truncate_start(): Replace dead code with a debug assertion.
Correctness tested by: Matthias Leich
Performance tested by: Axel Schwenke
Reviewed by: Debarun Banerjee
Since 0930eb86cb, system table creation
needed for spider init is delayed to the signal_ddl_recovery_done
callback. Since it is part of the init, failure should result in
spider deinit.
We also remove the call to spider_init_system_tables() from
spider_db_init(), as it was removed in the commit mentioned above and
accidentally restored in a merge.
- InnoDB fails to find the space id from the page0 of
the tablespace. In that case, InnoDB can use
doublewrite buffer to recover the page0 and write
into the file.
- buf_dblwr_t::init_or_load_pages(): Loads only the pages
which are valid.(page lsn >= checkpoint). To do that,
InnoDB has to open the redo log before system
tablespace, read the latest checkpoint information.
recv_dblwr_t::find_first_page():
1) Iterate the doublewrite buffer pages and find the 0th page
2) Read the tablespace flags, space id from the 0th page.
3) Read the 1st, 2nd and 3rd page from tablespace file and
compare the space id with the space id which is stored
in doublewrite buffer.
4) If it matches then we can write into the file.
5) Return space which matches the pages from the file.
SysTablespace::read_lsn_and_check_flags(): Remove the
retry logic for validating the first page. After
restoring the first page from doublewrite buffer,
assign tablespace flags by reading the first page.
recv_recovery_read_max_checkpoint(): Reads the maximum
checkpoint information from log file
recv_recovery_from_checkpoint_start(): Avoid reading
the checkpoint header information from log file
Datafile::validate_first_page(): Throw error in case
of first page validation fails.
PageBulk::init(): Unnecessary reserves the extent before
allocating a page for bulk insert. btr_page_alloc()
capable of handing the extending of tablespace.
The problem is the test is skipped after sourcing include/master-slave.inc.
This leaves the slave threads running after the test is skipped, causing a
following test to fail during rpl setup.
Also rename have_normal_bzip.inc to the more appropriate _zlib.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
- Closes PR #2839
- Usage of `Column_definition_fix_attributes()` suggested by Alexandar
Barkov - thanks bar, that is better than hook in server code
(reverted 22f3ebe4bf)
- This method is called after parsing the data type:
* in `CREATE/ALTER TABLE`
* in SP: return data type, parameter data type, variable data type
- We want to disallow all these use cases of MYSQL_JSON.
- Reviewer: bar@mariadb.comcvicentiu@mariadb.org
Changing the way how a the following conditions are evaluated:
WHERE timestamp_column=datetime_const_expr
(for all comparison operators: =, <=>, <, >, <=, >=, <> and for NULLIF)
Before the change it was always performed as DATETIME.
That was not efficient, as involved per-row TIMESTAMP->DATETIME conversion
for timestamp_column. For example, in case of the SYSTEM time zone
it involved a localtime_r() call, which is known to be slow.
After the change it's performed as TIMESTAMP in many cases.
This allows to avoid per-row conversion, as it works the other way around:
datetime_const_expr is converted to TIMESTAMP once before the execution stage.
Note, datetime_const_expr must be inside monotone continuous periods of
the current time zone, i.e. not near these anomalies:
- DST changes (spring forward, fall back)
- leap seconds
Field_new_decimal::store_value(const my_decimal*, int*)
Analysis
========
When rpl applier is unpacking a before row image, Field::reset() will be
called before setting a field to null if null bit of the field is set in
the row image. For Field_new_decimal::reset(), it calls
Field_new_decimal::store_value() to reset the value. store_value() asserts
that the field is in the write_set bitmap since it thinks the field is
updating.
But that is not true for the row image generated in FULL_NODUP
mode. In the mode, the before image includes all fields and the after
image includes only updated fields.
Fix
===
In the case unpacking binlog row images, the assertion is meaningless.
So the unpacking field is marked in write_set temporarily to avoid the
assertion failure.
- We don't test `json` MySQL tables from `std_data` since the error `ER_TABLE_NEEDS_REBUILD
` is invoked. However MDEV-32235 will override this test after merge,
but leave it to show behavior and historical changes.
- Closes PR #2833
Reviewer: <cvicentiu@mariadb.org>
<serg@mariadb.com>
Fixed typo in mysql_admin_table which cused call of
close_unused_temporary_table_instances alwas for the first table
instead of the current table.
Added ASSERT that close_unused_temporary_table_instances should not
remove all instances of user created temporary table.
The spider init bug fixes remove any race conditions during spider
init.
Also remove the add_suppressions in spider/bugfix.mdev_27575 which is
a similar issue.
All Spider tables are recorded in the system table
mysql.spider_tables. Deleting a spider table removes the corresponding
rows from the system table, among other things. This patch makes it so
that if spider could not find any record in the system table to delete
for a given table, it should correctly report that no such Spider
table exists.
When innodb_undo_log_truncate=ON causes an InnoDB undo tablespace
to be truncated, we must guarantee that the undo tablespace will
be rebuilt atomically: After mtr_t::commit_shrink() has durably
written the mini-transaction that rebuilds the undo tablespace,
we must not write any old pages to the tablespace.
To guarantee this, in trx_purge_truncate_history() we used to
traverse the entire buf_pool.flush_list in order to acquire
exclusive latches on all pages for the undo tablespace that
reside in the buffer pool, so that those pages cannot be written
and will be evicted during mtr_t::commit_shrink(). But, this
traversal may interfere with the page writing activity of
buf_flush_page_cleaner(). It would be better to lazily discard
the old pages of the truncated undo tablespace.
fil_space_t::is_being_truncated, fil_space_t::clear_stopping(): Remove.
fil_space_t::create_lsn: A new field, identifying the LSN of the
latest rebuild of a tablespace.
buf_page_t::flush(), buf_flush_try_neighbors(): Evict pages whose
FIL_PAGE_LSN is below fil_space_t::create_lsn.
mtr_t::commit_shrink(): Update fil_space_t::create_lsn and
fil_space_t::size right before the log is durably written and the
tablespace file is being truncated.
fsp_page_create(), trx_purge_truncate_history(): Simplify the logic.
Reviewed by: Thirunarayanan Balathandayuthapani, Vladislav Lesin
Performance tested by: Axel Schwenke
Correctness tested by: Matthias Leich