MariaDB Server 10.0 and 10.1 support non-indexed virtual columns,
which are hidden from the storage engine. Starting with MDEV-5800
in MariaDB 10.2.2, the virtual columns are visible to storage engines.
calc_row_difference(): Follow up the MDEV-17199 fix, which forgot
to increment num_v when skipping virtual columns in tables that
were created before MariaDB 10.2.2. This caused a corruption of
the update vector when an updated persistent column is preceded
by virtual columns.
The macro innobase_is_v_fld() turns out to be equivalent with
the opposite of Field::stored_in_db(). Remove the macro and
invoke the member function directly.
innodb_base_col_setup_for_stored(): Simplify a condition to only
check Field::vcol_info.
innobase_create_index_def(): Replace some redundant code with
DBUG_ASSERT().
MariaDB before MDEV-5800 in version 10.2.2 did not support
indexed virtual columns. Non-persistent virtual columns were
hidden from storage engines. Only starting with MDEV-5800, InnoDB
would create internal metadata on virtual columns.
On TRUNCATE TABLE, an old .frm file from before MDEV-5800 may be
used as the table schema. When the table is being re-created by
InnoDB, the old schema must be used. That is, we may hide
the existence of virtual columns from InnoDB.
create_table_check_doc_id_col(): Remove the assertion that failed.
This function can actually correctly deal with virtual columns
that could have been created before MariaDB 10.2.2 introduced MDEV-5800.
create_table_info_t::create_table_def(): Do not create metadata for
virtual columns if the table definition was created before MariaDB 10.2.2.
This bug was introduced by MDEV-12288, which made InnoDB use
a single undo log for persistent transactions, instead of
maintaining separate insert_undo and update_undo logs.
trx_undo_reuse_cached(): Initialize the TRX_UNDO_PAGE_TYPE
after reusing a cached undo log page for undo log.
Failure to do so can cause trx_undo_mem_create_at_db_start()
to misclassify new undo log records as TRX_UNDO_INSERT.
This in turn would trigger an assertion failure in
trx_roll_pop_top_rec_of_trx() due to undo==insert.
os_file_fsync_posix(): If fsync() returns a fatal error,
do include errno in the error message.
In the future, we might handle fsync() or write or allocation failures
on InnoDB data files a little more gracefully: flag the affected index
or table as corrupted, and deny any subsequent writes to the table.
If a write to the undo log or redo log fails, an alternative to
killing the server could be to deny any writes to InnoDB tables
until the server has been restarted.
In MDEV-10814, a missing argument caused a later optional argument
(bool true) to be treated as a size. The unmap of this memory occurs
during shutdown and resizing innodb buffer pool. As a result the
memory is lost but still allocated until shutdown is completed.
This patch contains a fix for the MDEV-17262/17243 issues and
new mtr test.
These issues (MDEV-17262/17243) have two reasons:
1) After an intermediate commit, a transaction loses its status
of "transaction that registered in the MySQL for 2pc coordinator"
(in the InnoDB) due to the fact that since version 10.2 the
write_row() function (which located in the ha_innodb.cc) does
not call trx_register_for_2pc(m_prebuilt->trx) during the processing
of split transactions. It is necessary to restore this call inside
the write_row() when an intermediate commit was made (for a split
transaction).
Similarly, we need to set the flag of the started transaction
(m_prebuilt->sql_stat_start) after intermediate commit.
The table->file->extra(HA_EXTRA_FAKE_START_STMT) called from the
wsrep_load_data_split() function (which located in sql_load.cc)
will also do this, but it will be too late. As a result, the call
to the wsrep_append_keys() function from the InnoDB engine may be
lost or function may be called with invalid transaction identifier.
2) If a transaction with the LOAD DATA statement is divided into
logical mini-transactions (of the 10K rows) and binlog is rotated,
then in rare cases due to the wsrep handler re-registration at the
boundary of the split, the last portion of data may be lost. Since
splitting of the LOAD DATA into mini-transactions is technical,
I believe that we should not allow these mini-transactions to fall
into separate binlogs. Therefore, it is necessary to prohibit the
rotation of binlog in the middle of processing LOAD DATA statement.
https://jira.mariadb.org/browse/MDEV-17262 and
https://jira.mariadb.org/browse/MDEV-17243
MDEV-15418 changed InnoDB to use the READ UNCOMMITTED isolation level
when the transaction recovery is disabled by setting
innodb_force_recovery to 5 or 6. Alas, CHECK TABLE would still
internally use REPEATABLE READ. If the server was started without
purge running into completion (and resetting all DB_TRX_ID thanks
to MDEV-12288), this would cause errors about any DB_TRX_ID that
had not been reset to 0.
In the function make_cond_for_table_from_pred a call of ix_fields()
missed checking of the return code. As a result an extracted constant
condition could be not well formed and this caused an assertion failure.
In 1dc78d35a0beb9620bae1f4841cc07389b425707 the arguments
to a deallocate_large(dontdump=true) was passed a wrong value.
To avoid accidential calling large memory function that have
DODUMP/DONTDUMP options and missing arguments, the functions
have been given distinct names.
MDEV-10814 introduce a bug where the size argument to
deallocate_large was passed true, evaluating to 1, as the size.
When this is passed to munmap this resulted in EINVAL and the
page not being released. This only occured the buf_pool_free_instance
when called on shutdown so no impact as the process termination
correctly frees the memory.
Includes:
MDEV-17302 Add support for ALTER USER command in prepared statement
and
MDEV-17673 main.cte_recursive fails in bb-10.4-ps branch in --ps
Set correct SELECT_LEX linkage for recursive CTEs.
Do not delegate this job to TABLE_LIST::set_as_with_table,
because it is only run on prepare, while With_element::move_anchors_ahead
is run both on prepare and execute (fix by Igor)
When there is a huge transaction in the undo log, the purge threads
may get stuck in trx_purge_attach_undo_recs() for a long time,
causing the server to hang on a normal shutdown (innodb_fast_shutdown>0).
Apparently the innodb_purge_batch_size does not work correctly, or the
n_pages_handled is not being incremented correctly. We do not fix that
for now, but we will instead check if shutdown has been initiated,
allowing the purge threads to shut down without delays.
There were two newly enabled warnings:
1. cast for a function pointers. Affected sql_analyse.h, mi_write.c
and ma_write.cc, mf_iocache-t.cc, mysqlbinlog.cc, encryption.cc, etc
2. memcpy/memset of nontrivial structures. Fixed as:
* the warning disabled for InnoDB
* TABLE, TABLE_SHARE, and TABLE_LIST got a new method reset() which
does the bzero(), which is safe for these classes, but any other
bzero() will still cause a warning
* Table_scope_and_contents_source_st uses `TABLE_LIST *` (trivial)
instead of `SQL_I_List<TABLE_LIST>` (not trivial) so it's safe to
bzero now.
* added casts in debug_sync.cc and sql_select.cc (for JOIN)
* move assignment method for MDL_request instead of memcpy()
* PARTIAL_INDEX_INTERSECT_INFO::init() instead of bzero()
* remove constructor from READ_RECORD() to make it trivial
* replace some memcpy() with c++ copy assignments
The symptom of the bug was that one got the following in
the aria recovery log:
"Table 'xxx', id 57, has create_rename_lsn (1,0x12dee) more recent than LOGREC_FILE_ID's LSN (1,0x12dc4), ignoring open request"
After this all future redo entries was marked with
"For table of short id 57, table skipped, so skipping record"
Analyze:
When ending batch insert, create_rename_lsn for the table
is updated to signal that earlier redo entries for the
table can't be applied. The problem was that future redo
entries was also ignored as redo code assumed they where
for the old table.
Fixed by calling translog_dessign_id, which causes
future redo entries to be seen as belonging to the
updated table.
On startup, if the InnoDB doublewrite buffer can be used to
recover a corrupted page, raising an ERROR about a recoverable
error seems inappropriate. Issue Note instead, and adjust
tests accordingly.
Also, correctly validate the tablespace ID in the files.
row_drop_tables_for_mysql_in_background(): Copy the table name
before closing the table handle, to avoid heap-use-after-free if
another thread succeeds in dropping the table before
row_drop_table_for_mysql_in_background() completes the table name lookup.
dict_mem_create_temporary_tablename(): With innodb_safe_truncate=ON
(the default), generate a simple, unique, collision-free table name
using only the id, no pseudorandom component. This is safe, because
on startup, we will drop any #sql tables that might exist in InnoDB.
This is a backport from 10.3. It should have been backported already
as part of backporting MDEV-14717,MDEV-14585 which were prerequisites
for the MDEV-13564 backup-friendly TRUNCATE TABLE.
This seems to reduce the chance of table creation failures in
ha_innobase::truncate().
ha_innobase::truncate(): Do not invoke close(), but instead
mimic it, so that we can restore to the original table handle
in case opening the truncated copy of the table failed.
Problem was that we skipped background persistent statistics calculation
on applier nodes if thread is marked as high priority (a.k.a BF).
However, on applier nodes all DDL which is replicate will be executed
as high priority i.e BF.
Fixed by allowing background persistent statistics calculation on
applier nodes even when thread is marked as BF. This could lead
BF lock waits but for queries on that node needs that statistics.
Removed redundant plugin_thdvar_cleanup() from end_connection(): called by
THD::free_connection(), which always follows end_connection().
Saves at least one lock(LOCK_plugin) and one
rdlock(LOCK_system_variables_hash).
Benchmarked on a 2socket/20core/40threads Broadwell system using sysbench
connect brencmark @40 threads (with select 1 disabled).
10.2 shows moderate improvement: 136219.93 -> 137766.31 CPS.
10.3 is improvement is somewhat better: 93018.29 -> 101379.77 CPS.
Also backported MyRocks memory leak fix from 10.4, which turned out to
be unrelated.
We do want to ignore InnoDB's internal #sql-ib*.ibd at shutdown,
because those tables will be dropped on the next startup.
Failure to filter out these table names occasionally causes some
unwanted output for tests that restart InnoDB soon after dropping
or truncating tables, for example innodb.recovery_shutdown.
The issue here was when we had a subquery and a window function in an expression in
the select list then subquery was getting computed after window function computation.
This resulted in incorrect results because the subquery was correlated and the fields
in the subquery was pointing to the base table instead of the temporary table.
The approach to fix this was to have an additional field in the temporary table
for the subquery and to execute the subquery before window function execution.
After execution the values for the subquery were stored in the temporary table
and then when we needed to calcuate the expression, all we do is read the values
from the temporary table for the subquery.
In 10.3, all records will be processed by purge due to MDEV-12288.
But, the insert undo records do not contain a transaction identifier.
row_purge_parse_undo_rec(): Use node->trx_id=TRX_ID_MAX for the
insert undo records. We cannot skip table lookups for these records
after DISCARD TABLESPACE other than by 'detaching' the table from
the undo logs by updating SYS_TABLES.ID on both DISCARD TABLESPACE
and IMPORT TABLESPACE.
Also, remove a redundant condition that was introduced
in the merge commit 814205f306.