Commit graph

1145 commits

Author SHA1 Message Date
Marko Mäkelä
1ec8d45c4d Merge bb-10.2-ext into 10.3 2017-12-20 23:18:41 +02:00
Marko Mäkelä
b4165985c9 MDEV-14585 Automatically remove #sql- tables in InnoDB dictionary during recovery
Now that MDEV-14717 made RENAME TABLE crash-safe within InnoDB,
it should be safe to drop the #sql- tables within InnoDB during
crash recovery. These tables can be one of two things:

(1) #sql-ib related to deferred DROP TABLE (follow-up to MDEV-13407)
or to table-rebuilding ALTER TABLE...ALGORITHM=INPLACE
(since MDEV-14378, only related to the intermediate copy of a table),

(2) #sql- related to the intermediate copy of a table during
ALTER TABLE...ALGORITHM=COPY

We will not drop tables whose name starts with #sql2, because
the server can be killed during an ALGORITHM=COPY operation at
a point where the original table was renamed to #sql2 but the
finished intermediate copy was not yet renamed from #sql-
to the original table name.
2017-12-20 22:47:01 +02:00
Marko Mäkelä
2534b5cb99 Merge bb-10.2-ext into 10.3 2017-12-20 22:37:24 +02:00
Marko Mäkelä
0bc36758ba MDEV-14717 RENAME TABLE in InnoDB is not crash-safe
InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2
redo log records during table-rebuilding ALGORITHM=INPLACE operations.
We must write the records for any .ibd file renames, so that the
operations are crash-safe.

If InnoDB is killed during a RENAME TABLE operation, it can happen that
the transaction for updating the data dictionary will be rolled back.
But, nothing will roll back the renaming of the .ibd file
(the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter,
the renaming of the dict_table_t::name in the dict_sys cache. We introduce
the undo log record TRX_UNDO_RENAME_TABLE to fix this.

fil_space_for_table_exists_in_mem(): Remove the parameters
adjust_space, table_id and some code that was trying to work around
these deficiencies.

fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record.

dict_table_rename_in_cache(): Invoke fil_name_write_rename().

trx_undo_rec_copy(): Set the first 2 bytes to the length of the
copied undo log record.

trx_undo_page_report_rename(), trx_undo_report_rename():
Write a TRX_UNDO_RENAME_TABLE record with the old table name.

row_rename_table_for_mysql(): Invoke trx_undo_report_rename()
before modifying any data dictionary tables.

row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE
by invoking dict_table_rename_in_cache(), which will take care
of both renaming the table and the file.
2017-12-20 22:21:03 +02:00
Marko Mäkelä
0436a0ff3c Merge bb-10.2-ext into 10.3 2017-12-19 17:28:22 +02:00
Marko Mäkelä
88aff5f471 Follow-up to MDEV-13407 innodb.drop_table_background failed in buildbot with "Tablespace for table exists"
The InnoDB background DROP TABLE queue is something that we should
really remove, but are unable to until we remove dict_operation_lock
so that DDL and DML operations can be combined in a single transaction.

Because the queue is not persistent, it is not crash-safe. We should
in some way ensure that the deferred-dropped tables will be dropped
after server restart.

The existence of two separate transactions complicates the error handling
of CREATE TABLE...SELECT. We should really not break locks in DROP TABLE.

Our solution to these problems is to rename the table to a temporary
name, and to drop such-named tables on InnoDB startup. Also, the
queue will use table IDs instead of names from now on.

check-testcase.test: Ignore #sql-ib*.ibd files, because tables may enter
the background DROP TABLE queue shortly before the test finishes.

innodb.drop_table_background: Test CREATE...SELECT and the creation of
tables whose file name starts with #sql-ib.

innodb.alter_crash: Adjust the recovery, now that the #sql-ib tables
will be dropped on InnoDB startup.

row_mysql_drop_garbage_tables(): New function, to drop all #sql-ib tables
on InnoDB startup.

row_drop_table_for_mysql_in_background(): Remove an unnecessary and
misplaced call to log_buffer_flush_to_disk(). (The call should have been
after the transaction commit. We do not care about flushing the redo log
here, because the table would be dropped again at server startup.)

Remove the entry from the list after the table no longer exists.

If server shutdown has been initiated, empty the list without actually
dropping any tables. They will be dropped again on startup.

row_drop_table_for_mysql(): Do not call lock_remove_all_on_table().
Instead, if locks exist, defer the DROP TABLE until they do not exist.
If the table name does not start with #sql-ib, rename it to that prefix
before adding it to the background DROP TABLE queue.
2017-12-19 17:12:39 +02:00
Marko Mäkelä
028e91f380 Merge 10.2 into bb-10.2-ext 2017-12-19 17:12:14 +02:00
Marko Mäkelä
8d70097c21 Merge 10.1 to 10.2
Follow-up fix to MDEV-14008: Let Field_double::val_uint() silently
return 0 on error
2017-12-19 16:48:28 +02:00
Marko Mäkelä
09c5bbf471 Merge 10.0 into 10.1 2017-12-18 20:05:50 +02:00
Marko Mäkelä
40088bfc7e MDEV-13407 innodb.drop_table_background failed in buildbot with "Tablespace for table exists"
The InnoDB background DROP TABLE queue is something that we should
really remove, but are unable to until we remove dict_operation_lock
so that DDL and DML operations can be combined in a single transaction.

Because the queue is not persistent, it is not crash-safe. In stable
versions of MariaDB, we can only try harder to drop all enqueued
tables before server shutdown.

row_mysql_drop_t::table_id: Replaces table_name.

row_drop_tables_for_mysql_in_background():
Do not remove the entry from the list as long as the table exists.
In this way, the table should eventually be dropped.
2017-12-18 19:51:44 +02:00
Marko Mäkelä
866ccc8890 Merge bb-10.2-ext into 10.3 2017-12-14 11:34:30 +02:00
Marko Mäkelä
1b5f0cbd46 Merge 10.2 into bb-10.2-ext 2017-12-14 09:53:19 +02:00
Marko Mäkelä
ece9c54e10 Merge 10.1 into 10.2 2017-12-14 08:40:01 +02:00
Marko Mäkelä
46305b006b Merge 10.0 into 10.1 2017-12-13 19:12:16 +02:00
Marko Mäkelä
b1977a39de MDEV-12323 Rollback progress log messages during crash recovery are intermixed with unrelated log messages
trx_roll_must_shutdown(): During the rollback of recovered transactions,
report progress and check if the rollback should be interrupted because
of a pending shutdown.

trx_roll_max_undo_no, trx_roll_progress_printed_pct: Remove, along with
the messages that were interleaved with other messages.
2017-12-13 18:56:22 +02:00
Marko Mäkelä
b46fa627ca MDEV-12352 InnoDB shutdown should not be blocked by a large transaction rollback
row_undo_step(), trx_rollback_active(): Abort the rollback of a
recovered ordinary transaction if fast shutdown has been initiated.

trx_rollback_resurrected(): Convert an aborted-rollback transaction
into a fake XA PREPARE transaction, so that fast shutdown can proceed.
2017-12-13 18:02:09 +02:00
Marko Mäkelä
58eb4e5db9 MDEV-14422 Assertion failure in trx_purge_run() on shutdown
row_quiesce_table_start(), row_quiesce_table_complete():
Use the more appropriate predicate srv_undo_sources for skipping
purge control. (This change alone is insufficient; it is possible
that this predicate will change during the call to trx_purge_stop()
or trx_purge_run().)

trx_purge_stop(), trx_purge_run(): Tolerate PURGE_STATE_EXIT.
It is very well possible to initiate shutdown soon after the statement
FLUSH TABLES FOR EXPORT has been submitted to execution.

srv_purge_coordinator_thread(): Ensure that the wait for purge_sys->event
in trx_purge_stop() will terminate when the coordinator thread exits.
2017-12-13 14:17:26 +02:00
Marko Mäkelä
34841d2305 Merge bb-10.2-ext into 10.3 2017-12-12 09:57:17 +02:00
Marko Mäkelä
e312a407b8 Merge 10.2 into bb-10.2-ext 2017-12-11 15:06:11 +02:00
Marko Mäkelä
13b9ec651a MDEV-14589 InnoDB should not lock a delete-marked record
When the transaction isolation level is SERIALIZABLE, or when
a locking read is performed in the REPEATABLE READ isolation level,
InnoDB must lock delete-marked records in order to prevent another
transaction from inserting something.

However, at READ UNCOMMITTED or READ COMMITTED isolation level or
when the parameter innodb_locks_unsafe_for_binlog is set, the
repeatability of the reads does not matter, and there is no need
to lock any records.

row_search_mvcc(): Skip locks on delete-marked committed records upfront,
instead of invoking row_unlock_for_mysql() afterwards. The unlocking
never worked for secondary index records.
2017-12-11 14:39:53 +02:00
Marko Mäkelä
1e6ac94451 Correct the comment of row_vers_impl_x_locked() 2017-12-11 13:56:36 +02:00
Marko Mäkelä
0af52734a7 Remove the unused function row_is_magic_monitor_table()
Before MySQL 5.7 or MariaDB 10.2.2, there used to be some
magic InnoDB table names that would assign some InnoDB flags
on CREATE TABLE or DROP TABLE.
2017-12-08 16:36:19 +02:00
Sergey Vojtovich
5b624f00fc MDEV-14529 - InnoDB rw-locks: optimize memory barriers
Remove volatile modifier from waiters: it's not supposed for inter-thread
communication, use appropriate atomic operations instead.

Changed waiters to int32_t, my_atomic friendly type.
2017-12-08 17:55:41 +04:00
Marko Mäkelä
07e9ff1fe1 MDEV-14378 In ALGORITHM=INPLACE, use a common name for the intermediate tables or partitions
Allow DROP TABLE `#mysql50##sql-...._.` to drop tables that were
being rebuilt by ALGORITHM=INPLACE

NOTE: If the server is killed after the table-rebuilding ALGORITHM=INPLACE
commits inside InnoDB but before the .frm file has been replaced, then
the recovery will involve something else than DROP TABLE.

NOTE: If the server is killed in a true inplace ALTER TABLE commits
inside InnoDB but before the .frm file has been replaced, then we
are really out of luck. To properly handle that situation, we would
need a transactional mysql.ddl_fixup table that directs recovery to
rename or remove files.

prepare_inplace_alter_table_dict(): Use the altered_table->s->table_name
for generating the new_table_name.

table_name_t::part_suffix: The start of the partition name suffix.

table_name_t::dbend(): Return the end of the schema name.

table_name_t::dblen(): Return the length of the schema name, in bytes.

table_name_t::basename(): Return the name without the schema name.

table_name_t::part(): Return the partition name, or NULL if none.

row_drop_table_for_mysql(): Assert for #sql, not #sql-ib.
2017-12-08 10:07:51 +02:00
Marko Mäkelä
3aa618a969 MDEV-13820 trx_id_check() fails during row_log_table_apply()
When logging ROW_T_INSERT or ROW_T_UPDATE records, we did not normalize
the DB_TRX_ID of the current transaction into 0 if the current transaction
had started (modifying other tables) before the ALTER TABLE started.

MDEV-13654 introduced this normalization for ROW_T_DELETE
and for all operations with ADD PRIMARY KEY, in row_log_table_get_pk().
2017-12-07 14:35:32 +02:00
Marko Mäkelä
976f6fb1b6 Merge bb-10.2-ext into 10.3 2017-12-06 19:36:33 +02:00
Marko Mäkelä
ce07676502 Merge 10.2 into bb-10.2-ext 2017-12-06 19:34:03 +02:00
Marko Mäkelä
7dc6066dea MDEV-14511 Use fewer transactions for updating InnoDB persistent statistics
dict_stats_exec_sql(): Expect the caller to always provide a transaction.
Remove some redundant assertions. The caller must hold dict_sys->mutex,
but holding dict_operation_lock is only necessary for accessing
data dictionary tables, which we are not accessing.

dict_stats_save_index_stat(): Acquire dict_sys->mutex
for invoking dict_stats_exec_sql().

dict_stats_save(), dict_stats_update_for_index(), dict_stats_update(),
dict_stats_drop_index(), dict_stats_delete_from_table_stats(),
dict_stats_delete_from_index_stats(), dict_stats_drop_table(),
dict_stats_rename_in_table_stats(), dict_stats_rename_in_index_stats(),
dict_stats_rename_table(): Use a single caller-provided
transaction that is started and committed or rolled back by the caller.

dict_stats_process_entry_from_recalc_pool(): Let the caller provide
a transaction object.

ha_innobase::open(): Pass a transaction to dict_stats_init().

ha_innobase::create(), ha_innobase::discard_or_import_tablespace():
Pass a transaction to dict_stats_update().

ha_innobase::rename_table(): Pass a transaction to
dict_stats_rename_table(). We do not use the same transaction
as the one that updated the data dictionary tables, because
we already released the dict_operation_lock. (FIXME: there is
a race condition; a lock wait on SYS_* tables could occur
in another DDL transaction until the data dictionary transaction
is committed.)

ha_innobase::info_low(): Pass a transaction to dict_stats_update()
when calculating persistent statistics.

alter_stats_norebuild(), alter_stats_rebuild(): Update the
persistent statistics as well. In this way, a single transaction
will be used for updating the statistics of a whole table, even
for partitioned tables.

ha_innobase::commit_inplace_alter_table(): Drop statistics for
all partitions when adding or dropping virtual columns, so that
the statistics will be recalculated on the next handler::open().
This is a refactored version of Oracle Bug#22469660 fix.

RecLock::add_to_waitq(), lock_table_enqueue_waiting():
Do not allow a lock wait to occur for updating statistics
in a data dictionary transaction, such as DROP TABLE. Instead,
return the previously unused error code DB_QUE_THR_SUSPENDED.

row_merge_lock_table(), row_mysql_lock_table(): Remove dead code
for handling DB_QUE_THR_SUSPENDED.

row_drop_table_for_mysql(), row_truncate_table_for_mysql():
Drop the statistics as part of the data dictionary transaction.
After TRUNCATE TABLE, the statistics will be recalculated on
subsequent ha_innobase::open(), similar to how the logic after
the above-mentioned Oracle Bug#22469660 fix in
ha_innobase::commit_inplace_alter_table() works.

btr_defragment_thread(): Use a single transaction object for
updating defragmentation statistics.

dict_stats_save_defrag_stats(), dict_stats_save_defrag_stats(),
dict_stats_process_entry_from_defrag_pool(),
dict_defrag_process_entries_from_defrag_pool(),
dict_stats_save_defrag_summary(), dict_stats_save_defrag_stats():
Add a parameter for the transaction.

dict_stats_empty_table(): Make public. This will be called by
row_truncate_table_for_mysql() after dropping persistent statistics,
to clear the memory-based statistics as well.
2017-12-06 18:52:28 +02:00
Marko Mäkelä
7cb3520c06 Merge bb-10.2-ext into 10.3 2017-11-30 08:16:37 +02:00
Alexander Barkov
5b697c5a23 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext 2017-11-29 12:06:48 +04:00
Sergei Golubchik
7f1900705b Merge branch '10.1' into 10.2 2017-11-21 19:47:46 +01:00
Marko Mäkelä
366bb162fe Merge 10.2 into bb-10.2-ext 2017-11-21 18:59:30 +02:00
Marko Mäkelä
f233c9778e Adjust the MySQL 5.7 tests for MariaDB 10.2 2017-11-20 16:05:41 +02:00
Marko Mäkelä
ce64a65f27 MDEV-14310 Possible corruption by table-rebuilding or index-creating ALTER TABLE…ALGORITHM=INPLACE
Also, MDEV-14317 When ALTER TABLE is aborted, do not write garbage pages to data files

As pointed out by Shaohua Wang, the merge of MDEV-13328 from
MariaDB 10.1 (based on MySQL 5.6) to 10.2 (based on 5.7)
was performed incorrectly.

Let us always pass a non-NULL FlushObserver* when writing
to data files is desired.

FlushObserver::is_partial_flush(): Check if this is a bulk-load
(partial flush of the tablespace).

FlushObserver::is_interrupted(): Check for interrupt status.

buf_LRU_flush_or_remove_pages(): Instead of trx_t*, take
FlushObserver* as a parameter.

buf_flush_or_remove_pages(): Remove the parameters flush, trx.
If observer!=NULL, write out the data pages. Use the new predicate
observer->is_partial() to distinguish a partial tablespace flush
(after bulk-loading) from a full tablespace flush (export).
Return a bool (whether all pages were removed from the flush_list).

buf_flush_dirty_pages(): Remove the parameter trx.
2017-11-20 13:26:56 +02:00
Alexander Barkov
4a8039b04e Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext 2017-11-20 11:12:08 +04:00
Jan Lindström
d7349e204b MDEV-9663: InnoDB assertion failure: *cursor->index->name == TEMP_INDEX_PREFIX
MariaDB adjustments to test case innodb-replace-debug and
add missing instrumentation to row0ins.cc.

MariaDB 10.1 does not seem to be affected.
2017-11-16 14:05:49 +02:00
Jan Lindström
d8ccc61f76 MDEV-9663: InnoDB assertion failure: *cursor->index->name == TEMP_INDEX_PREFIX
Add missing instrumentation to row0ins.cc.
2017-11-16 14:03:02 +02:00
Jan Lindström
0c4d11e819 MDEV-13206: INSERT ON DUPLICATE KEY UPDATE foreign key fail
This is caused by following change:

commit 95d29c99f01882ffcc2259f62b3163f9b0e80c75
Author: Marko Mäkelä <marko.makela@oracle.com>
Date:   Tue Nov 27 11:12:13 2012 +0200

    Bug#15920445 INNODB REPORTS ER_DUP_KEY BEFORE CREATE UNIQUE INDEX COMPLETED

    There is a phase during online secondary index creation where the index has
    been internally completed inside InnoDB, but does not 'officially' exist yet.
    We used to report ER_DUP_KEY in these situations, like this:

    ERROR 23000: Can't write; duplicate key in table 't1'

    What we should do is to let the 'offending' operation complete, but report an
    error to the
    ALTER TABLE t1 ADD UNIQUE KEY (c2):

    ERROR HY000: Index c2 is corrupted
    (This misleading error message should be fixed separately:
    Bug#15920713 CREATE UNIQUE INDEX REPORTS ER_INDEX_CORRUPT INSTEAD OF DUPLICATE)

    row_ins_sec_index_entry_low(): flag the index corrupted instead of
    reporting a duplicate, in case the index has not been published yet.

    rb:1614 approved by Jimmy Yang

Problem is that after we have found duplicate key on primary key
we continue to get necessary gap locks in secondary indexes to
block concurrent transactions from inserting the searched records.
However, search from unique index used in foreign key constraint
could return DB_NO_REFERENCED_ROW if INSERT .. ON DUPLICATE KEY UPDATE
does not contain value for foreign key column. In this case
we should return the original DB_DUPLICATE_KEY error instead
of DB_NO_REFERENCED_ROW.

Consider as a example following:

create table child(a int not null primary key,
b int not null,
c int,
unique key (b),
foreign key (b) references
parent (id)) engine=innodb;

insert into child values (1,1,2);

insert into child(a) values (1) on duplicate key update c = 3;

Now primary key value 1 naturally causes duplicate key error that will be
stored on node->duplicate. If there was no duplicate key error, we should
return the actual no referenced row error. As value for column b used in
both unique key and foreign key is not provided, server uses 0 as a
search value. This is naturally, not found leading to DB_NO_REFERENCED_ROW.
But, we should update the row with primay key value 1 anyway as
requested by on duplicate key update clause.
2017-11-16 11:05:24 +02:00
Marko Mäkelä
e94c9d24f6 MDEV-14378 In ALGORITHM=INPLACE, use a common name for the intermediate tables or partitions
Allow DROP TABLE `#mysql50##sql-...._.` to drop tables that were
being rebuilt by ALGORITHM=INPLACE

NOTE: If the server is killed after the table-rebuilding ALGORITHM=INPLACE
commits inside InnoDB but before the .frm file has been replaced, then
the recovery will involve something else than DROP TABLE.

NOTE: If the server is killed in a true inplace ALTER TABLE commits
inside InnoDB but before the .frm file has been replaced, then we
are really out of luck. To properly handle that situation, we would
need a transactional mysql.ddl_fixup table that directs recovery to
rename or remove files.

prepare_inplace_alter_table_dict(): Use the altered_table->s->table_name
for generating the new_table_name.

table_name_t::part_suffix: The start of the partition name suffix.

table_name_t::dbend(): Return the end of the schema name.

table_name_t::dblen(): Return the length of the schema name, in bytes.

table_name_t::basename(): Return the name without the schema name.

table_name_t::part(): Return the partition name, or NULL if none.

row_drop_table_for_mysql(): Assert for #sql, not #sql-ib.
2017-11-13 11:12:29 +02:00
Marko Mäkelä
c19ef508b8 InnoDB: Remove ut_snprintf() and the use of my_snprintf(); use snprintf() 2017-11-13 02:11:48 +02:00
Marko Mäkelä
a48aa0cd56 Merge bb-10.2-ext into 10.3 2017-11-10 16:12:45 +02:00
Marko Mäkelä
386e5d476e Merge 10.2 into bb-10.2-ext 2017-11-10 16:07:01 +02:00
Marko Mäkelä
9618c04e3f Follow-up fix of MDEV-13795/MDEV-14332
row_log_table_apply_op(): Remove references to dict_table_t::n_vcols.
Virtual column information is no longer being written to the log.

row_log_t: Remove the unused fields n_old_col, n_old_vcol.
2017-11-10 15:58:52 +02:00
Marko Mäkelä
1ca72a0c0d Merge 10.2 into bb-10.2-ext 2017-11-10 08:27:09 +02:00
Sergei Golubchik
2a4e4335c4 Merge branch 'github/10.0-galera' into 10.1 2017-11-10 01:38:03 +01:00
Marko Mäkelä
5d142f9958 MDEV-13795/MDEV-14332 Corruption during online table-rebuilding ALTER when VIRTUAL columns exist
When MySQL 5.7 introduced indexed virtual columns, it introduced
several bugs into the online table-rebuilding ALTER, that is,
the row_log_table_apply() family of functions.

The online_log format that was introduced for online table-rebuilding
ALTER in MySQL 5.6 should be sufficient. Ideally, any indexed virtual
column values would be evaluated based on the log records in the temporary
file. There is no need to log virtual column values.

(For ADD INDEX, that is row_log_apply(), we always must log the values of
the keys, no matter if the columns are virtual.)

Because omitting the virtual column values removes any chance of
row_log_table_apply() working with indexed virtual columns, we
will for now refuse LOCK=NONE in table-rebuilding ALTER operations
when indexes on virtual columns exist. This restriction would be
lifted in MDEV-14341.

innobase_indexed_virtual_exist(): New predicate, to determine if
indexed virtual columns exist in a table definition.

ha_innobase::check_if_supported_inplace_alter(): Refuse online rebuild
if indexed virtual columns exist.

rec_get_converted_size_temp_v(), rec_convert_dtuple_to_temp_v(): Remove.

row_log_table_delete(), row_log_table_update(, row_log_table_insert():
Remove parameters for virtual columns.

trx_undo_read_v_rows(): Remove the col_map parameter.

row_log_table_apply(): Do not deal with virtual columns.
2017-11-09 23:39:12 +02:00
Monty
0bb0d52221 Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext
Conflicts:
	mysql-test/r/cte_recursive.result
	mysql-test/r/derived_cond_pushdown.result
	mysql-test/t/cte_recursive.test
	mysql-test/t/derived_cond_pushdown.test
	sql/datadict.cc
	sql/handler.cc
2017-11-09 23:21:41 +02:00
sjaakola
f4f2e8fa2a MW-402 cascading FK issues
* created tests focusing in multi-master conflicts during cascading foreign key
  processing
* in row0upd.cc, calling wsrep_row_ups_check_foreign_constraints only when
  running in cluster
* in row0ins.cc fixed regression from MW-369, which caused crash with MW-402.test
2017-11-08 14:15:41 +02:00
Marko Mäkelä
843e4508c0 Merge 10.1 into 10.2 2017-11-07 23:02:39 +02:00
Marko Mäkelä
51b4366bfb MDEV-13328 ALTER TABLE…DISCARD TABLESPACE takes a lot of time
With a big buffer pool that contains many data pages,
DISCARD TABLESPACE took a long time, because it would scan the
entire buffer pool to remove any pages that belong to the tablespace.
With a large buffer pool, this would take a lot of time, especially
when the table-to-discard is empty.

The minimum amount of work that DISCARD TABLESPACE must do is to
remove the pages of the to-be-discarded table from the
buf_pool->flush_list because any writes to the data file must be
prevented before the file is deleted.

If DISCARD TABLESPACE does not evict the pages from the buffer pool,
then IMPORT TABLESPACE must do it, because we must prevent pre-DISCARD,
not-yet-evicted pages from being mistaken for pages of the imported
tablespace.

It would not be a useful fix to simply move the buffer pool scan to
the IMPORT TABLESPACE step. What we can do is to actively evict those
pages that could be mistaken for imported pages. In this way, when
importing a small table into a big buffer pool, the import should
still run relatively fast.

Import is bypassing the buffer pool when reading pages for the
adjustment phase. In the adjustment phase, if a page exists in
the buffer pool, we could replace it with the page from the imported
file. Unfortunately I did not get this to work properly, so instead
we will simply evict any matching page from the buffer pool.

buf_page_get_gen(): Implement BUF_EVICT_IF_IN_POOL, a new mode
where the requested page will be evicted if it is found. There
must be no unwritten changes for the page.

buf_remove_t: Remove. Instead, use trx!=NULL to signify that a write
to file is desired, and use a separate parameter bool drop_ahi.

buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Replace buf_remove_t.

buf_LRU_remove_pages(), buf_LRU_remove_all_pages(): Remove.

PageConverter::m_mtr: A dummy mini-transaction buffer

PageConverter::PageConverter(): Complete the member initialization list.

PageConverter::operator()(): Evict any 'shadow' pages from the
buffer pool so that pre-existing (garbage) pages cannot be mistaken
for pages that exist in the being-imported file.

row_discard_tablespace(): Remove a bogus comment that seems to
refer to IMPORT TABLESPACE, not DISCARD TABLESPACE.
2017-11-06 18:08:33 +02:00