is set to true, as it should.
Copy and modify original io_win.h header file to a different location
(as we cannot patch anything in submodule). Make sure modified header is
used.
This is caused by following change:
commit 95d29c99f01882ffcc2259f62b3163f9b0e80c75
Author: Marko Mäkelä <marko.makela@oracle.com>
Date: Tue Nov 27 11:12:13 2012 +0200
Bug#15920445 INNODB REPORTS ER_DUP_KEY BEFORE CREATE UNIQUE INDEX COMPLETED
There is a phase during online secondary index creation where the index has
been internally completed inside InnoDB, but does not 'officially' exist yet.
We used to report ER_DUP_KEY in these situations, like this:
ERROR 23000: Can't write; duplicate key in table 't1'
What we should do is to let the 'offending' operation complete, but report an
error to the
ALTER TABLE t1 ADD UNIQUE KEY (c2):
ERROR HY000: Index c2 is corrupted
(This misleading error message should be fixed separately:
Bug#15920713 CREATE UNIQUE INDEX REPORTS ER_INDEX_CORRUPT INSTEAD OF DUPLICATE)
row_ins_sec_index_entry_low(): flag the index corrupted instead of
reporting a duplicate, in case the index has not been published yet.
rb:1614 approved by Jimmy Yang
Problem is that after we have found duplicate key on primary key
we continue to get necessary gap locks in secondary indexes to
block concurrent transactions from inserting the searched records.
However, search from unique index used in foreign key constraint
could return DB_NO_REFERENCED_ROW if INSERT .. ON DUPLICATE KEY UPDATE
does not contain value for foreign key column. In this case
we should return the original DB_DUPLICATE_KEY error instead
of DB_NO_REFERENCED_ROW.
Consider as a example following:
create table child(a int not null primary key,
b int not null,
c int,
unique key (b),
foreign key (b) references
parent (id)) engine=innodb;
insert into child values (1,1,2);
insert into child(a) values (1) on duplicate key update c = 3;
Now primary key value 1 naturally causes duplicate key error that will be
stored on node->duplicate. If there was no duplicate key error, we should
return the actual no referenced row error. As value for column b used in
both unique key and foreign key is not provided, server uses 0 as a
search value. This is naturally, not found leading to DB_NO_REFERENCED_ROW.
But, we should update the row with primay key value 1 anyway as
requested by on duplicate key update clause.
row_log_table_apply_op(): Remove references to dict_table_t::n_vcols.
Virtual column information is no longer being written to the log.
row_log_t: Remove the unused fields n_old_col, n_old_vcol.
When MySQL 5.7 introduced indexed virtual columns, it introduced
several bugs into the online table-rebuilding ALTER, that is,
the row_log_table_apply() family of functions.
The online_log format that was introduced for online table-rebuilding
ALTER in MySQL 5.6 should be sufficient. Ideally, any indexed virtual
column values would be evaluated based on the log records in the temporary
file. There is no need to log virtual column values.
(For ADD INDEX, that is row_log_apply(), we always must log the values of
the keys, no matter if the columns are virtual.)
Because omitting the virtual column values removes any chance of
row_log_table_apply() working with indexed virtual columns, we
will for now refuse LOCK=NONE in table-rebuilding ALTER operations
when indexes on virtual columns exist. This restriction would be
lifted in MDEV-14341.
innobase_indexed_virtual_exist(): New predicate, to determine if
indexed virtual columns exist in a table definition.
ha_innobase::check_if_supported_inplace_alter(): Refuse online rebuild
if indexed virtual columns exist.
rec_get_converted_size_temp_v(), rec_convert_dtuple_to_temp_v(): Remove.
row_log_table_delete(), row_log_table_update(, row_log_table_insert():
Remove parameters for virtual columns.
trx_undo_read_v_rows(): Remove the col_map parameter.
row_log_table_apply(): Do not deal with virtual columns.
FlushObserver::flush(): Never discard unwritten changes.
We do not want to risk corrupting the system tablespace
or .ibd files that are not part of a table-rebuilding ALTER.
Ever since MDEV-10813 cleaned up InnoDB use of atomic memory operations
and made buf_block_fix() an atomic operation, some conditions around
buf_block_fix() have been unnecessary.
With a big buffer pool that contains many data pages,
DISCARD TABLESPACE took a long time, because it would scan the
entire buffer pool to remove any pages that belong to the tablespace.
With a large buffer pool, this would take a lot of time, especially
when the table-to-discard is empty.
The minimum amount of work that DISCARD TABLESPACE must do is to
remove the pages of the to-be-discarded table from the
buf_pool->flush_list because any writes to the data file must be
prevented before the file is deleted.
If DISCARD TABLESPACE does not evict the pages from the buffer pool,
then IMPORT TABLESPACE must do it, because we must prevent pre-DISCARD,
not-yet-evicted pages from being mistaken for pages of the imported
tablespace.
It would not be a useful fix to simply move the buffer pool scan to
the IMPORT TABLESPACE step. What we can do is to actively evict those
pages that could be mistaken for imported pages. In this way, when
importing a small table into a big buffer pool, the import should
still run relatively fast.
Import is bypassing the buffer pool when reading pages for the
adjustment phase. In the adjustment phase, if a page exists in
the buffer pool, we could replace it with the page from the imported
file. Unfortunately I did not get this to work properly, so instead
we will simply evict any matching page from the buffer pool.
buf_page_get_gen(): Implement BUF_EVICT_IF_IN_POOL, a new mode
where the requested page will be evicted if it is found. There
must be no unwritten changes for the page.
buf_remove_t: Remove. Instead, use trx!=NULL to signify that a write
to file is desired, and use a separate parameter bool drop_ahi.
buf_LRU_flush_or_remove_pages(), fil_delete_tablespace():
Replace buf_remove_t.
buf_LRU_remove_pages(), buf_LRU_remove_all_pages(): Remove.
PageConverter::m_mtr: A dummy mini-transaction buffer
PageConverter::PageConverter(): Complete the member initialization list.
PageConverter::operator()(): Evict any 'shadow' pages from the
buffer pool so that pre-existing (garbage) pages cannot be mistaken
for pages that exist in the being-imported file.
row_discard_tablespace(): Remove a bogus comment that seems to
refer to IMPORT TABLESPACE, not DISCARD TABLESPACE.
ibuf_check_bitmap_on_import(): Only access the pages that
are below FSP_FREE_LIMIT. It is possible that especially with
ROW_FORMAT=COMPRESSED, the FSP_SIZE will be much bigger than
the FSP_FREE_LIMIT, and the bitmap pages (page_size*N, 1+page_size*N)
are filled with zero bytes.
buf_page_is_corrupted(), buf_page_io_complete(): Make the
fault injection compatible with MariaDB 10.2.
Backport the IMPORT tests from 10.2.
On some old GNU/Linux systems, invoking posix_fallocate() with
offset=0 would sometimes cause already allocated bytes in the
data file to be overwritten.
Fix a correctness regression that was introduced in
commit 420798a81a
by invoking posix_fallocate() in a safer way.
A similar change was made in MDEV-5746 earlier.
os_file_get_size(): Avoid changing the state of the file handle,
by invoking fstat() instead of lseek().
os_file_set_size(): Determine the current size of the file
by os_file_get_size(), and then extend the file from that point
onwards.
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.
MariaDB used to ignore any possible return value from posix_fallocate()
ever since innodb_use_fallocate was introduced in MDEV-4338. If EINVAL
was returned, the file would not be extended.
Starting with MDEV-11520, MariaDB would treat EINVAL as a hard error.
Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
- FB/MySQL 5.6' MyRocks has START TRANSACTION WITH CONSISTENT
ROCKSDB SNAPSHOT, which returns binlog position.
- MariaDB has a cross-engine START TRANSACTION WITH CONSISTENT
SNAPSHOT. It can be used for the same purpose. Binlog position
can be obtained from Binlog_snapshot_file/position status vars.
os_file_set_size(): If posix_fallocate() returns EINVAL, fall back
to writing zero bytes to the file. Also, remove some error log output,
and make it possible for a server shutdown to interrupt the fall-back
code.
MariaDB 10.2 used to handle the EINVAL return value from posix_fallocate()
before commit b731a5bcf2
which refactored os_file_set_size() to try posix_fallocate().
Why is the EINVAL returned? The GNU posix_fallocate() function
would first try the fallocate() system call, which would return
-EOPNOTSUPP for many file systems (notably, not ext4). Then, it
would fall back to extending the file one block at a time by invoking
pwrite(fd, "", 1, offset) where offset is 1 less than a multiple of
the file block size. This would fail with EINVAL if the file is in
O_DIRECT mode, because O_DIRECT requires aligned operation.
When MariaDB 10.1.0 introduced table options for encryption and
compression, it unnecessarily changed
ha_innobase::check_if_supported_inplace_alter() so that ALGORITHM=COPY
is forced when these parameters differ.
A better solution is to move the check to innobase_need_rebuild().
In that way, the ALGORITHM=INPLACE interface (yes, the syntax is
very misleading) can be used for rebuilding the table much more
efficiently, with merge sort, with no undo logging, and allowing
concurrent DML operations.
If InnoDB or XtraDB recovered committed transactions at server
startup, but the processing of recovered transactions was
prevented by innodb_read_only or by innodb_force_recovery,
an assertion would fail at shutdown.
This bug was originally reproduced when Mariabackup executed
InnoDB shutdown after preparing (applying redo log into) a backup.
trx_free_prepared(): Allow TRX_STATE_COMMITTED_IN_MEMORY.
trx_undo_free_prepared(): Allow any undo log state. For transactions
that were resurrected in TRX_STATE_COMMITTED_IN_MEMORY
the undo log state would have been reset by trx_undo_set_state_at_finish().
ReadFile/WriteFile operations.
Innodb opens files with FILE_FLAG_OVERLAPPED. lpNumberOfBytesRead/Written
are documented to be potentially inaccurate in this case,
(possibly even if async operations complete synchronously?)
The fix is to always pass NULL for the correspondng parameters,
as recommended by MSDN. Read the actual counts with
GetQueuedCompletionStatus() or GetOverlappedResult().
os_file_get_size(): Use fstat() instead of calling lseek() 3 times.
In this way, concurrent calls to this function should not interfere
with each other.
Suggested by Vladislav Vaintroub.
The predicate page_is_root() would not hold if btr_create() fails
before the root page is fully initialized. Move the debug assertion
from btr_free_root_invalidate() to its other caller,
btr_free_if_exists(). In that caller, we actually already checked
for page_is_root().