accessing freed memory.
Before XMLCOL::WriteColumn() Tdbp->Clist gets assigned
a nodelist in
Clist = RowNode->SelectNodes(g, Colname, Clist);
which is RowNode->Doc->Xop->nodesetval.
In XMLCOL::WriteColumn()
ValNode = ColNode->SelectSingleNode(g, Xname, Vxnp);
calls LIBXMLDOC::GetNodeList() again, which frees the previous
XPath object Xop and replaces it with a new one.
In this case RowNode->Doc == ColNode->Doc, so Clist->Listp
points to a freed memory now.
Even though Apple Xcode is based on and similar to Clang,
it does not support the asm goto construct which was added
in commit 668a5f3d12 (MDEV-26720)
to work around a compiler deficiency that results in suboptimal
code being generated for IA-32 and AMD64.
We will disable this manual optimization if __APPLE_CC__ is defined.
Similar constructs are also used in sync/srw_lock.cc, but that code
is not used on Apple, because we have not implemented a Linux futex
compatible interface on Apple macOS.
Thanks to Valerii Kravchuk for reporting and testing this.
Most of this was likely already fixed by MDEV-27017.
On one implementation of the AMD64 ISA, the test
innodb_fts.concurrent_insert would still occasionally hang,
with both dict_sys_t::evict_table_LRU() and
dict_index_set_merge_threshold() waiting in dict_sys_t::lock()
several threads waiting in dict_sys_t::freeze(),
no thread holding exclusive dict_sys.latch and also no thread
in the stack traces apparently holding any dict_sys.latch,
even though dict_sys.latch_readers == 1.
To prevent this scenario, we will remove the dict_sys.latch
acquisition from dict_index_set_merge_threshold(). It is actually
not needed, because dict_sys.sys_indexes will not change after
InnoDB startup. The SYS_INDEXES leaf page will be sufficiently
protected by the page latch.
There potentially is a bug in the srw_lock implementation,
which will have to be investigated further.
In recv_sys_t::apply(), we were unnecessarily looking up pages
in buf_pool.page_hash and potentially waiting for exclusive page latches.
Before buf_page_get_low() would return an x-latched page,
that page will have to be read and buf_page_read_complete() would
have invoked recv_recover_page() to apply the log to the page.
Therefore, it suffices to invoke recv_read_in_area() to trigger
a transition from RECV_NOT_PROCESSED.
recv_read_in_area(): Take the iterator as a parameter, and remove
page_id lookups. Should the page already be in buf_pool.page_hash,
buf_page_init_for_read() will return nullptr to buf_read_page_low()
and buf_read_page_background().
recv_sys_t::apply(): Replace goto, remove dead code, and add assertions
to guarantee that the iteration will make progress.
Reviewed by: Vladislav Lesin
cmp_data(): Compare different-length CHAR fields with
the new strnncollsp_nchars function that will pad spaces if needed.
Any InnoDB ROW_FORMAT except the original one that was named
ROW_FORMAT=REDUNDANT in MySQL 5.0.3 will internally store
CHAR(n) columns as variable-length if the character encoding is
variable length. Spaces may be trimmed from the end.
For NOT NULL values, the minimum length is always n*mbminlen.
In cmp_data() we only know the lengths in bytes and we cannot
easily know the ROW_FORMAT.
is_strnncoll_compatible(): Refactored from innobase_mysql_cmp().
innobase_mysql_cmp(): Merged to cmp_whole_field().
cmp_whole_field(): Invoke strnncollsp_nchars for the DATA_MYSQL
(the CHAR type with any other collation than latin1_swedish_ci).
Reviewed by: Alexander Barkov
Tested by: Roel Roel Van de Paar
The 10.5 version of the patch.
Removing DEFAULT from INFORMATION_SCHEMA columns.
DEFAULT in read-only tables is rather meaningless.
Upgrade should go smoothly.
Also fixes:
MDEV-20254 Problems with EMPTY_STRING_IS_NULL and I_S tables
Because commit 24773bf380
made dict_v_col_t encapsulate v_indexes, we must invoke
dict_v_col_t::~dict_v_col_t() to destruct the container.
This basically is a fixup of the merge
commit 5008171b05
of the 10.2
commit cf2c6b7f8d (MDEV-24971).
I did not debug why no leaks are reported for 10.2 or 10.3.
It's misleading to compare and write to user number of columns and fields.
Thus, it would be better to remove that check and let use see a subsequent
error message about missing or mispaced column.
row_import::match_schema(): remove misleading check
The InnoDB DATA DIRECTORY attribute is not implemented via
symbolic links but something similar, *.isl files that contain
the names of data files.
InnoDB failed to ignore the DATA DIRECTORY attribute even though
the server was started with --skip-symbolic-links.
Native ALTER TABLE in InnoDB will retain the DATA DIRECTORY attribute
of the table, no matter if the table will be rebuilt or not.
Generic ALTER TABLE (with ALGORITHM=COPY) as well as TRUNCATE TABLE
will discard the DATA DIRECTORY attribute.
All tests have been run with and without the ./mtr option
--mysqld=--skip-symbolic-links
and some tests that use the InnoDB DATA DIRECTORY attribute
have been adjusted for this.
Problem:
=======
InnoDB ran out of memory during recovery and it fails to
flush the dirty LRU blocks. The reason is that buffer pool
can ran out before the LRU list length reaches
BUF_LRU_OLD_MIN_LEN(256) threshold.
Fix:
====
During recovery, InnoDB should write out and evict all
dirty blocks.
Add a --rocksdb_ignore_datadic_errors plugin option for MyRocks.
The default is 0, and this means MyRocks will call abort() if it detects
a DDL mismatch.
Setting rocksdb_ignore_datadic_errors=1 makes MyRocks to try to ignore the
errors and allow to start the server for repairs.
In ha_rocksdb::open(), check if the number of indexes seen from the
SQL layer matches the number of indexes in the internal MyRocks data
dictionary.
Produce an error if there is a mismatch. (If we don't produce this error,
we are likely to crash as soon as we attempt to use an index)
The test innodb.log_file_size would occasionally fail with
an assertion failure !buf_pool.any_io_pending(). Let us wait
for the page cleaner thread to become idle already in
srv_prepare_to_delete_redo_log_file(), like we used to.
.. to be the same as startup.
In resolving MDEV-27461, BUF_LRU_MIN_LEN (256) is the minimum number of
pages for the innodb buffer pool size. Obviously we need more than just
flushing pages. Taking the 16k page size and its default minimum, an
extra 25% is needed on top of the flushing pages to make a workable buffer
pool.
The minimum innodb_buffer_pool_chunk_size (1M) restricts the minimum
otherwise we'd have a pool made up of different chunk sizes.
The resulting minimum innodb buffer pool sizes are:
Page Size, Previously minimum (startup), with change.
4k 5M 2M
8k 5M 3M
16k 5M 5M
32k 24M 10M
64k 24M 20M
With this patch, SET GLOBAL innodb_buffer_pool_size minimums are
enforced.
The evident minimum system variable size for innodb_buffer_pool_size
is 2M, however this is only setable if using 4k page size. As
the order of the page_size and buffer_pool_size aren't fixed, we can't
hide this change.
Subsequent changes:
* innodb_buffer_pool_resize_with_chunks.test - raised of pool resize due to new
minimums. Chunk size also needed increase as the test was for
pool_size < chunk_size to generate a warning.
* Removed srv_buf_pool_min_size and replaced use with MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* Removed srv_buf_pool_def_size and replaced constant defination in
MYSQL_SYSVAR_LONGLONG(buffer_pool_size)
* Reordered ha_innodb to allow for direct use of MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* Moved buf_pool_size_align into ha_innodb to access to MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
* loose-innodb_disable_resize_buffer_pool_debug is needed in the
innodb.restart.opt test so that under debug mode, resizing of the
innodb buffer pool can occur.
There were no InnoDB changes in the MySQL 5.7.37 release that would be
relevant to MariaDB Server. We will merely update the reported
InnoDB version number.
The code was backported from 10.6 bd03c0e516
commit. See that commit message for details.
Apart from the above commit trx_lock_t::wait_trx was also backported from
MDEV-24738. trx_lock_t::wait_trx is protected with lock_sys.wait_mutex
in 10.6, but that mutex was implemented only in MDEV-24789. As there is no
need to backport MDEV-24789 for MDEV-27025,
trx_lock_t::wait_trx is protected with the same mutexes as
trx_lock_t::wait_lock.
This fix should not break innodb-lock-schedule-algorithm=VATS. This
algorithm uses an Eldest-Transaction-First (ETF) heuristic, which prefers
older transactions over new ones. In this fix we just insert granted lock
just before the last granted lock of the same transaction, what does not
change transactions execution order.
The changes in lock_rec_create_low() should not break Galera Cluster,
there is a big "if" branch for WSREP. This branch is necessary to provide
the correct transactions execution order, and should not be changed for
the current bug fix.
When lock is checked for conflict, ignore other locks on the record if
they wait for the requesting transaction.
lock_rec_has_to_wait_in_queue() iterates not all locks for
the page, but only the locks located before the waiting lock in the
queue. So there is some invariant - any lock in the queue can wait only
lock which is located before the waiting lock in the queue.
In the case when conflicting lock waits for the transaction of
requesting lock, we need to place the requesting lock before the waiting
lock in the queue to preserve the invariant. That is why we are looking
for the first waiting for requesting transation lock and place the new
lock just after the last granted requesting transaction lock before the
first waiting for requesting transaction lock.
Example:
trx1 waiting lock, trx1 granted lock, ..., trx2 lock - waiting for trx1
place new lock here -----------------^
There are also implicit locks which are lazily converted to explicit
ones, and we need to place the newly created explicit lock to the correct
place in a queue. All explicit locks converted from implicit ones are
placed just after the last non-waiting lock of the same transaction before
the first waiting for the transaction lock.
Code review and cleanup was made by Marko Mäkelä.
In commit 4c3ad24413 (MDEV-27416)
an unnecessarily strict wait condition was introduced in the
function buf_flush_wait(). Most callers actually only care that
the pages have been flushed, not that a checkpoint has completed.
Only in the buf_flush_sync() call for log resizing, we might care
about the log checkpoint. But, in fact,
srv_prepare_to_delete_redo_log_file() is explicitly disabling
checkpoints. So, we can simply remove the unnecessary wait loop.
Thanks to Krunal Bauskar for reporting this performance regression
that we failed to repeat in our testing.
buf_pool_t::realloc(): Invoke page_cleaner_wakeup()
if buf_LRU_get_free_only() returns a null pointer.
Ever since commit 7b1252c03d (MDEV-24278)
the page cleaner would remain in untimed sleep, expecting explicit
calls to buf_pool_t::page_cleaner_wakeup() when the ratio of dirty pages
could change.
Failure to wake up the page cleaner will cause all page writes to be
initiated by buf_flush_LRU_list_batch(). That might work too,
provided that the buffer pool size is at least BUF_LRU_MIN_LEN (256)
pages, but it would not advance the log checkpoint.
In commit c5fd9aa562 (MDEV-25919)
we prevented the function dict_stats_save_index_stat()
from being called in read-only mode in dict_stats_save(),
but not elsewhere.
dict_stats_save_defrag_summary(), dict_stats_save_defrag_stats():
If the transaction is in read-only mode, return DB_READ_ONLY
and do not attempt to lock or modify anything.
The commit e954d9de gave different lifetime to wide_share and
partition_handler_share. This introduced the possibility that
partition_handler_share could be accessed even after it was freed.
We stop sharing partitoiin_handler_share and make it belong to
a single wide_handler to fix the problem.
In MDEV-14425, an early plan was to introduce a separate log file
for file-level records and checkpoint information. The reasoning was
that fil_system.mutex contention would be reduced by not having to
maintain fil_system.named_spaces. The mutex contention was actually
fixed in MDEV-23855 by making some data fields in fil_space_t and
fil_node_t use std::atomic.
Using a single circular log file simplifies recovery and backup.
The function buf_page_free() that was introduced
in commit a35b4ae898 (MDEV-15528)
failed to remove any adaptive hash index entries for the page
before freeing the page.
This caused an assertion failure on shutdown of 10.6 server of
in the function buf_pool_t::clear_hash_index() with the expression:
(s >= buf_page_t::UNFIXED || s == buf_page_t::REMOVE_HASH).
The assertion would fail for a block that is in the freed state.
The failing assertion was added in
commit aaef2e1d8c
in the 10.6 branch.
Thanks to Matthias Leich for finding the bug and testing the fix.