Reason:
=======
- MDEV-16239 does apply the DML logs after bulk insert for
ALTER TABLE..ALGORITHM=COPY, but InnoDB fails to reset the bulk_insert
in ha_innobase::extra(HA_EXTRA_END_ALTER_COPY). This leads to crash
while applying DML logs.
Solution:
=======
ha_innobase::extra(HA_EXTRA_END_ALTER_COPY): Reset TRX_DDL_BULK at the
end of bulk insert operation
A statement SET GLOBAL innodb_buffer_pool_size=...
could fail for no good reason when the buffer pool contains many
pages that can actually be evicted.
buf_flush_LRU_list_batch(): Keep evicting as long as the buffer pool
is being shrunk, for at most innodb_lru_scan_depth extra blocks.
Disregard the flush limit for pages that are marked as freed in files.
buf_flush_LRU_to_withdraw(): Update the to_withdraw target during
buf_flush_LRU_list_batch().
buf_pool_t::will_be_withdrawn(): Allow also ptr=nullptr (the condition
will not hold for it).
This fixes a regression that was introduced in
commit b6923420f3 (MDEV-29445)
and caught by the test innodb.temp_truncate_freed in MariaDB Server 11.4.
Tested by: Thirunarayanan Balathandayuthapani
Reviewed by: Thirunarayanan Balathandayuthapani
Set solution is to check if transaction, which modified a record, is
still active in lock_clust_rec_read_check_and_lock(). if yes, then just
request a lock. If no, then, depending on if the current transaction read
view can see the changes, return eighter DB_RECORD_CHANGED or request a
lock.
We can do the check in lock_clust_rec_read_check_and_lock() because
transaction tries to set a lock on the record which cursor points to after
transaction resuming and cursor position restoring. If the lock already
exists, then we don't request the lock again. But for the current commit
it's important that lock_clust_rec_read_check_and_lock() will be invoked
again for the same record, so we can do the check again after
transaction, which modified a record, was committed or rolled back.
MDEV-33802(4aa9291) is partially reverted. If some transaction holds
implicit lock on some record and transaction with snapshot isolation level
requests conflicting lock on the same record, it should be blocked instead
of returning DB_RECORD_CHANGED to have ability to continue execution when
implicit lock owner is rolled back.
The construction
--------------------------------------------------------------------------
let $wait_condition=
select count(*) = 1 from information_schema.processlist
where state = 'Updating' and info = 'UPDATE t SET b = 2 WHERE a';
--source include/wait_condition.inc
--------------------------------------------------------------------------
is not reliable enought to make sure transaction is blocked in test
case, the test failed sporadically with
--------------------------------------------------------------------------
./mtr --max-test-fail=1 --parallel=96 lock_isolation{,,,,,,,}{,,,}{,,} \
--repeat=500
--------------------------------------------------------------------------
command. That's why it was replaced with debug sync-points.
Reviewed by: Marko Mäkelä
during mariabackup --prepare
Reason:
======
During --prepare of partial backup, if InnoDB encounters the redo log
for the excluded tablespace then InnoDB stores the space id in dirty
tablespace list during recovery, anticipates that it may encounter
FILE_* redo log records in the future. Even though we encounter FILE_*
record for the partial excluded tablespace then we fail to replace the
name in dirty tablespace list. This lead to missing of
FILE_* redo log records error.
Solution:
========
fil_name_process(): Rename the file name from "" to name encountered
during FILE_* record
recv_init_missing_space(): Correct the condition to print the warning
message of missing tablespace during mariabackup restore process.
- InnoDB fails to check the table is being dropped or evicted
while acquiring the MDL for the table when table open operation
mode is DICT_TABLE_OP_OPEN_ONLY_IF_CACHED. This is caused by
the commit 337bf8ac4b (MDEV-36122)
Fix:
===
dict_acquire_mdl_shared(): If the table is evicted or dropped when
table operation mode is DICT_TABLE_OP_OPEN_IF_CACHED then return
nullptr
ha_innobase::info_low(): Assert that dict_table_t::stat_initialized()
only within the critical section. Changes of this field should be
protected by dict_table_t::lock_latch.
Problem:
========
- After commit cc8eefb0dc (MDEV-33087),
InnoDB does use bulk insert operation for ALTER TABLE.. ALGORITHM=COPY
and CREATE TABLE..SELECT as well. InnoDB fails to clear the bulk
buffer when it encounters error during CREATE..SELECT. Problem
is that while transaction cleanup, InnoDB fails to identify
the bulk insert for DDL operation.
Fix:
====
- Represent bulk_insert in trx by 2 bits. By doing that, InnoDB
can distinguish between TRX_DML_BULK, TRX_DDL_BULK. During DDL,
set bulk insert value for transaction to TRX_DDL_BULK.
- Introduce a parameter HA_EXTRA_ABORT_ALTER_COPY which rollbacks
only TRX_DDL_BULK transaction.
- bulk_insert_apply() happens for TRX_DDL_BULK transaction happens
only during HA_EXTRA_END_ALTER_COPY extra() call.
...when using io_uring on a potentially affected kernel"
Remove version check on the kernel as it now corresponds to
a working RHEL9 kernel and the problem was only there in
pre-release kernels that shouldn't have been used in production.
This reverts commit 1193a793c4.
Remove version check on the kernel as it now corresponds to
a working RHEL9 kernel and the problem was only there in
pre-release kernels that shouldn't have been used in production.
This reverts commit 3dc0d884ec.
Starting with mysql/mysql-server@02f8eaa998
and commit 2e814d4702 the index ID of
indexes on virtual columns was being encoded insufficiently in
InnoDB undo log records. Only the least significant 32 bits were
being written. This could lead to some corruption of the affected
indexes on ROLLBACK, as well as to missed chances to remove some
history from such indexes when purging the history of committed
transactions that included DELETE or an UPDATE in the indexes.
dict_hdr_create(): In debug instrumented builds, initialize the
DICT_HDR_INDEX_ID close to the 32-bit barrier, instead of initializing
it to DICT_HDR_FIRST_ID (10). This will allow the changed code to
be exercised while running ./mtr --suite=gcol,vcol.
trx_undo_log_v_idx(): Encode large index->id in a similar way as
mysql/mysql-server@e00328b4d0
but using a different implementation.
trx_undo_read_v_idx_low(): Decode large index->id in a similar way
as mach_u64_read_much_compressed().
Reviewed by: Debarun Banerjee
Added retry logic to certain file operations during installation as a
workaround for issues caused by buggy antivirus software on Windows.
Retry logic added for WritePrivateProfileString (mysql_install_db.cc)
and renaming file in Innodb.
Problem was that thread was holding lock_sys.wait_mutex when
streaming replication transaction rollback was handled and
in wsrep-lib requests THD::LOCK_thd_kill mutex causing
wrong mutex usage (thd->reset_globals()).
Fix is to remove streaming replication rollback handling
from Deadlock::report() i.e. wsrep_handle_SR_rollback call.
Purpose of Deadloc::report() is to find a cycle in the
waits-for graph if exists, report it, mark victim transaction
as deadlock victim and release locks it is waiting for.
Actual streaming replication rollback that can take longer
time can be handled later at trx_t::rollback where
lock_sys.wait_mutex is not held.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
log_t::clear_mmap(): Do not modify buf_size; we may have
file_size==0 here during bootstrap.
log_t::set_recovered(): If we are writing to a memory-mapped log,
update log_sys.buf_size to the record payload area of log_sys.buf.
This fixes up commit acd071f599
(MDEV-21923).
buf_buddy_alloc_from(): Pass the correct argument to
buf_pool.contains_zip(). This fixes a failure of the test
encryption.innochecksum when the code is built with
cmake -DWITH_UBSAN=ON -DCMAKE_BUILD_TYPE=Debug
- With the help of MDEV-14795, InnoDB implemented a way to shrink
the InnoDB system tablespace after undo tablespaces have been moved
to separate files (MDEV-29986). There is no way to defragment any
pages of InnoDB system tables. By doing that, shrinking of
system tablespace can be more effective. This patch deals with
defragment of system tables inside ibdata1.
Following steps are done to do the defragmentation of system
tablespace:
1) Make sure that there is no user tables exist in ibdata1
2) Iterate through all extent descriptor pages in system tablespace
and note their states.
3) Find the free earlier extent to replace the lastly used
extents in the system tablespace.
4) Iterate through all indexes of system tablespace and defragment
the tree level by level.
5) Iterate the level from left page to right page and find out
the page comes under the extent to be replaced. If it is then
do step (6) else step(4)
6) Prepare the allocation of new extent by latching necessary
pages. If any error happens then there is no modification of
page happened till step (5).
7) Allocate the new page from the new extent
8) Prepare the associated pages for the block to be modified
9) Prepare the step of freeing of page
10) If any error happens during preparing of associated pages,
freeing of page then restore the page which was modified
during new page allocation
11) Copy the old page content to new page
12) Change the associative pages like left, right and parent page
13) Complete the freeing of old page
Allocation of page from new extent, changing of relative pages,
freeing of page are done by 2 steps. one is prepare which
latches the to be modified pages and checks their validation.
Other is complete(), Do the operation
fseg_validate(): Validate the list exist in inode segment
Defragmentation is enabled only when :autoextend exist in
innodb_data_file_path variable.
Update cmake_minimum_required to 2.8...3.12 in root cmake and mroonga.
This will update "Policy Version" to 3.12, which will not prevent the
build by even higher cmake versions. There is also a reason to stay on
the compatible with windows "policy version", so 3.12 is conservatively
chosen.
On the other hand, it will require at least version 2.8.
The parameter innodb_log_spin_wait_delay will be deprecated and
ignored, because there is no spin loop anymore.
Thanks to commit 685d958e38
and commit a635c40648
multiple mtr_t::commit() can concurrently copy their slice of
mtr_t::m_log to the shared log_sys.buf. Each writer would allocate
their own log sequence number by invoking log_t::append_prepare()
while holding a shared log_sys.latch. This function was too heavy,
because it would invoke a minimum of 4 atomic read-modify-write
operations as well as system calls in the supposedly fast code path.
It turns out that with a simpler data structure, instead of having
several data fields that needed to be kept consistent with each other,
we only need one Atomic_relaxed<uint64_t> write_lsn_offset, on which
we can operate using fetch_add(), fetch_sub() as well as a single-bit
fetch_or(), which reasonably modern compilers (GCC 7, Clang 15 or later)
can translate into loop-free code on AMD64.
Before anything can be written to the log, log_sys.clear_mmap()
must be invoked.
log_t::base_lsn: The LSN of the last write_buf() or persist().
This is a rough approximation of log_sys.lsn, which will be removed.
log_t::write_lsn_offset: An Atomic_relaxed<uint64_t> that buffers
updates of write_to_buf and base_lsn.
log_t::buf_free, log_t::max_buf_free, log_t::lsn. Remove.
Replaced by base_lsn and write_lsn_offset.
log_t::buf_size: Always reflects the usable size in append_prepare().
log_t::lsn_lock: Remove. For the memory-mapped log in resize_write(),
there will be a resize_wrap_mutex.
log_t::get_lsn_approx(): Return a lower bound of get_lsn().
This should be exact unless append_prepare_wait() is pending.
log_get_lsn(): A wrapper for log_sys.get_lsn(), which must be invoked
while holding an exclusive log_sys.latch.
recv_recovery_from_checkpoint_start(): Do not invoke fil_names_clear();
it would seem to be unnecessary.
In many places, references to log_sys.get_lsn() are replaced with
log_sys.get_flushed_lsn(), which remains a simple std::atomic::load().
Reviewed by: Debarun Banerjee
recv_sys_t::report_progress(): Display the largest currently known LSN.
recv_scan_log(): Display an error with fewer function calls.
Reviewed by: Debarun Banerjee
With view protocol, a SELECT statement is transformed into two
statements:
1. CREATE OR REPLACE VIEW mysqltest_tmp_v AS SELECT ...
2. SELECT * FROM mysqltest_tmp_v
The first statement reconstructed the query, which is executed in the
second statement.
The reconstruction often replaces aliases in ORDER BY by the original
item.
For example, in the test spider/bugfix.mdev_29008 the query
SELECT MIN(t2.a) AS f1, t1.b AS f2 FROM tbl_a AS t1 JOIN tbl_a AS t2 GROUP BY f2 ORDER BY f1, f2;
is transformed to
"select min(`t2`.`a`) AS `f1`,`t1`.`b` AS `f2` from (`auto_test_local`.`tbl_a` `t1` join `auto_test_local`.`tbl_a` `t2`) group by `t1`.`b` order by min(`t2`.`a`),`t1`.`b`"
In such cases, spider constructs different queries to execute at the
data node. So we disable view protocol for such queries.
With view protocol, often during optimization, the GBH is not created
because join->tables_list is the view mysqltest_tmp_v which has MEMORY
as engine which does not have GBH implemented.
In such cases, if without view protocol the test takes a path that
does create a spider GBH, the resulting queries sent to the data node
often differ.
Therefore we disable view protocol for these statements.
Spider needs to lock the spider table when executing the udf, but the
server layer would have already locked tables in view protocol because
it transforms the query:
select spider_copy_table('t', 0, 1)
to two queries
create or replace view mysqltest_tmp_v as select
spider_copy_table('t', 0, 1);
select * from mysqltest_tmp_v;
So spider justifiably errors out in this case by checking on
thd->derived_tables and thd->locks in spider_copy_tables_body()
If one of the selected field is a MIN or MAX and it has been optimized
into a constant, it is not added to the temp table used by a group by
handler (GBH). The GBH therefore cannot store results to this missing
field.
On the other hand, when SELECTing from a view or a derived table,
TMP_TABLE_ALL_COLUMNS is set. If the query has no group by or order
by, an Item_temptable_field is created for this MIN/MAX field and
added to the JOIN. Since the GBH could not store results to the
corresponding field in the temp table, the value of this
Item_temptable_field remains NULL. And the NULL value is passed to the
record, then the temp row, and finally output as the (wrong) result.
To fix this, we opt to not creating a spider GBH when a view or
derived table is involved.
This fixes spider/bugfix.mdev_26345 for --view-protocol
Also fixed a comment:
TABLE_LIST::belong_to_derived is NULL if the table belongs to a
derived table that has non-MERGE type.
Running mtr --view-protocol transforms SELECT statements to a CREATE
OR REPLACE VIEW of the statement, followed by SELECT from the view.
When thus when spider tests check the query log for select statements,
it often output a different one with --view-protocol compared to
without.
By adding disable/enable_view_protocol pairs to these statements. Most
of these statements are surrounded by existing
disable/enable_ps[2]_protocol pairs.
Acked-by: Yuchen Pei <ycp@mariadb.com>
Connect engine fails to build with libxml2 2.14.0.
Connect engine uses "#ifndef BASE_BUFFER_SIZE" to determine if libxml2 is
available. If libxml2 is unavailable it did redefine xmlElementType enum
of libxml/tree.h. The reasons for this redefinition is vague, most
probably some of these constants were used when connect was compiled with
MSXML, while libxml2 was disabled.
However BASE_BUFFER_SIZE constant was removed from libxml2 recently, as
a result connect fails to build due to xmlElementType constants
redefinition.
Use LIBXML2_SUPPORT instead of BASE_BUFFER_SIZE for libxml2 availability
check.
The value of dv[0].data being null showed up
in the mtr tests:
mroonga/storage.alter_table_fulltext_add_no_primary_key
as:
/source/storage/mroonga/vendor/groonga/lib/ii.c:2052:37: runtime error: applying non-zero offset 28 to null pointer
Correct this by entrying the if condition on null pointer value.
The free is valid, and the data of size is allocated.
buf_block_t::initialise(): Remove a redundant call to page.lock.init()
that was already executed in buf_pool_t::create() or
buf_pool_t::resize().
This fixes a regression that was introduced in
commit b6923420f3 (MDEV-29445).
Prepare for a more modern CMake version than the current minimum.
- Use CMAKE_MSVC_RUNTIME_LIBRARY instead of the custom MSVC_CRT_TYPE.
- Replace CMAKE_{C,CXX}_FLAGS modifications with
add_compile_definitions/options and add_link_options.
The older method already broke with new pcre2.
- Fix clang-cl compilation and ASAN build.
- Avoid modifying CMAKE_C_STANDARD_LIBRARIES/CMAKE_CXX_STANDARD_LIBRARIES,
as this is discouraged by CMake.
- Reduce system checks.
- fix several Windows-specific "variable set but not used",
or "variable unused" warnings.
- correctly initialize std::atomic_flag (ATOMIC_FLAG_INIT)
- fix Ninja build for spider on Windows
- adjust check for sizeof(MYSQL) for Windows compilers
Problem:
=======
- While loading the foreign key constraints for the parent table,
if child table wasn't open then InnoDB uses the parent table heap
to store the child table name in fk_tables list. If the consecutive
foreign key relation for the parent table fails with error,
InnoDB evicts the parent table from memory. But InnoDB accesses the
evicted table memory again in dict_sys.load_table()
Solution:
========
dict_load_table_one(): In case of error, remove the child table
names which was added during dict_load_foreigns()
Problem:
========
- InnoDB does consecutive instant alter operation, first instant DDL
fails, it fails to reset the old instant information in table during
rollback. This lead to consecutive instant alter to have wrong
assumption about the exisitng instant column information.
Fix:
====
dict_table_t::instant_column(): Duplicate the instant information
field of the table. By doing this, InnoDB alter retains the old
instant information and reset it during rollback operation
It prevents a crash in wsrep_report_error() which happened when appliers would run
with FK and UK checks disabled and erroneously execute plain inserts as bulk inserts.
Moreover, in release builds such a behavior could lead to deadlocks between two applier
threads if a thread waiting for a table-level lock was ordered before the lock holder.
In that case the lock holder would proceed to commit order and wait forever for the
now-blocked other applier thread to commit before.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>