1. only call calc_sum_of_all_status() if a global
SHOW_xxx_STATUS variable is to be returned
2. only lock LOCK_status when copying global_status_var,
but not when iterating all threads
sql standard (2016) allows <collate clause> in two places in the
<column definition> - as a part of the <data type> or at the very end.
Let's do that too.
Side effect: in column/SP declaration `COLLATE cs_coll` automatically
implies `CHARACTER SET cs` (unless charset was specified explicitly).
See changes in sp-ucs2.result
Server part:
kill_handlerton() was accessing thd->ha_data[] for some other thd,
while it could be concurrently modified by its owner thd.
protect thd->ha_data[] modifications with a mutex.
require this mutex when accessing thd->ha_data[] from kill_handlerton.
InnoDB part:
on close_connection, detach trx from thd before freeing the trx
This reverts the server part of the commit 775fccea0
but keeps InnoDB part (which reverted MDEV-17092 5530a93f4).
So after this both MDEV-23536 and MDEV-17092 are reverted,
and the original bug is resurrected.
with wrong data type is added
Inplace alter fails to report error when fts_doc_id column with
wrong data type is added.
prepare_inplace_alter_table_dict(): Should check whether the column
is fts_doc_id. It should be of bigint type, should accept non null
data type and it should be in capital letters.
A race condition may occur between the execution of transaction commit,
and an execution of a KILL statement that would attempt to abort that
transaction.
MDEV-17092 worked around this race condition by modifying InnoDB code.
After that issue was closed, Sergey Vojtovich pointed out that this
race condition would better be fixed above the storage engine layer:
If you look carefully into the above, you can conclude that
thd->free_connection() can be called concurrently with
KILL/thd->awake(). Which is the bug. And it is partially fixed in
THD::~THD(), that is destructor waits for KILL completion:
Fix: Add necessary mutex operations to THD::free_connection()
and move WSREP specific code also there. This ensures that no
one is using THD while we do free_connection(). These mutexes
will also ensures that there can't be concurrent KILL/THD::awake().
innobase_kill_query
We can now remove usage of trx_sys_mutex introduced on MDEV-17092.
trx_t::free()
Poison trx->state and trx->mysql_thd
This patch is validated with an RQG run similar to the one that
reproduced MDEV-17092.
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.
This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.
The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.
To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
orig_*, all_set bitmaps are never substituted already.
This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
MY_BITMAP* instead of my_bitmap_map*
These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
Problem:
=======
Upon deleting or updating a row in a parent table (with primary key), if
the child table has virtual column and an associated key with ON UPDATE
CASCADE/ON DELETE CASCADE, it will result in slave crash.
Analysis:
========
Tables which are related through foreign key require prelocking similar to
triggers. i.e If a table has triggers/foreign keys we should add all tables
and routines used by them to the prelocking set. This prelocking happens
during 'open_and_lock_tables' call. Each table being opened is checked for
foreign key references. If foreign key reference exists then the child
table is opened and it is linked to the table_list. Upon any modification
to parent table its corresponding child tables are retried from table_list
and they are updated accordingly. This prelocking work fine on master.
On slave prelocking works for following cases.
- Statement/mixed based replication
- In row based replication when trigger execution is enabled through
'slave_run_triggers_for_rbr=YES/LOGGING/ENFORCE'
Otherwise it results in an assert/crash, as the parent table will not find
the corresponding child table and it will be NULL. Dereferencing NULL
pointer leads to slave server exit.
Fix:
===
Introduce a new 'slave_fk_event_map' flag similar to 'trg_event_map'. This
flag will ensure that when foreign key is enabled in row based replication
all the parent and child tables are prelocked, so that parent is able to
locate the child table.
Note: This issue is specific to slave, hence only slave needs to be
upgraded.
Analysis: select into outfile creates files everytime with 666 permission,
regardsless if umask environment variables and umask settings on OS level.
It seems hardcoded.
Fix: change 0666 to 0644 which will let anybody consume the file but not
change it.
This corresponds to 10.5 commit 39378e1366.
With a patched version of the test innodb.ibuf_not_empty (so that
it would trigger crash recovery after using the change buffer),
and patched code that would modify the os_thread_sleep() in
recv_apply_hashed_log_recs() to be 1ms as well as add a sleep of
the same duration to the end of recv_recover_page() when
recv_sys->n_addrs=0, we can demonstrate a race condition.
After disabling some debug checks in buf_all_freed_instance(),
buf_pool_invalidate_instance() and buf_validate(), we managed to
trigger an assertion failure in fseg_free_step(), on the XDES_FREE_BIT.
In other words, an trx_undo_seg_free() call during
trx_rollback_resurrected() was attempting a double-free of a page.
This was repeated about once in 400 to 500 test runs. With the fix
applied, the test passed 2,000 runs.
recv_apply_hashed_log_recs(): Do not only wait for recv_sys->n_addrs
to reach 0, but also wait for buf_get_n_pending_read_ios() to reach 0,
to guarantee that buf_page_io_complete() will not be executing
ibuf_merge_or_delete_for_page().
Issues MDEV-23851 and MDEV-24229 are probably duplicates and are caused by the new self-asserting function lock0lock.cc:wsrep_assert_no_bf_bf_wait().
The criteria for asserting is too strict and does not take in consideration scenarios of "false positive" lock conflicts, which are resolved by replaying the local transaction.
As a fix, this PR is relaxing the assert criteria by two conditions, which skip assert if high priority transactions are locking in correct order or if conflicting high priority lock holder is aborting and has just not yet released the lock.
Alternative fix would be to remove wsrep_assert_no_bf_bf_wait() altogether, or remove the assert in this function and let it only print warnings in error log.
But in my high conflict rate multi-master test scenario, this relaxed asserting appears to be safe.
This PR also removes two wsrep_report_bf_lock_wait() calls in innodb lock manager, which cause mutex access assert in debug builds.
Foreign key appending missed handling of data types of float and double in INSERT execution. This is not directly related to the actual issue here but is fixed in this PR nevertheless. Missing these foreign keys values in certification could cause problems in some multi-master load scenarios.
Finally, some problem reports suggest that some of the issues reported in MDEV-23851 might relate to false positive lock conflicts over unique secondary index gaps. There is separate work for relaxing UK index gap locking of replication appliers, and separate PR will be submitted for it, with a related mtr test as well.
Part II.
It's still possible to bypass Item_func_like::escape
initialization in Item_func_like::fix_fields().
This requires ESCAPE argument being a cacheable subquery
that uses tables and is inside a derived table which
is used in multi-update.
Instead of implementing a complex or expensive fix for
this particular ridiculously artificial case, let's simply disallow it.
in queries like
create view v1 as select 2 like 1 escape (3 in (select 0 union select 1));
select 2 union select * from v1;
Item_func_like::escape was left uninitialized, because
Item_in_optimizer is const_during_execution()
but not actually const_item() during execution.
It's not, because const subquery evaluation was disabled for derived.
Practically it only needs to be disabled for multi-update
that runs fix_fields() before all tables are locked.
row_upd_clust_step() calls row_upd_del_mark_clust_rec() which would
allocate some memory in row_ins_foreign_fill_virtual(). Then,
row_upd_store_row() would access the allocated memory, but only after
potentially freeing that memory by invoking mem_heap_empty(),
leading to ASAN heap-use-after-free diagnostics.
row_ins_foreign_fill_virtual(): Use a more appropriate memory heap with a
longer lifetime.
Due to this bug the server reported bogus messages about lack of SELECT
privileges for base tables used in the specifications of CTE tables.
It happened only if such a CTE were referred to at least twice.
For any non-recursive reference to CTE that is not primary the
specification of the CTE is cloned. The function check_table_access() is
called for such reference. The function checks privileges of the tables
referenced in the specification. As no name resolution was performed for
CTE references whose definitions occurred outside the specification before
the call of check_table_access() that was supposed to check the access
rights of the underlying tables these references were considered
as references to base tables rather than references to CTEs. Yet for CTEs
as well as for derived tables no privileges are needed and thus cannot
be granted.
The patch ensures proper name resolution of all references to CTEs before
any acl checks.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
If log_slave_updates==OFF, wsrep applier threads used to be configured
with option: thd->variables.option_bits&= ~(OPTION_BIN_LOG);
(i.e. like sql_log_bin=ON). And this was regardless of log-bin configuration.
With this, having configuration of: --log-bin && --log-slave-updates=OFF,
local threads used binlogging, but applier threads did not. And further:
local threads went through binlog group commit, while applier threads did
direct commits. This resulted in situation, where applier threads entered
earlier in wsrep XID checkpointing, and could sync their wsrep XID out of order.
Later local thread commit would see that higher seqno was already checkpointed,
and fire an assert because of this.
As a fix, applier threads are now forced to enable binlogging regardless of
log-slave-updates configuration.
This PR comes with new mtr test: galera.MDEV-24327, which causes a scenario
where applier transaction is applied and committed while earlier local transaction
is parked before commit order monitor enter. A buggy mariadb versoin would fail
for assertion because of wsrep XID checkpoint order violation.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
This bug could cause a crash when executing queries that used mutually
recursive CTEs with system variable big_tables set to 1. It happened due
to several bugs in the code that handled recursive table references
referred mutually recursive CTEs. For each recursive table reference a
temporary table is created that contains all rows generated for the
corresponding recursive CTE table on the previous step of recursion.
This temporary table should be created in the same way as the temporary
table created for a regular materialized derived table using the
method select_union::create_result_table(). In this case when the
temporary table is created it uses the select_union::TMP_TABLE_PARAM
structure as the parameter for the table construction. However the
code created the temporary table using just the function create_tmp_table()
and passed pointers to certain fields of the TMP_TABLE_PARAM structure
used for accumulation of rows of the recursive CTE table as parameters
for update. This was a mistake because now different temporary tables
cannot share some TMP_TABLE_PARAM fields in a general case. Besides,
depending on how mutually recursive CTE tables were defined and which
of them were referred in the executed query the select_union object
allocated for a recursive table reference could be allocated again after
the the temporary table had been created. In this case the TMP_TABLE_PARAM
object associated with the temporary table created for the recursive
table reference contained unassigned fields needed for execution when
Aria engine is employed as the engine for temporary tables.
This patch ensures that
- select_union object is created only once for any recursive table
reference
- any temporary table created for recursive CTEs uses its own
TMP_TABLE_PARAM structure
The patch also fixes a problem caused by incomplete cleanup of join tables
associated with recursive table references.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
The last_updated column of innodb_table_stats and innodb_index_stats
hasn't been DATA_FIXBINARY for many years.
Innodb represents TIMESTAMP as INT of length 4. Let's test it with this
and stop hiding the result in mysql_upgrade test.
Reviewer: Marko
Basic variant of the fix: do not consider conditions in form
unique_key NOT IN (c1,c2...)
to be sargable. If there are only a few constants, the condition
is not selective. If there are a lot constants, the overhead of
processing such a huge range list is not worth it.
(Backport to 10.2)
The policy is not set for 10.2
If it is set, CMake would complain about bundled zlib for which the policy
is not set.
Fix:
- Set policy for 10.2 for the top level project.
For 10.3+ it was already set
- Cleanup zlib to remove unneeded stuff. It is an internal static library,
it needs none of PROJECT, library versioning, RC file on Windows.
The name of the library on Unix does not make any difference, since it is
static and compiled in.
failed in Diagnostics_area::set_ok_status on INSERT
Analysis: Error is not returned when strict mode is enabled and value is
truncated because double is outside range.
Fix: Return HA_ERR_AUTOINC_ERANGE if the error was reported when double is
outside range.