There was a race condition between delayed insert thread and connection thread
actually performing INSERT/REPLACE DELAYED. It was triggered by concurrent
INSERT/REPLACE DELAYED and statements that flush the same table either
explicitely or implicitely (like FLUSH TABLE, ALTER TABLE, ...).
This race condition was caused by a gap in delayed thread shutdown logic,
which allowed concurrent connection running INSERT/REPLACE DELAYED to change
essential data consequently leaving table in semi-consistent state.
Specifically query thread could decrease "tables_in_use" reference counter in
this gap, causing delayed insert thread to shutdown without releasing auto
increment and table lock.
Fixed by extending condition so that delayed insert thread won't shutdown
until there're locked tables.
Also removed volatile qualifier from tables_in_use and stacked_inserts since
they're supposed to be protected by mutexes.
INCORRECT ERROR.
Analysis
========
INSERT with DUPLICATE KEY UPDATE and REPLACE on a table
where foreign key constraint is defined fails with an
incorrect 'duplicate entry' error rather than foreign
key constraint violation error.
As part of the bug fix for BUG#22037930, a new flag
'HA_CHECK_FK_ERROR' was added while checking for non fatal
errors to manage FK errors based on the 'IGNORE' flag. For
INSERT with DUPLICATE KEY UPDATE and REPLACE queries, the
foreign key constraint violation error was marked as non-fatal,
even though IGNORE was not set. Hence it continued with the
duplicate key processing resulting in an incorrect error.
Fix:
===
Foreign key violation errors are treated as non fatal only when
the IGNORE is not set in the above mentioned queries. Hence reports
the appropriate foreign key violation error.
1. the same message text for INSERT and INSERT IGNORE
2. no new warnings in UPDATE IGNORE yet (big change for 5.5)
and replace a commonly used expression with a
named constant
This is a backport of the patch for MDEV-9653 (fixed earlier in 10.1.13).
The code in Item_func_case::fix_length_and_dec() did not
calculate max_length and decimals properly.
In case of any numeric result (DECIMAL, REAL, INT) a generic method
Item_func_case::agg_num_lengths() was called, which could erroneously result
into a DECIMAL item with max_length==0 and decimals==0, so the constructor of
Field_new_decimals tried to create a field of DECIMAL(0,0) type,
which caused a crash.
Unlike Item_func_case, the code responsible for merging attributes in
Item_func_coalesce::fix_length_and_dec() works fine: it has specific execution
branches for all distinct numeric types and correctly creates a DECIMAL(1,0)
column instead of DECIMAL(0,0) for the same set of arguments.
The fix does the following:
- Moves the attribute merging code from Item_func_coalesce::fix_length_and_dec()
to a new method Item_func_hybrid_result_type::fix_attributes()
- Removes the wrong code from Item_func_case::fix_length_and_dec()
and reuses fix_attributes() in both Item_func_coalesce::fix_length_and_dec()
and Item_func_case::fix_length_and_dec()
- Fixes count_real_length() and count_decimal_length() to get an array
of Items as an argument, instead of using Item::args directly.
This is needed for Item_func_case::fix_length_and_dec().
- Moves methods Item_func::count_xxx_length() from "public" to "protected".
- Removes Item_func_case::agg_num_length(), as it's not used any more.
- Additionally removes Item_func_case::agg_str_length(),
as it also was not used (dead code).
special treatment for temporal values in
create_tmp_field_from_item().
old code only did it when result_type() was STRING_RESULT,
but Item_cache_temporal::result_type() is INT_RESULT
ANALYSIS:
=========
A LEX_STRING structure pointer is processed during the
validation of a stored program name. During this processing,
there is a possibility of null pointer dereference.
FIX:
====
check_routine_name() is invoked by the parser by supplying a
non-empty string as the SP name. To avoid any potential calls
to check_routine_name() with NULL value, a debug assert has
been added to catch such cases.
FAILURES
Analysis:
=========
Test script is not ensuring that "assert_grep.inc" should be
called only after 'Disk is full' error is written to the
error log.
Test checks for "Queueing master event to the relay log"
state. But this state is set before invoking 'queue_event'.
Actual 'Disk is full' error happens at a very lower level.
It can happen that we might even reset the debug point
before even the actual disk full simulation occurs and the
"Disk is full" message will never appear in the error log.
In order to guarentee that we must have some mechanism where
in after we write "Disk is full" error messge into the error
log we must signal the test to execute SSS and then reset
the debug point. So that test is deterministic.
Fix:
===
Added debug sync point to make script deterministic.
Item_func_ifnull::date_op() and Item_func_coalesce::date_op() could
erroneously return 0000-00-00 instead of NULL when get_date()
was called with the TIME_FUZZY_DATES flag, e.g. from LEAST().
Some statements are always replicated in STATEMENT binlog format.
So upon their execution, the current binlog format is temporarily
switched to STATEMENT even though the session's format is different.
This state, stored in THD's current_stmt_binlog_format, was getting
incorrectly masked by wsrep_forced_binlog_format, causing assertions
and unintended generation of row events.
Backported galera.galera_forced_binlog_format and added a test
specific to this case.
There was a race between end_slave() and cleanup code at the end of
handle_slave_sql(). This could cause access to master_info_index and
global_rpl_thread_pool after they had been freed.
Fix by skipping that cleanup if server shutdown is in progress, as is done
in other parts of the code as well (the cleanup, which stops worker threads
that are not needed anymore, is redundant anyway when the server is shutting
down).
INTERVALS
ISSUE:
------
Some string functions return one or a combination of the
parameters as their result. Here the resultant string's
charset could be incorrectly set to that of the chosen
parameter.
This results in incorrect behavior when an ascii string is
expected.
SOLUTION:
---------
Since an ascii string is expected, val_str_ascii should
explicitly convert the string.
Part of the fix is a backport of Bug#22340858 for mysql-5.5
and mysql-5.6.
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE
Problem:
========
Currently SHOW SLAVE STATUS blocks if IO thread waits for
disk space. This makes automation tools verifying
server health block on taking relevant action. Finally this
will create SHOW SLAVE STATUS piles.
Analysis:
=========
SHOW SLAVE STATUS hangs on mi->data_lock if relay log write
is waiting for free disk space while holding mi->data_lock.
mi->data_lock is needed to protect the format description
event (mi->format_description_event) which is accessed by
the clients running FLUSH LOGS and slave IO thread. Note
relay log writes don't need to be protected by
mi->data_lock, LOCK_log is used to protect relay log between
IO and SQL thread (see MYSQL_BIN_LOG::append_event). The
code takes mi->data_lock to protect
mi->format_description_event during relay log rotate which
might get triggered right after relay log write.
Fix:
====
Release the data_lock just for the duration of writing into
relay log.
Made change to ensure the following lock order is maintained
to avoid deadlocks.
data_lock, LOCK_log
data_lock is held during relay log rotations to protect
the description event.
REPLICATION
Problem: In RBR mode, merge table updates are not successfully applied on a cascading
replication.
Analysis & Fix: Every type of row event is preceded by one or more table_map_log_events
that gives the information about all the tables that are involved in the row
event. Server maintains the list in RPL_TABLE_LIST and it goes through all the
tables and checks for the compatibility between master and slave. Before
checking for the compatibility, it calls 'open_tables()' which takes the list
of all tables that needs to be locked and opened. In RBR, because of the
Table_map_log_event , we already have all the tables including base tables in
the list. But the open_tables() which is generic call takes care of appending
base tables if the list contains merge tables. There is an assumption in the
current replication layer logic that these tables (TABLE_LIST type objects) are always
added in the end of the list. Replication layer maintains the count of
tables(tables_to_lock_count) that needs to be verified for compatibility check
and runs through only those many tables from the list and rest of the objects
in linked list can be skipped. But this assumption is wrong.
open_tables()->..->add_children_to_list() adds base tables to the list immediately
after seeing the merge table in the list.
For eg: If the list passed to open_tables() is t1->t2->t3 where t3 is merge
table (and t1 and t2 are base tables), it adds t1'->t2' to the list after t3.
New table list looks like t1->t2->t3->t1'->t2'. It looks like it added at the
end of the list but that is not correct. If the list passed to open_tables()
is t3->t1->t2 where t3 is merge table (and t1 and t2 are base tables), the new
prepared list will be t3->t1'->t2'->t1->t2. Where t1' and t2' are of
TABLE_LIST objects which were added by add_children_to_list() call and replication
layer should not look into them. Here tables_to_lock_count will not help as the
objects are added in between the list.
Fix: After investigating add_children_list() logic (which is called from open_tables()),
there is no flag/logic in it to skip adding the children to the list even if the
children are already included in the table list. Hence to fix the issue, a
logic should be added in the replication layer to skip children in the list by
checking whether 'parent_l' is non-null or not. If it is children, we will skip 'compatibility'
check for that table.
Also this patch is not removing 'tables_to_lock_count' logic for the performance issues
if there are any children at the end of the list, those can be easily skipped directly by
stopping the loop with tables_to_lock_count check.
The main.merge test case was failing when tested using row based
binlog format.
While analyzing the issue it was found the following issues:
a) The server is calling binlog related code even when a statement will
not be binlogged;
b) The child table list was not present into table structure by the time
to generate the create table statement;
c) The tables in the child table list will not be opened yet when
generating table create info using row based replication;
d) CREATE TABLE LIKE TEMP_TABLE does not preserve original table storage
engine when using row based replication;
This patch addressed all above issues.
@ sql/sql_class.h
Added a function to determine if the binary log is disabled to
the current session. This is related with issue (a) above.
@ sql/sql_table.cc
Added code to skip binary logging related code if the statement
will not be binlogged. This is related with issue (a) above.
Added code to add the children to the query list of the table that
will have its CREATE TABLE generated. This is related with issue (b)
above.
Added code to force the storage engine to be generated into the
CREATE TABLE. This is related with issue (d) above.
@ storage/myisammrg/ha_myisammrg.cc
Added a test to skip a table getting info about a child table if the
child table is not opened. This is related to issue (c) above.
- avoiding the race condition, by not grabbing thd->LOCK_wsrep_thd for
accessing thd->wsrep_exec_mode. The caller is same thread and exec mode
can only be changed by self.
Fix remaining issues with wsrep_sync_wait and query cache.
- Fixes misplaced call to invalidate query cache in
Rows_log_event::do_apply_event().
Query cache was invalidated too early, and allowed old
entries to be inserted to the cache.
- Reset thd->wsrep_sync_wait_gtid on query cache hit.
THD->cleanup_after_query is not called in such cases,
and thd->wsrep_sync_wait_gtid remained initialized.
* Total order isolation was started twice for FLUSH TABLES, from
reload_acl_and_cache() and from mysql_execute_command(). Removed
the reload_acl_and_cache() part.
* Removed PXC specific stuff from MTR tests
- Eliminates code duplication in query cache patch
- Reduces the number of iterations in mysql-wsrep#201.test
to shorten the execution time
- Adds a new test case that exercises more scenarios
The admin commands in question are:
> OPTIMIZE
> REPAIR
> ANALYZE
For LOCAL or NO_WRITE_TO_BINLOG invocations of these commands, ie
OPTIMIZE LOCAL TABLE <t1>
they are not binlogged as expected.
Also, in addition, they are not executed under TOI.
Hence, they are not propagated to other nodes.
The effect is same as that of wsrep_on=0.
Also added tests for this.
A WSREP_DEBUG for wsrep_register_hton has also been added.
The galera_flush_local test has also been updated for verifying that effects
of NO_WRITE_TO_BINLOG / LOCAL are equivalent to wsrep_on=0 from wsrep
perspective.
(cherry picked from commit 5065122f94a8002d4da231528a46f8d9ddbffdc2)
Conflicts:
sql/sql_admin.cc
sql/sql_reload.cc
sql/wsrep_hton.cc
- Fixes query cache so that it is aware of wsrep_sync_wait.
Query cache would return (possibly stale) results to the
client, regardless of the value of wsrep_sync_wait.
- Includes the test case that reproduced the issue.