mysql_insert() first opens all affected tables (which implicitly
starts a transaction in InnoDB), then stat tables.
A failure to open a stat table caused open_tables() to abort
the current stmt transaction (trans_rollback_stmt()). So, from the
server point of view the following ha_write_row()-s happened outside
of a transactions, and the server didn't bother to commit them.
The server has a mechanism to prevent a transaction being
unexpectedly committed or rolled back in the middle of a statement -
if an operation takes place _in a sub-statement_ it cannot change
the transaction state. Operations on stat tables are exactly that -
they are not allowed to change a transaction state. Put them in
a sub-statement to make sure they don't.
read_statistics_for_tables_if_needed
Regression after 279a907, read_statistics_for_tables_if_needed() was
called after open_normal_and_derived_tables() failure.
Fixed by moving read_statistics_for_tables() call to a branch of
get_schema_stat_record() where result of open_normal_and_derived_tables()
is checked.
Removed THD::force_read_stats, added read_statistics_for_tables() instead.
Simplified away statistics_for_command_is_needed().
Problem:-
When mysql executes INSERT ON DUPLICATE KEY INSERT, the storage engine checks
if the inserted row would generate a duplicate key error. If yes, it returns
the existing row to mysql, mysql updates it and sends it back to the storage
engine.When the table has more than one unique or primary key, this statement
is sensitive to the order in which the storage engines checks the keys.
Depending on this order, the storage engine may determine different rows
to mysql, and hence mysql can update different rows.The order that the
storage engine checks keys is not deterministic. For example, InnoDB checks
keys in an order that depends on the order in which indexes were added to
the table. The first added index is checked first. So if master and slave
have added indexes in different orders, then slave may go out of sync.
Solution:-
Make INSERT...ON DUPLICATE KEY UPDATE unsafe while using stmt or mixed format
When there is more then one unique key.
Although there is two exception.
1. Auto Increment key is not counted because Innodb will get gap lock for
failed Insert and concurrent insert will get a next increment value. But if
user supplies auto inc value it can be unsafe.
2. Count only unique keys for which insertion is performed.
So this patch also addresses the bug id #72921
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about
deprecated `register` keyword.
Too much warnings came from Mroonga and I gave up on it.
Plugin fixed to not lock the LOCK_operations when not active.
Server fixed to lock the LOCK_plugin less - do it once per
thread and then only if a plugin was installed/uninstalled.
truncating a temporary table
TRUNCATE expects only one TABLE instance (which is used by TRUNCATE
itself) to be open. However this requirement wasn't enforced after
"MDEV-5535: Cannot reopen temporary table".
Fixed by closing unused table instances before performing TRUNCATE.
The command SHOW INDEXES ignored setting of the system variable
use_stat_tables to the value of 'preferably' and and showed statistical
data received from the engine. Similarly queries over the table
STATISTICS from INFORMATION_SCHEMA ignored this setting. It happened
because the function fill_schema_table_by_open() did not read any data
from statistical tables.
This patch contains a fix for the MDEV-17262/17243 issues and
new mtr test.
These issues (MDEV-17262/17243) have two reasons:
1) After an intermediate commit, a transaction loses its status
of "transaction that registered in the MySQL for 2pc coordinator"
(in the InnoDB) due to the fact that since version 10.2 the
write_row() function (which located in the ha_innodb.cc) does
not call trx_register_for_2pc(m_prebuilt->trx) during the processing
of split transactions. It is necessary to restore this call inside
the write_row() when an intermediate commit was made (for a split
transaction).
Similarly, we need to set the flag of the started transaction
(m_prebuilt->sql_stat_start) after intermediate commit.
The table->file->extra(HA_EXTRA_FAKE_START_STMT) called from the
wsrep_load_data_split() function (which located in sql_load.cc)
will also do this, but it will be too late. As a result, the call
to the wsrep_append_keys() function from the InnoDB engine may be
lost or function may be called with invalid transaction identifier.
2) If a transaction with the LOAD DATA statement is divided into
logical mini-transactions (of the 10K rows) and binlog is rotated,
then in rare cases due to the wsrep handler re-registration at the
boundary of the split, the last portion of data may be lost. Since
splitting of the LOAD DATA into mini-transactions is technical,
I believe that we should not allow these mini-transactions to fall
into separate binlogs. Therefore, it is necessary to prohibit the
rotation of binlog in the middle of processing LOAD DATA statement.
https://jira.mariadb.org/browse/MDEV-17262 and
https://jira.mariadb.org/browse/MDEV-17243
Make mysqltest to use --ps-protocol more
use prepared statements for everything that server supports
with the exception of CALL (for now).
Fix discovered test failures and bugs.
tests:
* PROCESSLIST shows Execute state, not Query
* SHOW STATUS increments status variables more than in text protocol
* multi-statements should be avoided (see tests with a wrong delimiter)
* performance_schema events have different names in --ps-protocol
* --enable_prepare_warnings
mysqltest.cc:
* make sure run_query_stmt() doesn't crash if there's
no active connection (in wait_until_connected_again.inc)
* prepare all statements that server supports
protocol.h
* Protocol_discard::send_result_set_metadata() should not send
anything to the client.
sql_acl.cc:
* extract the functionality of getting the user for SHOW GRANTS
from check_show_access(), so that mysql_test_show_grants() could
generate the correct column names in the prepare step
sql_class.cc:
* result->prepare() can fail, don't ignore its return value
* use correct number of decimals for EXPLAIN columns
sql_parse.cc:
* discard profiling for SHOW PROFILE. In text protocol it's done in
prepare_schema_table(), but in --ps it is called on prepare only,
so nothing was discarding profiling during execute.
* move the permission checking code for SHOW CREATE VIEW to
mysqld_show_create_get_fields(), so that it would be called during
prepare step too.
* only set sel_result when it was created here and needs to be
destroyed in the same block. Avoid destroying lex->result.
* use the correct number of tables in check_show_access(). Saying
"as many as possible" doesn't work when first_not_own_table isn't
set yet.
sql_prepare.cc:
* use correct user name for SHOW GRANTS columns
* don't ignore verbose flag for SHOW SLAVE STATUS
* support preparing REVOKE ALL and ROLLBACK TO SAVEPOINT
* don't ignore errors from thd->prepare_explain_fields()
* use select_send result for sending ANALYZE and EXPLAIN, but don't
overwrite lex->result, because it might be needed to issue execute-time
errors (select_dumpvar - too many rows)
sql_show.cc:
* check grants for SHOW CREATE VIEW here, not in mysql_execute_command
sql_view.cc:
* use the correct function to check privileges. Old code was doing
check_access() for thd->security_ctx, which is invoker's sctx,
not definer's sctx. Hide various view related errors from the invoker.
sql_yacc.yy:
* initialize lex->select_lex for LOAD, otherwise it'll contain garbage
data that happen to fail tests with views in --ps (but not otherwise).
This solves the following issues:
* unlike lex->m_sql_cmd and lex->sql_command, thd->query_plan_flags
is not reset in Prepared_statement::execute, it survives
till the log_slow_statement(), so slow logging behaves correctly in --ps
* using thd->query_plan_flags for both slow_log_filter and
log_slow_admin_statements means the definition of "admin" statements
for the slow log is the same no matter how it is filtered out.
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
In the title of the MDEV-9519 it was proposed to ban start slave on a Galera
if master binlog_format = statement and wsrep_auto_increment_control = 1,
but the problem can be solved without such a restriction.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
https://jira.mariadb.org/browse/MDEV-9519
This problem manifested itself when a join query used two or more
materialized CTE such that each of them employed the same recursive CTE.
The bug caused a crash. The crash happened because the cleanup()
function was performed premature for recursive CTE. This clean up was
induced by the cleanup of the first CTE referenced the recusrsive CTE.
This cleanup destroyed the structures that would allow to read from the
temporary table containing the rows of the recursive CTE and an attempt to read
these rows for the second CTE referencing the recursive CTE triggered a
crash.
The clean up for a recursive CTE R should be performed after the cleanup
of the last materialized CTE that uses R.
This patch introduces support for the system variable eq_range_index_dive_limit
that existed in MySQL starting from 5.6. The variable sets a limit for
index dives into equality ranges. Index dives are performed by optimizer
to estimate the number of rows in range scans. Index dives usually provide
good estimate but they are pretty expensive. To estimate the number of rows
in equality ranges statistical data on indexes can be employed. Its usage gives
not so good estimates but it's cheap. So if the number of equality dives
required by an index scan exceeds the set limit no dives for equality
ranges are performed by the optimizer for this index.
As the new system variable is introduced in a stable version the default
value for it is set to a special value meaning there is no limit for the number
of index dives performed by the optimizer.
The patch partially uses the MySQL code for WL 5957
'Statistics-based Range optimization for many ranges'.
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
The previous correction of the patch for mdev-16473 did not work
correctly for the databases whose names started with '*'.
Added a test case with a database named "*".
Before this patch if no default database was set the server threw
an error for any table name reference that was not fully qualified by
database name. In particular it happened for table names referenced
CTE tables. This was incorrect.
The error message was thrown at the parser stage when the names referencing
different tables were not resolved yet.
Now if no default database is set and a with clause is used in the
processed statement any table reference is just supplied with a dummy
database name "*none*" at the parser stage. Later after a call
of check_dependencies_in_with_clauses() when the names for CTE tables
can be resolved error messages are thrown only for those names that
refer to non-CTE tables. This is done in open_and_process_table().