reset default fields not for every modified row, but only once,
at the beginning, as the set of modified fields doesn't change.
exception: INSERT ... ON DUPLICATE KEY UPDATE - the set of fields
does change per row and in that case we reset default fields per row.
This bug was seen when parallel replication experienced a deadlock between
transactions T1 and T2, where T2 has reached the commit phase and is waiting
for T1 to commit first. In this case, the deadlock is broken by sending a kill
to T2; that kill error is then later detected and converted to a deadlock
error, which causes T2 to be rolled back and retried.
The problem was that the kill caused ha_commit_trans() to errorneously call
wakeup_subsequent_commits() on T3, signalling it to abort because T2 failed
during commit. This is incorrect, because the error in T2 is only a temporary
error, which will be resolved by normal transaction retry. We should not
signal error to the next transaction until we have executed the code that
handles such temporary errors.
So this patch just removes the calls to wakeup_subsequent_commits() from
ha_commit_trans(). They are incorrect in this case, and they are not needed in
general, as wakeup_subsequent_commits() must in any case be called in
finish_event_group() to wakeup any transactions that may have started to wait
after ha_commit_trans(). And normally, wakeup will in fact have happened
earlier, either from the binlog group commit code, or (in case of no
binlogging) after the fast part of InnoDB/XtraDB group commit.
The symptom of this bug was that replication would break on some transaction
with "Commit failed due to failure of an earlier commit on which this one
depends", but with no such failure of an earlier commit visible anywhere.
The retry of an event group in parallel replication set the wrong value for
the end log position of the event that was retried
(qev->future_event_relay_log_pos). It was too large by the size of the event,
so it pointed into the middle of the following event.
If the retry happened in the very last event of the event group, _and_ the SQL
thread was stopped just after successfully retrying that event, then the SQL
threads's relay log position would be left incorrect. Restarting the SQL
thread could then try to read events from a garbage offset in the relay log,
usually leading to an error about not being able to read the event.
The code in binlog group commit around wait_for_commit that controls commit
order, did the wakeup of subsequent commits early, as soon as a following
transaction is put into the group commit queue, but before any such commit has
actually taken place. This causes problems with too early wakeup of
transactions that need to wait for prior to commit, but do not take part in
the binlog group commit for one reason or the other.
This patch solves the problem, by moving the wakeup to happen only after the
binlog group commit is completed.
This requires a new solution to ensure that transactions that arrive later
than the leader are still able to participate in group commit. This patch
introduces a flag wait_for_commit::commit_started. When this is set, a waiter
can queue up itself in the group commit queue.
This way, effectively the wait_for_prior_commit() is skipped only for
transactions that participate in group commit, so that skipping the wait is
safe. Other transactions still wait as needed for correctness.
setting of innodb_io_capacity_max
(a) Changed the behaviour so that if you set innodb_io_capacity to a
value > innodb_io_capacity_max that the value is accepted AND
that innodb_io_capacity_max = innodb_io_capacity * 2.
(b) If someone wants to reduce innodb_io_capacity_max and
reduce it below innodb_io_capacity then innodb_io_capacity
should be reduced to the same level as innodb_io_capacity_max.
In both cases give a warning to user.
main.information_schema: added a condition to the query to exclude perfschema tables
main.information_schema_all_engines: added a call to the include file to check for the presence of perfschema
When parsing a field declaration, grab type information from LEX before it's overwritten
by further rules. Pass type information through the parser stack to the rule that needs it.
The test complains that the server failed to disappear upon shutdown /
wait_for_disconnect. Trying to solve the probably race condition by
adding a wait before restart.
The test case had a classic mistake: SET DEBUG_SYNC='now SIGNAL xxx'
followed immediately by SET DEBUG_SYNC='RESET'. This makes it
possible for the waiter to miss the signal, if it does not manage
to wake up prior to the RESET.
The test cases had some --replace_result $USER USER. The problem is that the
value of $USER can be anything, depending on the name of the unix account that
runs the test suite. So random parts of the result can be errorneously
replaced, causing test failures.
Fix by making the replacements more specific, so they will match only the
intended stuff regardless of the value of $USER.
Merged Facebook commit 617aef9f911d825e9053f3d611d0389e02031225
authored by Inaam Rana to InnoDB storage engine (not XtraDB)
from https://github.com/facebook/mysql-5.6
WL#7047 - Optimize buffer pool list scans and related batch processing
Reduce excessive scanning of pages when doing flush list batches. The
fix is to introduce the concept of "Hazard Pointer", this reduces the
time complexity of the scan from O(n*n) to O.
The concept of hazard pointer is reversed in this work. Academically
hazard pointer is a pointer that the thread working on it will declar
such and as long as that thread is not done no other thread is allowe
do anything with it.
In this WL we declare the pointer as a hazard pointer and then if any
thread attempts to work on it, it is allowed to do so but it has to a
the hazard pointer to the next valid value. We use hazard pointer sol
reverse traversal of lists within a buffer pool instance.
Add an event to control the background flush thread. The background f
thread wait has been converted to an os event timed wait so that it c
signalled by threads that want to kick start a background flush when
buffer pool is running low on free/dirty pages.
Merge Facebook commit 154c579b828a60722a7d9477fc61868c07453d08
and e8f0052f9b112dc786bf9b957ed5b16a5749f7fd authored
by Steaphan Greene from https://github.com/facebook/mysql-5.6
Optimize prefix index queries to skip cluster index lookup when possible.
Currently InnoDB will always fetch the clustered index (primary key
index) for all prefix columns in an index, even when the value of a
particular record is smaller than the prefix length. This change
optimizes that case to use the record from the secondary index and avoid
the extra lookup.
Also adds two status vars that track how effective this is:
innodb_secondary_index_triggered_cluster_reads:
Times secondary index lookup triggered cluster lookup.
innodb_secondary_index_triggered_cluster_reads_avoided:
Times prefix optimization avoided triggering cluster lookup.
The bug is not very important per se, but it was helpful to move
Item_func_strcmp out of Item_bool_func2 (to Item_int_func),
for the purposes of "MDEV-4912 Add a plugin to field types (column types)".
The test runs a query in one thread, then in another queries the processlist
and expects to find the first thread in the COM_SLEEP state. The problem is
that the thread signals completion to the client before changing to COM_SLEEP
state, so there is a window where the other thread can see the wrong state.
A previous attempt to fix this was ineffective. It set a DEBUG_SYNC to handle
proper waiting, but unfortunately that DEBUG_SYNC point ended up triggering
already at the end of SET DEBUG_SYNC=xxx, so the wait was ineffective.
Fix it properly now (hopefully) by ensuring that we wait for the DEBUG_SYNC
point to trigger at the end of the SELECT SLEEP(), not just at the end of
SET DEBUG_SYNC=xxx.
Merge Facebook commit 4f3e0343fd2ac3fc7311d0ec9739a8f668274f0d
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
Adds innodb_idle_flush_pct to enable tuning of the page flushing rate
when the system is relatively idle. We care about this, since doing
extra unnecessary flash writes shortens the lifespan of the flash.
Merge Facebook commit ca40b4417fd224a68de6636b58c92f133703fc68
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
Change the default value for innodb_log_compressed_pages to false
Logging these pages is a waste. We don't want this to be enabled.
One caution here: If the zlib version used by innodb is changed, but
the running version is still the previous version, and the running
version crashes, it is possible crash recovery could fail.
When crash recovery uses a zlib version at all different than the
version used by the crashed instance, it is possible that a redone
compression could fail, where the original did not, because the new
zlib version compresses the same data to a slightly larger size.
Because of the nature of compression, this is even possible when
upgrading to a version of zlib which actually peforms overall better
compression than the previous version.
If this happens, mysql will fail to recover, since a page split can
not be safely triggered during crash recovery.
So, either the exact zlib version must be controlled between builds,
or these rare recovery failures must be accepted. The cost of
logging these pages is quite high, so we consider this limitation to
be worthwhile.
This failure scenario can not happen if there was a clean shutdown.
This is only relevant to restarting crashed instances, or starting an
instance built via a hot backup too (XtraBackup).
New generation hard drives, SSDs and NVM devices support 4K
sector size. Supported sector size can be found using fstatvfs()
or GetDiskFreeSpace() functions.
(Backport to 5.3)
(Attempt #2)
- Don't attempt to use BKA for materialized derived tables. The
table is neither filled nor fully opened yet, so attempt to
call handler->multi_range_read_info() causes crash.
(Backport to 5.3)
(variant #2, with fixed coding style)
- Make Mrr_ordered_index_reader::resume_read() restore index position
only if it was saved before with Mrr_ordered_index_reader::interrupt_read().
- TABLE::create_key_part_by_field() should not set PART_KEY_FLAG in field->flags
= The reason is that it is used by hash join code which calls it to create a hash
table lookup structure. It doesn't create a real index.
= Another caller of the function is TABLE::add_tmp_key(). Made it to set the flag itself.
- The differences in join_cache.result could also be observed before this patch: one
could put "FLUSH TABLES" before the queries and get exactly the same difference.
Merged Facebook commit dd2d11be7aaf3be270e740fb95cbc4eacb52f4d7
authored by Rongrong Zhong from https://github.com/facebook/mysql-5.6
This fixes MySQL Bug #68220 innodb_rows_updated is misleading on slave
http://bugs.mysql.com/bug.php?id=68220
Added innodb_system_rows_read/inserted/updated/deleted counters
that are the equivalent of innodb_rows_* but that only account for
changes made to system databases (mysql, information_schame and
preformance_schema). These counters will be used on slaves to
differentiated the updates made on system databases from those made on
user databases.
innodb_rows_* status counters are not updated when innodb_system_rows_*
are updated.
dd2d11be7a
Merged Facebook commit ecff018632c6db49bad73d9233c3cdc9f41430e9
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6
This change is to fix: http://bugs.mysql.com/62534
This makes innodb_max_dirty_pages_pct a double with min,default,max values
0.001, 75, 99.999.
This also makes innodb_max_dirty_pages_pct_lwm and adaptive_flushing_lwm
doubles, as these sysvars are inter-dependent.
Added more to the BUFFER POOL AND MEMORY section of SHOW INNODB STATUS:
Percent pages dirty: X.X
This is all n_dirty_pages / used_pages
Percent all pages dirty: X.X
This is all n_dirty_pages / all-pages
Max dirty pages percent: X.X
This is innodb_max_dirty_pages_pct
Also changed all of buf from 2 to 3 digits of precision (%.2f -> %.3f).
(Attempt #2)
- Don't attempt to use BKA for materialized derived tables. The
table is neither filled nor fully opened yet, so attempt to
call handler->multi_range_read_info() causes crash.
The method subselect_union_engine::no_rows() must take
into account the fact that now unit->fake_select_lex is
NULL for for select_union_direct objects.
The test case waits for other threads to complete, but the wait is only 2
seconds. This is likely to sometimes be too little on our heavily loaded
buildbot VMs, that can easily stall for more than 2 seconds from time to time.
So let's try to increase the timeout (to about 40 seconds) and see if it
helps.
The test case runs SHOW SLAVE HOSTS. The output of this is only stable after
all slaves have had time to register with the master; this happens
asynchroneously.
The test was waiting for the slave with server_id=3 to appear in the output,
but it was missing a similar wait for server_id=2. Thus, if server_id=2 was
much slower to connect for some reason, it could be missing from the output,
causing the test to fail.
I think I finally found the problem, managed to reproduce locally using a
sleep in the test case to simulate the particular race condition that causes
the test to fail often in Buildbot.
The test starts an ALTER TABLE that does repair by sort in one thread, then
another thread waits for the sort to be visible in SHOW PROCESSLIST and runs a
SHOW statement in parallel.
The problem happens when the sort manages to run to completion before the
other thread has the time to look at SHOW PROCESSLIST. In this case, the wait
times out because the state looked for has already passed.
Earlier I added some DEBUG_SYNC to prevent this race, but it turns out that
DEBUG_SYNC itself changes the state in the processlist. So when the debug sync
point was hit, the processlist was showing the wrong state, so the wait would
still time out.
Fixed now by looking for the processlist to contain either the "Repair by
sorting" state or the debug sync wait stage.
Also clean up previous attempts to fix it. Set the wait timeout back to
reasonable 60 seconds, and simplify the DEBUG_SYNC operations to work closer
to how the original test case was intended.
1. Do not use NULL `info' field in processlist to select the thread of
interest. This can fail if the read of processlist ends up happening after
REAP succeeds, but before the `info' field is reset. Instead, select on the
CONNECTION_ID(), making sure we still scan the whole list to trigger the same
code as in the original test case.
2. Wait for the query to really complete before reading it in the
processlist. When REAP returns, it only means that ack has been sent to
client, the reset of query stage happens a bit later in the code.
Add missing REAP to the test.
A later test failed with strange incorrect values for COM_SELECT
in information_schema.global_status. Since global_status is updated
at the end of session activity, it seems appropriate to ensure that
all background connections have completed before accessing it.
(I checked that the original bug still triggers the test case after
the modification with REAP).
- Don't attempt to use BKA for materialized derived tables. The
table is neither filled nor fully opened yet, so attempt to
call handler->multi_range_read_info() causes crash.
innodb_buffer_pool_pages_total depends on page size. On Power8 it is 65k
compared to 4k on Intel. As we round allocations on page size we may get
slightly more memory for buffer pool.
- test_if_skip_sort_order()/create_ref_for_key() may change table
access from EQ_REF(index1) to REF(index2).
- Doing so doesn't make much sense from optimization POV, but since
they are doing it, they should update tab->read_record.unlock_row
accordingly.
Extend existing plugins to support
* SHOW QUERY_RESPONSE_TIME
* FLUSH QUERY_RESPONSE_TIME
* SHOW LOCALE
move userstat tables to use the new API instead of
hand-coded syntax
1. @@boolean_var differs from SHOW VARIABLES
2. @@str_var ignored variable charset (which is wrong
for path variables that use filesystem charset)
3. @@signed_int_var in the string context was printed
as unsigned
* ignore the OPTIMIZER_SWITCH_ENGINE_CONDITION_PUSHDOWN bit
* issue a deprecation warning on 'engine_condition_pushdown=on'
* remove unused remains of the old pre-5.5 engine_condition_pushdown variable
Normally, Prepared_statement object rewrites the query on execution
to replace ?-placeholders with values. The rewritten query may be written
to logs (including binlog) or stored in the query cache.
But for compound statements, the whole block is prepared and executed,
while contained statements are logged individually. So it doesn't make
sense to rewrite the original statement block. Instead, we need to rewrite
every contained statement. SP is already doing it to replace SP variables
with values. Let it rewrite PS parameters too in the same loop.
instead of having it 1K everywhere, make it 1K on 32-bit and 2K on 64-bit.
As the latter has larger pointers (and larger sizeof(SHOW_VAR),
it needs a larger buffer to store the same amount of SHOW_VARs
* fix debian patch
* update the copyright
* rename include guards to follow conventions
* restore incorectly deleted test file, add clarification in a comment
* capitalize the first letter of the status variable
Don't double-check privileges for a column in the GROUP BY that refers to
the same column in SELECT clause. Privileges were already checked for SELECT clause.
Added MAX_STATEMENT_TIME user variable to automaticly kill queries after a given time limit has expired.
- Added timer functions based on pthread_cond_timedwait
- Added kill_handlerton() to signal storage engines about kill/timeout
- Added support for GRANT ... MAX_STATEMENT_TIME=#
- Copy max_statement_time to current user, if stored in mysql.user
- Added status variable max_statement_time_exceeded
- Added KILL_TIMEOUT
- Removed digest hash from performance schema tests as they change all the time.
- Updated test results that changed because of the new user variables or new fields in mysql.user
This functionallity is inspired by work done by Davi Arnaut at twitter.
Test case is copied from Davi's work.
Documentation can be found at
https://kb.askmonty.org/en/how-to-limittimeout-queries/
mysql-test/r/mysqld--help.result:
Updated for new help message
mysql-test/suite/perfschema/r/all_instances.result:
Added new mutex
mysql-test/suite/sys_vars/r/max_statement_time_basic.result:
Added testing of max_statement_time
mysql-test/suite/sys_vars/t/max_statement_time_basic.test:
Added testing of max_statement_time
mysql-test/t/max_statement_time.test:
Added testing of max_statement_time
mysys/CMakeLists.txt:
Added thr_timer
mysys/my_init.c:
mysys/mysys_priv.h:
Added new mutex and condition variables
Added new mutex and condition variables
mysys/thr_timer.c:
Added timer functions based on pthread_cond_timedwait()
This can be compiled with HAVE_TIMER_CREATE to benchmark agains timer_create()/timer_settime()
sql/lex.h:
Added MAX_STATEMENT_TIME
sql/log_event.cc:
Safety fix (timeout should be threated as an interrupted query)
sql/mysqld.cc:
Added support for timers
Added status variable max_statement_time_exceeded
sql/share/errmsg-utf8.txt:
Added ER_QUERY_TIMEOUT
sql/signal_handler.cc:
Added support for KILL_TIMEOUT
sql/sql_acl.cc:
Added support for GRANT ... MAX_STATEMENT_TIME=#
Copy max_statement_time to current user
sql/sql_class.cc:
Added timer functionality to THD.
Added thd_kill_timeout()
sql/sql_class.h:
Added timer functionality to THD.
Added KILL_TIMEOUT
Added max_statement_time variable in similar manner as long_query_time was done.
sql/sql_connect.cc:
Added handling of max_statement_time_exceeded
sql/sql_parse.cc:
Added starting and stopping timers for queries.
sql/sql_show.cc:
Added max_statement_time_exceeded for user/connects status in MariaDB 10.0
sql/sql_yacc.yy:
Added support for GRANT ... MAX_STATEMENT_TIME=# syntax, to be enabled in 10.0
sql/structs.h:
Added max_statement_time user resource
sql/sys_vars.cc:
Added max_statement_time variables
mysql-test/suite/roles/create_and_drop_role_invalid_user_table.test
Removed test as we require all fields in mysql.user table.
scripts/mysql_system_tables.sql
scripts/mysql_system_tables_data.sql
scripts/mysql_system_tables_fix.sql
Updated mysql.user with new max_statement_time field
- Fix the crash by making get_column_range_cardinality()
to handle the special case where Column_stats objects
is an all-zeros object (the question of what is the point
of having Field::read_stats point to such object remains a
mystery)
- Added a few comments. Learning the code still.
- if test_if_skip_sort_order() decides to switch to using an index, or
switch from using ref to using quick select, it should set all
members accordingly.
merge from MySQL-5.6, revision:
revno: 3677.2.1
committer: Alfranio Correia <alfranio.correia@oracle.com>
timestamp: Tue 2012-02-28 16:26:37 +0000
message:
BUG#13627921 - MISSING FLAGS IN SQL_COMMAND_FLAGS MAY LEAD TO REPLICATION PROBLEMS
Flags in sql_command_flags[command] are not correctly set for the following
commands:
. SQLCOM_SET_OPTION is missing CF_CAN_GENERATE_ROW_EVENTS;
. SQLCOM_BINLOG_BASE64_EVENT is missing CF_CAN_GENERATE_ROW_EVENTS;
. SQLCOM_REVOKE_ALL is missing CF_CHANGES_DATA;
. SQLCOM_CREATE_FUNCTION is missing CF_AUTO_COMMIT_TRANS;
This may lead to a wrong sequence of events in the binary log. To fix
the problem, we correctly set the flags in sql_command_flags[command].
The problem was that the internal temporary table created
for information_schema overflow to MyISAM because it has
a row width of > 3000 characters, which filled the in memory
temporary tables.
Fix was to increase size for the heap table.
Added comments
Ensure that tokudb test works even if jemalloc is not installed
Removed not referenced function Item::remove_fixed()
mysql-test/suite/rpl/t/rpl_gtid_reconnect.test:
Fixed race condition
sql/item.cc:
Indentation fix
sql/item.h:
Removed not used function
Added comment
sql/sql_select.cc:
Fixed indentation
storage/tokudb/mysql-test/rpl/include/have_tokudb.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb_add_index/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb_alter_table/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb_bugs/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
storage/tokudb/mysql-test/tokudb_mariadb/suite.opt:
Ensure that tokudb test works even if jemalloc is not installed
don't kill statements in the default connection, kill them in a connection that
will be closed - it'll guarantee that `KILL con_id` will not apply to unrelated statements.
We should assume that the store engine will report the first duplicate key for this case.
Old code of suppression of unsafe logging error with LIMIT didn't work, because of wrong usage of my_interval_timer().
Suppress unsafe logging errors to the error log if we get too many unsafe logging errors in a short time.
This is to not overflow the error log with meaningless errors.
- Each error code is suppressed and counted separately.
- We do a 5 minute suppression of new errors if we get more than 10 errors in that time.
Only print unsafe logging errors if log_warnings > 1.
mysql-test/suite/binlog/r/binlog_stm_unsafe_warning.result:
Update test results as INSERT ... ON DUPLICATE KEY UPDATE doesn't get logged anymore
mysql-test/suite/binlog/r/binlog_unsafe.result:
Update test results as INSERT ... ON DUPLICATE KEY UPDATE doesn't get logged anymore
mysql-test/suite/engines/README:
Fixed typos
mysql-test/suite/rpl/r/rpl_known_bugs_detection.result:
Update test results as INSERT ... ON DUPLICATE KEY UPDATE doesn't get logged anymore
sql/sql_base.cc:
Don't log warning if there are two unique keys used with INSERT .. ON DUPLICATE KEY UPDATE.
We should assume that the store engine will report the first duplicate key for this case.
sql/sql_class.cc:
Suppress error in binary log if we get too many unsafe logging errors in a short time.
Only print unsafe logging errors if log_warnings > 1
have 0x5C as the second byte in a multi-byte character.
- Adding big5 tests covering an unassigned character 0xC840
being stored into char/varchar/text/enum columns.
Fixed the bug itself.
Also, added "SET NAMES swe7" which was forgotten in the previous commit,
so latin1 was actually tested lati1 instead of swe7 in a mistake.
Now it tests swe7.
in the SQL parser.
Various backslash escapes and quote-quote escaped sequences are covered
in combination with single and multi-byte characters.
This is especially important for the character sets that can have 0x5C
as the second byte in a multi-byte character (big5, cp932, gbk, sjis).
swe7 is also a special character set, because in swe7 0x5C is used for
both escape character and for "LATIN CAPITAL LETTER O WITH DIAERESIS".