When we append data to the binlog file, we use fdatasync() to ensure
the data gets to disk so that crash recovery can work.
Unfortunately there seems to be a bug in ext3/ext4 on linux, so that
fdatasync() does not correctly sync all data when the size of a file
is increased. This causes crash recovery to not work correctly (it
loses transactions from the binlog).
As a work-around, use fsync() for the binlog, not fdatasync(). Since
we are increasing the file size, (correct) fdatasync() will most
likely not be faster than fsync() on any file system, and fsync()
does work correctly on ext3/ext4. This avoids the need to try to
detect if we are running on buggy ext3/ext4.
The flag is now checked for MYSQL_LOCK_LOG_TABLE and similar
in open_table().
per-file comments:
sql/sql_base.cc
MDEV-495 Table logging does not work in TRANSACTION READ ONLY mode.
1. Field_newdate::get_date should refuse to return a date with zeros when
TIME_NO_ZERO_IN_DATE is set, not when TIME_FUZZY_DATE is unset
2. Item_func_to_days and Item_date_add_interval can only work with valid dates,
no zeros allowed.
fix Item_func_add_time::get_date() to generate valid dates.
Move the validity check inside get_date_from_daynr()
instead of relying on callers
(5 that had it, and 2 that did not, but should've)
The use of Thread_iterator did not work on windows (linking problems).
Solution: Change the interface between the thread_pool and the server
to only use simple free functions.
This patch is for 5.5 only (mimicks similar solution in 5.6)
The function collect_statistics_for_table() when scanning a table
did not take into account that the handler function ha_rnd_next
could return the code HA_ERR_RECORD_DELETE that should not be
considered as an indication of an error.
Also fixed a potential memory leak in this function.
- We use HA_MRR_NO_ASSOC ("optimizer_switch=join_cache_hashed") mode
- Not able to use BKA's buffers yet.
- There is a variable to control batch size
- There are status counters.
- Nedeed to make some fixes in BKA code (to be checked with Igor)
The problem was that was_null and null_value variables was reset in each reexecution of IN subquery, but engine rerun only for non-constant subqueries.
Fixed checking constant in Item_equal sort.
Fix constant reporting in Item_subselect.
ENABLE AUDI PLUGIN WHEN DDL
OPERATION HAPPENING
PROBLEM: While unloading the plugin, state is
not checked before it is to be reaped.
This can lead to simultaneous free of
plugin memory by more than one thread.
Multiple deallocation leads to server
crash. In the present bug two threads
deallocate the alog_log plugin.
SOLUTION: A check is added to ensure that only
one thread is unloading the plugin.
NOTE: No mtr test is added as it requires
multiple threads to access critical
section. debug_sync cannot be used in
the current senario because we dont
have access to thread pointer in
some of the plugin functions. IMHO no
test case in the current time frame.
NUMBERS
If a system variable was declared as deprecated without mention of an
alternative, the message would look funny, e.g. for @@delayed_insert_limit:
Warning 1287 '@@delayed_insert_limit' is deprecated and
will be removed in MySQL .
The message was meant to display the version number, but it's not
possible to give one when declaring a system variable.
The fix does two things:
1) The definition of the message
ER_WARN_DEPRECATED_SYNTAX_NO_REPLACEMENT is changed so that it does
not display a version number. I.e. in English the message now reads:
Warning 1287 The syntax '@@delayed_insert_limit' is deprecated and
will be removed in a future version.
2) The message ER_WARN_DEPRECATED_SYNTAX_WITH_VER is discontinued in
favor of ER_WARN_DEPRECATED_SYNTAX for system variables. This change
was already done in versions 5.6 and above as part of wl#5265. This
part is simply back-ported from the worklog.
FAILED IN CHECK_LOCK_AND_ST
Problem:
--------
lock_tables() is supposed to invoke check_lock_and_start_stmt()
for TABLE_LIST which are directly used by top level statement.
TABLE_LIST->prelocking_placeholder is set only for TABLE_LIST
which are used indirectly by stored programs invoked by top
level statement. Hence check_lock_and_start_stmt() should have
TABLE_LIST->prelocking_placeholder==false always, but it is
observed that this assert fails.
The failure is found during RQG test rqg_signal_resignal.
Analysis:
---------
open_tables() invokes open_and_process_routines() where it
finds all the TABLE_LIST that belong to the routine and
adds it to thd->lex->query_tables. During this process if
the open_and_process_routines() fail for some reason,
we are supposed to chop-off all the TABLE_LIST found during
calls to open_and_process_routines(). But, in practice this
is not happening.
thd->lex->query_tables_own_last is supposed to point to a
node in thd->lex->query_tables, which would be a first
TABLE_LIST used indirectly by stored programs invoked by
top level statement. This is found to be not-set correctly
when we plan to chop-off TABLE_LIST's, when
open_and_process_routines() failed.
close_tables_for_reopen() does chop-off all the TABLE_LIST
added after thd->lex->query_table_own_last. This is invoked
upon error in open_and_process_routines(). This call would
not work as expected as thd->lex->query_tables_own_last
is not set, or is not set to correctly.
Further, when open_tables() restarts the process of finding
TABLE_LIST belonging to stored programs, and as the
thd->lex->query_tables_own_last points to in-correct node,
there is possibility of new iteration setting the
thd->lex->query_tables_own_last past some old nodes that
belong to stored programs, added earlier and not removed.
Later when open_tables() completes, lock_tables() ends up
invoking check_lock_and_start_stmt() for TABLE_LIST which
belong to stored programs, which is not expected behavior
and hence we hit the assert
TABLE_LIST->prelocking_placeholder==false.
Due to above behavior, if a user application tries to
execute a SQL statement which invokes some stored function
and if the lock grant on stored function fails due to a
deadlock, then mysqld crashes.
Fix:
----
open_tables() remembers save_query_tables_last which points
to thd-lex->query_tables_last before calls to
open_and_process_routines(). If there is no known
thd->lex->query_tables_own_last set, we are now setting
thd->lex->query_tables_own_last to save_query_tables_last.
This will make sure that the call to close_tables_for_reopen()
will chop-off the list correctly, in other words we now
remove all the nodes added to thd->lex->query_tables, by
previous calls to open_and_process_routines().
Further, it is found that the problem exists starting
from 5.5, due to a code refactoring effort related to
open_tables(). Hence, the fix will be pushed in 5.5, 5.6
and trunk.
With the new code of mysql-5.5 for metadata locking the function
unlock_tables_n_open_system_tables_for_write should not explicitly
unlock tables for which external locks have been set and should not
explicitly reset thd->lock to 0.
Documentation for class Item_outer_ref was wrong:
(*ref) may point to Item_field as well
(see e.g. Item_outer_ref::fix_fields)
So this casting in get_store_key() was wrong:
(*(Item_ref**)((Item_ref*)keyuse->val)->ref)->ref_type()
two tests still fail:
main.innodb_icp and main.range_vs_index_merge_innodb
call records_in_range() with both range ends being open
(which triggers an assert)
The fix backports from MWL#182: Explain running statements the logic that
saves the original JOIN_TAB array of a query plan after optimization. This
array is later used during EXPLAIN to iterate over the original JOIN plan
nodes in the cases when this plan could be changed by early subquery
execution during the optimization phase of the outer query.
The crash happend when combining query cache, prepared statements and using a read only cursor.
sql/sql_cache.cc:
Fixed unlikely error when one adjust query cache size in middle of operation
sql/sql_cursor.cc:
Disable query cache when using cursors. This fixed lp:1039277
tests/mysql_client_test.c:
Test case for lp:1039277
sql/handler.cc:
SHOW INNODB STATUS sometimes returns 0 even if it has generated an error.
This code is here to catch it until InnoDB some day is fixed.
storage/innobase/handler/ha_innodb.cc:
Catch at least one of the possible errors from SHOW INNODB STATUS to provide a correct return code.
storage/xtradb/handler/ha_innodb.cc:
Catch at least one of the possible errors from SHOW INNODB STATUS to provide a correct return code.
support-files/my-huge.cnf.sh:
Fixed typo
Additional patch to remove the part_id -> ref_buffer offset.
The partitioning id and the associate record buffer can
be found without having to calculate it.
By initializing it for each used partition, and then reuse
the key-buffer from the queue, it is not needed to have
such map.
Fixed error in test that caused following tests to fail
extra/yassl/taocrypt/src/dsa.cpp:
Fixed compiler warning by adding cast
mysql-test/suite/rpl/t/rpl_start_slave_deadlock_sys_vars.test:
We have to first test for have_debug_sync to not start master wrongly
plugin/auth_pam/auth_pam.c:
Fixed compiler warning
sql/sys_vars.h:
Fixed compiler warning (Sys_var_max_user_conn is now signed)
support-files/compiler_warnings.supp:
Don't give warnings for auth_pam.c (Tried to fix it by changing the code, but could not find an easy way to do that on solaris)
The buffer for the current read row from each partition
(m_ordered_rec_buffer) used for sorted reads was
allocated on open and freed when the ha_partition handler
was closed or destroyed.
For tables with many partitions and big records this could
take up too much valuable memory.
Solution is to only allocate the memory when it is needed
and free it when nolonger needed. I.e. allocate it in
index_init and free it in index_end (and to handle failures
also free it on reset, close etc.)
Also only allocating needed memory, according to
partitioning pruning.
Manually tested that it does not use as much memory and
releases it after queries.
mysql-test/suite/maria/lock.result:
Added test case
mysql-test/suite/maria/lock.test:
Added test case
sql/sql_table.cc:
One can't call HA_EXTRA_FORCE_REOPEN on something that may be opened twice.
It's safe to remove the call in this case as we will call HA_EXTRA_PREPARE_FOR_DROP for the table anyway.
(One nice side effect is that drop is a bit faster as we are not flushing the cache to disk before the drop anymore)
mysql-test/r/partition.result:
Added test case
mysql-test/t/partition.test:
Added test case
sql/sql_partition.cc:
Do mysql_trans_prepare_alter_copy_data() after all original tables are locked.
(We don't want to disable transactions for the original tables, that still may be in the cache)
sql/sql_table.cc:
Fixed two wrong DBUG_ENTER
sql/item_subselect.cc:
Added purecov info
sql/sql_select.cc:
Added cast
storage/innobase/handler/ha_innodb.cc:
Added cast
storage/xtradb/btr/btr0btr.c:
Added buf_block_get_frame_fast() to avoid compiler warning
storage/xtradb/handler/ha_innodb.cc:
Added cast
storage/xtradb/include/buf0buf.h:
Innodb has buf_block_get_frame(block) defined as (block)->frame.
Didn't want to do a big change to break xtradb as it may use block_get_frame() differently, so I mad this quick hack to patch one compiler warning.
Starting the SQL thread might deadlock with reading the values of the
replication filtering options.
The deadlock is due to a lock order violation when the variables are
read or set. For example, reading replicate_ignore_table first
acquires LOCK_global_system_variables in sys_var::value_ptr and later
acquires LOCK_active_mi in Sys_var_rpl_filter::global_value_ptr. This
violates the order established when starting a SQL thread, where
LOCK_active_mi is acquired before start_slave, and ends up creating a
thread (handle_slave_sql) that allocates a THD handle whose
constructor acquires LOCK_global_system_variables in THD::init.
The solution is to unlock LOCK_global_system_variables before the
replication filtering options are set or read. This way the lock
order is preserved and the data being read/set is still protected
given that it acquires LOCK_active_mi.
MASTER-MASTER AND USING SET USE
Problem:
=======
In a master-master set-up, a master can show a wrong
'SHOW SLAVE STATUS' output.
Requirements:
- master-master
- log_slave_updates
This is caused when using SET user-variables and then using
it to perform writes. From then on the master that performed
the insert will have a SHOW SLAVE STATUS that is wrong and
it will never get updated until a write happens on the other
master. On"Master A" the "exec_master_log_pos" is not
getting updated.
Analysis:
========
Slave receives a "User_var" event from the master and after
applying the event, when "log_slave_updates" option is
enabled the slave tries to write this applied event into
its own binary log. At the time of writing this event the
slave should use the "originating server-id". But in the
above case the sever always logs the "user var events"
by using its global server-id. Due to this in a
"master-master" replication when the event comes back to the
originating server the "User_var_event" doesn't get skipped.
"User_var_events" are context based events and they always
follow with a query event which marks their end of group.
Due to the above mentioned problem with "User_var_event"
logging the "User_var_event" never gets skipped where as
its corresponding "query_event" gets skipped. Hence the
"User_var" event always waits for the next "query event"
and the "Exec_master_log_position" does not get updated
properly.
Fix:
===
`MYSQL_BIN_LOG::write' function is used to write events
into binary log. Within this function a new object for
"User_var_log_event" is created and this new object is used
to write the "User_var" event in the binlog. "User var"
event is inherited from "Log_event". This "Log_event" has
different overloaded constructors. When a "THD" object
is present "Log_event(thd,...)" constructor should be used
to initialise the objects and in the absence of a valid
"THD" object "Log_event()" minimal constructor should be
used. In the above mentioned problem always default minimal
constructor was used which is incorrect. This minimal
constructor is replaced with "Log_event(thd,...)".
sql/log_event.h:
Replaced the default constructor with another constructor
which takes "THD" object as an argument.
The bug could caused a crash when the server executed a query with
ORDER by and sort_buffer_size was set to a small enough number.
It happened because the small sort buffer did not allow to allocate
all merge buffers in it.
Made sure that the allocated sort buffer would be big enough
to contain all possible merge buffers.
client/mysqldump.c:
Slave needs to be initialized with 0
dbug/dbug.c:
Removed not existing function
plugin/semisync/semisync_master.cc:
Fixed compiler warning
sql/opt_range.cc:
thd needs to be set early as it's used in some error conditions.
sql/sql_table.cc:
Changed to use uchar* to make array indexing portable
storage/innobase/handler/ha_innodb.cc:
Removed not used variable
storage/maria/ma_delete.c:
Fixed compiler warning
storage/maria/ma_write.c:
Fixed compiler warning
fix add_identifier() to distinguish between temporary tables (#sql- prefix) and temporary partitions (#TMP# suffix).
change add_identifier() to use the same name variant constants as sql_partition.cc does.
When resolving outer fields, Item_field::fix_outer_fields()
creates new Item_refs for each execution of a prepared statement, so
these must be allocated in the runtime memroot. The memroot switching
before resolving JOIN::having causes these to be allocated in the
statement root, leaking memory for each PS execution.
sql/item_subselect.cc:
addon, fix for 11829691, item could be created in
runtime memroot, so we need to use real_item instead.
ROWS THAT ARE EXPECTED
For non range/list partitioned tables (i.e. HASH/KEY):
When prune_partitions finds a multi-range list
(or in this test '<>') for a field of the partition index,
even if it cannot make any use of the multi-range,
it will continue with the next field of the partition index
and use that for pruning (even if it the previous
field could not be used). This results in partitions is
pruned away, leaving partitions that only matches
the last field in the partition index, and will exclude
partitions which might match any previous fields.
Fixed by skipping rest of partitioning key fields/parts
if current key field/part could not be used.
Also notice it is the order of the fields in the CREATE TABLE
statement that triggers this bug, not the order of fields in
primary/unique key or PARTITION BY KEY ().
It must not be the last field in the partitioning expression that
is not equal (or have a non single point range).
I.e. the partitioning index is created with the same field order
as in the CREATE TABLE. And for the bug to appear
the last field must be a single point and some previous field
must be a multi-point range.
SHOW 2012 INSTEAD OF 2011
* Added a new macro to hold the current year :
COPYRIGHT_NOTICE_CURRENT_YEAR
* Modified ORACLE_WELCOME_COPYRIGHT_NOTICE macro
to take the initial year as parameter and pick
current year from the above mentioned macro.
FOREVER MDL LOCK
Analysis:
----------
While granting MDL lock for the lock requests in wait queue,
first the lock is granted to the high priority lock types
and then to the low priority lock types.
MDL Priority Matrix,
+-------------+----+---+---+---+----+-----+
| Locks | | | | | | |
| has Priority| | | | | | |
| over ---> | S | SR| SW| SU| SNW| SNRW|
+-------------+----+---+---+---+----+-----+
| X | + | + | + | + | + | + |
+-------------|----|---|---|---|----|-----|
| SNRW | - | + | + | - | - | - |
+-------------|----|---|---|---|----|-----|
| SNW | - | - | + | - | - | - |
+-------------+----+---+---+---+----+-----+
Here '+' means, Lock priority is higher.
'-' means, Has same priority
In the scenario where,
*. Lock wait queue has requests of type S/SR/SW/SU.
*. And locks of high priority X/SNRW/SNW are requested
continuously.
In this case, while granting lock, always first high priority
lock requests(X/SNRW/SNW) are considered. Low priority
locks(S/SR/SW/SU) will not get chance and they will
wait forever.
In the scenario for which this bug is reported, application
executed many LOCK TABLES ... WRITE statements concurrently.
These statements request SNRW lock. Also there were some
connections trying to execute DML statements requesting SR
lock. Since SNRW lock request has higher priority (and as
they were too many waiting SNRW requests) lock is always
granted to it. So, lock request SR will wait forever, resulting
in DML starvation.
How is this handled in 5.1?
---------------------------
Even in 5.1 we have low priority lock starvation issue.
But, in 5.1 thread locking, system variable
"max_write_lock_count" can be configured to grant
some pending read lock requests. After
"max_write_lock_count" of write lock grants all the low
priority locks are granted.
Why this issue is seen in 5.5/trunk?
---------------------------------
In 5.5/trunk MDL locking, "max_write_lock_count" system
variable exists but not used in MDL, only thread lock uses
it. So no effect of "max_write_lock_count" in MDL locking.
This means that starvation of metadata locks is possible
even if max_write_lock_count is used.
Looks like, customer was using "max_write_lock_count" in
5.1 and when upgraded to 5.5, starvation is seen because
of not having effect of "max_write_lock_count" in MDL.
Fix:
----------
As a fix, support for max_write_lock_count is added to MDL.
To maintain write lock counter per MDL_lock object, new
member "m_hog_lock_count" is added in MDL_lock.
And following logic is added to increment the counter in
function reschedule_waiters,
(reschedule_waiters function is called while thread is
releasing the lock)
- After granting lock request from the wait queue.
- Check if there are any S/SR/SU/SW exists in the wait queue
- If yes then increment the "m_hog_lock_count"
And following logic is added in the same function to
handle pending S/SU/SR/SW locks
- Before granting locks
- Check if max_write_lock_count <= m_hog_lock_count
- If Yes, then try to grant S/SR/SW/SU locks.
(Since all of these has same priority, all locks are
granted together. But some lock grant may fail because
of grant incompatibility)
- Reset m_hog_lock_count if there no low priority lock
requests in wait queue.
- return
Note:
--------------------------
In the lock priority matrix explained above,
though X has priority over the SNW and SNRW. X locks is
taken mostly for RENAME, TRUNCATE, CREATE ... operations.
So lock type X may not be requested in loop continuously
in real world applications, as compared to other lock
request types. So, lock request of type SNW and SNRW are
not starved. So, we can grant all S/SR/SU/SW in one shot,
without considering SNW & SNRW lock request starvation.
ALTER table operations take SU lock first and then
upgrade to SNW if required. All S, SR, SW, SU have same
lock priority. So while granting SU, request of types
SR, SW, S are also granted in one shot. So, lock request
of type SU->SNW in loop will not make other low priority
lock request to starve.
But, when there is request for lock of type SNRW, lock
requests of lower priority types are not granted. And if
SNRW is requested in loop continuously then all
S, SR, SW, SU are starved.
This patch addresses the latter scenario.
When we have S/SR/SW/SU in wait queue and if
there are
- Continuous SNRW lock requests
- OR one or more X and Continuous SNRW lock requests.
- OR one SNW and Continuous SNRW lock requests.
- OR one SNW, one or more X and continuous SNRW lock
requests.
in wait queue then, S/SR/SW/SU lock request are starved.
Backport the fix from 5.6 to 5.1
Base bug number : 11765562
sql/item_strfunc.cc:
In Item_func_export_set::val_str, verify that the size of the end
result is within reasonable bounds.
- When JOIN::cleanup(full==TRUE) is called, the select can be in two states:
= Right after the create_sort_index() call, when join->join_tab[0] is used to
read data produced by filesort().
= After create_sort_index(), and after JOIN::reinit() calls, when
join->join_tab[0] has been reset to read the original data.
- We didn't handle the second case correctly, which resulted in an attempt to free
the same SQL_SELECT two times. The fix is to make sure we don't double-free.
mysql_rm_table_no_locks() function was modified.
When we construct log record for the DROP TABLE, now we
look if there's a comment before the first table name and
add it to the record if so.
per-file comments:
sql/sql_table.cc
MDEV-340 Save replication comments for DROP TABLE.
comment_length() function implemented to find comments in the query,
call it in mysql_rm_table_no_locks() and use the result to form log record.
mysql-test/suite/binlog/r/binlog_drop_if_exists.result
MDEV-340 Save replication comments for DROP TABLE.
test result updated.
mysql-test/suite/binlog/t/binlog_drop_if_exists.test
MDEV-340 Save replication comments for DROP TABLE.
test case added.
- Moved the definitions of the classes to store data from persistent
statistical tables into statistics.h, leaving in other internal
data structures only references to the corresponding objects.
- Defined class Column_statistics_collected derived from the class
Column_statistics. This is a helper class to collect statistics
on columns.
- Moved references to read statistics to TABLE SHARE, leaving the
the reference to the collected statistics in TABLE.
- Added a new clone method for the class Field allowing to clone
fields attached to table shares. It was was used to create
fields for min/max values in the memory of the table share.
A lso:
- Added procedures to allocate memory for statistical data in
the table share memory and in table memory.
Also:
- Added a test case demonstrating how ANALYZE could work in parallel
to collect statistics on different indexes of the same table.
- Added a test two demonstrate how two connections working
simultaneously could allocate memory for statistical data in the
table share memory.
IS PLACE HOLDER AND USE SERVER-SIDE
Analysis:
LIMIT always takes nonnegative integer constant values.
http://dev.mysql.com/doc/refman/5.6/en/select.html
So parsing of value '5' for LIMIT in SELECT fails.
But, within prepared statement, LIMIT parameters can be
specified using '?' markers. Value for the parameter can
be supplied while executing the prepared statement.
Passing string values, float or double value for LIMIT
works well from CLI. Because, while setting the value
for the parameters from the variable list (added using
SET), if the value is for parameter LIMIT then its
converted to integer value.
But, when prepared statement is executed from the other
interfaces as J connectors, or C applications etc.
The value for the parameters are sent to the server
with execute command. Each item in log has value and
the data TYPE. So, While setting parameter value
from this log, value is set to all the parameters
with the same data type as passed.
But here logic to convert value to integer type
if its for LIMIT parameter is missing.
Because of this,string '5' is set to LIMIT.
And the same is logged into the binlog file too.
Fix:
When executing prepared statement having parameter for
CLI it worked fine, as the value set for the parameter
is converted to integer. And this failed in other
interfaces as J connector,C Applications etc as this
conversion is missing.
So, as a fix added check while setting value for the
parameters. If the parameter is for LIMIT value then
its converted to integer value.
PROBLEM:
mysql provides a feature where in a session which is
idle for a period specified by the wait_timeout variable
(whose value is in seconds), the session is closed
This feature is not present when we use thread pool.
FIX:
This patch implements the interface functions which is
required to implement the wait_timeout functionality
in the thread pool plugin.
'MAX_BINLOG_CACHE_SIZE' ERROR
Problem:
=======
MySQL returns following error in win64.
"ERROR 1197 (HY000): Multi-statement transaction required more than
'max_binlog_cache_size' bytes of storage; increase this mysqld variable
and try again" when user tries to load >4G file even if
max_binlog_cache_size set to maximum value. On Linux everything
works fine.
Analysis:
========
The `max_binlog_cache_size' variable is of type `ulonglong'. This
value is set to `ULONGLONG_MAX' at the time of server start up. The
above value is stored in an intermediate variable named
`saved_max_binlog_cache_size' which is of type `ulong'. In visual
c++ complier the `ulong' type is of 4bytes in size and hence the value
is getting truncated to '4GB' and the cache is not able to grow beyond
4GB size. The same limitation is observed with
"max_binlog_stmt_cache_size" as well. Similar fix has been applied.
Fix:
===
As part of fix the type "ulong" is replaced with "my_off_t" which is of
type "ulonglong".
mysys/mf_iocache.c:
Added debug statement to simulate a scenario where the cache
file's current position is set to >4GB
sql/log.cc:
Replaced the type of `saved_max_binlog_cache_size' from "ulong" to
"my_off_t", which is a type def for "ulonglong".
- Correct the way SHOW EXPLAIN code calculates 'need_order' parameter
for JOIN::print_explain().
The calculation is still an approximation (see MDEV entry for details) (even EXPLAIN itself
is wrong in certain cases), but now it's a better approximation than before.
"ORDER BY" AND "LIMIT BY" CLAUSE
PROBLEM:
When a 'limit' clause is specified in a query along with
group by and order by, optimizer chooses wrong index
there by examining more number of rows than required.
However without the 'limit' clause, optimizer chooses
the right index.
ANALYSIS:
With respect to the query specified, range optimizer chooses
the first index as there is a range present ( on 'a'). Optimizer
then checks for an index which would give records in sorted
order for the 'group by' clause.
While checking chooses the second index (on 'c,b,a') based on
the 'limit' specified and the selectivity of
'quick_condition_rows' (number of rows present in the range)
in 'test_if_skip_sort_order' function.
But, it fails to consider that an order by clause on a
different column will result in scanning the entire index and
hence the estimated number of rows calculated above are
wrong (which results in choosing the second index).
FIX:
Do not enforce the 'limit' clause in the call to
'test_if_skip_sort_order' if we are creating a temporary
table. Creation of temporary table indicates that there would be
more post-processing and hence will need all the rows.
This fix is backported from 5.6. This problem is fixed in 5.6 as
part of changes for work log #5558
mysql-test/r/subselect.result:
Changes for Bug#11762052 results in the correct number of rows.
sql/sql_select.cc:
Do not pass the actual 'limit' value if 'need_tmp' is true.
Now partition engine adds underlying tables to the QC and ask underlying tables engine permittion to cache the query and return result of the query.
Incorrect QC cleanup in case of table registration failure fixe.
Unified interface for myisammrg & partitioned engnes for QC.
RBR AND RC
Description: When scanning and locking rows with < or <=, InnoDB locks
the next row even though row based binary logging and read committed
is used.
Solution: In the handler, when the row is identified to fall outside
of the range (as specified in the query predicates), then request the
storage engine to unlock the row (if possible). This is done in
handler::read_range_first() and handler::read_range_next().
COUNT DISTINCT GROUP BY
PROBLEM:
To calculate the final result of the count(distinct(select 1))
we call 'end_send' function instead of 'end_send_group'.
'end_send' cannot be called if we have aggregate functions
that need to be evaluated.
ANALYSIS:
While evaluating for a possible loose_index_scan option for
the query, the variable 'is_agg_distinct' is set to 'false'
as the item in the distinct clause is not a field. But, we
choose loose_index_scan by not taking this into
consideration.
So, while setting the final 'select_function' to evaluate
the result, 'precomputed_group_by' is set to TRUE as in
this case loose_index_scan is chosen and we do not have
agg_distinct in the query (which is clearly wrong as we
have one).
As a result, 'end_send' function is chosen as the final
select_function instead of 'end_send_group'. The difference
between the two being, 'end_send_group' evaluates the
aggregates while 'end_send' does not. Hence the wrong result.
FIX:
The variable 'is_agg_distinct' always represents if
'loose_idnex_scan' can be chosen for aggregate_distinct
functions present in the select.
So, we check for this variable to continue with
loose_index_scan option.
sql/opt_range.cc:
Do not continue if is_agg_distinct is not set in case
of agg_distinct functions.
Now when a table is dropped the statistics on the table is removed
from the statistical tables. If the table is altered in such a way
that a column is dropped or the type of the column is changed then
statistics on the column is removed from the table column_stat.
It also triggers removal of the statistics on the indexes who use
this column as its component.
Added procedures that changes the names of the tables or columns
in the statistical tables for.
These procedures are used when tables/columns are renamed.
Also partly re-factored the code that introduced the persistent
statistical tables.
Added test cases into statistics.test to cover the new code.
primary key with innodb tables
The bug was triggered if a single ALTER TABLE statement both
added and dropped indexes and ALTER TABLE failed during drop
(e.g. because the index was needed in a foreign key constraint).
In such cases, the server index information would get out of
sync with InnoDB - the added index would be present inside
InnoDB, but not in the server. This could then lead to InnoDB
error messages and/or server crashes.
The root cause is that new indexes are added before old indexes
are dropped. This means that if ALTER TABLE fails while dropping
indexes, index changes will be reverted in the server but not
inside InnoDB.
This patch fixes the problem by dropping any added indexes
if drop fails (for ALTER TABLE statements that both adds
and drops indexes).
However, this won't work if we added a primary key as this
key might not be possible to drop inside InnoDB. Therefore,
we resort to the copy algorithm if a primary key is added
by an ALTER TABLE statement that also drops an index.
In 5.6 this bug is more properly fixed by the handler interface
changes done in the scope of WL#5534 "Online ALTER".
KEY UPDATES WITH A LIMIT OF 1
Problem: The unsafety warning for statements such as
update...limit1 where pk=1 are thrown when binlog-format
= STATEMENT,despite of the fact that such statements are
actually safe. this leads to filling up of the disk space
with false warnings.
Solution: This is not a complete fix for the problem, but
prevents the disks from getting filled up. This should
therefore be regarded as a workaround. In the future this
should be overriden by server general suppress/filtering
framework. It should also be noted that another worklog is
supposed to defeat this case's artificial unsafety.
We use a warning suppression mechanism to detect warning flood,
enable the suppression, and disable this when the average
warnings/second has reduced to acceptable limits.
Activation: The supression for LIMIT unsafe statements are
activated when the last 50 warnings were logged in less
than 50 seconds.
Supression: Once activated this supression will prevent the
individual warnings to be logged in the error log, but print
the warning for every 50 warnings with the note:
"The last warning was repeated N times in last S seconds"
Noteworthy is the fact that this supression works only on the
error logs and the warnings seen by the clients will remain as
it is (i.e. one warning/ unsafe statement)
Deactivation: The supression will be deactivated once the
average # of warnings/sec have gone down to the acceptable limits.
sql/sql_class.cc:
Added code to supress warning while logging them to error-log.
Problem:
=======
The return value from my_b_write is ignored by: `my_b_write_quoted',
`my_b_write_bit',`Query_log_event::print_query_header'
Most callers of `my_b_printf' ignore the return value. `log_event.cc'
has many calls to it.
Analysis:
========
`my_b_write' is used to write data into a file. If the write fails it
sets appropriate error number and error message through my_error()
function call and sets the IO_CACHE::error == -1.
`my_b_printf' function is also used to write data into a file, it
internally invokes my_b_write to do the write operation. Upon
success it returns number of characters written to file and on error
it returns -1 and sets the error through my_error() and also sets
IO_CACHE::error == -1. Most of the event specific print functions
for example `Create_file_log_event::print', `Execute_load_log_event::print'
etc are the ones which make several calls to the above two functions and
they do not check for the return value after the 'print' call. All the above
mentioned abuse cases deal with the client side.
Fix:
===
As part of bug fix a check for IO_CACHE::error == -1 has been added at
a very high level after the call to the 'print' function. There are
few more places where the return value of "my_b_write" is ignored
those are mentioned below.
+++ mysys/mf_iocache2.c 2012-06-04 07:03:15 +0000
@@ -430,7 +430,8 @@
memset(buffz, '0', minimum_width - length2);
else
memset(buffz, ' ', minimum_width - length2);
- my_b_write(info, buffz, minimum_width - length2);
+++ sql/log.cc 2012-06-08 09:04:46 +0000
@@ -2388,7 +2388,12 @@
{
end= strxmov(buff, "# administrator command: ", NullS);
buff_len= (ulong) (end - buff);
- my_b_write(&log_file, (uchar*) buff, buff_len);
At these places appropriate return value handlers have been added.
client/mysqlbinlog.cc:
check for IO_CACHE::error == -1 has been added after the call to
the event specific print functions
mysys/mf_iocache2.c:
Added handler to check the written value of `my_b_write'
sql/log.cc:
Added handler to check the written value of `my_b_write'
sql/log_event.cc:
Added error simulation statements in `Create_file_log_event::print`
and `Execute_load_query_log_event::print'
sql/rpl_utility.h:
Removed the extra ';'
Fixes for BUG11761686 left a flaw that managed to slip away from testing.
Only effective filtering branch was actually tested with a regression test
added to rpl_filter_tables_not_exist.
The reason of the failure is destuction of too early mem-root-allocated memory
at the end of the deferred User-var's do_apply_event().
Fixed with bypassing free_root() in the deferred execution branch.
Deallocation of created in do_apply_event() items is done by the base code
through THD::cleanup_after_query() -> free_items() that the parent Query
can't miss.
sql/log_event.cc:
Do not call free_root() in case the deferred User-var event.
Necessary methods to the User-var class are added, do_apply_event() refined.
sql/log_event.h:
Necessary methods to avoid destoying mem-root-based memory at
User-var applying are defined.
- Fix the year in Monty Program Ab copyrights in the new files.
- Fix permissions handling so that SHOW EXPLAIN's handling is the
same as SHOW PROCESSLIST's.
Print the warning(note):
YEAR(x) is deprecated and will be removed in a future release. Please use YEAR(4) instead
on "CREATE TABLE ... YEAR(x)" or "ALTER TABLE MODIFY ... YEAR(x)", where x != 4
Several fixes :
* sql-common/client.c
Added a validity check of the fields metadata packet sent
by the server.
Now libmysql will check if the length of the data sent by
the server matches what's expected by the protocol before
using the data.
* client/mysqltest.cc
Fixed the error handling code in mysqltest to avoid sending
new commands when the reading the result set failed (and
there are unread data in the pipe).
* sql_common.h + libmysql/libmysql.c + sql-common/client.c
unpack_fields() now generates a proper error when it fails.
Added a new argument to this function to support the error
generation.
* sql/protocol.cc
Added a debug trigger to cause the server to send a NULL
insted of the packet expected by the client for testing
purposes.
Introduction of cost based decision on filesort vs index for UPDATE
statements changed detection of the fact that the index used to scan the
table is being updated. The new design missed the case of index merge
when there is no single index to check. That was worked until a recent
change in InnoDB after which it went into infinite recursion if update of
the used index wasn't properly detected.
The fix consists of 'used key being updated' detection code from 5.1.
sql/sql_update.cc:
Bug#14248833: UPDATE ON INNODB TABLE ENTERS RECURSION
The check for used key being updated is extended to cover the case when
index merge is used.
- Add Monty Program Ab copyright in new files
- Change Apc_target::make_apc_call() to accept a C++-style
functor (instead of C-style function + parameter)
TABLE_LIST::check_single_table made aware about fact that now if table attached to a merged view it can be (unopened) temporary table
(in 5.2 it was always leaf table or non (in case of several tables)).
a multiple definition of 'THD::clear_error()' in (at least)
libmysqld.a(lib_sql.o) and libmysqld.a(libfederated_a-ha_federated.o).
Patch provided by Ramil Kalimullin.
mysql.column_stat, mysql.table_stat for the type DECIMAL(12,4).
When cached the values from these columns are multiplied by factor 10^5
and stored as ulong numbers now.
Keep track of how many pending XIDs (transactions that are prepared in
storage engine and written into binlog, but not yet durably committed
on disk in the engine) there are in each binlog.
When the count of one binlog drops to zero, write a new binlog checkpoint
event, telling which is the oldest binlog with pending XIDs.
When doing XA recovery after a crash, check the last binlog checkpoint
event, and scan all binlog files from that point onwards for XIDs that
must be committed if found in prepared state inside engine.
Remove the code in binlog rotation that waits for all prepared XIDs to
be committed before writing a new binlog file (this is no longer necessary
when recovery can scan multiple binlog files).
Add function to replace arbitrary event with dummy event.
Add code which uses this to fix the bug that enabling row_annotate events
on the master breaks slaves which do not request such events.
Add that slaves set a variable @mariadb_slave_capability to inform the
master in a robust way about which events it can, and cannot, handle.
Add tests.
the new file is fully synced to disk and binlog index. This fixes a window
where a crash would leave next server restart unable to detect that a crash
occured, causing recovery to fail.
We set correct cmp_context during preparation to avoid changing it later by Item_field::equal_fields_propagator.
(see mysql bugs #57135#57692 during merging)
MySQL introduced a class Deferred_log_events. This class keeps a pointer
last_added. The code was keeping this pointer around even after the memory
pointed to was freed, and later comparing the bogus pointer against other
allocated memory. This is illegal, and can randomly produce false equal
comparisons depending on whatever the malloc() subsystem decides to return.
Virtual columns of ENUM and SET data types were not supported properly
in the original patch that introduced virtual columns into MariaDB 5.2.
The problem was that for any virtual column the patch used the
interval_id field of the definition of the column in the frm file as
a reference to the virtual column expression.
The fix stores the optional interval_id of the virtual column in the
extended header of the virtual column expression.
Problem: Some queries with subqueries and a HAVING clause that
consists only of a column not in the select or grouping lists causes
the server to crash.
During parsing, an Item_ref is constructed for the HAVING column. The
name of the column is resolved when JOIN::prepare calls fix_fields()
on its having clause. Since the column is not mentioned in the select
or grouping lists, a ref pointer is not found and a new Item_field is
created instead. The Item_ref is replaced by the Item_field in the
tree of HAVING clauses. Since the tree consists only of this item, the
pointer that is updated is JOIN::having. However,
st_select_lex::having still points to the Item_ref as the root of the
tree of HAVING clauses.
The bug is triggered when doing filesort for create_sort_index(). When
find_all_keys() calls select->cond->walk() it eventually reaches
Item_subselect::walk() where it continues to walk the having clauses
from lex->having. This means that it finds the Item_ref instead of the
new Item_field, and Item_ref::walk() tries to dereference the ref
pointer, which is still null.
The crash is reproducible only in 5.5, but the problem lies latent in
5.1 and trunk as well.
Fix: After calling fix_fields on the having clause in JOIN::prepare(),
set select_lex::having to point to the same item as JOIN::having.
This patch also fixes a bug in 5.1 and 5.5 that is triggered if the
query is executed as a prepared statement. The Item_field is created
in the runtime arena when the query is prepared, and the pointer to
the item is saved by st_select_lex::fix_prepare_information() and
brought back as a dangling pointer when the query is executed, after
the runtime arena has been reclaimed.
Fix: Backport fix from trunk that switches to the permanent arena
before calling Item_ref::fix_fields() in JOIN::prepare().
sql/item.cc:
Set context when creating Item_field.
sql/sql_select.cc:
Switch to permanent arena and update select_lex->having.
- Changed output to be error "error-text" instead of error - error-text
extra/perror.c:
Move my_handler_errors.h into include
include/my_handler_errors.h:
Move my_handler_errors.h into include
mysql-test/r/errors.result:
Updated result
mysql-test/r/innodb_mysql_sync.result:
Updated result
mysql-test/r/myisam-system.result:
Updated result
mysql-test/r/myisampack.result:
Updated result
mysql-test/r/partition_innodb_plugin.result:
Updated result
mysql-test/r/ps_1general.result:
Updated result
mysql-test/r/trigger.result:
Updated result
mysql-test/r/type_bit.result:
Updated result
mysql-test/r/type_bit_innodb.result:
Updated result
mysql-test/r/type_blob.result:
Updated result
mysql-test/suite/archive/archive.result:
Updated result
mysql-test/suite/binlog/r/binlog_index.result:
Updated result
mysql-test/suite/binlog/r/binlog_ioerr.result:
Updated result
mysql-test/suite/csv/csv.result:
Updated result
mysql-test/suite/engines/iuds/r/type_bit_iuds.result:
Updated result
mysql-test/suite/federated/federated_bug_35333.result:
Updated result
mysql-test/suite/innodb/r/innodb-create-options.result:
Updated result
mysql-test/suite/innodb/r/innodb-index.result:
Updated result
mysql-test/suite/innodb/r/innodb-zip.result:
Updated result
mysql-test/suite/innodb/r/innodb.result:
Updated result
mysql-test/suite/innodb/r/innodb_bug13635833.result:
Updated result
mysql-test/suite/innodb/r/innodb_bug21704.result:
Updated result
mysql-test/suite/innodb/r/innodb_bug46000.result:
Updated result
mysql-test/suite/parts/r/partition_bit_innodb.result:
Updated result
mysql-test/suite/parts/r/partition_bit_myisam.result:
Updated result
mysql-test/suite/percona/percona_innodb_fake_changes.result:
Updated result
mysql-test/suite/perfschema/r/misc.result:
Updated result
mysql-test/suite/perfschema/r/privilege.result:
Updated result
mysql-test/suite/rpl/r/rpl_EE_err.result:
Updated result
mysql-test/suite/rpl/r/rpl_binlog_errors.result:
Updated result
mysql-test/suite/rpl/r/rpl_drop_db.result:
Updated result
sql/share/errmsg-utf8.txt:
Removed 'column' from error text that was used in different context
strings/my_vsnprintf.c:
Move my_handler_errors.h into include
Minor cleanups
Changed output of %M to be error "error-text" instead of error - error-text
unittest/mysys/my_vsnprintf-t.c:
Updated error text
make sure that find_date_time_item() is called before agg_arg_charsets_for_comparison().
optimize Item_func_conv_charset to avoid conversion if no string result is needed
On localized Windows versions, Windows uses localized time zone names and contain non-ASCII characters. non-ASCII characters appear broken when displayed by clients
The fix is to declare system_time_zone variable to have UTF8 encoding and to convert tzname to UTF8.
sql/sql_table.cc:
Added comment
storage/maria/ma_close.c:
Don't store history if it's visible to all.
This fixed the MDEV-306 bug
storage/maria/ma_delete_table.c:
Removed old comment
Delete history state for deleted tables
storage/maria/ma_info.c:
More DBUG_PRINT
storage/maria/ma_open.c:
More DBUG_PRINT
Analysis:
The fix for bug lp:985667 implements the method Item_subselect::no_rows_in_result()
for all main kinds of subqueries. The purpose of this method is to be called from
return_zero_rows() and set Items to some default value in the case when a query
returns no rows. Aggregates and subqueries require special treatment in this case.
Every implementation of Item_subselect::no_rows_in_result() called
Item_subselect::make_const() to set the subquery predicate to its default value
irrespective of where the predicate was located in the query. Once the predicate
was set to a constant it was never executed.
At the same time, the JOIN object of the fake select for UNIONs (the one used for
the final result of the UNION), was set after all subqueries in the union were
executed. Since we set the subquery as constant, it was never executed, and the
corresponding JOIN was never created.
In order to decide whether the result of NOT IN is NULL or FALSE, Item_in_optimizer
needs to check if the subquery result was empty or not. This is where we got the
crash, because subselect_union_engine::no_rows() checks for
unit->fake_select_lex->join->send_records, and the join object was NULL.
Solution:
If a subquery is in the HAVING clause it must be evaluated in order to know its
result, so that we can properly filter the result records. Once subqueries in the
HAVING clause are executed even in the case of no result rows, this specific
crash will be solved, because the UNION will be executed, and its JOIN will be
constructed. Therefore the fix for this crash is to narrow the fix for lp:985667,
and to apply Item_subselect::no_rows_in_result() only when the subquery predicate
is in the SELECT clause.
The class Item_func missed an implementation of the virtual
function update_null_value.
Back-ported the fix for bug 62125 from mysql 5.6 code line.
The test case was also back-ported.
Analysis:
Queries with implicit grouping (there is aggregate, but no group by)
follow some non-obvious semantics in the case of empty result set.
Aggregate functions produce some special "natural" value depending on
the function. For instance MIN/MAX return NULL, COUNT returns 0.
The complexity comes from non-aggregate expressions in the select list.
If the non-aggregate expression is a constant, it can be computed, so
we should return its value, however if the expression is non-constant,
and depends on columns from the empty result set, then the only meaningful
value is NULL.
The cause of the wrong result was that for subqueries the optimizer didn't
make a difference between constant and non-constant ones in the case of
empty result for implicit grouping.
Solution:
In all implementations of Item_subselect::no_rows_in_result() check if the
subquery predicate is constant. If it is constant, do not set it to the
default value for implicit grouping, instead let it be evaluated.
INC_HOST_ERRORS() IS CALLED.
Issue : Sequence of calling inc_host_errors()
and reset_host_errors() required some
changes in order to maintain correct
connection error count.
Solution : Call to reset_host_errors() is shifted
to a location after which no calls to
inc_host_errors() are made.
client/mysqldump.c:
Added LINT_INIT
mysql-test/mysql-test-run.pl:
Disable warning if example engine is not found
mysql-test/suite/rpl/t/rpl_semi_sync.test:
Rpl_semi_sync_master_yes_tx may be different on Windows
sql/sql_plugin.cc:
More DBUG_PRINT
support-files/compiler_warnings.supp:
Disable some innobase warnings
unittest/mysys/my_vsnprintf-t.c:
Fixed test failure on Solaris (typo)
CRASHES INNODB | TRX_STATE_NOT_STARTED
The problem was that if DELETE with subselect caused a
deadlock inside InnoDB, this deadlock was not properly
handled by the SQL layer. This meant that the SQL layer
would try to unlock the row after InnoDB had rolled
back the transaction. This caused an assertion inside
InnoDB.
This patch fixes the problem by checking for errors
reported by SQL_SELECT::skip_record() and not calling
unlock_row() if any errors have been reported.
Problem
========
Replication breaks in the cases if the event length exceeds
the size of master Dump thread's max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet, on addition of the
max_event_header length exceeds the max_allowed_packet of the DUMP thread.
This causes the Dump thread to break replication and throw an error.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem is fixed in 2 steps:
1.) The Dump thread limit to read event is increased to the upper limit
i.e. Dump thread reads whatever gets logged in the binary log.
2.) On the slave side we increase the the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option (slave_max_allowed_packet)
included, is used to regulate the max_allowed_packet of the
slave thread (IO/SQL) by the DBA, and facilitates the sending of
large packets from the master to the slave.
This causes the large packets to be received by the slave and apply
it successfully.
sql/log_event.cc:
The max_allowed_packet is not evaluated to the new option
slave_max_allowed_packet after the fix.
sql/log_event.h:
Added the new option in the log_event.h file.
sql/mysqld.cc:
Added a new option to the server.
sql/slave.cc:
Increasing the session max_allowed_packet to a large value,
i.e. not taking global(max_allowed) into consideration, for the slave's threads.
sql/sql_repl.cc:
The dump thread's max_allowed_packet is set to the upper limit
which makes it independent and it now reads whatever gets
logged in the binary log.
The bug prevented acceptance of UNION queries whose non-first select
clauses contained join expressions with degenerated single-table nests
as valid queries.
The bug was introduced into mysql-5.5 code line by the patch for
bug 33204.
Fixed MDEV-331: last_insert_id() returns a signed number
mysql-test/r/auto_increment.result:
Added test case
mysql-test/t/auto_increment.test:
Added test case
sql/item_func.h:
Changed last_insert_id() to be unsigned.
- Item::get_seconds() now skips decimal arithmetic, if decimals is 0. This significantly speeds up from_unixtime() if no fractional part is passed.
- replace sprintfs used to format temporal values by hand-coded formatting
Query1 (original query in the bug report)
BENCHMARK(10000000,DATE_SUB(FROM_UNIXTIME(RAND() * 2147483648), INTERVAL (FLOOR(1 + RAND() * 365)) DAY))
Query2 (Variation of query1 that does not use fractional part in FROM_UNIXTIME parameter)
BENCHMARK(10000000,DATE_SUB(FROM_UNIXTIME(FLOOR(RAND() * 2147483648)), INTERVAL (FLOOR(1 + RAND() * 365)) DAY))
Prior to the patch, the runtimes were (32 bit compilation/AMD machine)
Query1: 41.53 sec
Query2: 23.90 sec
With the patch, the runtimes are
Query1: 32.32 sec (speed up due to removing sprintf)
Query2: 12.06 sec (speed up due to skipping decimal arithmetic)
The --debug-no-sync incorrectly defaulted to ON, disabling sync calls
by default which can loose data or cause corruption. Also, the code
used fsync() instead of the sometimes more efficient fdatasync().
- Make SHOW EXPLAIN code take into account that st_select_lex object without joins can be
a full-featured SELECTs which were already executed and cleaned up.
Analysis:
When the method JOIN::choose_subquery_plan() decided to apply
the IN-TO-EXISTS strategy, it set the unit and select_lex
uncacheable flag to UNCACHEABLE_DEPENDENT_INJECTED unconditionally.
As result, even if IN-TO-EXISTS injected non-correlated predicates,
the subquery was still treated as correlated.
Solution:
Set the subquery as correlated only if the injected predicate(s) depend
on the outer query.
- The problem was that create_ref_for_key() would act differently, depending on
whether we're running EXPLAIN or the actual query.
- As the first step, fixed the EXPLAIN printout not to depend on actions in create_ref_for_key().
backport dmitry.shulga@oracle.com-20120209125742-w7hdxv0103ymb8ko from mysql-trunk:
Patch for bug#11764747 (formerly known as 57612): SET GLOBAL READ_ONLY=1 cannot
progress when a table is locked with LOCK TABLES.
The reason for the bug was that mysql server makes a flush of all open tables
during handling of statement 'SET GLOBAL READ_ONLY=1'. Therefore if some of
these tables were locked by "LOCK TABLE ... READ" from a different connection,
then execution of statement 'SET GLOBAL READ_ONLY=1' would be waiting for
the lock for such table even if the table was locked in a compatible read mode.
Flushing of all open tables before setting of read_only system variable
is inherited from 5.1 implementation since this was the only possible approach
to ensure that there isn't any pending write operations on open tables.
Start from version 5.5 and above such behaviour is guaranteed by the fact
that we acquire global_read_lock before setting read_only flag. Since
acquiring of global_read_lock is successful only when there isn't any
active write operation then we can remove flushing of open tables from
processing of SET GLOBAL READ_ONLY=1.
This modification changes the server behavior so that read locks held
by other connections (LOCK TABLE ... READ) no longer will block attempts
to enable read_only.
Analysis:
The crash is a result of Item_cache_temporal::example not being set
(it is NULL). It turns out that the value of Item_cache_temporal
may be set directly by calling Item_cache_temporal::store_packed
without ever setting the "example" of this Item_cache. Therefore
the failing assertion is too narrow.
Solution:
Remove the assert.
In principle we could overwrite this method for Item_cache_temporal,
but it doesn't make sense just for this assert.
Renamed the system variable optimizer_use_stat_tables to use_stat_tables.
This variable now has only 3 possible values:
'never', 'complementary', 'preferably'.
If the server has been launched with
--use-stat-tables='complementary'|'preferably'
then the statictics tables can be employed by the optimizer and by the
ANALYZE command.
The constructor for Query_log_event allocated 2 bytes too few for
extra space needed by Query cache. (Not sure if this is reproducible
in practice, as there are often a couple of extra bytes allocated
for unused string zero terminators, but better safe than sorry).
CHEAP SQ: Valgrind warnings "Memory lost" with IN and EXISTS nested subquery, materialization+semijoin
Analysis:
The memory leak was a result of the interaction of semi-join optimization
with early optimization of constant subqueries. The function:
setup_jtbm_semi_joins() created a dummy temporary table "dummy_table"
in order to make some JOIN_TAB objects complete. Normally, such temporary
tables are freed inside JOIN_TAB::cleanup.
However, the inner-most subquery is pre-optimized, which allows the
optimization fo the MAX subquery to determine that its WHERE is TRUE,
and thus to compute the result of the MAX during optimization. This
ultimately allows the optimize phase of the outer query to find that
it WHERE clause is FALSE. Once JOIN::optimize finds that the result
set is empty, it sets zero_result_cause, and returns *before* it ever
reached make_join_statistics(). As a result the query plan has no
JOIN_TABs at all. Since the temporary table is supposed to be cleanup
via JOIN_TAB::cleanup, this never happens because there is no JOIN_TAB
for this table. Hence we get a memory leak.
Solution:
Whenever there are no JOIN_TABs, iterate over all table reference in
JOIN::join_list, and free the ones that contain semi-join temporary
tables.
WHEN KILLING
Suppose there is a query waiting for a lock. If the user kills
this query, then "Got error -1 when reading table" error message
must not be logged in the server log file. Since this is a user
requested interruption, no spurious error message must be logged
in the server log. This patch will remove the error message from
the log.
approved by joh and tatjana
Fixed by backport of:
------------------------------------------------------------
revno: 3402.50.156
committer: Jon Olav Hauglid <jon.hauglid@oracle.com>
branch nick: mysql-trunk-test
timestamp: Wed 2012-02-08 14:10:23 +0100
message:
Bug#13417754 ASSERT IN ROW_DROP_DATABASE_FOR_MYSQL DURING DROP SCHEMA
This assert could be triggered if an InnoDB table was being moved
to a different database using ALTER TABLE ... RENAME, while this
database concurrently was being dropped by DROP DATABASE.
The reason for the problem was that no metadata lock was taken
on the target database by ALTER TABLE ... RENAME.
DROP DATABASE was therefore not blocked and could remove
the database while ALTER TABLE ... RENAME was executing. This
could cause the assert in InnoDB to be triggered.
This patch fixes the problem by taking a IX metadata lock on
the target database before ALTER TABLE ... RENAME starts
moving a table to a different database.
Note that this problem did not occur with RENAME TABLE which
already takes the correct metadata locks.
Also note that this patch slightly changes the behavior of
ALTER TABLE ... RENAME. Before, the statement would abort and
return an error if a lock on the target table name could not
be taken immediately. With this patch, ALTER TABLE ... RENAME
will instead block and wait until the lock can be taken
(or until we get a lock timeout). This also means that it is
possible to get ER_LOCK_DEADLOCK errors in this situation
since we allow ALTER TABLE ... RENAME to wait and not just
abort immediately.
- Fixed code that was not ready for a major version number > 9
- Fixed test cases that assumed max major version number could be 9
Updated version number for depricated options (will be removed in a later commit)
VERSION:
Version number 10.0.0
client/mysqlbinlog.cc:
Added support for major version numbers > 9
cmake/mysql_version.cmake:
Added support for version numbers that is 0
mysql-test/r/comments.result:
Modified test to handle version number 100000
mysql-test/r/func_system.result:
Modified test to handle version number 100000
mysql-test/r/log_state.result:
Updated depricated error message
mysql-test/r/sp.result:
Modified test to handle version number 100000
mysql-test/r/subselect4.result:
Updated depricated error message
mysql-test/r/variables.result:
Updated depricated error message
mysql-test/suite/rpl/r/rpl_conditional_comments.result:
Modified test to handle version number 100000
mysql-test/suite/rpl/r/rpl_loaddatalocal.result:
Modified test to handle version number 100000
mysql-test/suite/rpl/t/rpl_conditional_comments.test:
Modified test to handle version number 100000
mysql-test/suite/rpl/t/rpl_loaddatalocal.test:
Modified test to handle version number 100000
mysql-test/suite/sys_vars/r/debug_basic.result:
Updated depricated error message
mysql-test/suite/sys_vars/r/engine_condition_pushdown_basic.result:
Updated depricated error message
mysql-test/suite/sys_vars/r/log_basic.result:
Updated depricated error message
mysql-test/suite/sys_vars/r/log_slow_queries_basic.result:
Updated depricated error message
mysql-test/suite/sys_vars/r/multi_range_count_basic.result:
Updated depricated error message
mysql-test/suite/sys_vars/r/rpl_recovery_rank_basic.result:
Updated depricated error message
mysql-test/suite/sys_vars/r/sql_big_selects_func.result:
Updated depricated error message
mysql-test/suite/sys_vars/r/sql_max_join_size_basic.result:
Updated depricated error message
mysql-test/suite/sys_vars/r/sql_max_join_size_func.result:
Updated depricated error message
mysql-test/t/comments.test:
Modified test to handle version number 100000
mysql-test/t/file_contents.test:
Modified test to handle version number 100000
mysql-test/t/func_system.test:
Modified test to handle version number 100000
mysql-test/t/parser_not_embedded.test:
Modified test to handle version number 100000
mysql-test/t/sp.test:
Modified test to handle version number 100000
sql/mysqld.cc:
Updated version number for depricated options (will be removed in a later commit)
sql/slave.cc:
Modified test to handle version number 100000
Better error messages
sql/sql_lex.cc:
Modified test to handle version number 100000 in comment syntax
sql/sys_vars.cc:
Updated version number for depricated options (will be removed in a later commit)
Analysis:
When a subquery that needs a temp table is executed during
the prepare or optimize phase of the outer query, at the end
of the subquery execution all the JOIN_TABs of the subquery
are replaced by a new JOIN_TAB that selects from the temp table.
However that temp table has no corresponding TABLE_LIST.
Once EXPLAIN execution reaches its last phase, it tries to print
the names of the subquery tables through its TABLE_LISTs, but in
the case of this bug there is no such TABLE_LIST (it is NULL),
hence a crash.
Solution:
The fix is to block subquery evaluation inside
Item_func_like::fix_fields and Item_func_like::select_optimize()
using the Item::is_expensive() test.
Problem
========
Replication breaks in the cases if the event length exceeds
the size of master Dump thread's max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet, on addition of the
max_event_header length exceeds the max_allowed_packet of the DUMP thread.
This causes the Dump thread to break replication and throw an error.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem is fixed in 2 steps:
1.) The Dump thread limit to read event is increased to the upper limit
i.e. Dump thread reads whatever gets logged in the binary log.
2.) On the slave side we increase the the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option (slave_max_allowed_packet)
included, is used to regulate the max_allowed_packet of the
slave thread (IO/SQL) by the DBA, and facilitates the sending of
large packets from the master to the slave.
This causes the large packets to be received by the slave and apply
it successfully.
sql/log_event.cc:
The max_allowed_packet is not evaluated to the new option
slave_max_allowed_packet after the fix.
sql/log_event.h:
Added the new option in the log_event.h file.
sql/mysqld.cc:
Added a new option to the server.
sql/slave.cc:
Increasing the session max_allowed_packet to a large value,
i.e. not taking global(max_allowed) into consideration, for the slave's threads.
sql/sql_repl.cc:
The dump thread's max_allowed_packet is set to the upper limit
which makes it independent and it now reads whatever gets
logged in the binary log.
WHEN KILLING
Suppose there is a query waiting for a lock. If the user kills
this query, then "Got error -1 when reading table" error message
must not be logged in the server log file. Since this is a user
requested interruption, no spurious error message must be logged
in the server log. This patch will remove the error message from
the log.
approved by joh and tatjana
- Adding %M my_sprintf() modifier that prints error number - system-error-text
- Modified mysys, mysql_client and SQL error messages to use %M instead of %d
- Added my_strerror()
Updated handler errors to 5.6 error numbers
Updated text for a few error messages (to match 5.6)
Increased length of command name in error output
extra/comp_err.c:
Added support for %M
include/my_base.h:
Updated handler errors to 5.6 error numbers
include/my_sys.h:
Added my_strerror()
libmysql/errmsg.c:
Updated error messages to use %M
mysql-test/r/errors.result:
Updated result as error message have changed
mysql-test/r/innodb_mysql_sync.result:
Updated result with text for errno
mysql-test/r/myisam-system.result:
Updated result with text for errno
mysql-test/r/myisam.result:
Updated result as error message have changed
mysql-test/r/myisampack.result:
Updated result with text for errno
mysql-test/r/mysql.result:
Updated result with text for errno
mysql-test/r/mysql_upgrade.result:
Updated result with text for errno
mysql-test/r/partition_datatype.result:
Updated result as error message have changed
mysql-test/r/partition_innodb_plugin.result:
Updated result with text for errno
mysql-test/r/ps_1general.result:
Updated result with text for errno
mysql-test/r/trigger.result:
Updated result with text for errno
mysql-test/r/type_bit.result:
Updated result as error message have changed
mysql-test/r/type_bit_innodb.result:
Updated result as error message have changed
mysql-test/r/type_blob.result:
Updated result as error message have changed
mysql-test/suite/archive/archive.result:
Updated result with text for errno
mysql-test/suite/binlog/r/binlog_index.result:
Updated result with text for errno
mysql-test/suite/binlog/r/binlog_ioerr.result:
Updated result with text for errno
mysql-test/suite/csv/csv.result:
Updated result with text for errno
mysql-test/suite/federated/federated_bug_35333.result:
Updated result with text for errno
mysql-test/suite/innodb/r/innodb-create-options.result:
Updated result with text for errno
mysql-test/suite/innodb/r/innodb-index.result:
Updated result with text for errno
mysql-test/suite/innodb/r/innodb-zip.result:
Updated result as error message have changed
mysql-test/suite/innodb/r/innodb.result:
Updated result with text for errno
mysql-test/suite/innodb/r/innodb_bug21704.result:
Updated result with text for errno
mysql-test/suite/innodb/r/innodb_bug46000.result:
Updated result with text for errno
mysql-test/suite/innodb/r/innodb_bug53591.result:
Updated result as error message have changed
mysql-test/suite/innodb/r/innodb_corrupt_bit.result:
New error numbers
mysql-test/suite/innodb/r/innodb_prefix_index_liftedlimit.result:
Updated result as error message have changed
mysql-test/suite/innodb/t/innodb-create-options.test:
Added regexp to avoid system error text
mysql-test/suite/innodb/t/innodb-zip.test:
Added regexp to avoid system error text
mysql-test/suite/maria/maria-recovery2.result:
Updated supression rule
mysql-test/suite/maria/maria-recovery2.test:
Updated supression rule
mysql-test/suite/maria/maria.result:
Updated result as error message have changed
mysql-test/suite/parts/r/partition_bit_innodb.result:
Updated result as error message have changed
mysql-test/suite/parts/r/partition_bit_myisam.result:
Updated result as error message have changed
mysql-test/suite/percona/percona_innodb_fake_changes.result:
Updated result with text for errno
mysql-test/suite/perfschema/r/dml_cond_instances.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_events_waits_current.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_events_waits_history.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_events_waits_history_long.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_ews_by_instance.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_ews_by_thread_by_event_name.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_ews_global_by_event_name.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_file_instances.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_file_summary_by_event_name.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_file_summary_by_instance.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_mutex_instances.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_performance_timers.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_rwlock_instances.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/dml_threads.result:
Updated result as error message have changed
mysql-test/suite/perfschema/r/misc.result:
Updated result with text for errno
mysql-test/suite/perfschema/r/privilege.result:
Updated result with text for errno
mysql-test/suite/rpl/r/rpl_EE_err.result:
Updated result with text for errno
mysql-test/suite/rpl/r/rpl_binlog_errors.result:
Updated result with text for errno
mysql-test/suite/rpl/r/rpl_drop_db.result:
Updated result with text for errno
mysys/errors.c:
Updated error messages to use %M
Changed all errors to use Errcode: consistenly
mysys/my_handler_errors.h:
Updated handler errors to 5.6 error numbers
sql/share/errmsg-utf8.txt:
Updated error messages to use %M
sql/sys_vars.cc:
Added error number to ER_EVENT_SET_VAR_ERROR
strings/my_vsnprintf.c:
Added %M my_sprintf() modifier that prints error number - system-error-text
Simplify code
Movied common code to function
Removed some casts that was not necessary when reading integer/unsigned int stored in longlong
Added my_strerror()
unittest/mysys/my_vsnprintf-t.c:
Added testing of %M
Analysis:
The fix for lp:944706 introduces early subquery optimization.
While a subquery is being optimized some of its predicates may be
removed. In the test case, the EXISTS subquery is constant, and is
evaluated to TRUE. As a result the whole OR is TRUE, and thus the
correlated condition "b = alias1.b" is optimized away. The subquery
becomes non-correlated.
The subquery cache is designed to work only for correlated subqueries.
If constant subquery optimization is disallowed, then the constant
subquery is not evaluated, the subquery remains correlated, and its
execution is cached. As a result execution is fast.
However, when the constant subquery was optimized away, it was neither
cached by the subquery cache, nor it was cached by the internal subquery
caching. The latter was due to the fact that the subquery still appeared
as correlated to the subselect_XYZ_engine::exec methods, and they
re-executed the subquery on each call to Item_subselect::exec.
Solution:
The solution is to update the correlated status of the subquery after it has
been optimized. This status consists of:
- st_select_lex::is_correlated
- Item_subselect::is_correlated
- SELECT_LEX::uncacheable
- SELECT_LEX_UNIT::uncacheable
The status is updated by st_select_lex::update_correlated_cache(), and its
caller st_select_lex::optimize_unflattened_subqueries. The solution relies
on the fact that the optimizer already called
st_select_lex::update_used_tables() for each subquery. This allows to
efficiently update the correlated status of each subquery without walking
the whole subquery tree.
Notice that his patch is an improvement over MySQL 5.6 and older, where
subqueries are not pre-optimized, and the above analysis is not possible.
Problem: mysqlbinlog exits without any error code in case of
file write error. It is because of the fact that the calls
to Log_event::print() method does not return a value and the
thus any error were being ignored.
Resolution: We resolve this problem by checking for the
IO_CACHE::error == -1 after every call to Log_event:: print()
and terminating the further execution.
client/mysqlbinlog.cc:
- handled error conditions during event->print() calls
- added check for error in end_io_cache()
mysys/my_write.c:
Added debug code to simulate file write error.
error returned will be ENOSPC=> error no space on the disk
sql/log_event.cc:
Added debug code to simulate file write error, by reducing the size of io cache.
Analysis:
-------------
If server is started with limit of MAX_CONNECTIONS and
MAX_USER_CONNECTIONS then only MAX_USER_CONNECTIONS of any particular
users can be connected to server and total MAX_CONNECTIONS of client can
be connected to server.
Server maintains a counter for total CONNECTIONS and total CONNECTIONS
from particular user.
Here, MAX_CONNECTIONS of connections are created to server. Out of this
MAX_CONNECTIONS, connections from particular user (say USER1) are
also created. The connections from USER1 is lesser than
MAX_USER_CONNECTIONS. After that there was one more connection request from
USER1. Since USER1 can still create connections as he havent reached
MAX_USER_CONNECTIONS, server increments counter of CONNECTIONS per user.
As server already has MAX_CONNECTIONS of connections, next check to total
CONNECTION count fails. In this case control is returned WITHOUT
decrementing the CONNECTIONS per user. So the counter per user CONNECTIONS goes
on incrementing for each attempt until current connections are closed.
And because of this counter per CONNECTIONS reached MAX_USER_CONNECTIONS.
So, next connections form USER1 user always returns with MAX_USER_CONNECTION
limit error, even when total connection to sever are less than MAX_CONNECTIONS.
Fix:
-------------
This issue is occurred because of not handling counters properly in the
server. Changed the code to handle per user connection counters properly.
- In JOIN::exec(), make the having->update_used_tables() call before we've
made the JOIN::cleanup(full=true) call. The latter frees SJ-Materialization
structures, which correlated subquery predicate items attempt to walk afterwards.
- Don't try to produce plans after JOIN::cleanup() has been called, because:
= JOIN::cleanup leaves data structures in partially-cleaned state
= Walking them is hazardous (see this bug), and has funny effects
(See previous commits, "Using join cache" may or may not be shown)
= Changing data structures to be persisted may cause unwanted side effects
- The consequence is that SHOW EXPLAIN will show "Query plan already deleted" when e.g.
reading data after filesort.
Analysis:
The problem in the original MySQL bug is that the range optimizer
performs its analysis in a separate MEM_ROOT object that is freed
after the range optimzier is done. During range analysis get_mm_tree
calls Item_func_like::select_optimize, which in turn evaluates its
right argument. In the test case the right argument is a subquery.
In MySQL, subqueries are optimized lazyly, thus the call to val_str
triggers optimization for the subquery. All objects needed by the
subquery plan end up in the temporary MEM_ROOT used by the range
optimizer. When execution ends, the JOIN::cleanup process tries to
cleanup objects of the subquery plan, but all these objects are gone
with the temporary MEM_ROOT. The solution for MySQL is to switch the
mem_root.
In MariaDB with the patch for bug lp:944706, all constant subqueries
that may be used by the optimization process are preoptimized. Therefore
Item_func_like::select_optimize only triggers subquery execution, and
the above problem is not present.
The patch however adds a test whether the evaluated right argument of
the LIKE predicate is expensive. This is consistent with our approach
not to evaluate expensive expressions during optimization.
IN THE ERROR LOG
Problem:
Using mysqlbinlog with the --read-from-remote-server option as shown below
prints a message in error log for each call. This happens for 5.5 and above
versions
mysqlbinlog -uroot -p --read-from-remote-server --host=localhost test
Message in error log file is given below:
120312 10:27:57 [Note] Start binlog_dump to slave_server(0), pos(test, 4)
The problem is that it can fill up the error log if the command is called
very often.
Analysis:
The below mentioned print function is called from "mysql_binlog_send" function
which causes the "Start binlog_dump..." string to be printed in error log file.
sql_print_information("Start binlog_dump to master_thread_id(%lu)
slave_server(%d)..."
Fix:
A condition has been added in such a way that the 'sql_print_information'
will be invoked only when the "log_warnings" variable is set to >1
otherwise don't call the 'sql_print_information' function.
This is a backport of the (unchaged) fix for MySQL bug #11764372, 57197.
Analysis:
When the outer query finishes its main execution and computes GROUP BY,
it needs to construct a new temporary table (and a corresponding JOIN) to
execute the last DISTINCT operation. At this point JOIN::exec calls
JOIN::join_free, which calls JOIN::cleanup -> TMP_TABLE_PARAM::cleanup
for both the outer and the inner JOINs. The call to the inner
TMP_TABLE_PARAM::cleanup sets copy_field = NULL, but not copy_field_end.
The final execution phase that computes the DISTINCT invokes:
evaluate_join_record -> end_write -> copy_funcs
The last function copies the results of all functions into the temp table.
copy_funcs walks over all functions in join->tmp_table_param.items_to_copy.
In this case items_to_copy contains both assignments to user variables.
The process of copying user variables invokes Item_func_set_user_var::check
which in turn re-evaluates the arguments of the user variable assignment.
This in turn triggers re-evaluation of the subquery, and ultimately
copy_field.
However, the previous call to TMP_TABLE_PARAM::cleanup for the subquery
already set copy_field to NULL but not its copy_field_end. This results
in a null pointer access, and a crash.
Fix:
Set copy_field_end and save_copy_field_end to null when deleting
copy fields in TMP_TABLE_PARAM::cleanup().
Analysis:
The optimizer detects an empty result through constant table optimization.
Then it calls return_zero_rows(), which in turns calls inderctly
Item_maxmin_subselect::no_rows_in_result(). The latter method set "value=0",
however "value" is pointer to Item_cache, and not just an integer value.
All of the Item_[maxmin | singlerow]_subselect::val_XXX methods does:
if (forced_const)
return value->val_real();
which of course crashes when value is a NULL pointer.
Solution:
When the optimizer discovers an empty result set, set
Item_singlerow_subselect::value to a FALSE constant Item instead of NULL.
Handle the 'set read_only=1' in lighter way, than the FLUSH TABLES READ LOCK;
For the transactional engines we don't wait for operations on that tables to finish.
per-file comments:
mysql-test/r/read_only_innodb.result
MDEV-136 Non-blocking "set read_only".
test result updated.
mysql-test/t/read_only_innodb.test
MDEV-136 Non-blocking "set read_only".
test case added.
sql/mysql_priv.h
MDEV-136 Non-blocking "set read_only".
The close_cached_tables_set_readonly() declared.
sql/set_var.cc
MDEV-136 Non-blocking "set read_only".
Call close_cached_tables_set_readonly() for the read_only::set_var.
sql/sql_base.cc
MDEV-136 Non-blocking "set read_only".
Parameters added to the close_cached_tables implementation,
close_cached_tables_set_readonly declared.
Prevent blocking on the transactional tables if the
set_readonly_mode is on.
INNODB_AUTOINC_LOCK_MODE=1 AND USING TRIGGER
When an insert stmt like "insert into t values (1),(2),(3)" is
executed, the autoincrement values assigned to these three rows are
expected to be contiguous. In the given lock mode
(innodb_autoinc_lock_mode=1), the auto inc lock will be released
before the end of the statement. So to make the autoincrement
contiguous for a given statement, we need to reserve the auto inc
values at the beginning of the statement.
Modified the fix based on review comment by Svoj.
This is a followup to the fix for Bug#12340997
get_interval_value() was trying to parse the input string,
looking for leading '-' while skipping whitespace.
The macro my_isspace() does not work for utf32 character set,
since my_charset_utf32_general_ci.ctype == NULL.
Solution: convert input to ASCII before parsing,
and use the character set of the returned ASCII string.
Problem
========
SQL statements close to the size of max_allowed_packet produce binary
log events larger than max_allowed_packet.
The reason why this failure is occuring is because the event length is
more than the total size of the max_allowed_packet + max_event_header
length. Now since the event length exceeds this size master Dump
thread is unable to send the packet on to the slave.
That can happen e.g with row-based replication in Update_rows event.
Fix
====
The problem was fixed by increasing the max_allowed_packet for the
slave's threads (IO/SQL) by increasing it to 1GB.
This is done using the new server option included which is used to
regulate the max_allowed_packet of the slave thread (IO/SQL).
This causes the large packets to be received by the slave and apply
it successfully.
sql/log_event.h:
Added the new option in the log_event.h file.
sql/mysqld.cc:
Added a new option to the server.
sql/slave.cc:
Increasing the session max_allowed_packet to a large value ,
i.e. not taking global(max_allowed) into consideration, for the slave's threads.
Fixed some mtr test problems
dbug/tests.c:
Fixed compiler warnings
mysql-test/r/handlersocket.result:
Fixed that plugin_license is written
mysql-test/suite/innodb/t/innodb_bug60196.test:
Force sorted results as it was sometimes different on windows
mysql-test/suite/rpl/t/rpl_heartbeat_basic.test:
Prolong test as this failed on windows
mysql-test/t/handlersocket.test:
Fixed that plugin_license is written
plugin/handler_socket/handlersocket/handlersocket.cpp:
Use maria_declare_plugin
plugin/handler_socket/handlersocket/mysql_incl.hpp:
Fixed compiler warning
plugin/handler_socket/libhsclient/auto_addrinfo.hpp:
Fixed compiler warning
sql/handler.h:
Fixed typo
sql/sql_plugin.cc:
Fixed bug that caused plugin library name twice in error message
storage/maria/ma_checkpoint.c:
Fixed compiler warning
storage/maria/ma_loghandler.c:
Fixed compiler warning
unittest/mysys/base64-t.c:
Fixed compiler warning
unittest/mysys/bitmap-t.c:
Fixed compiler warning
unittest/mysys/my_malloc-t.c:
Fixed compiler warning
- make make_cond_after_sjm() correctly handle OR clauses where one branch refers to the semi-join table
while the other branch refers to the non-semijoin table.
The cause for this bug is that the method JOIN::get_examined_rows iterates over all
JOIN_TABs of the join assuming they are just a sequence. In the query above, the
innermost subquery is merged into its parent query. When we call
JOIN::get_examined_rows for the second-level subquery, the iteration that
assumes sequential order of join tabs goes outside the join_tab array and calls
the method JOIN_TAB::get_examined_rows on uninitialized memory.
The fix is to iterate over JOIN_TABs in a way that takes into account the nested
semi-join structure of JOIN_TABs. In particular iterate as select_describe.
Problem: After the fix for Bug#12589870, a new field that
stores the length of db name was added in the buffer that
stores the query to be executed. Unlike for the plain user
session, the replication execution did not allocate the
necessary chunk in Query-event constructor. This caused an
invalid read while accessing this field.
Solution: We fix this problem by allocating a necessary chunk
in the buffer created in the Query_log_event::Query_log_event()
and store the length of database name.
sql/log_event.cc:
Added a new field in the buffer created in the
Query_log_event's constructor and store the length
of database name.
PROBLEM:
Threads end-up in deadlock due to locks acquired as described
below,
con1: Run Query on a table.
It is important that this SELECT must back-off while
trying to open the t1 and enter into wait_for_condition().
The SELECT then is blocked trying to lock mysys_var->mutex
which is held by con3. The very significant fact here is
that mysys_var->current_mutex will still point to LOCK_open,
even if LOCK_open is no longer held by con1 at this point.
con2: Try dropping table used in con1 or query some table.
It will hold LOCK_open and be blocked trying to lock
kernel_mutex held by con4.
con3: Try killing the query run by con1.
It will hold THD::LOCK_thd_data belonging to con1 while
trying to lock mysys_var->current_mutex belonging to con1.
But current_mutex will point to LOCK_open which is held
by con2.
con4: Get innodb engine status
It will hold kernel_mutex, trying to lock THD::LOCK_thd_data
belonging to con1 which is held by con3.
So while technically only con2, con3 and con4 participate in the
deadlock, con1's mysys_var->current_mutex pointing to LOCK_open
is a vital component of the deadlock.
CYCLE = (THD::LOCK_thd_data -> LOCK_open ->
kernel_mutex -> THD::LOCK_thd_data)
FIX:
LOCK_thd_data has responsibility of protecting,
1) thd->query, thd->query_length
2) VIO
3) thd->mysys_var (used by KILL statement and shutdown)
4) THD during thread delete.
Among above responsibilities, 1), 2)and (3,4) seems to be three
independent group of responsibility. If there is different LOCK
owning responsibility of (3,4), the above mentioned deadlock cycle
can be avoid. This fix introduces LOCK_thd_kill to handle
responsibility (3,4), which eliminates the deadlock issue.
Note: The problem is not found in 5.5. Introduction MDL subsystem
caused metadata locking responsibility to be moved from TDC/TC to
MDL subsystem. Due to this, responsibility of LOCK_open is reduced.
As the use of LOCK_open is removed in open_table() and
mysql_rm_table() the above mentioned CYCLE does not form.
Revision ID for changes,
open_table() = dlenev@mysql.com-20100727133458-m3ua9oslnx8fbbvz
mysql_rm_table() = jon.hauglid@oracle.com-20101116100012-kxep9txz2fxy3nmw
The patch enables back constant subquery execution during
query optimization after it was disabled during the development
of MWL#89 (cost-based choice of IN-TO-EXISTS vs MATERIALIZATION).
The main idea is that constant subqueries are allowed to be executed
during optimization if their execution is not expensive.
The approach is as follows:
- Constant subqueries are recursively optimized in the beginning of
JOIN::optimize of the outer query. This is done by the new method
JOIN::optimize_constant_subqueries(). This is done so that the cost
of executing these queries can be estimated.
- Optimization of the outer query proceeds normally. During this phase
the optimizer may request execution of non-expensive constant subqueries.
Each place where the optimizer may potentially execute an expensive
expression is guarded with the predicate Item::is_expensive().
- The implementation of Item_subselect::is_expensive has been extended
to use the number of examined rows (estimated by the optimizer) as a
way to determine whether the subquery is expensive or not.
- The new system variable "expensive_subquery_limit" controls how many
examined rows are considered to be not expensive. The default is 100.
In addition, multiple changes were needed to make this solution work
in the light of the changes made by MWL#89. These changes were needed
to fix various crashes and wrong results, and legacy bugs discovered
during development.
The optimizer chose a less efficient execution plan due to the following
defects of the code:
1. the generic handler function handler::keyread_time did not take into account
that in clustered primary keys record data is included into each index entry
2. the function make_join_readinfo erroneously decided that index only scan
could not be used if join cache was empoyed.
Added no additional test case.
Adjusted some of the test results.
Changed HA_EXTRA_NORMAL to HA_EXTRA_NOT_USED (more clean)
mysql-test/suite/maria/lock.result:
More extensive tests of LOCK TABLE with FLUSH and REPAIR
mysql-test/suite/maria/lock.test:
More extensive tests of LOCK TABLE with FLUSH and REPAIR
sql/sql_admin.cc:
Fix that REPAIR TABLE ... USE_FRM works with LOCK TABLES
sql/sql_base.cc:
Ensure that transactions are closed in ARIA when doing flush
HA_EXTRA_NORMAL -> HA_EXTRA_NOT_USED
Don't call extra many times for a table in close_all_tables_for_name()
Added test if table_list->table as this can happen in error situations
sql/sql_partition.cc:
HA_EXTRA_NORMAL -> HA_EXTRA_NOT_USED
sql/sql_reload.cc:
Fixed comment
sql/sql_table.cc:
HA_EXTRA_NORMAL -> HA_EXTRA_NOT_USED
sql/sql_trigger.cc:
HA_EXTRA_NORMAL -> HA_EXTRA_NOT_USED
sql/sql_truncate.cc:
HA_EXTRA_FORCE_REOPEN -> HA_EXTRA_PREPARE_FOR_DROP for truncate, as this speeds up truncate by not having to flush the cache to disk.
mysql-test/suite/maria/maria-partitioning.result:
New test case
mysql-test/suite/maria/maria-partitioning.test:
New test case
sql/sql_base.cc:
Ignore HA_EXTRA_NORMAL for wait_while_table_is_used()
More DBUG
sql/sql_partition.cc:
Don't use HA_EXTRA_FORCE_REOPEN for wait_while_table_is_used() as the table is opened multiple times (in prep_alter_part_table)
This fixes the assert in Aria where we check if table is opened multiple times if HA_EXTRA_FORCE_REOPEN is issued
- 5.5 was missing calls to ha_extra(HA_PREPARE_FOR_DROP | HA_PREPARE_FOR_RENAME); Lost in merge 5.3 -> 5.5
sql/sql_admin.cc:
Updated arguments for close_all_tables_for_name
sql/sql_base.h:
Updated arguments for close_all_tables_for_name
sql/sql_partition.cc:
Updated arguments for close_all_tables_for_name
sql/sql_table.cc:
Updated arguments for close_all_tables_for_name
Removed test of kill, as we have already called 'ha_extra(HA_PREPARE_FOR_DROP)' and the table may be inconsistent.
sql/sql_trigger.cc:
Updated arguments for close_all_tables_for_name
sql/sql_truncate.cc:
For truncate that is done with drop + recreate, signal that the table will be dropped.
- It turns out, there is a case where the join is degenerate, but
join->table_count!= && join->tables_list!=NULL. Need to also check
if join->zero_result_cause!=NULL, too.
- There is a slight problem: The code sets
zero_result_cause= "no matching row in const table"
when NOT running EXPLAIN. The result is that SHOW EXPLAIN will show this line while
regular EXPLAIN will not.
INNODB_AUTOINC_LOCK_MODE=1 AND USING TRIGGER
When an insert stmt like "insert into t values (1),(2),(3)" is
executed, the autoincrement values assigned to these three rows are
expected to be contiguous. In the given lock mode
(innodb_autoinc_lock_mode=1), the auto inc lock will be released
before the end of the statement. So to make the autoincrement
contiguous for a given statement, we need to reserve the auto inc
values at the beginning of the statement.
rb://1074 approved by Alexander Nozdrin
There can be cases when the optimizer calls ha_partition::records_in_range
when there are no matching partitions. So the DBUG_ASSERT of
!tot_used_partitions does assert.
Fixed by returning 0 instead when no matching partitions are found.
This will avoid the crash. records_in_range will then try to find the
biggest used partition, which will not find any partition and
records_in_range will then return 0, meaning non rows can be found.
Patch contributed by Davi Arnaut at twitter.
Issue/Cause:
Issue is of memory corruption.During optimization phase, pattern to be matched in where
clause, is prepared. This is done in Item_func_concat::val_str() function which forms the
resultant string (tmp_value) and return its pointer. In caller, Item_func_like::fix_fields,
pattern is made to point to this string (tmp_value). In further processing, tmp_value is
getting modified which causes pattern to have changed/wrong values.
Fix:
Allocate its own memroy location in caller, copy value of resultant string (tmp_value)
into that and make pattern to point to that. This makes sure no further changes to
tmp_value will affect pattern.
RESULTS ON IN() & NOT IN() COMP #3
This bug causes a wrong result in mysql-trunk when ICP is used
and bad performance in mysql-5.5 and mysql-trunk.
Using the query from bug report to explain what happens and causes
the wrong result from the query when ICP is enabled:
1. The t3 table contains four records. The outer query will read
these and for each of these it will execute the subquery.
2. Before the first execution of the subquery it will be optimized. In
this case the important is what happens to the first table t1:
-make_join_select() will call the range optimizer which decides
that t1 should be accessed using a range scan on the k1 index
It creates a QUICK_RANGE_SELECT object for this.
-As the last part of optimization the ICP code pushes the
condition down to the storage engine for table t1 on the k1 index.
This produces the following information in the explain for this table:
2 DEPENDENT SUBQUERY t1 range k1 k1 5 NULL 3 Using index condition; Using filesort
Note the use of filesort.
3. The first execution of the subquery does (among other things) due
to the need for sorting:
a. Call create_sort_index() which again will call find_all_keys():
b. find_all_keys() will read the required keys for all qualifying
rows from the storage engine. To do this it checks if it has a
quick-select for the table. It will use the quick-select for
reading records. In this case it will read four records from the
storage engine (based on the range criteria). The storage engine
will evaluate the pushed index condition for each record.
c. At the end of create_sort_index() there is code that cleans up a
lot of stuff on the join tab. One of the things that is cleaned
is the select object. The result of this is that the
quick-select object created in make_join_select is deleted.
4. The second execution of the subquery does the same as the first but
the result is different:
a. Call create_sort_index() which again will call find_all_keys()
(same as for the first execution)
b. find_all_keys() will read the keys from the storage engine. To
do this it checks if it has a quick-select for the table. Now
there is NO quick-select object(!) (since it was deleted in
step 3c). So find_all_keys defaults to read the table using a
table scan instead. So instead of reading the four relevant records
in the range it reads the entire table (6 records). It then
evaluates the table's condition (and here it goes wrong). Since
the entire condition has been pushed down to the storage engine
using ICP all 6 records qualify. (Note that the storage engine
will not evaluate the pushed index condition in this case since
it was pushed for the k1 index and now we do a table scan
without any index being used).
The result is that here we return six qualifying key values
instead of four due to not evaluating the table's condition.
c. As above.
5. The two last execution of the subquery will also produce wrong results
for the same reason.
Summary: The problem occurs due to all but the first executions of the
subquery is done as a table scan without evaluating the table's
condition (which is pushed to the storage engine on a different
index). This is caused by the create_sort_index() function deleting
the quick-select object that should have been used for executing the
subquery as a range scan.
Note that this bug in addition to causing wrong results also can
result in bad performance due to executing the subquery using a table
scan instead of a range scan. This is an issue in MySQL 5.5.
The fix for this problem is to avoid that the Quick-select-object that
the optimizer created is deleted when create_sort_index() is doing
clean-up of the join-tab. This will ensure that the quick-select
object and the corresponding pushed index condition will be available
and used by all following executions of the subquery.
sql/sql_select.cc:
Fix for Bug#12667154: Change how create_sort_index() cleans up the
join_tab's select and quick-select objects in order to avoid that a
quick-select object created outside of create_sort_index() is deleted.
If we did nothing in resolving unique table conflict we should not retry (it leed to infinite loop).
Now we retry (recheck) unique table check only in case if we materialized a table.
- Let fix_semijoin_strategies_for_picked_join_order() set
POSITION::prefix_record_count for POSITION records that it copies from
SJ_MATERIALIZATION_INFO::tables.
(These records do not have prefix_record_count set, because they are optimized
as joins-inside-semijoin-nests, without full advance_sj_state() processing).
The not_null_tables() of Item_func_not_all and Item_in_optimizer was inherited from
Item_func by mistake. It made the optimizer think that subquery
predicates with ALL/ANY/IN were null-rejecting. This could trigger invalid
conversions of outer joins into inner joins.
Make SHOW EXPLAIN work for queries that do "Using temporary" and/or "Using filesort"
- Patch#1: Don't lose "Using temporary/filesort" in the SHOW EXPLAIN output.