ENABLE AUDI PLUGIN WHEN DDL
OPERATION HAPPENING
PROBLEM: While unloading the plugin, state is
not checked before it is to be reaped.
This can lead to simultaneous free of
plugin memory by more than one thread.
Multiple deallocation leads to server
crash. In the present bug two threads
deallocate the alog_log plugin.
SOLUTION: A check is added to ensure that only
one thread is unloading the plugin.
NOTE: No mtr test is added as it requires
multiple threads to access critical
section. debug_sync cannot be used in
the current senario because we dont
have access to thread pointer in
some of the plugin functions. IMHO no
test case in the current time frame.
NUMBERS
If a system variable was declared as deprecated without mention of an
alternative, the message would look funny, e.g. for @@delayed_insert_limit:
Warning 1287 '@@delayed_insert_limit' is deprecated and
will be removed in MySQL .
The message was meant to display the version number, but it's not
possible to give one when declaring a system variable.
The fix does two things:
1) The definition of the message
ER_WARN_DEPRECATED_SYNTAX_NO_REPLACEMENT is changed so that it does
not display a version number. I.e. in English the message now reads:
Warning 1287 The syntax '@@delayed_insert_limit' is deprecated and
will be removed in a future version.
2) The message ER_WARN_DEPRECATED_SYNTAX_WITH_VER is discontinued in
favor of ER_WARN_DEPRECATED_SYNTAX for system variables. This change
was already done in versions 5.6 and above as part of wl#5265. This
part is simply back-ported from the worklog.
FAILED IN CHECK_LOCK_AND_ST
Problem:
--------
lock_tables() is supposed to invoke check_lock_and_start_stmt()
for TABLE_LIST which are directly used by top level statement.
TABLE_LIST->prelocking_placeholder is set only for TABLE_LIST
which are used indirectly by stored programs invoked by top
level statement. Hence check_lock_and_start_stmt() should have
TABLE_LIST->prelocking_placeholder==false always, but it is
observed that this assert fails.
The failure is found during RQG test rqg_signal_resignal.
Analysis:
---------
open_tables() invokes open_and_process_routines() where it
finds all the TABLE_LIST that belong to the routine and
adds it to thd->lex->query_tables. During this process if
the open_and_process_routines() fail for some reason,
we are supposed to chop-off all the TABLE_LIST found during
calls to open_and_process_routines(). But, in practice this
is not happening.
thd->lex->query_tables_own_last is supposed to point to a
node in thd->lex->query_tables, which would be a first
TABLE_LIST used indirectly by stored programs invoked by
top level statement. This is found to be not-set correctly
when we plan to chop-off TABLE_LIST's, when
open_and_process_routines() failed.
close_tables_for_reopen() does chop-off all the TABLE_LIST
added after thd->lex->query_table_own_last. This is invoked
upon error in open_and_process_routines(). This call would
not work as expected as thd->lex->query_tables_own_last
is not set, or is not set to correctly.
Further, when open_tables() restarts the process of finding
TABLE_LIST belonging to stored programs, and as the
thd->lex->query_tables_own_last points to in-correct node,
there is possibility of new iteration setting the
thd->lex->query_tables_own_last past some old nodes that
belong to stored programs, added earlier and not removed.
Later when open_tables() completes, lock_tables() ends up
invoking check_lock_and_start_stmt() for TABLE_LIST which
belong to stored programs, which is not expected behavior
and hence we hit the assert
TABLE_LIST->prelocking_placeholder==false.
Due to above behavior, if a user application tries to
execute a SQL statement which invokes some stored function
and if the lock grant on stored function fails due to a
deadlock, then mysqld crashes.
Fix:
----
open_tables() remembers save_query_tables_last which points
to thd-lex->query_tables_last before calls to
open_and_process_routines(). If there is no known
thd->lex->query_tables_own_last set, we are now setting
thd->lex->query_tables_own_last to save_query_tables_last.
This will make sure that the call to close_tables_for_reopen()
will chop-off the list correctly, in other words we now
remove all the nodes added to thd->lex->query_tables, by
previous calls to open_and_process_routines().
Further, it is found that the problem exists starting
from 5.5, due to a code refactoring effort related to
open_tables(). Hence, the fix will be pushed in 5.5, 5.6
and trunk.
Documentation for class Item_outer_ref was wrong:
(*ref) may point to Item_field as well
(see e.g. Item_outer_ref::fix_fields)
So this casting in get_store_key() was wrong:
(*(Item_ref**)((Item_ref*)keyuse->val)->ref)->ref_type()
Additional patch to remove the part_id -> ref_buffer offset.
The partitioning id and the associate record buffer can
be found without having to calculate it.
By initializing it for each used partition, and then reuse
the key-buffer from the queue, it is not needed to have
such map.
The buffer for the current read row from each partition
(m_ordered_rec_buffer) used for sorted reads was
allocated on open and freed when the ha_partition handler
was closed or destroyed.
For tables with many partitions and big records this could
take up too much valuable memory.
Solution is to only allocate the memory when it is needed
and free it when nolonger needed. I.e. allocate it in
index_init and free it in index_end (and to handle failures
also free it on reset, close etc.)
Also only allocating needed memory, according to
partitioning pruning.
Manually tested that it does not use as much memory and
releases it after queries.
MASTER-MASTER AND USING SET USE
Problem:
=======
In a master-master set-up, a master can show a wrong
'SHOW SLAVE STATUS' output.
Requirements:
- master-master
- log_slave_updates
This is caused when using SET user-variables and then using
it to perform writes. From then on the master that performed
the insert will have a SHOW SLAVE STATUS that is wrong and
it will never get updated until a write happens on the other
master. On"Master A" the "exec_master_log_pos" is not
getting updated.
Analysis:
========
Slave receives a "User_var" event from the master and after
applying the event, when "log_slave_updates" option is
enabled the slave tries to write this applied event into
its own binary log. At the time of writing this event the
slave should use the "originating server-id". But in the
above case the sever always logs the "user var events"
by using its global server-id. Due to this in a
"master-master" replication when the event comes back to the
originating server the "User_var_event" doesn't get skipped.
"User_var_events" are context based events and they always
follow with a query event which marks their end of group.
Due to the above mentioned problem with "User_var_event"
logging the "User_var_event" never gets skipped where as
its corresponding "query_event" gets skipped. Hence the
"User_var" event always waits for the next "query event"
and the "Exec_master_log_position" does not get updated
properly.
Fix:
===
`MYSQL_BIN_LOG::write' function is used to write events
into binary log. Within this function a new object for
"User_var_log_event" is created and this new object is used
to write the "User_var" event in the binlog. "User var"
event is inherited from "Log_event". This "Log_event" has
different overloaded constructors. When a "THD" object
is present "Log_event(thd,...)" constructor should be used
to initialise the objects and in the absence of a valid
"THD" object "Log_event()" minimal constructor should be
used. In the above mentioned problem always default minimal
constructor was used which is incorrect. This minimal
constructor is replaced with "Log_event(thd,...)".
sql/log_event.h:
Replaced the default constructor with another constructor
which takes "THD" object as an argument.
When resolving outer fields, Item_field::fix_outer_fields()
creates new Item_refs for each execution of a prepared statement, so
these must be allocated in the runtime memroot. The memroot switching
before resolving JOIN::having causes these to be allocated in the
statement root, leaking memory for each PS execution.
sql/item_subselect.cc:
addon, fix for 11829691, item could be created in
runtime memroot, so we need to use real_item instead.
ROWS THAT ARE EXPECTED
For non range/list partitioned tables (i.e. HASH/KEY):
When prune_partitions finds a multi-range list
(or in this test '<>') for a field of the partition index,
even if it cannot make any use of the multi-range,
it will continue with the next field of the partition index
and use that for pruning (even if it the previous
field could not be used). This results in partitions is
pruned away, leaving partitions that only matches
the last field in the partition index, and will exclude
partitions which might match any previous fields.
Fixed by skipping rest of partitioning key fields/parts
if current key field/part could not be used.
Also notice it is the order of the fields in the CREATE TABLE
statement that triggers this bug, not the order of fields in
primary/unique key or PARTITION BY KEY ().
It must not be the last field in the partitioning expression that
is not equal (or have a non single point range).
I.e. the partitioning index is created with the same field order
as in the CREATE TABLE. And for the bug to appear
the last field must be a single point and some previous field
must be a multi-point range.
SHOW 2012 INSTEAD OF 2011
* Added a new macro to hold the current year :
COPYRIGHT_NOTICE_CURRENT_YEAR
* Modified ORACLE_WELCOME_COPYRIGHT_NOTICE macro
to take the initial year as parameter and pick
current year from the above mentioned macro.
FOREVER MDL LOCK
Analysis:
----------
While granting MDL lock for the lock requests in wait queue,
first the lock is granted to the high priority lock types
and then to the low priority lock types.
MDL Priority Matrix,
+-------------+----+---+---+---+----+-----+
| Locks | | | | | | |
| has Priority| | | | | | |
| over ---> | S | SR| SW| SU| SNW| SNRW|
+-------------+----+---+---+---+----+-----+
| X | + | + | + | + | + | + |
+-------------|----|---|---|---|----|-----|
| SNRW | - | + | + | - | - | - |
+-------------|----|---|---|---|----|-----|
| SNW | - | - | + | - | - | - |
+-------------+----+---+---+---+----+-----+
Here '+' means, Lock priority is higher.
'-' means, Has same priority
In the scenario where,
*. Lock wait queue has requests of type S/SR/SW/SU.
*. And locks of high priority X/SNRW/SNW are requested
continuously.
In this case, while granting lock, always first high priority
lock requests(X/SNRW/SNW) are considered. Low priority
locks(S/SR/SW/SU) will not get chance and they will
wait forever.
In the scenario for which this bug is reported, application
executed many LOCK TABLES ... WRITE statements concurrently.
These statements request SNRW lock. Also there were some
connections trying to execute DML statements requesting SR
lock. Since SNRW lock request has higher priority (and as
they were too many waiting SNRW requests) lock is always
granted to it. So, lock request SR will wait forever, resulting
in DML starvation.
How is this handled in 5.1?
---------------------------
Even in 5.1 we have low priority lock starvation issue.
But, in 5.1 thread locking, system variable
"max_write_lock_count" can be configured to grant
some pending read lock requests. After
"max_write_lock_count" of write lock grants all the low
priority locks are granted.
Why this issue is seen in 5.5/trunk?
---------------------------------
In 5.5/trunk MDL locking, "max_write_lock_count" system
variable exists but not used in MDL, only thread lock uses
it. So no effect of "max_write_lock_count" in MDL locking.
This means that starvation of metadata locks is possible
even if max_write_lock_count is used.
Looks like, customer was using "max_write_lock_count" in
5.1 and when upgraded to 5.5, starvation is seen because
of not having effect of "max_write_lock_count" in MDL.
Fix:
----------
As a fix, support for max_write_lock_count is added to MDL.
To maintain write lock counter per MDL_lock object, new
member "m_hog_lock_count" is added in MDL_lock.
And following logic is added to increment the counter in
function reschedule_waiters,
(reschedule_waiters function is called while thread is
releasing the lock)
- After granting lock request from the wait queue.
- Check if there are any S/SR/SU/SW exists in the wait queue
- If yes then increment the "m_hog_lock_count"
And following logic is added in the same function to
handle pending S/SU/SR/SW locks
- Before granting locks
- Check if max_write_lock_count <= m_hog_lock_count
- If Yes, then try to grant S/SR/SW/SU locks.
(Since all of these has same priority, all locks are
granted together. But some lock grant may fail because
of grant incompatibility)
- Reset m_hog_lock_count if there no low priority lock
requests in wait queue.
- return
Note:
--------------------------
In the lock priority matrix explained above,
though X has priority over the SNW and SNRW. X locks is
taken mostly for RENAME, TRUNCATE, CREATE ... operations.
So lock type X may not be requested in loop continuously
in real world applications, as compared to other lock
request types. So, lock request of type SNW and SNRW are
not starved. So, we can grant all S/SR/SU/SW in one shot,
without considering SNW & SNRW lock request starvation.
ALTER table operations take SU lock first and then
upgrade to SNW if required. All S, SR, SW, SU have same
lock priority. So while granting SU, request of types
SR, SW, S are also granted in one shot. So, lock request
of type SU->SNW in loop will not make other low priority
lock request to starve.
But, when there is request for lock of type SNRW, lock
requests of lower priority types are not granted. And if
SNRW is requested in loop continuously then all
S, SR, SW, SU are starved.
This patch addresses the latter scenario.
When we have S/SR/SW/SU in wait queue and if
there are
- Continuous SNRW lock requests
- OR one or more X and Continuous SNRW lock requests.
- OR one SNW and Continuous SNRW lock requests.
- OR one SNW, one or more X and continuous SNRW lock
requests.
in wait queue then, S/SR/SW/SU lock request are starved.
Backport the fix from 5.6 to 5.1
Base bug number : 11765562
sql/item_strfunc.cc:
In Item_func_export_set::val_str, verify that the size of the end
result is within reasonable bounds.
IS PLACE HOLDER AND USE SERVER-SIDE
Analysis:
LIMIT always takes nonnegative integer constant values.
http://dev.mysql.com/doc/refman/5.6/en/select.html
So parsing of value '5' for LIMIT in SELECT fails.
But, within prepared statement, LIMIT parameters can be
specified using '?' markers. Value for the parameter can
be supplied while executing the prepared statement.
Passing string values, float or double value for LIMIT
works well from CLI. Because, while setting the value
for the parameters from the variable list (added using
SET), if the value is for parameter LIMIT then its
converted to integer value.
But, when prepared statement is executed from the other
interfaces as J connectors, or C applications etc.
The value for the parameters are sent to the server
with execute command. Each item in log has value and
the data TYPE. So, While setting parameter value
from this log, value is set to all the parameters
with the same data type as passed.
But here logic to convert value to integer type
if its for LIMIT parameter is missing.
Because of this,string '5' is set to LIMIT.
And the same is logged into the binlog file too.
Fix:
When executing prepared statement having parameter for
CLI it worked fine, as the value set for the parameter
is converted to integer. And this failed in other
interfaces as J connector,C Applications etc as this
conversion is missing.
So, as a fix added check while setting value for the
parameters. If the parameter is for LIMIT value then
its converted to integer value.
PROBLEM:
mysql provides a feature where in a session which is
idle for a period specified by the wait_timeout variable
(whose value is in seconds), the session is closed
This feature is not present when we use thread pool.
FIX:
This patch implements the interface functions which is
required to implement the wait_timeout functionality
in the thread pool plugin.
'MAX_BINLOG_CACHE_SIZE' ERROR
Problem:
=======
MySQL returns following error in win64.
"ERROR 1197 (HY000): Multi-statement transaction required more than
'max_binlog_cache_size' bytes of storage; increase this mysqld variable
and try again" when user tries to load >4G file even if
max_binlog_cache_size set to maximum value. On Linux everything
works fine.
Analysis:
========
The `max_binlog_cache_size' variable is of type `ulonglong'. This
value is set to `ULONGLONG_MAX' at the time of server start up. The
above value is stored in an intermediate variable named
`saved_max_binlog_cache_size' which is of type `ulong'. In visual
c++ complier the `ulong' type is of 4bytes in size and hence the value
is getting truncated to '4GB' and the cache is not able to grow beyond
4GB size. The same limitation is observed with
"max_binlog_stmt_cache_size" as well. Similar fix has been applied.
Fix:
===
As part of fix the type "ulong" is replaced with "my_off_t" which is of
type "ulonglong".
mysys/mf_iocache.c:
Added debug statement to simulate a scenario where the cache
file's current position is set to >4GB
sql/log.cc:
Replaced the type of `saved_max_binlog_cache_size' from "ulong" to
"my_off_t", which is a type def for "ulonglong".
"ORDER BY" AND "LIMIT BY" CLAUSE
PROBLEM:
When a 'limit' clause is specified in a query along with
group by and order by, optimizer chooses wrong index
there by examining more number of rows than required.
However without the 'limit' clause, optimizer chooses
the right index.
ANALYSIS:
With respect to the query specified, range optimizer chooses
the first index as there is a range present ( on 'a'). Optimizer
then checks for an index which would give records in sorted
order for the 'group by' clause.
While checking chooses the second index (on 'c,b,a') based on
the 'limit' specified and the selectivity of
'quick_condition_rows' (number of rows present in the range)
in 'test_if_skip_sort_order' function.
But, it fails to consider that an order by clause on a
different column will result in scanning the entire index and
hence the estimated number of rows calculated above are
wrong (which results in choosing the second index).
FIX:
Do not enforce the 'limit' clause in the call to
'test_if_skip_sort_order' if we are creating a temporary
table. Creation of temporary table indicates that there would be
more post-processing and hence will need all the rows.
This fix is backported from 5.6. This problem is fixed in 5.6 as
part of changes for work log #5558
mysql-test/r/subselect.result:
Changes for Bug#11762052 results in the correct number of rows.
sql/sql_select.cc:
Do not pass the actual 'limit' value if 'need_tmp' is true.
RBR AND RC
Description: When scanning and locking rows with < or <=, InnoDB locks
the next row even though row based binary logging and read committed
is used.
Solution: In the handler, when the row is identified to fall outside
of the range (as specified in the query predicates), then request the
storage engine to unlock the row (if possible). This is done in
handler::read_range_first() and handler::read_range_next().
COUNT DISTINCT GROUP BY
PROBLEM:
To calculate the final result of the count(distinct(select 1))
we call 'end_send' function instead of 'end_send_group'.
'end_send' cannot be called if we have aggregate functions
that need to be evaluated.
ANALYSIS:
While evaluating for a possible loose_index_scan option for
the query, the variable 'is_agg_distinct' is set to 'false'
as the item in the distinct clause is not a field. But, we
choose loose_index_scan by not taking this into
consideration.
So, while setting the final 'select_function' to evaluate
the result, 'precomputed_group_by' is set to TRUE as in
this case loose_index_scan is chosen and we do not have
agg_distinct in the query (which is clearly wrong as we
have one).
As a result, 'end_send' function is chosen as the final
select_function instead of 'end_send_group'. The difference
between the two being, 'end_send_group' evaluates the
aggregates while 'end_send' does not. Hence the wrong result.
FIX:
The variable 'is_agg_distinct' always represents if
'loose_idnex_scan' can be chosen for aggregate_distinct
functions present in the select.
So, we check for this variable to continue with
loose_index_scan option.
sql/opt_range.cc:
Do not continue if is_agg_distinct is not set in case
of agg_distinct functions.
primary key with innodb tables
The bug was triggered if a single ALTER TABLE statement both
added and dropped indexes and ALTER TABLE failed during drop
(e.g. because the index was needed in a foreign key constraint).
In such cases, the server index information would get out of
sync with InnoDB - the added index would be present inside
InnoDB, but not in the server. This could then lead to InnoDB
error messages and/or server crashes.
The root cause is that new indexes are added before old indexes
are dropped. This means that if ALTER TABLE fails while dropping
indexes, index changes will be reverted in the server but not
inside InnoDB.
This patch fixes the problem by dropping any added indexes
if drop fails (for ALTER TABLE statements that both adds
and drops indexes).
However, this won't work if we added a primary key as this
key might not be possible to drop inside InnoDB. Therefore,
we resort to the copy algorithm if a primary key is added
by an ALTER TABLE statement that also drops an index.
In 5.6 this bug is more properly fixed by the handler interface
changes done in the scope of WL#5534 "Online ALTER".
KEY UPDATES WITH A LIMIT OF 1
Problem: The unsafety warning for statements such as
update...limit1 where pk=1 are thrown when binlog-format
= STATEMENT,despite of the fact that such statements are
actually safe. this leads to filling up of the disk space
with false warnings.
Solution: This is not a complete fix for the problem, but
prevents the disks from getting filled up. This should
therefore be regarded as a workaround. In the future this
should be overriden by server general suppress/filtering
framework. It should also be noted that another worklog is
supposed to defeat this case's artificial unsafety.
We use a warning suppression mechanism to detect warning flood,
enable the suppression, and disable this when the average
warnings/second has reduced to acceptable limits.
Activation: The supression for LIMIT unsafe statements are
activated when the last 50 warnings were logged in less
than 50 seconds.
Supression: Once activated this supression will prevent the
individual warnings to be logged in the error log, but print
the warning for every 50 warnings with the note:
"The last warning was repeated N times in last S seconds"
Noteworthy is the fact that this supression works only on the
error logs and the warnings seen by the clients will remain as
it is (i.e. one warning/ unsafe statement)
Deactivation: The supression will be deactivated once the
average # of warnings/sec have gone down to the acceptable limits.
sql/sql_class.cc:
Added code to supress warning while logging them to error-log.
Problem:
=======
The return value from my_b_write is ignored by: `my_b_write_quoted',
`my_b_write_bit',`Query_log_event::print_query_header'
Most callers of `my_b_printf' ignore the return value. `log_event.cc'
has many calls to it.
Analysis:
========
`my_b_write' is used to write data into a file. If the write fails it
sets appropriate error number and error message through my_error()
function call and sets the IO_CACHE::error == -1.
`my_b_printf' function is also used to write data into a file, it
internally invokes my_b_write to do the write operation. Upon
success it returns number of characters written to file and on error
it returns -1 and sets the error through my_error() and also sets
IO_CACHE::error == -1. Most of the event specific print functions
for example `Create_file_log_event::print', `Execute_load_log_event::print'
etc are the ones which make several calls to the above two functions and
they do not check for the return value after the 'print' call. All the above
mentioned abuse cases deal with the client side.
Fix:
===
As part of bug fix a check for IO_CACHE::error == -1 has been added at
a very high level after the call to the 'print' function. There are
few more places where the return value of "my_b_write" is ignored
those are mentioned below.
+++ mysys/mf_iocache2.c 2012-06-04 07:03:15 +0000
@@ -430,7 +430,8 @@
memset(buffz, '0', minimum_width - length2);
else
memset(buffz, ' ', minimum_width - length2);
- my_b_write(info, buffz, minimum_width - length2);
+++ sql/log.cc 2012-06-08 09:04:46 +0000
@@ -2388,7 +2388,12 @@
{
end= strxmov(buff, "# administrator command: ", NullS);
buff_len= (ulong) (end - buff);
- my_b_write(&log_file, (uchar*) buff, buff_len);
At these places appropriate return value handlers have been added.
client/mysqlbinlog.cc:
check for IO_CACHE::error == -1 has been added after the call to
the event specific print functions
mysys/mf_iocache2.c:
Added handler to check the written value of `my_b_write'
sql/log.cc:
Added handler to check the written value of `my_b_write'
sql/log_event.cc:
Added error simulation statements in `Create_file_log_event::print`
and `Execute_load_query_log_event::print'
sql/rpl_utility.h:
Removed the extra ';'