IN QUERY CACHE CODE
DESCRIPTION:
MySQL Server crashes sporadically when Query Caching is on and
the server has high contention among clients.
ANALYSIS :
Scenario 1:
In Query_cache::move_by_type() when handling RESULT or its related blocks,
Write Lock is acquired on its parent Query block. However the next and prev
pointers are cached in local variables before lock acquisition. In an extremely
high contention scenario there exists a possibility that
Query_cache::append_result_data() is operating on the same query block
and as a consequence might append a new Result block to the end of Result
blocks Linked List of the Query. This would manipulate the next, prev pointers
of the Block being processed in move_by_type(), however the local pointers
still point to previous nodes there by causing Data Corruption leading to crash.
FIX :
Scenario 1:
The next, prev pointers are now accessed only after Lock acquisition in
Query_cache::move_by_type().
ROBUST AGAINST BUGS IN CALLERS".
Both MDL subsystems and Table Definition Cache code assume
that callers ensure that names of objects passed to them are
not longer than NAME_LEN bytes. Unfortunately due to bugs in
callers this assumption might be broken in some cases. As
result we get nasty bugs causing buffer overruns when we
construct MDL key or TDC key from object names.
This patch makes TDC code more robust against such bugs by
ensuring that we always checking size of result buffer when
constructing TDC keys. This doesn't free its callers from
ensuring that both db and table names are shorter than
NAME_LEN bytes. But at least this steps prevents buffer
overruns in case of bug in caller, replacing them with less
harmful behavior.
This is 5.1-only version of patch.
This patch introduces new version of create_table_def_key()
helper function which constructs TDC key without risk of
result buffer overrun. Places in code that construct TDC keys
were changed to use this function.
Also changed rm_temporary_table() and open_new_frm() functions
to avoid use of "unsafe" strmov() and strxmov() functions and
use safer strnxmov() instead.
A buffer large enough to hold the query _plus_ some additional
data is allocated before parsing is started. The additional data
is used by the query cache, and consists of the name of the current
database and a set of flags.
When a packet containing multiple SQL statements is sent to the
server and one of the statements changes the current database
(a "USE <db>" statement), and the name of the new current database
is longer than of the previous, there is not enough space in the
buffer for the new name, and we write out over the buffer boundary.
The fix adds an extra field to store the number of bytes
allocated to the database name in the buffer. If the current
database name changes, and the new name is longer than the
previous one, we refuse to cache the query.
compression protocol.
The loss of connection was caused by a malformed packet
sent by the server in case when query cache was in use.
When storing data in the query cache, the query cache
memory allocation algorithm had a tendency to reduce
the amount of memory block necessary to store a result
set, up to finally storing the entire result set in a single
block. With a significant result set, this memory block
could turn out to be quite large - 30, 40 MB and on.
When such a result set was sent to the client, the entire
memory block was compressed and written to network as a
single network packet. However, the length of the
network packet is limited by 0xFFFFFF (16MB), since
the packet format only allows 3 bytes for packet length.
As a result, a malformed, overly large packet
with truncated length would be sent to the client
and break the client/server protocol.
The solution is, when sending result sets from the query
cache, to ensure that the data is chopped into
network packets of size <= 16MB, so that there
is no corruption of packet length. This solution,
however, has a shortcoming: since the result set
is still stored in the query cache as a single block,
at the time of sending, we've lost boundaries of individual
logical packets (one logical packet = one row of the result
set) and thus can end up sending a truncated logical
packet in a compressed network packet.
As a result, on the client we may require more memory than
max_allowed_packet to keep, both, the truncated
last logical packet, and the compressed next packet.
This never (or in practice never) happens without compression,
since without compression it's very unlikely that
a) a truncated logical packet would remain on the client
when it's time to read the next packet
b) a subsequent logical packet that is being read would be
so large that size-of-new-packet + size-of-old-packet-tail >
max_allowed_packet.
To remedy this issue, we send data in 1MB sized packets,
that's below the current client default of 16MB for
max_allowed_packet, but large enough to ensure there is no
unnecessary overhead from too many syscalls per result set.
This patch introduce a limit on the time the query cache can
block with a lock on SELECTs.
Other operations which causes a change in the table
data will still be blocked.
Implemented the server infrastructure for the fix:
1. Added a function LEX_STRING *thd_query_string(THD) to return
a LEX_STRING structure instead of char *.
This is the function that must be called in innodb instead of
thd_query()
2. Did some encapsulation in THD : aggregated thd_query and
thd_query_length into a LEX_STRING and made accessor and mutator
methods for easy code updating.
3. Updated the server code to use the new methods where applicable.
Early patch submitted for discussion.
It is possible for more than one thread to enter the condition
in query_cache_insert(), but the condition predicate is to
signal one thread each time the cache status changes between
the following states: {NO_FLUSH_IN_PROGRESS,FLUSH_IN_PROGRESS,
TABLE_FLUSH_IN_PROGRESS}
Consider three threads THD1, THD2, THD3
THD2: select ... => Got a writer in ::store_query
THD3: select ... => Got a writer in ::store_query
THD1: flush tables => qc status= FLUSH_IN_PROGRESS;
new writers are blocked.
THD2: select ... => Still got a writer and enters cond in
query_cache_insert
THD3: select ... => Still got a writer and enters cond in
query_cache_insert
THD1: flush tables => finished and signal status change.
THD2: select ... => Wakes up and completes the insert.
THD3: select ... => Happily waiting for better times. Why hurry?
This patch is a refactoring of this lock system. It introduces four new methods:
Query_cache::try_lock()
Query_cache::lock()
Query_cache::lock_and_suspend()
Query_cache::unlock()
This change also deprecates wait_while_table_flush_is_in_progress(). All threads are
queued and put on a conditional wait. On each unlock the queue is signalled. This resolve
the issues with left over threads. To assure that no threads are spending unnecessary
time waiting a signal broadcast is issued every time a lock is taken before a full
cache flush.
The query cache module did not check for the SQL_NO_CACHE keyword before
attempting to query the hash lookup table. This had a small performance impact.
By introducing a check on the query string before obtaining the hash mutex
we can gain some performance if the SQL_NO_CACHE directive is used often.
The problem is that select queries executed concurrently with
a concurrent insert on a MyISAM table could be cached if the
select started after the query cache invalidation but before
the unlock of tables performed by the concurrent insert. This
race could happen because the concurrent insert was failing
to prevent cache of select queries happening at the same time.
The solution is to add a 'uncacheable' status flag to signal
that a concurrent insert is being performed on the table and
that queries executing at the same time shouldn't cache the
results.
- Remove bothersome warning messages. This change focuses on the warnings
that are covered by the ignore file: support-files/compiler_warnings.supp.
- Strings are guaranteed to be max uint in length
The problem is that the query cache was storing partial results
if the statement failed when sending the results to the client.
This could cause clients to hang when trying to read the results
from the cache as they would, for example, wait indefinitely for
a eof packet that wasn't saved.
The solution is to always discard the caching of a query that
failed to send its results to the associated client.
The problem is that the query cache stores packets containing
the server status of the time when the cached statement was run.
This might lead to a wrong transaction status in the client side
if a statement is cached during a transaction and is later served
outside a transaction context (and vice-versa).
The solution is to take into account the transaction status when
storing in and serving from the query cache.
The query cache module did not check for the SQL_NO_CACHE keyword before
attempting to query the hash lookup table. This had a small performance impact.
By introducing a check on the query string before obtaining the hash mutex
we can gain some performance if the SQL_NO_CACHE directive is used often.
This patch also fixes bugs 36963 and 35600.
- In many places a view was confused with an anonymous derived
table, i.e. access checking was skipped. Fixed by introducing a
predicate to tell the difference between named and anonymous
derived tables.
- When inserting fields for "SELECT * ", there was no
distinction between base tables and views, where one should be
made. View privileges are checked elsewhere.
if cached query uses many tables
The problem was that query cache would not properly cache
queries which used 256 or more tables but yet would leave
behind query cache blocks pointing to freed (destroyed)
data. Later when invalidating (due to a truncate) query cache
would attempt to grab a lock which resided in the freed data,
leading to hangs or undefined behavior.
This was happening due to a improper return value from the
function responsible for registering the tables used in the
query (so the cache can be invalidated later if one of the
tables is modified). The function expected a return value of
type boolean (char, 8 bits) indicating success (1) or failure
(0) but the number of tables registered (unsigned int, 32 bits)
was being returned instead. This caused the function to return
failure for cases where it had actually succeed because when
a type (unsigned int) is converted to a narrower type (char),
the excess bits on the left are discarded. Thus if the 8
rightmost bits are zero, the return value will be 0 (failure).
The solution is to simply return true (1) only if the number of
registered table is greater than zero and false (0) otherwise.
The initial value of free memory blocks in 0. When the query cache is enabled
a new memory block gets allocated and is assigned number 1. The free memory
block is later split each time query cache memory is allocated for new blocks.
This means that the free memory block counter won't be reduced to zero when
the number of allocated blocks are zero, but rather one. To avoid confusion
this patch changes this behavior so that the free memory block counter is
reset to zero when the query cache is disabled.
Note that when the query cache is enabled and resized the free memory block
counter was still calculated correctly.
pre-locking.
The crash was caused by an implicit assumption in check_table_access() that
table_list parameter is always a part of lex->query_tables.
When iterating over the passed list of tables, check_table_access() used
to stop only when lex->query_tables_last_not_own was reached.
In case of pre-locking, lex->query_tables_last_own is not NULL and points
to some element of lex->query_tables. When the parameter
of check_table_access() was not part of lex->query_tables, loop invariant
could never be violated and a crash would happen when the current table
pointer would point beyond the end of the provided list.
The fix is to change the signature of check_table_access() to also accept
a numeric limit of loop iterations, similarly to check_grant(), and
supply this limit in all places when we want to check access of tables
that are outside lex->query_tables, or just want to check access to one table.
Reseting the query cache by issuing a SET GLOBAL query_cache_size=0 caused the server
to crash if a the server concurrently was saving a new result set to the query cache. The
reason for this was that the invalidation wasn't waiting on the result writers to
release the block level locks on the query cache.
cause ROLLBACK of statement", part 1. Review fixes.
Do not send OK/EOF packets to the client until we reached the end of
the current statement.
This is a consolidation, to keep the functionality that is shared by all
SQL statements in one place in the server.
Currently this functionality includes:
- close_thread_tables()
- log_slow_statement().
After this patch and the subsequent patch for Bug#12713, it shall also include:
- ha_autocommit_or_rollback()
- net_end_statement()
- query_cache_end_of_result().
In future it may also include:
- mysql_reset_thd_for_next_command().