FLUSH TABLES <list> WITH READ LOCK are incompatible" to
be pushed as separate patch.
Replaced thread state name "Waiting for table", which was
used by threads waiting for a metadata lock or table flush,
with a set of names which better reflect types of resources
being waited for.
Also replaced "Table lock" thread state name, which was used
by threads waiting on thr_lock.c table level lock, with more
elaborate "Waiting for table level lock", to make it
more consistent with other thread state names.
Updated test cases and their results according to these
changes.
Fixed sys_vars.query_cache_wlock_invalidate_func test to not
to wait for timeout of wait_condition.inc script.
Apart strict-aliasing warnings, fix the remaining warnings
generated by GCC 4.4.4 -Wall and -Wextra flags.
One major source of warnings was the in-house function my_bcmp
which (unconventionally) took pointers to unsigned characters
as the byte sequences to be compared. Since my_bcmp and bcmp
are deprecated functions whose only difference with memcmp is
the return value, every use of the function is replaced with
memcmp as the special return value wasn't actually being used
by any caller.
There were also various other warnings, mostly due to type
mismatches, missing return values, missing prototypes, dead
code (unreachable) and ignored return values.
In order to allow thread schedulers to be dynamically loaded,
it is necessary to make the following changes to the server:
- Two new service interfaces
- Modifications to InnoDB to inform the thread scheduler of state changes.
- Changes to the VIO subsystem for checking if data is available on a socket.
- Elimination of remains of the old thread pool implementation.
The two new service interfaces introduces are:
my_thread_scheduler
A service interface to register a thread
scheduler.
thd_wait
A service interface to inform thread scheduler
that the thread is about to start waiting.
In addition, the patch adds code that:
- Add a call to thd_wait for table locks in mysys
thd_lock.c by introducing a set function that
can be used to set a callback to be used when
waiting on a lock and resuming from waiting.
- Calling the mysys set function from the server
to set the callbacks correctly.
This patch introduces timeouts for metadata locks.
The timeout is specified in seconds using the new dynamic system
variable "lock_wait_timeout" which has both GLOBAL and SESSION
scopes. Allowed values range from 1 to 31536000 seconds (= 1 year).
The default value is 1 year.
The new server parameter "lock-wait-timeout" can be used to set
the default value parameter upon server startup.
"lock_wait_timeout" applies to all statements that use metadata locks.
These include DML and DDL operations on tables, views, stored procedures
and stored functions. They also include LOCK TABLES, FLUSH TABLES WITH
READ LOCK and HANDLER statements.
The patch also changes thr_lock.c code (table data locks used by MyISAM
and other simplistic engines) to use the same system variable.
InnoDB row locks are unaffected.
One exception to the handling of the "lock_wait_timeout" variable
is delayed inserts. All delayed inserts are executed with a timeout
of 1 year regardless of the setting for the global variable. As the
connection issuing the delayed insert gets no notification of
delayed insert timeouts, we want to avoid unnecessary timeouts.
It's important to note that the timeout value is used for each lock
acquired and that one statement can take more than one lock.
A statement can therefore block for longer than the lock_wait_timeout
value before reporting a timeout error. When lock timeout occurs,
ER_LOCK_WAIT_TIMEOUT is reported.
Test case added to lock_multi.test.
This was a deadlock between ALTER TABLE and another DML statement
(or LOCK TABLES ... READ). ALTER TABLE would wait trying to upgrade
its lock to MDL_EXCLUSIVE and the DML statement would wait trying
to acquire a TL_READ_NO_INSERT table level lock.
This could happen if one connection first acquired a MDL_SHARED_READ
lock on a table. In another connection ALTER TABLE is then started.
ALTER TABLE eventually blocks trying to upgrade to MDL_EXCLUSIVE,
but while holding a TL_WRITE_ALLOW_READ table level lock.
If the first connection then tries to acquire TL_READ_NO_INSERT,
it will block and we have a deadlock since neither connection can
proceed.
This patch fixes the problem by allowing TL_READ_NO_INSERT
locks to be granted if another connection holds TL_WRITE_ALLOW_READ
on the same table. This will allow the DML statement to proceed
such that it eventually can release its MDL lock which in turn
makes ALTER TABLE able to proceed.
Note that TL_READ_NO_INSERT was already partially compatible with
TL_WRITE_ALLOW_READ as the latter would be granted if the former
lock was held. This patch just makes the opposite true as well.
Also note that since ALTER TABLE takes an upgradable MDL lock,
there will be no starvation of ALTER TABLE statements by
statements acquiring TL_READ or TL_READ_NO_INSERT.
Test case added to lock_sync.test.
INFILE".
Attempts to execute an INSERT statement for a MEMORY table which invoked
a trigger or called a stored function which tried to perform LOW_PRIORITY
update on the table being inserted into, resulted in debug servers aborting
due to an assertion failure. On non-debug servers such INSERTs failed with
"Can't update table t1 in stored function/trigger because it is already used
by statement which invoked this stored function/trigger" as expected.
The problem was that in the above scenario TL_WRITE_CONCURRENT_INSERT
is converted to TL_WRITE inside the thr_lock() function since the MEMORY
engine does not support concurrent inserts. This triggered an assertion
which assumed that for the same table, one thread always requests locks with
higher thr_lock_type value first. When TL_WRITE_CONCURRENT_INSERT is
upgraded to TL_WRITE after the locks have been sorted, this is no longer true.
In this case, TL_WRITE was requested after acquiring a TL_WRITE_LOW_PRIORITY
lock on the table, triggering the assert.
This fix solves the problem by adjusting this assert to take this
scenario into account.
An alternative approach to change handler::store_locks() methods for all engines
which do not support concurrent inserts in such way that
TL_WRITE_CONCURRENT_INSERT is upgraded to TL_WRITE there instead,
was considered too intrusive.
Commit on behalf of Dmitry Lenev.
Bug #42147 Concurrent DML and LOCK TABLE ... READ for InnoDB
table cause warnings in errlog
Concurrent execution of LOCK TABLES ... READ statement and DML statements
affecting the same InnoDB table on debug builds of MySQL server might lead
to "Found lock of type 6 that is write and read locked" warnings appearing
in error log.
The problem is that the table-level locking code allows a thread to acquire
TL_READ_NO_INSERT lock on a table even if there is another thread which holds
TL_WRITE_ALLOW_WRITE lock on the same table. At the same time, the locking
code assumes that that such locks are incompatible (for example, see check_locks()).
This doesn't lead to any problems other than warnings in error log for
debug builds of server since for InnoDB tables TL_READ_NO_INSERT type of
lock is only used for LOCK TABLES and for this statement InnoDB also
performs its own table-level locking.
Unfortunately, the table lock compatibility matrix cannot be updated to disallow
TL_READ_NO_INSERT when another thread holds TL_WRITE_ALLOW_WRITE without
causing starvation of LOCK TABLE READ in InnoDB under high write load.
This patch therefore contains no code changes.
The issue will be fixed later when LOCK TABLE READ has been updated
to not use table locks. This bug will therefore be marked as
"To be fixed later".
Code comment in thr_lock.c expanded to clarify the issue and a
test case based on the bug description added to innodb_mysql_lock.test.
Note that a global suppression rule has been added to both MTR v1 and v2
for the "Found lock of type 6 that is write and read locked" warning.
These suppression rules must be removed once this bug is properly fixed.
------------------------------------------------------------
revno: 3035.4.1
committer: Davi Arnaut <Davi.Arnaut@Sun.COM>
branch nick: 39897-6.0
timestamp: Thu 2009-01-15 12:17:57 -0200
message:
Bug#39897: lock_multi fails in pushbuild: timeout waiting for processlist
The problem is that relying on the "Table lock" thread state in
its current position to detect that a thread is waiting on a lock
is race prone. The "Table lock" state change happens before the
thread actually tries to grab a lock on a table.
The solution is to move the "Table lock" state so that its set
only when a thread is actually going to wait for a lock. The state
change happens after the thread fails to grab the lock (because it
is owned by other thread) and proceeds to wait on a condition.
This is considered part of work related to WL#4284 "Transactional
DDL locking"
Warning: this patch contains an incompatible change.
When waiting on a lock in thr_lock.c, the server used to display "Locked"
processlist state. After this patch, the state is "Table lock".
The new state was actually intended to be display since year 2002,
when Monty added it. But up until removal of thd->locked boolean
member, this state was ignored by SHOW PROCESSLIST code.
----------------------------------------------------------
revno: 2630.4.38
committer: Konstantin Osipov <konstantin@mysql.com>
branch nick: mysql-6.0-4144
timestamp: Wed 2008-06-25 22:07:06 +0400
message:
WL#4144 - Lock MERGE engine children.
Committing a version of the patch merged with WL#3726
on behalf of Ingo.
Step #1: Move locking from parent to children.
MERGE children are now left in the query list of tables
after inserted there in open_tables(). So they are locked
by lock_tables() as all other tables are.
The MERGE parent does not store locks any more. It appears
in a MYSQL_LOCK with zero lock data. This is kind of a "dummy"
lock.
All other lock handling is also done directly on the children.
To protect against parent or child modifications during LOCK
TABLES, the children are detached after every statement and
attached before every statement, even under LOCK TABLES.
The children table list is removed from the query list of tables
on every detach and on close of the parent.
Step #2: Move MERGE specific functionality from SQL layer
into table handler.
Functionality moved from SQL layer (mainly sql_base.cc)
to the table handler (ha_myisammrg.cc).
Unnecessary code is removed from the SQL layer.
Step #3: Moved all MERGE specific members from TABLE
to ha_myisammrg.
Moved members from TABLE to ha_myisammrg.
Renamed some mebers.
Fixed comments.
Step #4: Valgrind and coverage testing
Valgrind did not uncover new problems.
Added purecov comments.
Added a new test for DATA/INDEX DIRECTORY options.
Changed handling of ::reset() for non-attached children.
Fixed the merge-big test.
Step #5: Fixed crashes detected during review
Changed detection when to attach/detach.
Added new tests.
Backport also the fix for Bug#44040 "MySQL allows creating a
MERGE table upon VIEWs but crashes when using it"
----------------------------------------------------------
revno: 2630.4.35
committer: Konstantin Osipov <konstantin@mysql.com>
branch nick: mysql-6.0-3726
timestamp: Wed 2008-06-25 16:44:00 +0400
message:
Fix a MyISAM-specific bug in the new implementation of
LOCK TABLES (WL#3726).
If more than one instance of a MyISAM table are open in the
same connection, all of them must share the same status_param.
Otherwise, unlock of a table may lead to lost records.
See also comments in thr_lock.c.
----------------------------------------------------------------------
ChangeSet@1.2571, 2008-04-08 12:30:06+02:00, vvaintroub@wva. +122 -0
Bug#32082 : definition of VOID in my_global.h conflicts with Windows
SDK headers
VOID macro is now removed. Its usage is replaced with void cast.
In some cases, where cast does not make much sense (pthread_*, printf,
hash_delete, my_seek), cast is ommited.
Concurrent execution of statements which require non-table-level
write locks on several instances of the same table (such as
SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML
statement which invokes trigger which tries to update same InnoDB
table directly and through stored function) and statements which
required table-level locks on this table (e.g. LOCK TABLE ... WRITE,
ALTER TABLE, ...) might have resulted in a deadlock.
The problem occured when a thread tried to acquire write lock
(TL_WRITE_ALLOW_WRITE) on the table but had to wait since there was
a pending write lock (TL_WRITE, TL_WRITE_ALLOW_READ) on this table
and we failed to detect that this thread already had another instance
of write lock on it (so in fact we were trying to acquire recursive
lock) because there was also another thread holding write lock on the
table (also TL_WRITE_ALLOW_WRITE). When the latter thread released
its lock neither the first thread nor the thread trying to acquire
TL_WRITE/TL_WRITE_ALLOW_READ were woken up (as table was still write
locked by the first thread) so we ended up with a deadlock.
This patch solves this problem by ensuring that thread which
already has write lock on the table won't wait when it tries
to acquire second write lock on the same table.
Backport from 6.0 to 5.1.
Only those sync points are included, which are used in debug_sync.test.
The Debug Sync Facility allows to place synchronization points
in the code:
open_tables(...)
DEBUG_SYNC(thd, "after_open_tables");
lock_tables(...)
When activated, a sync point can
- Send a signal and/or
- Wait for a signal
Nomenclature:
- signal: A value of a global variable that persists
until overwritten by a new signal. The global
variable can also be seen as a "signal post"
or "flag mast". Then the signal is what is
attached to the "signal post" or "flag mast".
- send a signal: Assign the value (the signal) to the global
variable ("set a flag") and broadcast a
global condition to wake those waiting for
a signal.
- wait for a signal: Loop over waiting for the global condition until
the global value matches the wait-for signal.
Please find more information in the top comment in debug_sync.cc
or in the worklog entry.
upgrading lock, even with low_priority_updates
The problem is that there is no mechanism to control whether a
delayed insert takes a high or low priority lock on a table.
The solution is to modify the delayed insert thread ("handler")
to take into account the global value of low_priority_updates
when taking table locks. The value of low_priority_updates is
retrieved when the insert delayed thread is created and will
remain the same for the duration of the thread.
Dumping information about locks in use by sending a SIGHUP signal
to the server or by invoking the "mysqladmin debug" command may
lead to a server crash in debug builds or to undefined behavior in
production builds.
The problem was that a mutex that protects a lock object (THR_LOCK)
might have been destroyed before the lock object was actually removed
from the list of locks in use, causing a race condition with other
threads iterating over the list. The solution is to destroy the mutex
only after removing lock object from the list.
value" error even though the value was correct): a C function in my_getopt.c
was taking bool* in parameter and was called from C++ sql_plugin.cc,
but on some Mac OS X sizeof(bool) is 1 in C and 4 in C++, giving funny
mismatches. Fixed, all other occurences of bool in C are removed, future
ones are blocked by a "C-bool-catcher" in my_global.h (use my_bool).
The problem is that the Table_locks_waited was incremented only
when the lock request succeed. If a thread waiting for the lock
gets killed or the lock request is aborted, the variable would
not be incremented, leading to inaccurate values in the variable.
The solution is to increment the Table_locks_waited whenever the
lock request is queued. This reflects better the intended behavior
of the variable -- show how many times a lock was waited.
corrupts a MERGE table
Bug 26867 - LOCK TABLES + REPAIR + merge table result in
memory/cpu hogging
Bug 26377 - Deadlock with MERGE and FLUSH TABLE
Bug 25038 - Waiting TRUNCATE
Bug 25700 - merge base tables get corrupted by
optimize/analyze/repair table
Bug 30275 - Merge tables: flush tables or unlock tables
causes server to crash
Bug 19627 - temporary merge table locking
Bug 27660 - Falcon: merge table possible
Bug 30273 - merge tables: Can't lock file (errno: 155)
The problems were:
Bug 26379 - Combination of FLUSH TABLE and REPAIR TABLE
corrupts a MERGE table
1. A thread trying to lock a MERGE table performs busy waiting while
REPAIR TABLE or a similar table administration task is ongoing on
one or more of its MyISAM tables.
2. A thread trying to lock a MERGE table performs busy waiting until all
threads that did REPAIR TABLE or similar table administration tasks
on one or more of its MyISAM tables in LOCK TABLES segments do UNLOCK
TABLES. The difference against problem #1 is that the busy waiting
takes place *after* the administration task. It is terminated by
UNLOCK TABLES only.
3. Two FLUSH TABLES within a LOCK TABLES segment can invalidate the
lock. This does *not* require a MERGE table. The first FLUSH TABLES
can be replaced by any statement that requires other threads to
reopen the table. In 5.0 and 5.1 a single FLUSH TABLES can provoke
the problem.
Bug 26867 - LOCK TABLES + REPAIR + merge table result in
memory/cpu hogging
Trying DML on a MERGE table, which has a child locked and
repaired by another thread, made an infinite loop in the server.
Bug 26377 - Deadlock with MERGE and FLUSH TABLE
Locking a MERGE table and its children in parent-child order
and flushing the child deadlocked the server.
Bug 25038 - Waiting TRUNCATE
Truncating a MERGE child, while the MERGE table was in use,
let the truncate fail instead of waiting for the table to
become free.
Bug 25700 - merge base tables get corrupted by
optimize/analyze/repair table
Repairing a child of an open MERGE table corrupted the child.
It was necessary to FLUSH the child first.
Bug 30275 - Merge tables: flush tables or unlock tables
causes server to crash
Flushing and optimizing locked MERGE children crashed the server.
Bug 19627 - temporary merge table locking
Use of a temporary MERGE table with non-temporary children
could corrupt the children.
Temporary tables are never locked. So we do now prohibit
non-temporary chidlren of a temporary MERGE table.
Bug 27660 - Falcon: merge table possible
It was possible to create a MERGE table with non-MyISAM children.
Bug 30273 - merge tables: Can't lock file (errno: 155)
This was a Windows-only bug. Table administration statements
sometimes failed with "Can't lock file (errno: 155)".
These bugs are fixed by a new implementation of MERGE table open.
When opening a MERGE table in open_tables() we do now add the
child tables to the list of tables to be opened by open_tables()
(the "query_list"). The children are not opened in the handler at
this stage.
After opening the parent, open_tables() opens each child from the
now extended query_list. When the last child is opened, we remove
the children from the query_list again and attach the children to
the parent. This behaves similar to the old open. However it does
not open the MyISAM tables directly, but grabs them from the already
open children.
When closing a MERGE table in close_thread_table() we detach the
children only. Closing of the children is done implicitly because
they are in thd->open_tables.
For more detail see the comment at the top of ha_myisammrg.cc.
Changed from open_ltable() to open_and_lock_tables() in all places
that can be relevant for MERGE tables. The latter can handle tables
added to the list on the fly. When open_ltable() was used in a loop
over a list of tables, the list must be temporarily terminated
after every table for open_and_lock_tables().
table_list->required_type is set to FRMTYPE_TABLE to avoid open of
special tables. Handling of derived tables is suppressed.
These details are handled by the new function
open_n_lock_single_table(), which has nearly the same signature as
open_ltable() and can replace it in most cases.
In reopen_tables() some of the tables open by a thread can be
closed and reopened. When a MERGE child is affected, the parent
must be closed and reopened too. Closing of the parent is forced
before the first child is closed. Reopen happens in the order of
thd->open_tables. MERGE parents do not attach their children
automatically at open. This is done after all tables are reopened.
So all children are open when attaching them.
Special lock handling like mysql_lock_abort() or mysql_lock_remove()
needs to be suppressed for MERGE children or forwarded to the parent.
This depends on the situation. In loops over all open tables one
suppresses child lock handling. When a single table is touched,
forwarding is done.
Behavioral changes:
===================
This patch changes the behavior of temporary MERGE tables.
Temporary MERGE must have temporary children.
The old behavior was wrong. A temporary table is not locked. Hence
even non-temporary children were not locked. See
Bug 19627 - temporary merge table locking.
You cannot change the union list of a non-temporary MERGE table
when LOCK TABLES is in effect. The following does *not* work:
CREATE TABLE m1 ... ENGINE=MRG_MYISAM ...;
LOCK TABLES t1 WRITE, t2 WRITE, m1 WRITE;
ALTER TABLE m1 ... UNION=(t1,t2) ...;
However, you can do this with a temporary MERGE table.
You cannot create a MERGE table with CREATE ... SELECT, neither
as a temporary MERGE table, nor as a non-temporary MERGE table.
CREATE TABLE m1 ... ENGINE=MRG_MYISAM ... SELECT ...;
Gives error message: table is not BASE TABLE.
statement being KILLed".
When statement which was trying to obtain write lock on then table and
which was blocked by existing read lock was killed, concurrent statements
that were trying to obtain read locks on the same table and that were
blocked by the presence of this pending write lock were not woken up and
had to wait until this first read lock goes away.
This problem was caused by the fact that we forgot to wake up threads
which pending requests could have been satisfied after removing lock
request for the killed thread.
The patch solves the problem by waking up those threads in such situation.
Test for this bug will be added to 5.1 only as it has much better
facilities for its implementation. Particularly, by using I_S.PROCESSLIST
and wait_condition.inc script we can wait until thread will be blocked on
certain table lock without relying on unconditional sleep (which usage
increases time needed for test runs and might cause spurious test
failures on slower platforms).
The following type conversions was done:
- Changed byte to uchar
- Changed gptr to uchar*
- Change my_string to char *
- Change my_size_t to size_t
- Change size_s to size_t
Removed declaration of byte, gptr, my_string, my_size_t and size_s.
Following function parameter changes was done:
- All string functions in mysys/strings was changed to use size_t
instead of uint for string lengths.
- All read()/write() functions changed to use size_t (including vio).
- All protocoll functions changed to use size_t instead of uint
- Functions that used a pointer to a string length was changed to use size_t*
- Changed malloc(), free() and related functions from using gptr to use void *
as this requires fewer casts in the code and is more in line with how the
standard functions work.
- Added extra length argument to dirname_part() to return the length of the
created string.
- Changed (at least) following functions to take uchar* as argument:
- db_dump()
- my_net_write()
- net_write_command()
- net_store_data()
- DBUG_DUMP()
- decimal2bin() & bin2decimal()
- Changed my_compress() and my_uncompress() to use size_t. Changed one
argument to my_uncompress() from a pointer to a value as we only return
one value (makes function easier to use).
- Changed type of 'pack_data' argument to packfrm() to avoid casts.
- Changed in readfrm() and writefrom(), ha_discover and handler::discover()
the type for argument 'frmdata' to uchar** to avoid casts.
- Changed most Field functions to use uchar* instead of char* (reduced a lot of
casts).
- Changed field->val_xxx(xxx, new_ptr) to take const pointers.
Other changes:
- Removed a lot of not needed casts
- Added a few new cast required by other changes
- Added some cast to my_multi_malloc() arguments for safety (as string lengths
needs to be uint, not size_t).
- Fixed all calls to hash-get-key functions to use size_t*. (Needed to be done
explicitely as this conflict was often hided by casting the function to
hash_get_key).
- Changed some buffers to memory regions to uchar* to avoid casts.
- Changed some string lengths from uint to size_t.
- Changed field->ptr to be uchar* instead of char*. This allowed us to
get rid of a lot of casts.
- Some changes from true -> TRUE, false -> FALSE, unsigned char -> uchar
- Include zlib.h in some files as we needed declaration of crc32()
- Changed MY_FILE_ERROR to be (size_t) -1.
- Changed many variables to hold the result of my_read() / my_write() to be
size_t. This was needed to properly detect errors (which are
returned as (size_t) -1).
- Removed some very old VMS code
- Changed packfrm()/unpackfrm() to not be depending on uint size
(portability fix)
- Removed windows specific code to restore cursor position as this
causes slowdown on windows and we should not mix read() and pread()
calls anyway as this is not thread safe. Updated function comment to
reflect this. Changed function that depended on original behavior of
my_pwrite() to itself restore the cursor position (one such case).
- Added some missing checking of return value of malloc().
- Changed definition of MOD_PAD_CHAR_TO_FULL_LENGTH to avoid 'long' overflow.
- Changed type of table_def::m_size from my_size_t to ulong to reflect that
m_size is the number of elements in the array, not a string/memory
length.
- Moved THD::max_row_length() to table.cc (as it's not depending on THD).
Inlined max_row_length_blob() into this function.
- More function comments
- Fixed some compiler warnings when compiled without partitions.
- Removed setting of LEX_STRING() arguments in declaration (portability fix).
- Some trivial indentation/variable name changes.
- Some trivial code simplifications:
- Replaced some calls to alloc_root + memcpy to use
strmake_root()/strdup_root().
- Changed some calls from memdup() to strmake() (Safety fix)
- Simpler loops in client-simple.c
Fixed compile-pentium64 scripts
Fixed wrong estimate of update_with_key_prefix in sql-bench
Merge bk-internal.mysql.com:/home/bk/mysql-5.1 into mysql.com:/home/my/mysql-5.1
Fixed unsafe define of uint4korr()
Fixed that --extern works with mysql-test-run.pl
Small trivial cleanups
This also fixes a bug in counting number of rows that are updated when we have many simultanous queries
Move all connection handling and command exectuion main loop from sql_parse.cc to sql_connection.cc
Split handle_one_connection() into reusable sub functions.
Split create_new_thread() into reusable sub functions.
Added thread_scheduler; Preliminary interface code for future thread_handling code.
Use 'my_thread_id' for internal thread id's
Make thr_alarm_kill() to depend on thread_id instead of thread
Make thr_abort_locks_for_thread() depend on thread_id instead of thread
In store_globals(), set my_thread_var->id to be thd->thread_id.
Use my_thread_var->id as basis for my_thread_name()
The above changes makes the connection we have between THD and threads more soft.
Added a lot of DBUG_PRINT() and DBUG_ASSERT() functions
Fixed compiler warnings
Fixed core dumps when running with --debug
Removed setting of signal masks (was never used)
Made event code call pthread_exit() (portability fix)
Fixed that event code doesn't call DBUG_xxx functions before my_thread_init() is called.
Made handling of thread_id and thd->variables.pseudo_thread_id uniform.
Removed one common 'not freed memory' warning from mysqltest
Fixed a couple of usage of not initialized warnings (unlikely cases)
Suppress compiler warnings from bdb and (for the moment) warnings from ndb
This problem could happen when show table status get outdated copy
of TABLE object from table cache.
MyISAM updates state info when external_lock() method is called. Though
I_S does not lock a table to avoid deadlocks. If I_S opens a table which
is in a table cache it will likely get outdated state info copy.
In this case shared state copy is more recent than local copy. This problem
is fixed by correctly restoring myisam state info pointer back to original
value, that is to shared state.
Affects MyISAM only. No good deterministic test case for this fix.
(Mostly in DBUG_PRINT() and unused arguments)
Fixed bug in query cache when used with traceing (--with-debug)
Fixed memory leak in mysqldump
Removed warnings from mysqltest scripts (replaced -- with #)
Addendum fixes after changing the condition variable
for the global read lock.
The stress test suite revealed some deadlocks. Some were
related to the new condition variable (COND_global_read_lock)
and some were general problems with the global read lock.
It is now necessary to signal COND_global_read_lock whenever
COND_refresh is signalled.
We need to wait for the release of a global read lock if one
is set before every operation that requires a write lock.
But we must not wait if we have locked tables by LOCK TABLES.
After setting a global read lock a thread waits until all
write locks are released.
Changes that requires code changes in other code of other storage engines.
(Note that all changes are very straightforward and one should find all issues
by compiling a --debug build and fixing all compiler errors and all
asserts in field.cc while running the test suite),
- New optional handler function introduced: reset()
This is called after every DML statement to make it easy for a handler to
statement specific cleanups.
(The only case it's not called is if force the file to be closed)
- handler::extra(HA_EXTRA_RESET) is removed. Code that was there before
should be moved to handler::reset()
- table->read_set contains a bitmap over all columns that are needed
in the query. read_row() and similar functions only needs to read these
columns
- table->write_set contains a bitmap over all columns that will be updated
in the query. write_row() and update_row() only needs to update these
columns.
The above bitmaps should now be up to date in all context
(including ALTER TABLE, filesort()).
The handler is informed of any changes to the bitmap after
fix_fields() by calling the virtual function
handler::column_bitmaps_signal(). If the handler does caching of
these bitmaps (instead of using table->read_set, table->write_set),
it should redo the caching in this code. as the signal() may be sent
several times, it's probably best to set a variable in the signal
and redo the caching on read_row() / write_row() if the variable was
set.
- Removed the read_set and write_set bitmap objects from the handler class
- Removed all column bit handling functions from the handler class.
(Now one instead uses the normal bitmap functions in my_bitmap.c instead
of handler dedicated bitmap functions)
- field->query_id is removed. One should instead instead check
table->read_set and table->write_set if a field is used in the query.
- handler::extra(HA_EXTRA_RETRIVE_ALL_COLS) and
handler::extra(HA_EXTRA_RETRIEVE_PRIMARY_KEY) are removed. One should now
instead use table->read_set to check for which columns to retrieve.
- If a handler needs to call Field->val() or Field->store() on columns
that are not used in the query, one should install a temporary
all-columns-used map while doing so. For this, we provide the following
functions:
my_bitmap_map *old_map= dbug_tmp_use_all_columns(table, table->read_set);
field->val();
dbug_tmp_restore_column_map(table->read_set, old_map);
and similar for the write map:
my_bitmap_map *old_map= dbug_tmp_use_all_columns(table, table->write_set);
field->val();
dbug_tmp_restore_column_map(table->write_set, old_map);
If this is not done, you will sooner or later hit a DBUG_ASSERT
in the field store() / val() functions.
(For not DBUG binaries, the dbug_tmp_restore_column_map() and
dbug_tmp_restore_column_map() are inline dummy functions and should
be optimized away be the compiler).
- If one needs to temporary set the column map for all binaries (and not
just to avoid the DBUG_ASSERT() in the Field::store() / Field::val()
methods) one should use the functions tmp_use_all_columns() and
tmp_restore_column_map() instead of the above dbug_ variants.
- All 'status' fields in the handler base class (like records,
data_file_length etc) are now stored in a 'stats' struct. This makes
it easier to know what status variables are provided by the base
handler. This requires some trivial variable names in the extra()
function.
- New virtual function handler::records(). This is called to optimize
COUNT(*) if (handler::table_flags() & HA_HAS_RECORDS()) is true.
(stats.records is not supposed to be an exact value. It's only has to
be 'reasonable enough' for the optimizer to be able to choose a good
optimization path).
- Non virtual handler::init() function added for caching of virtual
constants from engine.
- Removed has_transactions() virtual method. Now one should instead return
HA_NO_TRANSACTIONS in table_flags() if the table handler DOES NOT support
transactions.
- The 'xxxx_create_handler()' function now has a MEM_ROOT_root argument
that is to be used with 'new handler_name()' to allocate the handler
in the right area. The xxxx_create_handler() function is also
responsible for any initialization of the object before returning.
For example, one should change:
static handler *myisam_create_handler(TABLE_SHARE *table)
{
return new ha_myisam(table);
}
->
static handler *myisam_create_handler(TABLE_SHARE *table, MEM_ROOT *mem_root)
{
return new (mem_root) ha_myisam(table);
}
- New optional virtual function: use_hidden_primary_key().
This is called in case of an update/delete when
(table_flags() and HA_PRIMARY_KEY_REQUIRED_FOR_DELETE) is defined
but we don't have a primary key. This allows the handler to take precisions
in remembering any hidden primary key to able to update/delete any
found row. The default handler marks all columns to be read.
- handler::table_flags() now returns a ulonglong (to allow for more flags).
- New/changed table_flags()
- HA_HAS_RECORDS Set if ::records() is supported
- HA_NO_TRANSACTIONS Set if engine doesn't support transactions
- HA_PRIMARY_KEY_REQUIRED_FOR_DELETE
Set if we should mark all primary key columns for
read when reading rows as part of a DELETE
statement. If there is no primary key,
all columns are marked for read.
- HA_PARTIAL_COLUMN_READ Set if engine will not read all columns in some
cases (based on table->read_set)
- HA_PRIMARY_KEY_ALLOW_RANDOM_ACCESS
Renamed to HA_PRIMARY_KEY_REQUIRED_FOR_POSITION.
- HA_DUPP_POS Renamed to HA_DUPLICATE_POS
- HA_REQUIRES_KEY_COLUMNS_FOR_DELETE
Set this if we should mark ALL key columns for
read when when reading rows as part of a DELETE
statement. In case of an update we will mark
all keys for read for which key part changed
value.
- HA_STATS_RECORDS_IS_EXACT
Set this if stats.records is exact.
(This saves us some extra records() calls
when optimizing COUNT(*))
- Removed table_flags()
- HA_NOT_EXACT_COUNT Now one should instead use HA_HAS_RECORDS if
handler::records() gives an exact count() and
HA_STATS_RECORDS_IS_EXACT if stats.records is exact.
- HA_READ_RND_SAME Removed (no one supported this one)
- Removed not needed functions ha_retrieve_all_cols() and ha_retrieve_all_pk()
- Renamed handler::dupp_pos to handler::dup_pos
- Removed not used variable handler::sortkey
Upper level handler changes:
- ha_reset() now does some overall checks and calls ::reset()
- ha_table_flags() added. This is a cached version of table_flags(). The
cache is updated on engine creation time and updated on open.
MySQL level changes (not obvious from the above):
- DBUG_ASSERT() added to check that column usage matches what is set
in the column usage bit maps. (This found a LOT of bugs in current
column marking code).
- In 5.1 before, all used columns was marked in read_set and only updated
columns was marked in write_set. Now we only mark columns for which we
need a value in read_set.
- Column bitmaps are created in open_binary_frm() and open_table_from_share().
(Before this was in table.cc)
- handler::table_flags() calls are replaced with handler::ha_table_flags()
- For calling field->val() you must have the corresponding bit set in
table->read_set. For calling field->store() you must have the
corresponding bit set in table->write_set. (There are asserts in
all store()/val() functions to catch wrong usage)
- thd->set_query_id is renamed to thd->mark_used_columns and instead
of setting this to an integer value, this has now the values:
MARK_COLUMNS_NONE, MARK_COLUMNS_READ, MARK_COLUMNS_WRITE
Changed also all variables named 'set_query_id' to mark_used_columns.
- In filesort() we now inform the handler of exactly which columns are needed
doing the sort and choosing the rows.
- The TABLE_SHARE object has a 'all_set' column bitmap one can use
when one needs a column bitmap with all columns set.
(This is used for table->use_all_columns() and other places)
- The TABLE object has 3 column bitmaps:
- def_read_set Default bitmap for columns to be read
- def_write_set Default bitmap for columns to be written
- tmp_set Can be used as a temporary bitmap when needed.
The table object has also two pointer to bitmaps read_set and write_set
that the handler should use to find out which columns are used in which way.
- count() optimization now calls handler::records() instead of using
handler->stats.records (if (table_flags() & HA_HAS_RECORDS) is true).
- Added extra argument to Item::walk() to indicate if we should also
traverse sub queries.
- Added TABLE parameter to cp_buffer_from_ref()
- Don't close tables created with CREATE ... SELECT but keep them in
the table cache. (Faster usage of newly created tables).
New interfaces:
- table->clear_column_bitmaps() to initialize the bitmaps for tables
at start of new statements.
- table->column_bitmaps_set() to set up new column bitmaps and signal
the handler about this.
- table->column_bitmaps_set_no_signal() for some few cases where we need
to setup new column bitmaps but don't signal the handler (as the handler
has already been signaled about these before). Used for the momement
only in opt_range.cc when doing ROR scans.
- table->use_all_columns() to install a bitmap where all columns are marked
as use in the read and the write set.
- table->default_column_bitmaps() to install the normal read and write
column bitmaps, but not signaling the handler about this.
This is mainly used when creating TABLE instances.
- table->mark_columns_needed_for_delete(),
table->mark_columns_needed_for_delete() and
table->mark_columns_needed_for_insert() to allow us to put additional
columns in column usage maps if handler so requires.
(The handler indicates what it neads in handler->table_flags())
- table->prepare_for_position() to allow us to tell handler that it
needs to read primary key parts to be able to store them in
future table->position() calls.
(This replaces the table->file->ha_retrieve_all_pk function)
- table->mark_auto_increment_column() to tell handler are going to update
columns part of any auto_increment key.
- table->mark_columns_used_by_index() to mark all columns that is part of
an index. It will also send extra(HA_EXTRA_KEYREAD) to handler to allow
it to quickly know that it only needs to read colums that are part
of the key. (The handler can also use the column map for detecting this,
but simpler/faster handler can just monitor the extra() call).
- table->mark_columns_used_by_index_no_reset() to in addition to other columns,
also mark all columns that is used by the given key.
- table->restore_column_maps_after_mark_index() to restore to default
column maps after a call to table->mark_columns_used_by_index().
- New item function register_field_in_read_map(), for marking used columns
in table->read_map. Used by filesort() to mark all used columns
- Maintain in TABLE->merge_keys set of all keys that are used in query.
(Simplices some optimization loops)
- Maintain Field->part_of_key_not_clustered which is like Field->part_of_key
but the field in the clustered key is not assumed to be part of all index.
(used in opt_range.cc for faster loops)
- dbug_tmp_use_all_columns(), dbug_tmp_restore_column_map()
tmp_use_all_columns() and tmp_restore_column_map() functions to temporally
mark all columns as usable. The 'dbug_' version is primarily intended
inside a handler when it wants to just call Field:store() & Field::val()
functions, but don't need the column maps set for any other usage.
(ie:: bitmap_is_set() is never called)
- We can't use compare_records() to skip updates for handlers that returns
a partial column set and the read_set doesn't cover all columns in the
write set. The reason for this is that if we have a column marked only for
write we can't in the MySQL level know if the value changed or not.
The reason this worked before was that MySQL marked all to be written
columns as also to be read. The new 'optimal' bitmaps exposed this 'hidden
bug'.
- open_table_from_share() does not anymore setup temporary MEM_ROOT
object as a thread specific variable for the handler. Instead we
send the to-be-used MEMROOT to get_new_handler().
(Simpler, faster code)
Bugs fixed:
- Column marking was not done correctly in a lot of cases.
(ALTER TABLE, when using triggers, auto_increment fields etc)
(Could potentially result in wrong values inserted in table handlers
relying on that the old column maps or field->set_query_id was correct)
Especially when it comes to triggers, there may be cases where the
old code would cause lost/wrong values for NDB and/or InnoDB tables.
- Split thd->options flag OPTION_STATUS_NO_TRANS_UPDATE to two flags:
OPTION_STATUS_NO_TRANS_UPDATE and OPTION_KEEP_LOG.
This allowed me to remove some wrong warnings about:
"Some non-transactional changed tables couldn't be rolled back"
- Fixed handling of INSERT .. SELECT and CREATE ... SELECT that wrongly reset
(thd->options & OPTION_STATUS_NO_TRANS_UPDATE) which caused us to loose
some warnings about
"Some non-transactional changed tables couldn't be rolled back")
- Fixed use of uninitialized memory in ha_ndbcluster.cc::delete_table()
which could cause delete_table to report random failures.
- Fixed core dumps for some tests when running with --debug
- Added missing FN_LIBCHAR in mysql_rm_tmp_tables()
(This has probably caused us to not properly remove temporary files after
crash)
- slow_logs was not properly initialized, which could maybe cause
extra/lost entries in slow log.
- If we get an duplicate row on insert, change column map to read and
write all columns while retrying the operation. This is required by
the definition of REPLACE and also ensures that fields that are only
part of UPDATE are properly handled. This fixed a bug in NDB and
REPLACE where REPLACE wrongly copied some column values from the replaced
row.
- For table handler that doesn't support NULL in keys, we would give an error
when creating a primary key with NULL fields, even after the fields has been
automaticly converted to NOT NULL.
- Creating a primary key on a SPATIAL key, would fail if field was not
declared as NOT NULL.
Cleanups:
- Removed not used condition argument to setup_tables
- Removed not needed item function reset_query_id_processor().
- Field->add_index is removed. Now this is instead maintained in
(field->flags & FIELD_IN_ADD_INDEX)
- Field->fieldnr is removed (use field->field_index instead)
- New argument to filesort() to indicate that it should return a set of
row pointers (not used columns). This allowed me to remove some references
to sql_command in filesort and should also enable us to return column
results in some cases where we couldn't before.
- Changed column bitmap handling in opt_range.cc to be aligned with TABLE
bitmap, which allowed me to use bitmap functions instead of looping over
all fields to create some needed bitmaps. (Faster and smaller code)
- Broke up found too long lines
- Moved some variable declaration at start of function for better code
readability.
- Removed some not used arguments from functions.
(setup_fields(), mysql_prepare_insert_check_table())
- setup_fields() now takes an enum instead of an int for marking columns
usage.
- For internal temporary tables, use handler::write_row(),
handler::delete_row() and handler::update_row() instead of
handler::ha_xxxx() for faster execution.
- Changed some constants to enum's and define's.
- Using separate column read and write sets allows for easier checking
of timestamp field was set by statement.
- Remove calls to free_io_cache() as this is now done automaticly in ha_reset()
- Don't build table->normalized_path as this is now identical to table->path
(after bar's fixes to convert filenames)
- Fixed some missed DBUG_PRINT(.."%lx") to use "0x%lx" to make it easier to
do comparision with the 'convert-dbug-for-diff' tool.
Things left to do in 5.1:
- We wrongly log failed CREATE TABLE ... SELECT in some cases when using
row based logging (as shown by testcase binlog_row_mix_innodb_myisam.result)
Mats has promised to look into this.
- Test that my fix for CREATE TABLE ... SELECT is indeed correct.
(I added several test cases for this, but in this case it's better that
someone else also tests this throughly).
Lars has promosed to do this.
Optimised version of ADD/DROP/REORGANIZE partitions for
non-NDB storage engines.
New syntax to handle REBUILD/OPTIMIZE/ANALYZE/CHECK/REPAIR partitions
Quite a few bug fixes
- CHAR() now returns binary string as default
- CHAR(X*65536+Y*256+Z) is now equal to CHAR(X,Y,Z) independent of the character set for CHAR()
- Test for both ETIMEDOUT and ETIME from pthread_cond_timedwait()
(Some old systems returns ETIME and it's safer to test for both values
than to try to write a wrapper for each old system)
- Fixed new introduced bug in NOT BETWEEN X and X
- Ensure we call commit_by_xid or rollback_by_xid for all engines, even if one engine has failed
- Use octet2hex() for all conversion of string to hex
- Simplify and optimize code