This patch was originally developed by Vladislav Vaintroub.
The main changes are:
* Use TryEnterCriticalSection in os_fast_mutex_trylock().
* Use lightweight condition variables on Vista or later Windows;
but fall back to events on older Windows, such as XP.
This patch also fixes the following bugs:
bug# 52102 InnoDB Plugin shows performance drop compared to InnoDB
on Windows
bug# 53204 os_fastmutex_trylock is implemented incorrectly on Windows
rb://363 approved by Inaam Rana
Remove the pure attribute from a function. The function doesn't qualify as
a pure function because it has a side-effect (modifies its parameter). Add
a clarifying comment to another function's declaration.
This change is for performance optimization.
Fixed the performance schema instrumentation interface as follows:
- simplified mysql_unlock_mutex()
- simplified mysql_unlock_rwlock()
- simplified mysql_cond_signal()
- simplified mysql_cond_broadcast()
Changed the get_thread_XXX_locker apis to have one extra parameter,
to provide memory to the instrumentation implementation.
This API change allows to use memory provided by the caller,
to avoid having to use thread local storage.
Using this extra parameter will be done in a separate fix,
this change is for the interface only.
Adjusted all the code and unit tests accordingly.
------------------------------------------------------------
revno: 3529
revision-id: marko.makela@oracle.com-20100629125518-m3am4ia1ffjr0d0j
parent: jimmy.yang@oracle.com-20100629024137-690sacm5sogruzvb
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Tue 2010-06-29 15:55:18 +0300
message:
Bug#54358: READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED
columns
When the server crashes after a record stub has been inserted and
before all its off-page columns have been written, the record will
contain incomplete off-page columns after crash recovery. Such records
may only be accessed at the READ UNCOMMITTED isolation level or when
rolling back a recovered transaction in recv_recovery_rollback_active().
Skip these records at the READ UNCOMMITTED isolation level.
TODO: Add assertions for checking the above assumptions hold when an
incomplete BLOB is encountered.
btr_rec_copy_externally_stored_field(): Return NULL if the field is
incomplete.
row_prebuilt_t::templ_contains_blob: Clarify what "BLOB" means in this
context. Hint: MySQL BLOBs are not the same as InnoDB BLOBs.
row_sel_store_mysql_rec(): Return FALSE if not all columns could be
retrieved. Previously this function always returned TRUE. Assert that
the record is not delete-marked.
row_sel_push_cache_row_for_mysql(): Return FALSE if not all columns
could be retrieved.
row_search_for_mysql(): Skip records containing incomplete off-page
columns. Assert that the transaction isolation level is READ
UNCOMMITTED.
rb://380 approved by Jimmy Yang
Merge and adjust a forgotten change to fix this bug.
rb://393 approved by Jimmy Yang
------------------------------------------------------------------------
r3794 | marko | 2009-01-07 14:14:53 +0000 (Wed, 07 Jan 2009) | 18 lines
branches/6.0: Allow the minimum length of a multi-byte character to be
up to 4 bytes. (Bug #35391)
dtype_t, dict_col_t: Replace mbminlen:2, mbmaxlen:3 with mbminmaxlen:5.
In this way, the 5 bits can hold two values of 0..4, and the storage size
of the fields will not cross the 64-bit boundary. Encode the values as
DATA_MBMAX * mbmaxlen + mbminlen. Define the auxiliary macros
DB_MBMINLEN(mbminmaxlen), DB_MBMAXLEN(mbminmaxlen), and
DB_MINMAXLEN(mbminlen, mbmaxlen).
Try to trim and pad UTF-16 and UTF-32 with spaces as appropriate.
Alexander Barkov suggested the use of cs->cset->fill(cs, buff, len, 0x20).
ha_innobase::store_key_val_for_row() now does that, but the added function
row_mysql_pad_col() does not, because it doesn't have the MySQL TABLE object.
rb://49 approved by Heikki Tuuri
------------------------------------------------------------------------
and clarifies the invariant in dict_table_get_on_id().
In Mar 2007 Marko observed a crash during recovery, the crash resulted from
an UNDO operation on a system table. His solution was to acquire an X lock on
the data dictionary, this in hindsight was an overkill. It is unclear what
caused the crash, current hypothesis is that it was a memory corruption.
The X lock results in performance issues by when undoing changes due to
rollback during normal operation on regular tables.
Why the change is safe:
======================
The InnoDB code has changed since the original X lock change was made. In the
new code we always lock the data dictionary in X mode during startup when
UNDOing operations on the system tables (this is a given). This ensures that
the crash Marko observed cannot happen as long as all transactions that update
the system tables follow the standard rules by setting the appropriate DICT_OP
flag when writing the log records when they make the changes.
If transactions violate the above mentioned rule then during recovery (at
startup) the rollback code (see trx0roll.c) will not acquire the X lock
and we will see the crash again. This will however be a different bug.
and clarifies the invariant in dict_table_get_on_id().
In Mar 2007 Marko observed a crash during recovery, the crash resulted from
an UNDO operation on a system table. His solution was to acquire an X lock on
the data dictionary, this in hindsight was an overkill. It is unclear what
caused the crash, current hypothesis is that it was a memory corruption.
The X lock results in performance issues by when undoing changes due to
rollback during normal operation on regular tables.
Why the change is safe:
======================
The InnoDB code has changed since the original X lock change was made. In the
new code we always lock the data dictionary in X mode during startup when
UNDOing operations on the system tables (this is a given). This ensures that
the crash Marko observed cannot happen as long as all transactions that update
the system tables follow the standard rules by setting the appropriate DICT_OP
flag when writing the log records when they make the changes.
If transactions violate the above mentioned rule then during recovery (at
startup) the rollback code (see trx0roll.c) will not acquire the X lock
and we will see the crash again. This will however be a different bug.
------------------------------------------------------------
revno: 3517
revision-id: vasil.dimov@oracle.com-20100622163043-dc0lxy0byg74viet
parent: marko.makela@oracle.com-20100621095148-8g73k8k68dpj080u
committer: Vasil Dimov <vasil.dimov@oracle.com>
branch nick: mysql-5.1-innodb
timestamp: Tue 2010-06-22 19:30:43 +0300
message:
Fix Bug#47991 InnoDB Dictionary Cache memory usage increases indefinitely
when renaming tables
Allocate the table name using ut_malloc() instead of table->heap because
the latter cannot be freed.
Adjust dict_sys->size calculations all over the code.
Change dict_table_t::name from const char* to char* because we need to
ut_malloc()/ut_free() it.
Reviewed by: Inaam, Marko, Heikki (rb://384)
Approved by: Heikki (rb://384)
------------------------------------------------------------
and innodb_file_format_max two system variables. And this also fixes
bug #53654 after 2nd shutdown innodb_file_format_check attains strange
values.
rb://366 approved by Marko
------------------------------------------------------------
revno: 3495
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Wed 2010-06-02 13:37:14 +0300
message:
Bug#53674: InnoDB: Error: unlock row could not find a 4 mode lock on the record
In semi-consistent read, only unlock freshly locked non-matching records.
lock_rec_lock_fast(): Return LOCK_REC_SUCCESS,
LOCK_REC_SUCCESS_CREATED, or LOCK_REC_FAIL instead of TRUE/FALSE.
enum db_err: Add DB_SUCCESS_LOCKED_REC for indicating a successful
operation where a record lock was created.
lock_sec_rec_read_check_and_lock(),
lock_clust_rec_read_check_and_lock(), lock_rec_enqueue_waiting(),
lock_rec_lock_slow(), lock_rec_lock(), row_ins_set_shared_rec_lock(),
row_ins_set_exclusive_rec_lock(), sel_set_rec_lock(),
row_sel_get_clust_rec_for_mysql(): Return DB_SUCCESS_LOCKED_REC if a
new record lock was created. Adjust callers.
row_unlock_for_mysql(): Correct the function documentation.
row_prebuilt_t::new_rec_locks: Correct the documentation.
In semi-consistent read, only unlock freshly locked non-matching records.
Define DB_SUCCESS_LOCKED_REC for indicating a successful operation
where a record lock was created.
lock_rec_lock_fast(): Return LOCK_REC_SUCCESS,
LOCK_REC_SUCCESS_CREATED, or LOCK_REC_FAIL instead of TRUE/FALSE.
lock_sec_rec_read_check_and_lock(),
lock_clust_rec_read_check_and_lock(), lock_rec_enqueue_waiting(),
lock_rec_lock_slow(), lock_rec_lock(), row_ins_set_shared_rec_lock(),
row_ins_set_exclusive_rec_lock(), sel_set_rec_lock(),
row_sel_get_clust_rec_for_mysql(): Return DB_SUCCESS_LOCKED_REC if a
new record lock was created. Adjust callers.
row_unlock_for_mysql(): Correct the function documentation.
row_prebuilt_t::new_rec_locks: Correct the documentation.
can now view the content of InnoDB System Tables through following
information schema tables:
information_schema.INNODB_SYS_TABLES
information_schema.INNODB_SYS_INDEXES
information_schema.INNODB_SYS_COUMNS
information_schema.INNODB_SYS_FIELDS
information_schema.INNODB_SYS_FOREIGN
information_schema.INNODB_SYS_FOREIGN_COLS
information_schema.INNODB_SYS_TABLESTATS
rb://330 Approved by Marko
------------------------------------------------------------
revno: 3479
revision-id: marko.makela@oracle.com-20100524110439-fazi70rlmt07tzd9
parent: vasil.dimov@oracle.com-20100520133157-42uk5q3pp0vsinac
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Mon 2010-05-24 14:04:39 +0300
message:
Bug#53578: assert on invalid page access, in fil_io()
Store the max_space_id in the data dictionary header in order to avoid
space_id reuse.
DICT_HDR_MIX_ID: Renamed to DICT_HDR_MAX_SPACE_ID, DICT_HDR_MIX_ID_LOW.
dict_hdr_get_new_id(): Return table_id, index_id, space_id or a subset of them.
fil_system_t: Add ibool space_id_reuse_warned.
fil_create_new_single_table_tablespace(): Get the space_id from the caller.
fil_space_create(): Issue a warning if the fil_system->max_assigned_id
is exceeded.
fil_assign_new_space_id(): Return TRUE/FALSE and take a pointer to the
space_id as a parameter. Make the function public.
fil_init(): Initialize all fil_system fields by mem_zalloc(). Remove
explicit initializations of certain fields to 0 or NULL.
TO DO: Enable this in CMake-based builds.
------------------------------------------------------------
revno: 3474
revision-id: marko.makela@oracle.com-20100520104042-ma2nsscqdvwoph8k
parent: marko.makela@oracle.com-20100519081618-h38q02qxuvcowbtk
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Thu 2010-05-20 13:40:42 +0300
message:
Bug#53593: Add some instrumentation to improve Valgrind sensitivity
BUILD/*: Add valgrind_configs=--with-valgrind.
BUILD/*: Remove -USAFEMALLOC from valgrind_flags.
configure.in: Add AC_ARG_WITH(valgrind) and HAVE_VALGRIND.
include/my_sys.h: Define a number of MEM_ wrappers for VALGRIND_ functions.
include/my_sys.h: Make TRASH do MEM_UNDEFINED().
include/m_string.h: Remove unused macro bzero_if_purify(A,B).
_mymalloc(): Declare MEM_UNDEFINED() on the allocated memory.
_myfree(): Declare MEM_NOACCESS() on the freed memory.
storage/innobase/include/univ.i: Enable UNIV_DEBUG_VALGRIND based on
HAVE_VALGRIND rather than HAVE_purify.
Possible things to do:
* In my_global.h, remove the defined(HAVE_purify) condition
from the _WIN32 uint3korr().
* In my_global.h *int*korr(), use | instead of +
in order to keep the Valgrind V bits accurate
* Consider replacing HAVE_purify with HAVE_VALGRIND
* Use VALGRIND_CREATE_BLOCK, VALGRIND_DISCARD in mem_root and similar places
BUILD/*: Add valgrind_configs=--with-valgrind.
BUILD/*: Remove -USAFEMALLOC from valgrind_flags.
configure.in: Add AC_ARG_WITH(valgrind) and HAVE_VALGRIND.
include/my_sys.h: Define a number of MEM_ wrappers for VALGRIND_ functions.
include/my_sys.h: Make TRASH do MEM_UNDEFINED().
include/m_string.h: Remove unused macro bzero_if_purify(A,B).
_mymalloc(): Declare MEM_UNDEFINED() on the allocated memory.
_myfree(): Declare MEM_NOACCESS() on the freed memory.
storage/innobase/include/univ.i: Enable UNIV_DEBUG_VALGRIND based on
HAVE_VALGRIND rather than HAVE_purify.
Possible things to do:
* In my_global.h, remove the defined(HAVE_purify) condition
from the _WIN32 uint3korr().
* In my_global.h *int*korr(), use | instead of +
in order to keep the Valgrind V bits accurate
* Consider replacing HAVE_purify with HAVE_VALGRIND
* Use VALGRIND_CREATE_BLOCK, VALGRIND_DISCARD in mem_root and similar places
------------------------------------------------------------
revno: 3094
revision-id: vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh
parent: vasil.dimov@oracle.com-20100512173700-byf8xntxjur1hqov
committer: Vasil Dimov <vasil.dimov@oracle.com>
branch nick: mysql-trunk-innodb
timestamp: Thu 2010-05-13 10:46:52 +0300
message:
Followup to Bug#51920, fix binlog.binlog_killed
This is a followup to the fix of
Bug#51920 Innodb connections in row lock wait ignore KILL until lock wait
timeout
in that fix (rb://279) the behavior was changed to honor when a trx is
interrupted during lock wait, but the returned error code was still
"lock wait timeout" when it should be "interrupted".
This change fixes the non-deterministically failing test binlog.binlog_killed,
that failed like this:
binlog.binlog_killed 'stmt' [ fail ]
Test ended at 2010-05-12 11:39:08
CURRENT_TEST: binlog.binlog_killed
mysqltest: At line 208: query 'reap' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 0...
Approved by: Sunny Bains (rb://344)
------------------------------------------------------------
This merge is non-trivial since it has to introduce the DB_INTERRUPTED
error code.
Also revert vasil.dimov@oracle.com-20100408165555-9rpjh24o0sa9ad5y
which adjusted the binlog.binlog_killed test to the new (wrong) behavior
Post-merge fixes: Remove the MYSQL_VERSION_ID checks, because they only
apply to the InnoDB Plugin. Fix potential race condition accessing
trx->op_info and trx->detailed_error.
------------------------------------------------------------
revno: 3466
revision-id: marko.makela@oracle.com-20100514130815-ym7j7cfu88ro6km4
parent: marko.makela@oracle.com-20100514130228-n3n42nw7ht78k0wn
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: mysql-5.1-innodb2
timestamp: Fri 2010-05-14 16:08:15 +0300
message:
Make the InnoDB FOREIGN KEY parser understand multi-statements. (Bug #48024)
Also make InnoDB thinks that /*/ only starts a comment. (Bug #53644).
This fixes the bugs in the InnoDB Plugin.
ha_innodb.h: Use trx_query_string() instead of trx_query() when
available (MySQL 5.1.42 or later).
innobase_get_stmt(): New function, to retrieve the currently running
SQL statement.
struct trx_struct: Remove mysql_query_str. Use innobase_get_stmt() instead.
dict_strip_comments(): Add and observe the parameter sql_length. Treat
/*/ as the start of a comment.
dict_create_foreign_constraints(), row_table_add_foreign_constraints():
Add the parameter sql_length.
Also make InnoDB thinks that /*/ only starts a comment. (Bug #53644).
struct trx_struct: Add mysql_query_len.
ha_innodb.cc: Use trx_query_string() instead of trx_query() and
initialize trx->mysql_query_len.
INNOBASE_COPY_STMT(thd, trx): New macro, to initialize
trx->mysql_query_str and trx->mysql_query_len.
dict_strip_comments(): Add and observe the parameter sql_length. Treat
/*/ as the start of a comment.
dict_create_foreign_constraints(), row_table_add_foreign_constraints():
Add the parameter sql_length.
in the code but they have nothing to do with the kernel mutex split code.
Some subsequent commits use the new functions. This patch has been tested
with: ./mtr --suite=innodb with UNIV_DEBUG and UNIV_SYNC_DEBUG enabled.
All tests were successful.
------------------------------------------------------------
revno: 3459
revision-id: marko.makela@oracle.com-20100511105308-grp2t3prh3tqivw0
parent: marko.makela@oracle.com-20100511105012-b2t7wvz6mu6bll74
parent: marko.makela@oracle.com-20100505123901-xjxu93h1xnbkfkq0
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: mysql-5.1-innodb
timestamp: Tue 2010-05-11 13:53:08 +0300
message:
Merge a patch from Facebook to fix Bug #53290
commit e759bc64eb5c5eed4f75677ad67246797d486460
Author: Ryan Mack
Date: 3 days ago
Bugfix for 53290, fast unique index creation fails on duplicate null values
Summary:
Bug in the fast index creation code incorrectly considers null
values to be duplicates during block merging. Innodb policy is that
multiple null values are allowed in a unique index. Null duplicates
were correctly ignored while sorting individual blocks and with slow
index creation.
Test Plan:
mtr, including new test, load dbs using deferred index creation
License:
Copyright (C) 2009-2010 Facebook, Inc. All Rights Reserved.
Dual licensed under BSD license and GPLv2.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY FACEBOOK, INC. ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL FACEBOOK, INC. BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
------------------------------------------------------------
revno: 3453.2.1
revision-id: marko.makela@oracle.com-20100505123901-xjxu93h1xnbkfkq0
parent: marko.makela@oracle.com-20100505120555-ukoq1gklpheslrxs
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Wed 2010-05-05 15:39:01 +0300
message:
Merge a contribution from Ryan Mack at Facebook:
Bugfix for 53290, fast unique index creation fails on duplicate null values
Summary:
Bug in the fast index creation code incorrectly considers null
values to be duplicates during block merging. Innodb policy is that
multiple null values are allowed in a unique index. Null duplicates
were correctly ignored while sorting individual blocks and with slow
index creation.
Test Plan:
mtr, including new test, load dbs using deferred index creation
DiffCamp Revision: 110840
Reviewed By: mcallaghan
CC: mcallaghan, mysql-devel@lists
Revert Plan:
OK