This is a manual merge from mysql-5.1-innodb of:
revision-id: vasil.dimov@oracle.com-20100930102618-s9f9ytbytr3eqw9h
committer: Vasil Dimov <vasil.dimov@oracle.com>
timestamp: Thu 2010-09-30 13:26:18 +0300
message:
Fix a potential bug when using __sync_lock_test_and_set()
TYPE __sync_lock_test_and_set (TYPE *ptr, TYPE value, ...)
it is not documented what happens if the two arguments are of different
type like it was before: the first one was lock_word_t (byte) and the
second one was 1 or 0 (int).
Approved by: Marko (via IRC)
and above the general requirement free. We call them
BUF_FLUSH_EXTRA_MARGIN. With multiple buffer pools we may end up keeping
this amount of pages for each buffer pool. This patch, diagnosed and fixed
by Michael, throttles flushing in such cases.
rb://435
bug#54346
Improve the range estimation algorithm.
Previously:
For a given level the algo knows the number of pages in the requested range and the n
With this change:
Same idea, but peek a few (10) of the intermediate pages to get a better estimate of
In the bug report one of the examples has a btree with a snippet of the leaf level li
page1(899 records), page2(1 record), page3(1 record), page4(1 record)
so when trying to estimate, the previous algo, assumed there are average (899+1)/2=45
Fix Bug#53761 RANGE estimation for matched rows may be 200 times different
Improve the range estimation algorithm.
Previously:
For a given level the algo knows the number of pages in the requested range
and the number of records on the leftmost and the rightmost page. Then it
assumes all pages in between contain the average between the two border pages
and multiplies this average number by the number of intermediate pages.
With this change:
Same idea, but peek a few (10) of the intermediate pages to get a better
estimate of the average number of records per page. If there are less than 10
intermediate pages then all of them will be scanned and the result will be
precise, not an estimation.
In the bug report one of the examples has a btree with a snippet of the leaf
level like this:
page1(899 records), page2(1 record), page3(1 record), page4(1 record)
so when trying to estimate, the previous algo, assumed there are average
(899+1)/2=450 records per page which went terribly wrong. With this change
page2 and page3 will be read and the exact number of records will be returned.
Approved by: Sunny (rb://401)
Reduce ibuf_mutex and ibuf_pessimistic_insert_mutex contention further.
Protect ibuf->empty by the insert buffer root page latch, not ibuf_mutex.
ibuf_tree_root_get(): Assert that ibuf_mutex is owned by the
caller. Assert that the stamped page number is correct. Assert that
ibuf->empty agrees with the root page.
ibuf_size_update(): Do not update ibuf->empty.
ibuf_init_at_db_start(): Update ibuf->empty while holding the root page latch.
ibuf_add_free_page(): Return TRUE/FALSE instead of DB_SUCCESS/DB_STRONG_FAIL.
ibuf_remove_free_page(): Release ibuf_pessimistic_insert_mutex as
early as possible.
ibuf_contract_ext(): Rely on a dirty read of ibuf->empty, unless the
server is being shut down. Never acquire ibuf_mutex. Eliminate n_stored.
ibuf_contract_after_insert(): Never acquire ibuf_mutex. Perform dirty
reads of ibuf->size and ibuf->max_size.
ibuf_insert_low(): Only acquire ibuf_mutex for mode==BTR_MODIFY_TREE.
Perform dirty reads of ibuf->size and ibuf->max_size. Update
ibuf->empty while holding the root page latch.
ibuf_delete_rec(): Update ibuf->empty while holding the root page latch.
ibuf_is_empty(): Release ibuf_mutex earlier.
This patch was originally developed by Vladislav Vaintroub.
The main changes are:
* Use TryEnterCriticalSection in os_fast_mutex_trylock().
* Use lightweight condition variables on Vista or later Windows;
but fall back to events on older Windows, such as XP.
This patch also fixes the following bugs:
bug# 52102 InnoDB Plugin shows performance drop compared to InnoDB
on Windows
bug# 53204 os_fastmutex_trylock is implemented incorrectly on Windows
rb://363 approved by Inaam Rana
Remove the pure attribute from a function. The function doesn't qualify as
a pure function because it has a side-effect (modifies its parameter). Add
a clarifying comment to another function's declaration.
This change is for performance optimization.
Fixed the performance schema instrumentation interface as follows:
- simplified mysql_unlock_mutex()
- simplified mysql_unlock_rwlock()
- simplified mysql_cond_signal()
- simplified mysql_cond_broadcast()
Changed the get_thread_XXX_locker apis to have one extra parameter,
to provide memory to the instrumentation implementation.
This API change allows to use memory provided by the caller,
to avoid having to use thread local storage.
Using this extra parameter will be done in a separate fix,
this change is for the interface only.
Adjusted all the code and unit tests accordingly.
------------------------------------------------------------
revno: 3529
revision-id: marko.makela@oracle.com-20100629125518-m3am4ia1ffjr0d0j
parent: jimmy.yang@oracle.com-20100629024137-690sacm5sogruzvb
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Tue 2010-06-29 15:55:18 +0300
message:
Bug#54358: READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED
columns
When the server crashes after a record stub has been inserted and
before all its off-page columns have been written, the record will
contain incomplete off-page columns after crash recovery. Such records
may only be accessed at the READ UNCOMMITTED isolation level or when
rolling back a recovered transaction in recv_recovery_rollback_active().
Skip these records at the READ UNCOMMITTED isolation level.
TODO: Add assertions for checking the above assumptions hold when an
incomplete BLOB is encountered.
btr_rec_copy_externally_stored_field(): Return NULL if the field is
incomplete.
row_prebuilt_t::templ_contains_blob: Clarify what "BLOB" means in this
context. Hint: MySQL BLOBs are not the same as InnoDB BLOBs.
row_sel_store_mysql_rec(): Return FALSE if not all columns could be
retrieved. Previously this function always returned TRUE. Assert that
the record is not delete-marked.
row_sel_push_cache_row_for_mysql(): Return FALSE if not all columns
could be retrieved.
row_search_for_mysql(): Skip records containing incomplete off-page
columns. Assert that the transaction isolation level is READ
UNCOMMITTED.
rb://380 approved by Jimmy Yang
Merge and adjust a forgotten change to fix this bug.
rb://393 approved by Jimmy Yang
------------------------------------------------------------------------
r3794 | marko | 2009-01-07 14:14:53 +0000 (Wed, 07 Jan 2009) | 18 lines
branches/6.0: Allow the minimum length of a multi-byte character to be
up to 4 bytes. (Bug #35391)
dtype_t, dict_col_t: Replace mbminlen:2, mbmaxlen:3 with mbminmaxlen:5.
In this way, the 5 bits can hold two values of 0..4, and the storage size
of the fields will not cross the 64-bit boundary. Encode the values as
DATA_MBMAX * mbmaxlen + mbminlen. Define the auxiliary macros
DB_MBMINLEN(mbminmaxlen), DB_MBMAXLEN(mbminmaxlen), and
DB_MINMAXLEN(mbminlen, mbmaxlen).
Try to trim and pad UTF-16 and UTF-32 with spaces as appropriate.
Alexander Barkov suggested the use of cs->cset->fill(cs, buff, len, 0x20).
ha_innobase::store_key_val_for_row() now does that, but the added function
row_mysql_pad_col() does not, because it doesn't have the MySQL TABLE object.
rb://49 approved by Heikki Tuuri
------------------------------------------------------------------------
and clarifies the invariant in dict_table_get_on_id().
In Mar 2007 Marko observed a crash during recovery, the crash resulted from
an UNDO operation on a system table. His solution was to acquire an X lock on
the data dictionary, this in hindsight was an overkill. It is unclear what
caused the crash, current hypothesis is that it was a memory corruption.
The X lock results in performance issues by when undoing changes due to
rollback during normal operation on regular tables.
Why the change is safe:
======================
The InnoDB code has changed since the original X lock change was made. In the
new code we always lock the data dictionary in X mode during startup when
UNDOing operations on the system tables (this is a given). This ensures that
the crash Marko observed cannot happen as long as all transactions that update
the system tables follow the standard rules by setting the appropriate DICT_OP
flag when writing the log records when they make the changes.
If transactions violate the above mentioned rule then during recovery (at
startup) the rollback code (see trx0roll.c) will not acquire the X lock
and we will see the crash again. This will however be a different bug.
------------------------------------------------------------
revno: 3517
revision-id: vasil.dimov@oracle.com-20100622163043-dc0lxy0byg74viet
parent: marko.makela@oracle.com-20100621095148-8g73k8k68dpj080u
committer: Vasil Dimov <vasil.dimov@oracle.com>
branch nick: mysql-5.1-innodb
timestamp: Tue 2010-06-22 19:30:43 +0300
message:
Fix Bug#47991 InnoDB Dictionary Cache memory usage increases indefinitely
when renaming tables
Allocate the table name using ut_malloc() instead of table->heap because
the latter cannot be freed.
Adjust dict_sys->size calculations all over the code.
Change dict_table_t::name from const char* to char* because we need to
ut_malloc()/ut_free() it.
Reviewed by: Inaam, Marko, Heikki (rb://384)
Approved by: Heikki (rb://384)
------------------------------------------------------------
and innodb_file_format_max two system variables. And this also fixes
bug #53654 after 2nd shutdown innodb_file_format_check attains strange
values.
rb://366 approved by Marko
------------------------------------------------------------
revno: 3495
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Wed 2010-06-02 13:37:14 +0300
message:
Bug#53674: InnoDB: Error: unlock row could not find a 4 mode lock on the record
In semi-consistent read, only unlock freshly locked non-matching records.
lock_rec_lock_fast(): Return LOCK_REC_SUCCESS,
LOCK_REC_SUCCESS_CREATED, or LOCK_REC_FAIL instead of TRUE/FALSE.
enum db_err: Add DB_SUCCESS_LOCKED_REC for indicating a successful
operation where a record lock was created.
lock_sec_rec_read_check_and_lock(),
lock_clust_rec_read_check_and_lock(), lock_rec_enqueue_waiting(),
lock_rec_lock_slow(), lock_rec_lock(), row_ins_set_shared_rec_lock(),
row_ins_set_exclusive_rec_lock(), sel_set_rec_lock(),
row_sel_get_clust_rec_for_mysql(): Return DB_SUCCESS_LOCKED_REC if a
new record lock was created. Adjust callers.
row_unlock_for_mysql(): Correct the function documentation.
row_prebuilt_t::new_rec_locks: Correct the documentation.
can now view the content of InnoDB System Tables through following
information schema tables:
information_schema.INNODB_SYS_TABLES
information_schema.INNODB_SYS_INDEXES
information_schema.INNODB_SYS_COUMNS
information_schema.INNODB_SYS_FIELDS
information_schema.INNODB_SYS_FOREIGN
information_schema.INNODB_SYS_FOREIGN_COLS
information_schema.INNODB_SYS_TABLESTATS
rb://330 Approved by Marko
------------------------------------------------------------
revno: 3479
revision-id: marko.makela@oracle.com-20100524110439-fazi70rlmt07tzd9
parent: vasil.dimov@oracle.com-20100520133157-42uk5q3pp0vsinac
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Mon 2010-05-24 14:04:39 +0300
message:
Bug#53578: assert on invalid page access, in fil_io()
Store the max_space_id in the data dictionary header in order to avoid
space_id reuse.
DICT_HDR_MIX_ID: Renamed to DICT_HDR_MAX_SPACE_ID, DICT_HDR_MIX_ID_LOW.
dict_hdr_get_new_id(): Return table_id, index_id, space_id or a subset of them.
fil_system_t: Add ibool space_id_reuse_warned.
fil_create_new_single_table_tablespace(): Get the space_id from the caller.
fil_space_create(): Issue a warning if the fil_system->max_assigned_id
is exceeded.
fil_assign_new_space_id(): Return TRUE/FALSE and take a pointer to the
space_id as a parameter. Make the function public.
fil_init(): Initialize all fil_system fields by mem_zalloc(). Remove
explicit initializations of certain fields to 0 or NULL.
TO DO: Enable this in CMake-based builds.
------------------------------------------------------------
revno: 3474
revision-id: marko.makela@oracle.com-20100520104042-ma2nsscqdvwoph8k
parent: marko.makela@oracle.com-20100519081618-h38q02qxuvcowbtk
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Thu 2010-05-20 13:40:42 +0300
message:
Bug#53593: Add some instrumentation to improve Valgrind sensitivity
BUILD/*: Add valgrind_configs=--with-valgrind.
BUILD/*: Remove -USAFEMALLOC from valgrind_flags.
configure.in: Add AC_ARG_WITH(valgrind) and HAVE_VALGRIND.
include/my_sys.h: Define a number of MEM_ wrappers for VALGRIND_ functions.
include/my_sys.h: Make TRASH do MEM_UNDEFINED().
include/m_string.h: Remove unused macro bzero_if_purify(A,B).
_mymalloc(): Declare MEM_UNDEFINED() on the allocated memory.
_myfree(): Declare MEM_NOACCESS() on the freed memory.
storage/innobase/include/univ.i: Enable UNIV_DEBUG_VALGRIND based on
HAVE_VALGRIND rather than HAVE_purify.
Possible things to do:
* In my_global.h, remove the defined(HAVE_purify) condition
from the _WIN32 uint3korr().
* In my_global.h *int*korr(), use | instead of +
in order to keep the Valgrind V bits accurate
* Consider replacing HAVE_purify with HAVE_VALGRIND
* Use VALGRIND_CREATE_BLOCK, VALGRIND_DISCARD in mem_root and similar places
Post-merge fixes: Remove the MYSQL_VERSION_ID checks, because they only
apply to the InnoDB Plugin. Fix potential race condition accessing
trx->op_info and trx->detailed_error.
------------------------------------------------------------
revno: 3466
revision-id: marko.makela@oracle.com-20100514130815-ym7j7cfu88ro6km4
parent: marko.makela@oracle.com-20100514130228-n3n42nw7ht78k0wn
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: mysql-5.1-innodb2
timestamp: Fri 2010-05-14 16:08:15 +0300
message:
Make the InnoDB FOREIGN KEY parser understand multi-statements. (Bug #48024)
Also make InnoDB thinks that /*/ only starts a comment. (Bug #53644).
This fixes the bugs in the InnoDB Plugin.
ha_innodb.h: Use trx_query_string() instead of trx_query() when
available (MySQL 5.1.42 or later).
innobase_get_stmt(): New function, to retrieve the currently running
SQL statement.
struct trx_struct: Remove mysql_query_str. Use innobase_get_stmt() instead.
dict_strip_comments(): Add and observe the parameter sql_length. Treat
/*/ as the start of a comment.
dict_create_foreign_constraints(), row_table_add_foreign_constraints():
Add the parameter sql_length.