CHECKSUM TABLE for performance schema tables could cause uninitialized
memory reads.
The root cause is a design flaw in the implementation of
mysql_checksum_table(), which do not honor null fields.
However, fixing this bug in CHECKSUM TABLE is risky, as it can cause the
checksum value to change.
This fix implements a work around, to systematically reset fields values
even for null fields, so that the field memory representation is always
initialized with a known value.
Before this fix, the server could crash inside a memcpy when reading data
from the EVENTS_WAITS_CURRENT / HISTORY / HISTORY_LONG tables.
The root cause is that the length used in a memcpy could be corrupted,
when another thread writes data in the wait record being read.
Reading unsafe data is ok, per design choice, and the code does sanitize
the data in general, but did not sanitize the length given to memcpy.
The fix is to also sanitize the schema name / object name / file name
length when extracting the data to produce a row.
The first part is the functional change,
the second is needed as a compile fix on Windows
(header file order).
| committer: Marc Alff <marc.alff@oracle.com>
| branch nick: mysql-5.5-bugfixing-56521
| timestamp: Thu 2010-09-09 14:28:47 -0600
| message:
| Bug#56521 Assertion failed: (m_state == 2), function allocated_to_free, pfs_lock.h (138)
|
| Before this fix, it was possible to build the server:
| - with the performance schema
| - with a dummy implementation of my_atomic (MY_ATOMIC_MODE_DUMMY).
|
| In this case, the resulting binary will just crash,
| as this configuration is not supported.
|
| This fix enforces that the build will fail with a compilation error in this
| configuration, instead of resulting in a broken binary.
| committer: Tor Didriksen <tor.didriksen@oracle.com>
| branch nick: 5.5-bugfixing-56521
| timestamp: Fri 2010-09-10 11:10:38 +0200
| message:
| Header files should be self-contained
Before this fix, it was possible to build the server:
- with the performance schema
- with a dummy implementation of my_atomic (MY_ATOMIC_MODE_DUMMY).
In this case, the resulting binary will just crash,
as this configuration is not supported.
This fix enforces that the build will fail with a compilation error in this
configuration, instead of resulting in a broken binary.
In early development of delete buffering, we did allow B-tree pages
to become empty as a result of buffered deletes. That caused fundamental
problems. The fix was to refuse buffering purge operations unless
the page can be guaranteed to be nonempty. Remove an attempt to
cope with empty pages when merging inserts.
ALTER TABLE on a MERGE table could cause a deadlock with two
other connections if we reached a situation where:
1) A connection doing ALTER TABLE can't upgrade to MDL_EXCLUSIVE on the
parent table, but holds TL_READ_NO_INSERT on the child tables.
2) A connection doing DELETE on a child table can't get TL_WRITE on it
since ALTER TABLE holds TL_READ_NO_INSERT.
3) A connection doing SELECT on the parent table can't get TL_READ on
the child tables since TL_WRITE is ahead in the lock queue, but holds
MDL_SHARED_READ on the parent table preventing ALTER TABLE from upgrading.
For regular tables, this deadlock is avoided by having ALTER TABLE
take a MDL_SHARED_NO_WRITE metadata lock on the table. This prevents
DELETE from acquiring MDL_SHARED_WRITE on the table before ALTER TABLE
tries to upgrade to MDL_EXCLUSIVE. In the example above, SELECT would
therefore not be blocked by the pending DELETE as DELETE would not be
able to enter TL_WRITE in the table lock queue.
This patch fixes the problem for merge tables by using the same metadata
lock type for child tables as for the parent table. The child tables will
in this case therefore be locked with MDL_SHARED_NO_WRITE, preventing
DELETE from acquiring a metadata lock and enter into the table lock queue.
Change in behavior: By taking the same metadata lock for child tables
as for the parent table, LOCK TABLE on the parent table will now also
implicitly lock the child tables. Since LOCK TABLE on the parent table
now takes more than one metadata lock, it is possible for LOCK TABLE
... WRITE on the parent table or child tables to give ER_LOCK_DEADLOCK
error.
Test case added to mdl_sync.test.
Merge.test/.result has been updated to reflect the change to LOCK TABLE.
Remove non applicable licensing files
storage/innobase/COPYING is in the MySQL top level directory and
storage/innobase/COPYING.Sun_Microsystems is not applicable anymore
now that Oracle and Sun are one company.
Problem: trailing spaces were stripped using 8-bit code,
so the truncation result length was incorrect, which led
to an assertion failure.
Fix: using multi-byte safe code.
Before this fix, some tests failed due to lack of instrumentation slots
in the performance schema, because the default sizing was too low.
Now that more code has been instrumented, the default sizing has to be adjusted
to match the current instrumentation consumption.
This change:
- increases the number of rwlock classes from 20 to 30,
- increases the number of rwlock and mutex instances to 1 million.
Both are to account for the volume of data instrumented
when the innodb storage engine is used (because of the innodb buffer pool).
Adjusted the test output accordingly.
------------------------------------------------------------
revno: 3550
revision-id: marko.makela@oracle.com-20100824081003-v4ecy0tga99cpxw2
parent: marko.makela@oracle.com-20100823102854-t1clrojqis2ley36
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Tue 2010-08-24 11:10:03 +0300
message:
Bug#55832: selects crash too easily when innodb_force_recovery>3
dict_update_statistics_low(): Create bogus statistics for those
indexes that cannot be accessed because of the innodb_force_recovery
setting.
ha_innobase::info(): Calculate statistics for each index, even if
innodb_force_recovery is set. Fill in bogus data for those indexes
that are not accessed because of the innodb_force_recovery setting.
dict_update_statistics_low(): Create bogus statistics for those
indexes that cannot be accessed because of the innodb_force_recovery
setting.
ha_innobase::info(): Calculate statistics for each index, even if
innodb_force_recovery is set. Fill in bogus data for those indexes
that are not accessed because of the innodb_force_recovery setting.
dict_update_statistics_low(): Create bogus statistics for those
indexes that cannot be accessed because of the innodb_force_recovery
setting.
ha_innobase::info(): Calculate statistics for each index, even if
innodb_force_recovery is set. Fill in bogus data for those indexes
that are not accessed because of the innodb_force_recovery setting.
and above the general requirement free. We call them
BUF_FLUSH_EXTRA_MARGIN. With multiple buffer pools we may end up keeping
this amount of pages for each buffer pool. This patch, diagnosed and fixed
by Michael, throttles flushing in such cases.
rb://435
bug#54346
This patch doesn't get rid of the need to acquire the dict_sys->mutex but
reduces the need to keep the mutex locked for the duration of the query
to fsp_get_available_space_in_free_extents() from ha_innobase::info().
rb://390.
The callers should indicate that the dictionary is locked or not using
the trx->dict_operation_lock_mode == RW_X_LATCH mode. Checking explicitly
for system tables is unnecessary.
Approved by Marko on IRC.
Issue an error message to the error log when
trx->dict_operation_lock_mode == RW_X_LATCH in
srv_suspend_mysql_thread(). Transactions that modify InnoDB
data dictionary tables must be free of lock waits, because they
must be holding the data dictionary latch in exclusive mode.
The transactions must not be accessing any other tables other than
the data dictionary tables.
The handling of RW_X_LATCH was accidentally added in the InnoDB Plugin,
as a wrong fix of an assertion failure. (Fast index creation was accessing
both data dictionary tables and user tables in the same transaction.)
revno: 3545
revision-id: marko.makela@oracle.com-20100818110110-zfs0i1vfrccfb4yw
parent: vasil.dimov@oracle.com-20100817193934-1yl7zz2odikxauf8
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Wed 2010-08-18 14:01:10 +0300
message:
Bug#55626: MIN and MAX reading a delete-marked record from secondary index
Remove a bogus debug assertion that triggered the bug.
Add assertions precisely where records must not be delete-marked.
And a comment to clarify when the record is allowed to be delete-marked.
Remove a bogus debug assertion that triggered the bug.
Add assertions precisely where records must not be delete-marked.
And a comment to clarify when the record is allowed to be delete-marked.
Added InnoDB to the 'default' plugin group, and modified
the autoconf script so the 'default' group is actually
built by default.
(i.e ./configure.am == ./configure.am --with-plugins=default ,
instead of being ./configure.am --with-plugins=none )
Improve the range estimation algorithm.
Previously:
For a given level the algo knows the number of pages in the requested range and the n
With this change:
Same idea, but peek a few (10) of the intermediate pages to get a better estimate of
In the bug report one of the examples has a btree with a snippet of the leaf level li
page1(899 records), page2(1 record), page3(1 record), page4(1 record)
so when trying to estimate, the previous algo, assumed there are average (899+1)/2=45
Fix Bug#53761 RANGE estimation for matched rows may be 200 times different
Improve the range estimation algorithm.
Previously:
For a given level the algo knows the number of pages in the requested range
and the number of records on the leftmost and the rightmost page. Then it
assumes all pages in between contain the average between the two border pages
and multiplies this average number by the number of intermediate pages.
With this change:
Same idea, but peek a few (10) of the intermediate pages to get a better
estimate of the average number of records per page. If there are less than 10
intermediate pages then all of them will be scanned and the result will be
precise, not an estimation.
In the bug report one of the examples has a btree with a snippet of the leaf
level like this:
page1(899 records), page2(1 record), page3(1 record), page4(1 record)
so when trying to estimate, the previous algo, assumed there are average
(899+1)/2=450 records per page which went terribly wrong. With this change
page2 and page3 will be read and the exact number of records will be returned.
Approved by: Sunny (rb://401)
------------------------------------------------------------
revno: 3476
committer: Sunny Bains <Sunny.Bains@Oracle.Com>
branch nick: 5.1-security
timestamp: Thu 2010-08-05 19:18:17 +1000
message:
Fix bug# 55543 - InnoDB Plugin: Signal 6: Assertion failure in file fil/fil0fil.c line 4306
The bug is due to a double delete of a BLOB, once via:
rollback -> btr_cur_pessimistic_delete()
and the second time via purge.
The bug is in row_upd_clust_rec_by_insert(). There we relinquish ownership
of the non-updated BLOB columns in btr_cur_mark_extern_inherited_fields()
before building the row entry that will be inserted and whose contents will
be logged in the UNDO log. However, we don't set the BLOB column later to
INHERITED so that a possible rollback will not free the original row's
non-updated BLOB entries. This is because the condition that checks for
that is in :
if (node->upd_ext) {}.
node->upd_ext is non-NULL only if a BLOB column was updated and that column
is part of some key ordering (see row_upd_replace()). This results in the
non-update BLOB columns being deleted during a rollback and subsequently by
purge again.
rb://413
Handle overflow when reading value from SELECT MAX(C) FROM T;
Call ha_innobase::info() after initializing the autoinc value
in ha_innobase::open().
Fix for both the builtin and plugin.
rb://402
Merge from mysql-5.1-security.
Reduce ibuf_mutex and ibuf_pessimistic_insert_mutex contention further.
Protect ibuf->empty by the insert buffer root page latch, not ibuf_mutex.
ibuf_tree_root_get(): Assert that ibuf_mutex is owned by the
caller. Assert that the stamped page number is correct. Assert that
ibuf->empty agrees with the root page.
ibuf_size_update(): Do not update ibuf->empty.
ibuf_init_at_db_start(): Update ibuf->empty while holding the root page latch.
ibuf_add_free_page(): Return TRUE/FALSE instead of DB_SUCCESS/DB_STRONG_FAIL.
ibuf_remove_free_page(): Release ibuf_pessimistic_insert_mutex as
early as possible.
ibuf_contract_ext(): Rely on a dirty read of ibuf->empty, unless the
server is being shut down. Never acquire ibuf_mutex. Eliminate n_stored.
ibuf_contract_after_insert(): Never acquire ibuf_mutex. Perform dirty
reads of ibuf->size and ibuf->max_size.
ibuf_insert_low(): Only acquire ibuf_mutex for mode==BTR_MODIFY_TREE.
Perform dirty reads of ibuf->size and ibuf->max_size. Update
ibuf->empty while holding the root page latch.
ibuf_delete_rec(): Update ibuf->empty while holding the root page latch.
ibuf_is_empty(): Release ibuf_mutex earlier.
regression in Bug #54914, but it does speed up the execution for
innodb_change_buffering=inserts.
ibuf_add_ops(), ibuf_merge_or_delete_for_page(),
ibuf_delete_for_discarded_space(): Use atomic built-ins instead of
ibuf_mutex, when available.
ibuf_add_free_page(), ibuf_remove_free_page(), ibuf_contract_ext():
Release ibuf_mutex earlier.
ibuf_free_excess_pages(): Release ibuf_mutex before a conditional branch.
ibuf_insert_low(): Release ibuf_mutex before a conditional
branch. Create ibuf_entry before re-acquiring ibuf_mutex. Simplify a
loop to reduce code footprint. Release ibuf_mutex before mtr_commit()
[btr_pcur_close()].
ibuf_is_empty(): Release ibuf_mutex before mtr_commit().
Currently we do a full validation of AHI whenever check tables is
called on any table. This patch fixes this by only doing this full
check in debug versions.
bug#55716
rb://423
approved by: Marko
Reverted the ulong->uint diff
Re-applied the first diff.
The original commit message follows:
enum plugin system variables are ulong internally, not int.
On systems where long is not the same as an int it causes
problems.
Fixed by correct typecasting. Removed the test from the
experimental list.
The enum system variables were handled inconsistently
as ints, unsigned int and unsigned long on various places.
This caused problems on platforms on which
sizeof(int) != sizeof(long).
Fixed by homogenizing the type of the enum variables
to unsigned int, since it's size compatible with the C enum
type.
Removed the test from the experimental list.
pages that it wants to flush then we should honor that value as in
not going beyond that in our eagerness to flush the neighbors of
the selected victim.
The problem was that the optimize method of the ARCHIVE storage
engine was not preserving the FRM embedded in the ARZ file when
rewriting the ARZ file for optimization. The ARCHIVE engine stores
the FRM in the ARZ file so it can be transferred from machine to
machine without also copying the FRM -- the engine restores the
embedded FRM during discovery.
The solution is to copy over the FRM when rewriting the ARZ file.
In addition, some initial error checking is performed to ensure
garbage is not copied over.