CHECKSUM TABLE for performance schema tables could cause uninitialized
memory reads.
The root cause is a design flaw in the implementation of
mysql_checksum_table(), which do not honor null fields.
However, fixing this bug in CHECKSUM TABLE is risky, as it can cause the
checksum value to change.
This fix implements a work around, to systematically reset fields values
even for null fields, so that the field memory representation is always
initialized with a known value.
Before this fix, the server could crash inside a memcpy when reading data
from the EVENTS_WAITS_CURRENT / HISTORY / HISTORY_LONG tables.
The root cause is that the length used in a memcpy could be corrupted,
when another thread writes data in the wait record being read.
Reading unsafe data is ok, per design choice, and the code does sanitize
the data in general, but did not sanitize the length given to memcpy.
The fix is to also sanitize the schema name / object name / file name
length when extracting the data to produce a row.
Before this fix, it was possible to build the server:
- with the performance schema
- with a dummy implementation of my_atomic (MY_ATOMIC_MODE_DUMMY).
In this case, the resulting binary will just crash,
as this configuration is not supported.
This fix enforces that the build will fail with a compilation error in this
configuration, instead of resulting in a broken binary.
ALTER TABLE on a MERGE table could cause a deadlock with two
other connections if we reached a situation where:
1) A connection doing ALTER TABLE can't upgrade to MDL_EXCLUSIVE on the
parent table, but holds TL_READ_NO_INSERT on the child tables.
2) A connection doing DELETE on a child table can't get TL_WRITE on it
since ALTER TABLE holds TL_READ_NO_INSERT.
3) A connection doing SELECT on the parent table can't get TL_READ on
the child tables since TL_WRITE is ahead in the lock queue, but holds
MDL_SHARED_READ on the parent table preventing ALTER TABLE from upgrading.
For regular tables, this deadlock is avoided by having ALTER TABLE
take a MDL_SHARED_NO_WRITE metadata lock on the table. This prevents
DELETE from acquiring MDL_SHARED_WRITE on the table before ALTER TABLE
tries to upgrade to MDL_EXCLUSIVE. In the example above, SELECT would
therefore not be blocked by the pending DELETE as DELETE would not be
able to enter TL_WRITE in the table lock queue.
This patch fixes the problem for merge tables by using the same metadata
lock type for child tables as for the parent table. The child tables will
in this case therefore be locked with MDL_SHARED_NO_WRITE, preventing
DELETE from acquiring a metadata lock and enter into the table lock queue.
Change in behavior: By taking the same metadata lock for child tables
as for the parent table, LOCK TABLE on the parent table will now also
implicitly lock the child tables. Since LOCK TABLE on the parent table
now takes more than one metadata lock, it is possible for LOCK TABLE
... WRITE on the parent table or child tables to give ER_LOCK_DEADLOCK
error.
Test case added to mdl_sync.test.
Merge.test/.result has been updated to reflect the change to LOCK TABLE.
Handle combined instrument states of ENABLED and/or TIMED:
ENABLED TIMED
1 1 Aggregate stats, increment counter
1 0 Increment counter
0 1 Do nothing
0 0 Do nothing
storage/perfschema/pfs.cc:
Aggregate stats only if state is both ENABLED and TIMED. If ENABLED
but not TIMED, only increment the value counter.
storage/perfschema/pfs_stat.h:
Split aggregate and counter increment into separate
methods for performance.
Problem: trailing spaces were stripped using 8-bit code,
so the truncation result length was incorrect, which led
to an assertion failure.
Fix: using multi-byte safe code.
Before this fix, some tests failed due to lack of instrumentation slots
in the performance schema, because the default sizing was too low.
Now that more code has been instrumented, the default sizing has to be adjusted
to match the current instrumentation consumption.
This change:
- increases the number of rwlock classes from 20 to 30,
- increases the number of rwlock and mutex instances to 1 million.
Both are to account for the volume of data instrumented
when the innodb storage engine is used (because of the innodb buffer pool).
Adjusted the test output accordingly.
Added InnoDB to the 'default' plugin group, and modified
the autoconf script so the 'default' group is actually
built by default.
(i.e ./configure.am == ./configure.am --with-plugins=default ,
instead of being ./configure.am --with-plugins=none )
Improve the range estimation algorithm.
Previously:
For a given level the algo knows the number of pages in the requested range and the n
With this change:
Same idea, but peek a few (10) of the intermediate pages to get a better estimate of
In the bug report one of the examples has a btree with a snippet of the leaf level li
page1(899 records), page2(1 record), page3(1 record), page4(1 record)
so when trying to estimate, the previous algo, assumed there are average (899+1)/2=45
Fix Bug#53761 RANGE estimation for matched rows may be 200 times different
Improve the range estimation algorithm.
Previously:
For a given level the algo knows the number of pages in the requested range
and the number of records on the leftmost and the rightmost page. Then it
assumes all pages in between contain the average between the two border pages
and multiplies this average number by the number of intermediate pages.
With this change:
Same idea, but peek a few (10) of the intermediate pages to get a better
estimate of the average number of records per page. If there are less than 10
intermediate pages then all of them will be scanned and the result will be
precise, not an estimation.
In the bug report one of the examples has a btree with a snippet of the leaf
level like this:
page1(899 records), page2(1 record), page3(1 record), page4(1 record)
so when trying to estimate, the previous algo, assumed there are average
(899+1)/2=450 records per page which went terribly wrong. With this change
page2 and page3 will be read and the exact number of records will be returned.
Approved by: Sunny (rb://401)
Reduce ibuf_mutex and ibuf_pessimistic_insert_mutex contention further.
Protect ibuf->empty by the insert buffer root page latch, not ibuf_mutex.
ibuf_tree_root_get(): Assert that ibuf_mutex is owned by the
caller. Assert that the stamped page number is correct. Assert that
ibuf->empty agrees with the root page.
ibuf_size_update(): Do not update ibuf->empty.
ibuf_init_at_db_start(): Update ibuf->empty while holding the root page latch.
ibuf_add_free_page(): Return TRUE/FALSE instead of DB_SUCCESS/DB_STRONG_FAIL.
ibuf_remove_free_page(): Release ibuf_pessimistic_insert_mutex as
early as possible.
ibuf_contract_ext(): Rely on a dirty read of ibuf->empty, unless the
server is being shut down. Never acquire ibuf_mutex. Eliminate n_stored.
ibuf_contract_after_insert(): Never acquire ibuf_mutex. Perform dirty
reads of ibuf->size and ibuf->max_size.
ibuf_insert_low(): Only acquire ibuf_mutex for mode==BTR_MODIFY_TREE.
Perform dirty reads of ibuf->size and ibuf->max_size. Update
ibuf->empty while holding the root page latch.
ibuf_delete_rec(): Update ibuf->empty while holding the root page latch.
ibuf_is_empty(): Release ibuf_mutex earlier.
regression in Bug #54914, but it does speed up the execution for
innodb_change_buffering=inserts.
ibuf_add_ops(), ibuf_merge_or_delete_for_page(),
ibuf_delete_for_discarded_space(): Use atomic built-ins instead of
ibuf_mutex, when available.
ibuf_add_free_page(), ibuf_remove_free_page(), ibuf_contract_ext():
Release ibuf_mutex earlier.
ibuf_free_excess_pages(): Release ibuf_mutex before a conditional branch.
ibuf_insert_low(): Release ibuf_mutex before a conditional
branch. Create ibuf_entry before re-acquiring ibuf_mutex. Simplify a
loop to reduce code footprint. Release ibuf_mutex before mtr_commit()
[btr_pcur_close()].
ibuf_is_empty(): Release ibuf_mutex before mtr_commit().
Post-merge fix: remove --with-debug=full, it was only used for safemalloc.
BUILD/compile-pentium-mysqlfs-debug:
Remove build script for a feature that is long gone.
Reverted the ulong->uint diff
Re-applied the first diff.
The original commit message follows:
enum plugin system variables are ulong internally, not int.
On systems where long is not the same as an int it causes
problems.
Fixed by correct typecasting. Removed the test from the
experimental list.
The enum system variables were handled inconsistently
as ints, unsigned int and unsigned long on various places.
This caused problems on platforms on which
sizeof(int) != sizeof(long).
Fixed by homogenizing the type of the enum variables
to unsigned int, since it's size compatible with the C enum
type.
Removed the test from the experimental list.
pages that it wants to flush then we should honor that value as in
not going beyond that in our eagerness to flush the neighbors of
the selected victim.