Commit graph

72 commits

Author SHA1 Message Date
Marko Mäkelä
8da33e3a86 MDEV-11349 (1/2) Fix some clang 4.0 warnings
In InnoDB and XtraDB functions that declare pointer parameters as nonnull,
remove nullness checks, because GCC would optimize them away anyway.

Use #ifdef instead of #if when checking for a configuration flag.

Clang says that left shifts of negative values are undefined.
So, use ~0U instead of ~0 in a number of macros.

Some functions that were defined as UNIV_INLINE were declared as
UNIV_INTERN. Consistently use the same type of linkage.

ibuf_merge_or_delete_for_page() could pass bitmap_page=NULL to
buf_page_print(), conflicting with the __attribute__((nonnull)).
2016-11-25 09:09:51 +02:00
Sergei Golubchik
3361aee591 Merge branch '10.0' into 10.1 2016-06-28 22:01:55 +02:00
Sergei Golubchik
a79d46c3a4 Merge branch 'merge-innodb-5.6' into 10.0 2016-06-21 14:58:19 +02:00
Sergei Golubchik
720e04ff67 5.6.31 2016-06-21 14:21:03 +02:00
Jan Lindström
69147040a6 MDEV-9236: Dramatically overallocation of InnoDB buffer pool leads to crash
Part I: Add diagnostics to page allocation if state is not correct
but do not assert if it is incorrect.
2015-12-17 19:45:42 +02:00
Jan Lindström
7e916bb86f MDEV-8588: Assertion failure in file ha_innodb.cc line 21140 if at least one encrypted table exists and encryption service is not available
Analysis: Problem was that in fil_read_first_page we do find that
    table has encryption information and that encryption service
    or used key_id is not available. But, then we just printed
    fatal error message that causes above assertion.

    Fix: When we open single table tablespace if it has encryption
    information (crypt_data) store this crypt data to the table
    structure. When we open a table and we find out that tablespace
    is not available, check has table a encryption information
    and from there is encryption service or used key_id is not available.
    If it is, add additional warning for SQL-layer.
2015-09-04 20:19:45 +03:00
Jan Lindström
a80753594a MDEV-8591: Database page corruption on disk or a failed space, Assertion failure in file buf0buf.cc line 2856 on querying a table using wrong default encryption key
Improved error messaging to show based on original page before
encryption is page maybe encrypted or just corrupted.
2015-08-14 15:49:56 +03:00
Jan Lindström
d7f3d889de MDEV-8272: Encryption performance: Reduce the number of unused memcpy's
Removed memcpy's on cases when page is not encrypted and make sure
we use the correct buffer for reading/writing.
2015-06-09 11:35:21 +03:00
Jan Lindström
f7002c05ae MDEV-8250: InnoDB: Page compressed tables are not compressed and compressed+encrypted tables cause crash
Analysis: Problem is that both encrypted tables and compressed tables use
FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION to store
required metadata. Furhermore, for only compressed tables currently
code skips compression.

Fixes:
- Only encrypted pages store key_version to FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION,
  no need to fix
- Only compressed pages store compression algorithm to FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION,
  no need to fix as they have different page type FIL_PAGE_PAGE_COMPRESSED
- Compressed and encrypted pages now use a new page type FIL_PAGE_PAGE_COMPRESSED_ENCRYPTED and
  key_version is stored on FIL header offset FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION and compression
  method is stored after FIL header similar way as compressed size, so that first
  FIL_PAGE_COMPRESSED_SIZE is stored followed by FIL_PAGE_COMPRESSION_METHOD
- Fix buf_page_encrypt_before_write function to really compress pages if compression is enabled
- Fix buf_page_decrypt_after_read function to really decompress pages if compression is used
- Small style fixes
2015-06-04 09:47:06 +03:00
Sergei Golubchik
6d06fbbd1d move to storage/innobase 2015-05-04 19:17:21 +02:00
Jan Lindström
0ba9fa35bc InnoDB/XtraDB Encryption cleanup
Step 2:

-- Introduce temporal memory array to buffer pool where to allocate
temporary memory for encryption/compression
-- Rename PAGE_ENCRYPTION -> ENCRYPTION
-- Rename PAGE_ENCRYPTION_KEY -> ENCRYPTION_KEY
-- Rename innodb_default_page_encryption_key -> innodb_default_encryption_key
-- Allow enable/disable encryption for tables by changing
 ENCRYPTION to enum having values DEFAULT, ON, OFF
-- In create table store crypt_data if ENCRYPTION is ON or OFF
-- Do not crypt tablespaces having ENCRYPTION=OFF
-- Store encryption mode to crypt_data and redo-log
2015-04-07 23:44:56 +02:00
Sergei Golubchik
2db62f686e Merge branch '10.0' into 10.1 2015-03-07 13:21:02 +01:00
Sergei Golubchik
6b05688f6d innodb 5.6.23 2015-02-18 17:59:21 +01:00
Jan Lindström
e2e809860e Pass down the information should we encrypt the page at os0file.cc
when page compression and google encryption is used.
2015-02-10 10:21:18 +01:00
Monty
d7d589dc01 Push for testing of encryption 2015-02-10 10:21:17 +01:00
Sergei Golubchik
4b21cd21fe Merge branch '10.0' into merge-wip 2015-01-31 21:48:47 +01:00
Jan Lindström
2877c5ecc2 MDEV-7477: Make innochecksum work on compressed tables
This patch ports the work that facebook has performed
to make innochecksum handle compressed tables.
the basic idea is to use actual innodb-code to perform
checksum verification rather than duplicating in innochecksum.cc.
to make this work, innodb code has been annotated with
lots of #ifndef UNIV_INNOCHECKSUM so that it can be
compiled outside of storage/innobase.

A new testcase is also added that verifies that innochecksum
works on compressed/non-compressed tables.

Merged from commit fabc79d2ea976c4ff5b79bfe913e6bc03ef69d42 
from https://code.google.com/p/google-mysql/

The actual steps to produce this patch are:

    take innochecksum from 5.6.14
    apply changes in innodb from facebook patches needed to make innochecksum compile
    apply changes in innochecksum from facebook patches
    add handcrafted testcase

The referenced facebook patches used are:

    91e25120e7
    847fe76ea5
    1135628a5a
    4dbf7c240c
2015-01-19 12:39:17 +02:00
Sergei Golubchik
853077ad7e Merge branch '10.0' into bb-10.1-merge
Conflicts:
	.bzrignore
	VERSION
	cmake/plugin.cmake
	debian/dist/Debian/control
	debian/dist/Ubuntu/control
	mysql-test/r/join_outer.result
	mysql-test/r/join_outer_jcl6.result
	mysql-test/r/null.result
	mysql-test/r/old-mode.result
	mysql-test/r/union.result
	mysql-test/t/join_outer.test
	mysql-test/t/null.test
	mysql-test/t/old-mode.test
	mysql-test/t/union.test
	packaging/rpm-oel/mysql.spec.in
	scripts/mysql_config.sh
	sql/ha_ndbcluster.cc
	sql/ha_ndbcluster_binlog.cc
	sql/ha_ndbcluster_cond.cc
	sql/item_cmpfunc.h
	sql/lock.cc
	sql/sql_select.cc
	sql/sql_show.cc
	sql/sql_update.cc
	sql/sql_yacc.yy
	storage/innobase/buf/buf0flu.cc
	storage/innobase/fil/fil0fil.cc
	storage/innobase/include/srv0srv.h
	storage/innobase/lock/lock0lock.cc
	storage/tokudb/CMakeLists.txt
	storage/xtradb/buf/buf0flu.cc
	storage/xtradb/fil/fil0fil.cc
	storage/xtradb/include/srv0srv.h
	storage/xtradb/lock/lock0lock.cc
	support-files/mysql.spec.sh
2014-12-02 22:25:16 +01:00
Jan Lindström
a03dd94be8 MDEV-6936: Buffer pool list scan optimization
Merged Facebook commit 617aef9f911d825e9053f3d611d0389e02031225
authored by Inaam Rana to InnoDB storage engine (not XtraDB)
from https://github.com/facebook/mysql-5.6

WL#7047 - Optimize buffer pool list scans and related batch processing

Reduce excessive scanning of pages when doing flush list batches. The
fix is to introduce the concept of "Hazard Pointer", this reduces the
time complexity of the scan from O(n*n) to O.

The concept of hazard pointer is reversed in this work. Academically
hazard pointer is a pointer that the thread working on it will declar
such and as long as that thread is not done no other thread is allowe
do anything with it.

In this WL we declare the pointer as a hazard pointer and then if any
thread attempts to work on it, it is allowed to do so but it has to a
the hazard pointer to the next valid value. We use hazard pointer sol
reverse traversal of lists within a buffer pool instance.

Add an event to control the background flush thread. The background f
thread wait has been converted to an os event timed wait so that it c
signalled by threads that want to kick start a background flush when
buffer pool is running low on free/dirty pages.
2014-11-06 13:17:11 +02:00
Jan Lindström
60e995cfec MDEV-6930: Make innodb_max_dirty_pages_pct my.cnf variable a double
Merged Facebook commit ecff018632c6db49bad73d9233c3cdc9f41430e9
authored by Steaphan Greene from https://github.com/facebook/mysql-5.6

This change is to fix: http://bugs.mysql.com/62534

This makes innodb_max_dirty_pages_pct a double with min,default,max values
0.001, 75, 99.999.

This also makes innodb_max_dirty_pages_pct_lwm and adaptive_flushing_lwm
doubles, as these sysvars are inter-dependent.

Added more to the BUFFER POOL AND MEMORY section of SHOW INNODB STATUS:
Percent pages dirty: X.X
This is all n_dirty_pages / used_pages
Percent all pages dirty: X.X
This is all n_dirty_pages / all-pages
Max dirty pages percent: X.X
This is innodb_max_dirty_pages_pct

Also changed all of buf from 2 to 3 digits of precision (%.2f -> %.3f).
2014-10-25 09:24:39 +03:00
Sergei Golubchik
f62c12b405 Merge 10.0.14 into 10.1 2014-10-15 12:59:13 +02:00
Sergei Golubchik
75796d9ecb InnoDB 5.6.20 2014-09-11 16:42:54 +02:00
Jan Lindström
1016ee9d77 Merge 10.0 -> 10.1 2014-05-24 21:37:21 +03:00
Jan Lindström
d12dbe77e2 MDEV-6246: Merge 10.0.10-FusionIO to 10.1. 2014-05-22 14:24:00 +03:00
Sergei Golubchik
8ee9d19607 innodb 5.6.17 2014-05-07 17:32:23 +02:00
Sergei Golubchik
e2e5d07b28 MDEV-6184 10.0.11 merge
InnoDB 5.6.16
2014-05-06 09:57:39 +02:00
Jan Lindström
96100d6652 Merge: lp:maria/10.0 latest. 2014-03-03 14:27:56 +02:00
Sergei Golubchik
27d45e4696 MDEV-5574 Set AUTO_INCREMENT below max value of column.
Update InnoDB to 5.6.14
Apply MySQL-5.6 hack for MySQL Bug#16434374
Move Aria-only HA_RTREE_INDEX from my_base.h to maria_def.h (breaks an assert in InnoDB)
Fix InnoDB memory leak
2014-02-01 09:33:26 +01:00
Jan Lindström
5e55d1ced5 Changes for Fusion-io multi-threaded flush, page compressed tables and
tables using atomic write/table.

This is work in progress and some parts are at most POC quality.
2013-12-19 14:36:38 +02:00
Michael Widenius
068c61978e Temporary commit of 10.0-merge 2013-03-26 00:03:13 +02:00
Sergei Golubchik
e1f681c99b 10.0-base -> 10.0-monty 2012-10-19 20:38:59 +02:00
Michael Widenius
1d0f70c2f8 Temporary commit of merge of MariaDB 10.0-base and MySQL 5.6 2012-08-01 17:27:34 +03:00
Yasufumi Kinoshita
6058319dba Bug#14251529 : FIX FOR BUG 13704145 CREATES POSSIBLE RACE CONDITION
make buf_read_page_low() to treat DB_TABLESPACE_DELETED error correctly
rb#1129 approved by Inaam
2012-06-29 12:04:44 +09:00
Narayanan Venkateswaran
164080e4a8 WL#6161 Integrating with InnoDB codebase in MySQL 5.5
Changes in the InnoDB codebase required to compile and
integrate the MEB codebase with MySQL 5.5.

@ storage/innobase/btr/btr0btr.c
  Excluded buffer pool usage from MEB build.
 
  buf_pool_from_bpage calls are in buf0buf.ic, and
  the buffer pool functions from that file are
  disabled in MEB.
@ storage/innobase/buf/buf0buf.c
  Disabling more buffer pool functions unused in MEB.
@ storage/innobase/dict/dict0dict.c
  Disabling dict_ind_free that is unused in MEB.
@ storage/innobase/dict/dict0mem.c
  The include

  #include "ha_prototypes.h"

  Was causing conflicts with definitions in my_global.h

  Linking C executable mysqlbackup
  libinnodb.a(dict0mem.c.o): In function `dict_mem_foreign_table_name_lookup_set':
  dict0mem.c:(.text+0x91c): undefined reference to `innobase_get_lower_case_table_names'
  libinnodb.a(dict0mem.c.o): In function `dict_mem_referenced_table_name_lookup_set':
  dict0mem.c:(.text+0x9fc): undefined reference to `innobase_get_lower_case_table_names'
  libinnodb.a(dict0mem.c.o): In function `dict_mem_foreign_table_name_lookup_set':
  dict0mem.c:(.text+0x96e): undefined reference to `innobase_casedn_str'
  libinnodb.a(dict0mem.c.o): In function `dict_mem_referenced_table_name_lookup_set':
  dict0mem.c:(.text+0xa4e): undefined reference to `innobase_casedn_str'
  collect2: ld returned 1 exit status
  make[2]: *** [mysqlbackup] Error 1

  innobase_get_lower_case_table_names
  innobase_casedn_str
  are functions that are part of ha_innodb.cc that is not part of the build
        
  dict_mem_foreign_table_name_lookup_set
  function is not there in the current codebase, meaning we do not use it in MEB.
@ storage/innobase/fil/fil0fil.c
  The srv_fast_shutdown variable is declared in
  srv0srv.c that is not compiled in the
  mysqlbackup codebase.

  This throws an undeclared error.

  From the Manual
  ---------------

  innodb_fast_shutdown
  --------------------

  The InnoDB shutdown mode. The default value is 1
  as of MySQL 3.23.50, which causes a “fast� shutdown
  (the normal type of shutdown). If the value is 0,
  InnoDB does a full purge and an insert buffer merge
  before a shutdown. These operations can take minutes,
  or even hours in extreme cases. If the value is 1,
  InnoDB skips these operations at shutdown.

  This ideally does not matter from mysqlbackup
  @ storage/innobase/ha/ha0ha.c
  In file included from /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/ha/ha0ha.c:34:0:
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/btr0sea.h:286:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
  make[2]: *** [CMakeFiles/innodb.dir/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/ha/ha0ha.c.o] Error 1
  make[1]: *** [CMakeFiles/innodb.dir/all] Error 2
  make: *** [all] Error 2

  # include "sync0rw.h" is excluded from hotbackup compilation in dict0dict.h

  This causes extern rw_lock_t*	btr_search_latch_temp; to throw a failure because
  the definition of rw_lock_t is not found.
@ storage/innobase/include/buf0buf.h
  Excluding buffer pool functions that are unused from the
  MEB codebase.
@ storage/innobase/include/buf0buf.ic
  replicated the exclusion of

  #include "buf0flu.h"
  #include "buf0lru.h"
  #include "buf0rea.h"

  by looking at the current codebase in <meb-trunk>/src/innodb
  @ storage/innobase/include/dict0dict.h
  dict_table_x_lock_indexes, dict_table_x_unlock_indexes, dict_table_is_corrupted,
  dict_index_is_corrupted, buf_block_buf_fix_inc_func are unused in MEB and was
  leading to compilation errors and hence excluded.
@ storage/innobase/include/dict0dict.ic
  dict_table_x_lock_indexes, dict_table_x_unlock_indexes, dict_table_is_corrupted,
  dict_index_is_corrupted, buf_block_buf_fix_inc_func are unused in MEB and was
  leading to compilation errors and hence excluded.
@ storage/innobase/include/log0log.h
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/log0log.h: At top level:
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/log0log.h:767:2: error: expected specifier-qualifier-list before Ã¢â  ‚¬Ëœmutex_t’

  mutex_t definitions were excluded as seen from ambient code
  hence excluding definition for log_flush_order_mutex also.
@ storage/innobase/include/os0file.h
  Bug in InnoDB code, create_mode should have been create.
@ storage/innobase/include/srv0srv.h
  In file included from /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/buf/buf0buf.c:50:0:
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/srv0srv.h: At top level:
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/srv0srv.h:120:16: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘srv_use_native_aio’

  srv_use_native_aio - we do not use native aio of the OS anyway from MEB. MEB does not compile
  InnoDB with this option. Hence disabling it.
@ storage/innobase/include/trx0sys.h
  [ 56%] Building C object CMakeFiles/innodb.dir/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c.o
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c: In function ‘trx_sys_read_file_format_id’:
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c:1499:20: error: ‘TRX_SYS_FILE_FORMAT_TAG_MAGIC_N’   undeclared (first use in this function)
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c:1499:20: note: each undeclared identifier is reported only once for  each function it appears in
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c: At top level:
  /home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/include/buf0buf.h:607:1: warning: ‘buf_block_buf_fix_inc_func’ declared ‘static’ but never defined
  make[2]: *** [CMakeFiles/innodb.dir/home/narayanan/mysql-server/mysql-5.5-meb-rel3.8-innodb-integration-1/storage/innobase/trx/trx0sys.c.o] Error 1

  unused calls excluded to enable compilation
@ storage/innobase/mem/mem0dbg.c
    excluding #include "ha_prototypes.h" that lead to definitions in ha_innodb.cc
@ storage/innobase/os/os0file.c
    InnoDB not compiled with aio support from MEB anyway. Hence excluding this from
    the compilation.
@ storage/innobase/page/page0zip.c
  page0zip.c:(.text+0x4e9e): undefined reference to `buf_pool_from_block'
  collect2: ld returned 1 exit status

  buf_pool_from_block defined in buf0buf.ic, most of the file is excluded for compilation of MEB
@ storage/innobase/ut/ut0dbg.c
  excluding #include "ha_prototypes.h" since it leads to definitions in ha_innodb.cc
  innobase_basename(file) is defined in ha_innodb.cc. Hence excluding that also.
@ storage/innobase/ut/ut0ut.c
  cal_tm unused from MEB, was leading to earnings, hence disabling for MEB.
2012-06-07 19:14:26 +05:30
Jimmy Yang
faa3ecdb07 backport Bug #47707 print some progress messages during shutdown of innodb
to mysql-5.5

rb://979 approved by Marko
2012-03-21 11:48:12 +08:00
Marko Mäkelä
4ea57c80b2 Merge mysql-5.1 to mysql-5.5. 2012-02-17 11:52:51 +02:00
Marko Mäkelä
39100cd984 Bug #13651627 Move ut_ad(0) from the beginning to the end of buf_page_print(),
print page dump

buf_page_print(): Remove the ut_ad(0) from the beginning. Add two flags
(enum buf_page_print_flags) that can be bitwise-ORed together:

BUF_PAGE_PRINT_NO_CRASH:
  Do not crash debug builds at the end of buf_page_print().
BUF_PAGE_PRINT_NO_FULL:
  Do not print the full page dump. This can be useful when adding
  diagnostic printout to flushing or to the doublewrite buffer.

trx_sys_doublewrite_init_or_restore_page(): Replace exit(1) with ut_error,
so that we can get a core dump if this extraordinary condition happens.

rb:924 approved by Sunny Bains
2012-02-02 12:31:57 +02:00
Inaam Rana
358a31df43 Bug#11759044 - 51325: DROPPING AN EMPTY INNODB TABLE TAKES A LONG TIME
WITH LARGE BUFFER POOL

(Note: this a backport of revno:3472 from mysql-trunk)

rb://845
approved by: Marko

  When dropping a table (with an .ibd file i.e.: with
  innodb_file_per_table set) we scan entire LRU to invalidate pages from
  that table. This can be painful in case of large buffer pools as we hold
  the buf_pool->mutex for the scan. Note that gravity of the problem does
  not depend on the size of the table. Even with an empty table but a
  large and filled up buffer pool we'll end up scanning a very long LRU
  list.
  
  The fix is to scan flush_list and just remove the blocks belonging to
  the table from the flush_list, marking them as non-dirty. The blocks
  are left in the LRU list for eventual eviction due to aging. The
  flush_list is typically much smaller than the LRU list but for cases
  where it is very long we have the solution of releasing the
  buf_pool->mutex after scanning 1K pages.
  
  buf_page_[set|unset]_sticky(): Use new IO-state BUF_IO_PIN to ensure
  that a block stays in the flush_list and LRU list when we release
  buf_pool->mutex. Previously we have been abusing BUF_IO_READ to achieve
  this.
2011-12-07 09:12:53 -05:00
Marko Mäkelä
2c67d5066d Revert revno:3452.71.32 (Bug#12612184 fix).
Bug#12612184 RACE CONDITION AFTER BTR_CUR_PESSIMISTIC_UPDATE()

The fix introduced potentially more severe crash recovery problems
than the bug causes. Revert the fix for now.
2011-10-26 12:23:57 +03:00
Marko Mäkelä
5dc19b9ce1 Merge mysql-5.1 to mysql-5.5. 2011-10-12 09:21:33 +03:00
Inaam Rana
c95fdd936e Revert original fix for Bug 12612184 and the follow up fix for
Bug 12704861.

Bug 12704861 fix was revno: 3504.1.1 (rb://693)
Bug 12612184 fix was revno: 3445.1.10 (rb://678)
2011-09-30 07:02:19 -04:00
unknown
40761a9a73 Merge from mysql-5.1.59-release 2011-09-15 18:48:54 +02:00
Marko Mäkelä
7c20cd1b5d Merge mysql-5.1 to mysql-5.5. 2011-08-29 11:22:43 +03:00
Marko Mäkelä
41f229cd9e Bug#12704861 Corruption after a crash during BLOB update
The fix of Bug#12612184 broke crash recovery. When a record that
contains off-page columns (BLOBs) is updated, we must first write redo
log about the BLOB page writes, and only after that write the redo log
about the B-tree changes. The buggy fix would log the B-tree changes
first, meaning that after recovery, we could end up having a record
that contains a null BLOB pointer.

Because we will be redo logging the writes off the off-page columns
before the B-tree changes, we must make sure that the pages chosen for
the off-page columns are free both before and after the B-tree
changes. In this way, the worst thing that can happen in crash
recovery is that the BLOBs are written to free pages, but the B-tree
changes are not applied. The BLOB pages would correctly remain free in
this case. To achieve this, we must allocate the BLOB pages in the
mini-transaction of the B-tree operation. A further quirk is that BLOB
pages are allocated from the same file segment as leaf pages. Because
of this, we must temporarily "hide" any leaf pages that were freed
during the B-tree operation by "fake allocating" them prior to writing
the BLOBs, and freeing them again before the mtr_commit() of the
B-tree operation, in btr_mark_freed_leaves().

btr_cur_mtr_commit_and_start(): Remove this faulty function that was
introduced in the Bug#12612184 fix. The problem that this function was
trying to address was that when we did mtr_commit() the BLOB writes
before the mtr_commit() of the update, the new BLOB pages could have
overwritten clustered index B-tree leaf pages that were freed during
the update. If recovery applied the redo log of the BLOB writes but
did not see the log of the record update, the index tree would be
corrupted. The correct solution is to make the freed clustered index
pages unavailable to the BLOB allocation. This function is also a
likely culprit of InnoDB hangs that were observed when testing the
Bug#12612184 fix.

btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of
a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs,
or freed (nonfree=FALSE) before committing the mini-transaction.

btr_freed_leaves_validate(): A debug function for checking that all
clustered index leaf pages that have been marked free in the
mini-transaction are consistent (have not been zeroed out).

btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
number of the allocated page, or FIL_NULL if out of space. Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or if this is a "fake allocation"
(init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).

btr_page_alloc(): Add the parameter init_mtr, allowing the page to be
initialized and X-latched in a different mini-transaction than the one
that is used for the allocation. Invoke btr_page_alloc_low(). If a
clustered index leaf page was previously freed in mtr, remove it from
the memo of previously freed pages.

btr_page_free(): Assert that the page is a B-tree page and it has been
X-latched by the mini-transaction. If the freed page was a leaf page
of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to
the mini-transaction.

btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
which is NULL (old behaviour in inserts) and the same as local_mtr in
updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
instead of the mini-transaction that is used for writing the BLOBs.

fsp_alloc_from_free_frag(): Refactored from
fsp_alloc_free_page(). Allocate the specified page from a partially
free extent.

fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or NULL if this is a "fake allocation"
that prevents the reuse of a previously freed B-tree page for BLOB
storage. If init_mtr==NULL, try harder to reallocate the specified page
and assert that it succeeded.

fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for
specifying the mini-transaction where the page should be initialized.
Do not allow init_mtr == NULL, because this function is never to be
used for "fake allocations".

mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag
mtr->freed_clust_leaf for quickly determining if any
MTR_MEMO_FREE_CLUST_LEAF operations have been posted.

row_ins_index_entry_low(): When columns are being made off-page in
insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass
the mini-transaction as the alloc_mtr to
btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.

row_build(): Correct a comment, and add a debug assertion that a
record that contains NULL BLOB pointers must be a fresh insert.

row_upd_clust_rec(): When columns are being moved off-page, invoke
btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as
the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.

buf_reset_check_index_page_at_flush(): Remove. The function
fsp_init_file_page_low() already sets
bpage->check_index_page_at_flush=FALSE.

There is a known issue in tablespace extension. If the request to
allocate a BLOB page leads to the tablespace being extended, crash
recovery could see BLOB writes to pages that are off the tablespace
file bounds. This should trigger an assertion failure in fil_io() at
crash recovery. The safe thing would be to write redo log about the
tablespace extension to the mini-transaction of the BLOB write, not to
the mini-transaction of the record update. However, there is no redo
log record for file extension in the current redo log format.

rb:693 approved by Sunny Bains
2011-08-29 11:16:42 +03:00
Marko Mäkelä
f2f4b19678 Bug#12626794 61240: UNUSED FUNCTIONS ... 2011-08-10 14:56:14 +03:00
Inaam Rana
fce189fadb Merge from 5.1 the fix for Bug 12356373 2011-07-19 10:54:59 -04:00
Marko Mäkelä
5b4ceba58d Bug#12612184 Race condition after btr_cur_pessimistic_update()
btr_cur_compress_if_useful(), btr_compress(): Add the parameter ibool
adjust. If adjust=TRUE, adjust the cursor position after compressing
the page.

btr_lift_page_up(): Return a pointer to the father page.

BTR_KEEP_POS_FLAG: A new flag for btr_cur_pessimistic_update().

btr_cur_pessimistic_update(): If *big_rec != NULL and flags &
BTR_KEEP_POS_FLAG, keep the cursor positioned on the updated record.
Also, do not release the index tree x-lock if *big_rec != NULL.

btr_cur_mtr_commit_and_start(): Commits and restarts a
mini-transaction so that it will retain an x-lock on index->lock and
the page of the cursor. This is invoked when
btr_cur_pessimistic_update() returns *big_rec != NULL.

In all callers of btr_cur_pessimistic_update() that do not pass
BTR_KEEP_POS_FLAG, assert that *big_rec == NULL.

btr_cur_compress(): Unused function [in the built-in MySQL 5.1], remove.

page_rec_get_nth(): Return the nth record on the page (an inverse
function of page_rec_get_n_recs_before()). Refactored from
page_get_middle_rec().

page_get_middle_rec(): Invoke page_rec_get_nth().

page_cur_insert_rec_zip_reorg(): Make use of the page directory
shortcuts in page_rec_get_nth() instead of scanning the whole list of
records.

row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update().

row_ins_index_entry_low(): If row_ins_clust_index_entry_by_modify()
returns a big_rec, invoke btr_cur_mtr_commit_and_start() in order to
commit and start the mini-transaction without releasing the x-locks on
index->lock and the cursor page, and write the big_rec. Releasing the
page latch in mtr_commit() caused a race condition.

row_upd_clust_rec(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update(). If it returns a big_rec, invoke
btr_cur_mtr_commit_and_start() in order to commit and start the
mini-transaction without releasing the x-locks on index->lock and the
cursor page, and write the big_rec. Releasing the page latch in
mtr_commit() caused a race condition.

sync_thread_add_level(): Add the parameter ibool relock. When TRUE,
bypass the latching order rules.

rw_lock_add_debug_info(): For nested X-lock requests, pass relock=TRUE
to sync_thread_add_level().

rb:678 approved by Jimmy Yang
2011-06-16 10:27:21 +03:00
Marko Mäkelä
7545f86670 Merge mysql-5.1 to mysql-5.5.
This patch was already pushed to mysql-5.5 by Inaam Rana.
2011-06-23 15:57:25 +03:00
Inaam Rana
9df4c14354 merge from mysql-5.1
Bug 12635227 - 61188: DROP TABLE EXTREMELY SLOW
2011-06-17 16:29:19 -04:00
Marko Mäkelä
f43adb8072 Merge mysql-5.1 to mysql-5.5. 2011-06-16 15:13:24 +03:00