mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-02-01 11:31:51 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	0664d633e4	MDEV-14952 Avoid repeated calls to btr_get_search_latch() btr_cur_search_to_nth_level(), btr_search_guess_on_hash(), btr_pcur_open_with_no_init_func(), row_sel_open_pcur(): Replace the parameter has_search_latch with the ahi_latch (passed as NULL if the caller does not hold the latch). btr_search_update_hash_node_on_insert(), btr_search_update_hash_on_insert(), btr_search_build_page_hash_index(): Add the parameter ahi_latch. btr_search_x_lock(), btr_search_x_unlock(), btr_search_s_lock(), btr_search_s_unlock(): Remove.	2018-01-15 19:51:09 +02:00
Alexander Barkov	835cbbcc7b	Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3 TODO: enable MDEV-13049 optimization for 10.3	2017-10-30 20:47:39 +04:00
Alexander Barkov	30e7d6709f	Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext	2017-10-18 14:11:55 +04:00
Marko Mäkelä	a0df2693a2	Remove an unused parameter btr_store_big_rec_extern_fields(): Remove the unused parameter 'upd' that was added in https://github.com/mysql/mysql-server/commit/ce0a1e85e24e48b8171f767b44330d and merged to MariaDB 10.2.2 in commit `2e814d4702`.	2017-10-11 12:00:52 +03:00
Marko Mäkelä	a4948dafcd	MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG \| REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().	2017-10-06 09:50:10 +03:00
Alexander Barkov	765347384a	Merge remote-tracking branch 'origin/10.2' into bb-10.2-ext	2017-06-15 15:27:11 +04:00
Marko Mäkelä	70505dd45b	Merge 10.1 into 10.2	2017-05-22 09:46:51 +03:00
Marko Mäkelä	13a350ac29	Merge 10.0 into 10.1	2017-05-19 12:29:37 +03:00
Marko Mäkelä	9f89b94ba6	MDEV-12358 Work around what looks like a bug in GCC 7.1.0 The parameter thr of the function btr_cur_optimistic_insert() is not declared as nonnull, but GCC 7.1.0 with -O3 is wrongly optimizing away the first part of the condition UNIV_UNLIKELY(thr && thr_get_trx(thr)->fake_changes) when the function is being called by row_merge_insert_index_tuples() with thr==NULL. The fake_changes is an XtraDB addition. This GCC bug only appears to have an impact on XtraDB, not InnoDB. We work around the problem by not attempting to dereference thr when both BTR_NO_LOCKING_FLAG and BTR_NO_UNDO_LOG_FLAG are set in the flags. Probably BTR_NO_LOCKING_FLAG alone should suffice. btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), btr_cur_pessimistic_update(): Correct comments that disagree with usage and with nonnull attributes. No other parameter than thr can actually be NULL. row_ins_duplicate_error_in_clust(): Remove an unused parameter. innobase_is_fake_change(): Unused function; remove. ibuf_insert_low(), row_log_table_apply(), row_log_apply(), row_undo_mod_clust_low(): Because we will be passing BTR_NO_LOCKING_FLAG \| BTR_NO_UNDO_LOG_FLAG in the flags, the trx->fake_changes flag will be treated as false, which is the right thing to do at these low-level operations (change buffer merge, ALTER TABLE…LOCK=NONE, or ROLLBACK). This might be fixing actual XtraDB bugs. Other callers that pass these two flags are also passing thr=NULL, implying fake_changes=false. (Some callers in ROLLBACK are passing BTR_NO_LOCKING_FLAG and a nonnull thr. In these callers, fake_changes better be false, to avoid corruption.)	2017-05-17 16:09:22 +03:00
Marko Mäkelä	7c767a30a7	MDEV-10139 Support for InnoDB SEQUENCE objects We introduce a NO_ROLLBACK flag for InnoDB tables. This flag only works for tables that have a single index. Apart from undo logging, this flag will also prevent locking and the assignment of DB_ROW_ID or DB_TRX_ID, and imply READ UNCOMMITTED isolation. It is assumed that the SQL layer is guaranteeing mutual exclusion. After the initial insert of the single record during CREATE SEQUENCE, InnoDB will be updating the single record in-place. This is crash-safe thanks to the redo log. (That is, after a crash after CREATE SEQUENCE was committed, the effect of sequence operations will be observable fully or not at all.) When it comes to the durability of the updates of SEQUENCE in InnoDB, there is a clear analogy to MDEV-6076 Persistent AUTO_INCREMENT. The updates would be made persistent by the InnoDB redo log flush at transaction commit or rollback (or XA PREPARE), provided that innodb_log_flush_at_trx_commit=1. Similar to AUTO_INCREMENT, it is possible that the update of a SEQUENCE in a middle of transaction becomes durable before the COMMIT/ROLLBACK of the transaction, in case the InnoDB redo log is being flushed as a result of the a commit or rollback of some other transaction, or as a result of a redo log checkpoint that can be initiated at any time by operations that are writing redo log. dict_table_t::no_rollback(): Check if the table does not support rollback. BTR_NO_ROLLBACK: Logging and locking flags for no_rollback() tables. DICT_TF_BITS: Add the NO_ROLLBACK flag. row_ins_step(): Assign 0 to DB_ROW_ID and DB_TRX_ID, and skip any locking for no-rollback tables. There will be only a single row in no-rollback tables (or there must be a proper PRIMARY KEY). row_search_mvcc(): Execute the READ UNCOMMITTED code path for no-rollback tables. ha_innobase::external_lock(), ha_innobase::store_lock(): Block CREATE/DROP SEQUENCE in innodb_read_only mode. This probably has no effect for CREATE SEQUENCE, because already ha_innobase::create() should have been called (and refused) before external_lock() or store_lock() is called. ha_innobase::store_lock(): For CREATE SEQUENCE, do not acquire any InnoDB locks, even though TL_WRITE is being requested. (This is just a performance optimization.) innobase_copy_frm_flags_from_create_info(), row_drop_table_for_mysql(): Disable persistent statistics for no_rollback tables.	2017-04-07 19:12:40 +04:00
Marko Mäkelä	4e1116b2c6	MDEV-12271 Port MySQL 8.0 Bug#23150562 REMOVE UNIV_MUST_NOT_INLINE AND UNIV_NONINL Also, remove empty .ic files that were not removed by my MySQL commit. Problem: InnoDB used to support a compilation mode that allowed to choose whether the function definitions in .ic files are to be inlined or not. This stopped making sense when InnoDB moved to C++ in MySQL 5.6 (and ha_innodb.cc started to #include .ic files), and more so in MySQL 5.7 when inline methods and functions were introduced in .h files. Solution: Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from all files, assuming that the symbols are never defined. Remove the files fut0fut.cc and ut0byte.cc which only mattered when UNIV_NONINL was defined.	2017-03-17 12:42:07 +02:00
Marko Mäkelä	89d80c1b0b	Fix many -Wconversion warnings. Define my_thread_id as an unsigned type, to avoid mismatch with ulonglong. Change some parameters to this type. Use size_t in a few more places. Declare many flag constants as unsigned to avoid sign mismatch when shifting bits or applying the unary ~ operator. When applying the unary ~ operator to enum constants, explictly cast the result to an unsigned type, because enum constants can be treated as signed. In InnoDB, change the source code line number parameters from ulint to unsigned type. Also, make some InnoDB functions return a narrower type (unsigned or uint32_t instead of ulint; bool instead of ibool).	2017-03-07 19:07:27 +02:00
Marko Mäkelä	27b9989d31	MDEV-12121 Introduce build option WITH_INNODB_AHI to disable innodb_adaptive_hash_index The InnoDB adaptive hash index is sometimes degrading the performance of InnoDB, and it is sometimes disabled to get more consistent performance. We should have a compile-time option to disable the adaptive hash index. Let us introduce two options: OPTION(WITH_INNODB_AHI "Include innodb_adaptive_hash_index" ON) OPTION(WITH_INNODB_ROOT_GUESS "Cache index root block descriptors" ON) where WITH_INNODB_AHI always implies WITH_INNODB_ROOT_GUESS. As part of this change, the misleadingly named function trx_search_latch_release_if_reserved(trx) will be replaced with the macro trx_assert_no_search_latch(trx) that will be empty unless BTR_CUR_HASH_ADAPT is defined (cmake -DWITH_INNODB_AHI=ON). We will also remove the unused column INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_TIMEOUT. In MariaDB Server 10.1, it used to reflect the value of trx_t::search_latch_timeout which could be adjusted during row_search_for_mysql(). In 10.2, there is no such field. Other than the removal of the unused column TRX_ADAPTIVE_HASH_TIMEOUT, this is an almost non-functional change to the server when using the default build options. Some tests are adjusted so that they will work with both -DWITH_INNODB_AHI=ON and -DWITH_INNODB_AHI=OFF. The test innodb.innodb_monitor has been renamed to innodb.monitor in order to track MySQL 5.7, and the duplicate tests sys_vars.innodb_monitor_* are removed.	2017-03-03 16:55:50 +02:00
Marko Mäkelä	63574f1275	MDEV-11690 Remove UNIV_HOTBACKUP The InnoDB source code contains quite a few references to a closed-source hot backup tool which was originally called InnoDB Hot Backup (ibbackup) and later incorporated in MySQL Enterprise Backup. The open source backup tool XtraBackup uses the full database for recovery. So, the references to UNIV_HOTBACKUP are only cluttering the source code.	2016-12-30 16:05:42 +02:00
Marko Mäkelä	8777458a6e	MDEV-6076 Persistent AUTO_INCREMENT for InnoDB This should be functionally equivalent to WL#6204 in MySQL 8.0.0, with the notable difference that the file format changes are limited to repurposing a previously unused data field in B-tree pages. For persistent InnoDB tables, write the last used AUTO_INCREMENT value to the root page of the clustered index, in the previously unused (0) PAGE_MAX_TRX_ID field, now aliased as PAGE_ROOT_AUTO_INC. Unlike some other previously unused InnoDB data fields, this one was actually always zero-initialized, at least since MySQL 3.23.49. The writes to PAGE_ROOT_AUTO_INC are protected by SX or X latch on the root page. The SX latch will allow concurrent read access to the root page. (The field PAGE_ROOT_AUTO_INC will only be read on the first-time call to ha_innobase::open() from the SQL layer. The PAGE_ROOT_AUTO_INC can only be updated when executing SQL, so read/write races are not possible.) During INSERT, the PAGE_ROOT_AUTO_INC is updated by the low-level function btr_cur_search_to_nth_level(), adding no extra page access. [Adaptive hash index lookup will be disabled during INSERT.] If some rare UPDATE modifies an AUTO_INCREMENT column, the PAGE_ROOT_AUTO_INC will be adjusted in a separate mini-transaction in ha_innobase::update_row(). When a page is reorganized, we have to preserve the PAGE_ROOT_AUTO_INC field. During ALTER TABLE, the initial AUTO_INCREMENT value will be copied from the table. ALGORITHM=COPY and online log apply in LOCK=NONE will update PAGE_ROOT_AUTO_INC in real time. innodb_col_no(): Determine the dict_table_t::cols[] element index corresponding to a Field of a non-virtual column. (The MySQL 5.7 implementation of virtual columns breaks the 1:1 relationship between Field::field_index and dict_table_t::cols[]. Virtual columns are omitted from dict_table_t::cols[]. Therefore, we must translate the field_index of AUTO_INCREMENT columns into an index of dict_table_t::cols[].) Upgrade from old data files: By default, the AUTO_INCREMENT sequence in old data files would appear to be reset, because PAGE_MAX_TRX_ID or PAGE_ROOT_AUTO_INC would contain the value 0 in each clustered index page. In new data files, PAGE_ROOT_AUTO_INC can only be 0 if the table is empty or does not contain any AUTO_INCREMENT column. For backward compatibility, we use the old method of SELECT MAX(auto_increment_column) for initializing the sequence. btr_read_autoinc(): Read the AUTO_INCREMENT sequence from a new-format data file. btr_read_autoinc_with_fallback(): A variant of btr_read_autoinc() that will resort to reading MAX(auto_increment_column) for data files that did not use AUTO_INCREMENT yet. It was manually tested that during the execution of innodb.autoinc_persist the compatibility logic is not activated (for new files, PAGE_ROOT_AUTO_INC is never 0 in nonempty clustered index root pages). initialize_auto_increment(): Replaces ha_innobase::innobase_initialize_autoinc(). This initializes the AUTO_INCREMENT metadata. Only called from ha_innobase::open(). ha_innobase::info_low(): Do not try to lazily initialize dict_table_t::autoinc. It must already have been initialized by ha_innobase::open() or ha_innobase::create(). Note: The adjustments to class ha_innopart were not tested, because the source code (native InnoDB partitioning) is not being compiled.	2016-12-16 09:19:19 +02:00
Marko Mäkelä	c868acdf65	MDEV-11487 Revert InnoDB internal temporary tables from WL#7682 WL#7682 in MySQL 5.7 introduced the possibility to create light-weight temporary tables in InnoDB. These are called 'intrinsic temporary tables' in InnoDB, and in MySQL 5.7, they can be created by the optimizer for sorting or buffering data in query processing. In MariaDB 10.2, the optimizer temporary tables cannot be created in InnoDB, so we should remove the dead code and related data structures.	2016-12-09 12:05:07 +02:00
Jan Lindström	2e814d4702	Merge InnoDB 5.7 from mysql-5.7.9. Contains also MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7 The failure happened because 5.7 has changed the signature of the bool handler::primary_key_is_clustered() const virtual function ("const" was added). InnoDB was using the old signature which caused the function not to be used. MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7 Fixed mutexing problem on lock_trx_handle_wait. Note that rpl_parallel and rpl_optimistic_parallel tests still fail. MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan) Reason: incorrect merge MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan) Reason: incorrect merge	2016-09-02 13:22:28 +03:00
Sergei Golubchik	3361aee591	Merge branch '10.0' into 10.1	2016-06-28 22:01:55 +02:00
Sergei Golubchik	a79d46c3a4	Merge branch 'merge-innodb-5.6' into 10.0	2016-06-21 14:58:19 +02:00
Sergei Golubchik	720e04ff67	5.6.31	2016-06-21 14:21:03 +02:00
Jan Lindström	7e916bb86f	MDEV-8588: Assertion failure in file ha_innodb.cc line 21140 if at least one encrypted table exists and encryption service is not available Analysis: Problem was that in fil_read_first_page we do find that table has encryption information and that encryption service or used key_id is not available. But, then we just printed fatal error message that causes above assertion. Fix: When we open single table tablespace if it has encryption information (crypt_data) store this crypt data to the table structure. When we open a table and we find out that tablespace is not available, check has table a encryption information and from there is encryption service or used key_id is not available. If it is, add additional warning for SQL-layer.	2015-09-04 20:19:45 +03:00
Sergei Golubchik	6d06fbbd1d	move to storage/innobase	2015-05-04 19:17:21 +02:00
Jan Lindström	3486135bb5	MDEV-6928: Add trx pointer to struct mtr_t Merge Facebook commit 25295d003cb0c17aa8fb756523923c77250b3294 authored by Steaphan Greene from https://github.com/facebook/mysql-5.6 This adds a pointer to the trx to each mtr. This allows the trx to be accessed in parts of the code where it was otherwise not available. This is needed later.	2014-10-24 22:26:31 +03:00
Sergei Golubchik	2023fac281	innodb-5.6.19	2014-08-06 20:05:10 +02:00
Sergei Golubchik	15ee97214d	InnoDB 5.6.15 merge. update test results	2014-02-26 19:36:33 +01:00
Sergei Golubchik	27d45e4696	MDEV-5574 Set AUTO_INCREMENT below max value of column. Update InnoDB to 5.6.14 Apply MySQL-5.6 hack for MySQL Bug#16434374 Move Aria-only HA_RTREE_INDEX from my_base.h to maria_def.h (breaks an assert in InnoDB) Fix InnoDB memory leak	2014-02-01 09:33:26 +01:00
Michael Widenius	068c61978e	Temporary commit of 10.0-merge	2013-03-26 00:03:13 +02:00
Sergei Golubchik	ab83952f29	10.0-base merge	2013-01-31 09:48:19 +01:00
Sergei Golubchik	aca8e7ed6b	5.3 merge	2013-01-15 19:07:46 +01:00
Marko Mäkelä	6bbe24e9a0	Bug#14636528 INNODB CHANGE BUFFERING IS NOT ENTIRELY CRASH-SAFE Delete-mark change buffer records when resorting to a pessimistic delete from the change buffer B-tree. Skip delete-marked records in the change buffer merge and when estimating whether an operation can be buffered. Without this fix, we could try to apply the same buffered changes multiple times if the server was killed at the right moment. In MySQL 5.5 and later: ibuf_get_volume_buffered_count_func(): Ignore delete-marked (already processed) records. ibuf_delete_rec(): Add a crash point before optimistic delete. If the optimistic delete fails, flag the record processed before mtr_commit(). ibuf_merge_or_delete_for_page(): Ignore delete-marked (already processed) records. Backport to 5.1: Rename btr_cur_del_unmark_for_ibuf() to btr_cur_set_deleted_flag_for_ibuf() and add a parameter. rb:1307 approved by Jimmy Yang	2012-09-19 22:35:38 +03:00
Michael Widenius	1d0f70c2f8	Temporary commit of merge of MariaDB 10.0-base and MySQL 5.6	2012-08-01 17:27:34 +03:00
Marko Mäkelä	4ea57c80b2	Merge mysql-5.1 to mysql-5.5.	2012-02-17 11:52:51 +02:00
Marko Mäkelä	7a48b3df7f	Bug#13340047 LATCHING ORDER VIOLATION IN IBUF_SET_ENTRY_COUNTER() - cleanup Remove btr_cur_t::ibuf_cnt. The only dependence on cursor->ibuf_cnt was ibuf_set_entry_counter(). That code path was removed in the original bug fix (marko.makela@oracle.com-20111107072802-dgwagejlpub0rjkd). rb:819 approved by Jimmy Yang	2011-11-22 13:14:55 +02:00
Marko Mäkelä	2c67d5066d	Revert revno:3452.71.32 (Bug#12612184 fix). Bug#12612184 RACE CONDITION AFTER BTR_CUR_PESSIMISTIC_UPDATE() The fix introduced potentially more severe crash recovery problems than the bug causes. Revert the fix for now.	2011-10-26 12:23:57 +03:00
Marko Mäkelä	91b5e9352a	Revert most of revno 3560.9.1 (Bug#12704861) This was an attempt to address problems with the Bug#12612184 fix. Even with this follow-up fix, crash recovery can be broken. Let us fix the bug later.	2011-10-26 11:44:28 +03:00
Inaam Rana	c95fdd936e	Revert original fix for Bug 12612184 and the follow up fix for Bug 12704861. Bug 12704861 fix was revno: 3504.1.1 (rb://693) Bug 12612184 fix was revno: 3445.1.10 (rb://678)	2011-09-30 07:02:19 -04:00
Marko Mäkelä	7c20cd1b5d	Merge mysql-5.1 to mysql-5.5.	2011-08-29 11:22:43 +03:00
Marko Mäkelä	41f229cd9e	Bug#12704861 Corruption after a crash during BLOB update The fix of Bug#12612184 broke crash recovery. When a record that contains off-page columns (BLOBs) is updated, we must first write redo log about the BLOB page writes, and only after that write the redo log about the B-tree changes. The buggy fix would log the B-tree changes first, meaning that after recovery, we could end up having a record that contains a null BLOB pointer. Because we will be redo logging the writes off the off-page columns before the B-tree changes, we must make sure that the pages chosen for the off-page columns are free both before and after the B-tree changes. In this way, the worst thing that can happen in crash recovery is that the BLOBs are written to free pages, but the B-tree changes are not applied. The BLOB pages would correctly remain free in this case. To achieve this, we must allocate the BLOB pages in the mini-transaction of the B-tree operation. A further quirk is that BLOB pages are allocated from the same file segment as leaf pages. Because of this, we must temporarily "hide" any leaf pages that were freed during the B-tree operation by "fake allocating" them prior to writing the BLOBs, and freeing them again before the mtr_commit() of the B-tree operation, in btr_mark_freed_leaves(). btr_cur_mtr_commit_and_start(): Remove this faulty function that was introduced in the Bug#12612184 fix. The problem that this function was trying to address was that when we did mtr_commit() the BLOB writes before the mtr_commit() of the update, the new BLOB pages could have overwritten clustered index B-tree leaf pages that were freed during the update. If recovery applied the redo log of the BLOB writes but did not see the log of the record update, the index tree would be corrupted. The correct solution is to make the freed clustered index pages unavailable to the BLOB allocation. This function is also a likely culprit of InnoDB hangs that were observed when testing the Bug#12612184 fix. btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs, or freed (nonfree=FALSE) before committing the mini-transaction. btr_freed_leaves_validate(): A debug function for checking that all clustered index leaf pages that have been marked free in the mini-transaction are consistent (have not been zeroed out). btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the number of the allocated page, or FIL_NULL if out of space. Add the parameter "mtr_t* init_mtr" for specifying the mini-transaction where the page should be initialized, or if this is a "fake allocation" (init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE). btr_page_alloc(): Add the parameter init_mtr, allowing the page to be initialized and X-latched in a different mini-transaction than the one that is used for the allocation. Invoke btr_page_alloc_low(). If a clustered index leaf page was previously freed in mtr, remove it from the memo of previously freed pages. btr_page_free(): Assert that the page is a B-tree page and it has been X-latched by the mini-transaction. If the freed page was a leaf page of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to the mini-transaction. btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr, which is NULL (old behaviour in inserts) and the same as local_mtr in updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it instead of the mini-transaction that is used for writing the BLOBs. fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page(). Allocate the specified page from a partially free extent. fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the parameter "mtr_t* init_mtr" for specifying the mini-transaction where the page should be initialized, or NULL if this is a "fake allocation" that prevents the reuse of a previously freed B-tree page for BLOB storage. If init_mtr==NULL, try harder to reallocate the specified page and assert that it succeeded. fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for specifying the mini-transaction where the page should be initialized. Do not allow init_mtr == NULL, because this function is never to be used for "fake allocations". mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag mtr->freed_clust_leaf for quickly determining if any MTR_MEMO_FREE_CLUST_LEAF operations have been posted. row_ins_index_entry_low(): When columns are being made off-page in insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages. row_build(): Correct a comment, and add a debug assertion that a record that contains NULL BLOB pointers must be a fresh insert. row_upd_clust_rec(): When columns are being moved off-page, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages. buf_reset_check_index_page_at_flush(): Remove. The function fsp_init_file_page_low() already sets bpage->check_index_page_at_flush=FALSE. There is a known issue in tablespace extension. If the request to allocate a BLOB page leads to the tablespace being extended, crash recovery could see BLOB writes to pages that are off the tablespace file bounds. This should trigger an assertion failure in fil_io() at crash recovery. The safe thing would be to write redo log about the tablespace extension to the mini-transaction of the BLOB write, not to the mini-transaction of the record update. However, there is no redo log record for file extension in the current redo log format. rb:693 approved by Sunny Bains	2011-08-29 11:16:42 +03:00
Marko Mäkelä	5b4ceba58d	Bug#12612184 Race condition after btr_cur_pessimistic_update() btr_cur_compress_if_useful(), btr_compress(): Add the parameter ibool adjust. If adjust=TRUE, adjust the cursor position after compressing the page. btr_lift_page_up(): Return a pointer to the father page. BTR_KEEP_POS_FLAG: A new flag for btr_cur_pessimistic_update(). btr_cur_pessimistic_update(): If big_rec != NULL and flags & BTR_KEEP_POS_FLAG, keep the cursor positioned on the updated record. Also, do not release the index tree x-lock if big_rec != NULL. btr_cur_mtr_commit_and_start(): Commits and restarts a mini-transaction so that it will retain an x-lock on index->lock and the page of the cursor. This is invoked when btr_cur_pessimistic_update() returns big_rec != NULL. In all callers of btr_cur_pessimistic_update() that do not pass BTR_KEEP_POS_FLAG, assert that big_rec == NULL. btr_cur_compress(): Unused function [in the built-in MySQL 5.1], remove. page_rec_get_nth(): Return the nth record on the page (an inverse function of page_rec_get_n_recs_before()). Refactored from page_get_middle_rec(). page_get_middle_rec(): Invoke page_rec_get_nth(). page_cur_insert_rec_zip_reorg(): Make use of the page directory shortcuts in page_rec_get_nth() instead of scanning the whole list of records. row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG to btr_cur_pessimistic_update(). row_ins_index_entry_low(): If row_ins_clust_index_entry_by_modify() returns a big_rec, invoke btr_cur_mtr_commit_and_start() in order to commit and start the mini-transaction without releasing the x-locks on index->lock and the cursor page, and write the big_rec. Releasing the page latch in mtr_commit() caused a race condition. row_upd_clust_rec(): Pass BTR_KEEP_POS_FLAG to btr_cur_pessimistic_update(). If it returns a big_rec, invoke btr_cur_mtr_commit_and_start() in order to commit and start the mini-transaction without releasing the x-locks on index->lock and the cursor page, and write the big_rec. Releasing the page latch in mtr_commit() caused a race condition. sync_thread_add_level(): Add the parameter ibool relock. When TRUE, bypass the latching order rules. rw_lock_add_debug_info(): For nested X-lock requests, pass relock=TRUE to sync_thread_add_level(). rb:678 approved by Jimmy Yang	2011-06-16 10:27:21 +03:00
Marko Mäkelä	3400e0be0a	Merge mysql-5.1 to mysql-5.5.	2011-06-16 12:07:49 +03:00
Jimmy Yang	9cd4d49840	Fix Bug#30423 "InnoDBs treatment of NULL in index stats causes bad "rows examined" estimates". This change implements "innodb_stats_method" with options of "nulls_equal", "nulls_unequal" and "null_ignored". rb://553 approved by Marko	2011-01-14 09:02:28 -08:00
Marko Mäkelä	ee486208cd	Merge mysql-5.1-innodb to mysql-5.5-innodb.	2011-02-02 15:58:01 +02:00
Jimmy Yang	5a0138e250	Merge from mysql-5.1-innodb to mysql-5.5-innodb	2011-01-14 23:24:47 -08:00
Marko Mäkelä	c9951393a8	Merge mysql-5.1-innodb to mysql-5.5-innodb.	2010-12-21 14:05:10 +02:00
Marko Mäkelä	a5352c8970	Merge mysql-5.1-innodb to mysql-5.5-innodb.	2010-11-03 11:25:14 +02:00
Marko Mäkelä	6dfc85f0f7	Merge Bug #56680 from mysql-5.1. Additional fixes in 5.5: ibuf_set_del_mark(): Add diagnostics when setting a buffered delete-mark fails. ibuf_delete(): Correct a misleading comment about non-found records. rec_print(): Add a const qualifier to the index parameter. Bug #56680 wrong InnoDB results from a case-insensitive covering index row_search_for_mysql(): When a secondary index record might not be visible in the current transaction's read view and we consult the clustered index and optionally some undo log records, return the relevant columns of the clustered index record to MySQL instead of the secondary index record. ibuf_insert_to_index_page_low(): New function, refactored from ibuf_insert_to_index_page(). ibuf_insert_to_index_page(): When we are inserting a record in place of a delete-marked record and some fields of the record differ, update that record just like row_ins_sec_index_entry_by_modify() would do. btr_cur_update_alloc_zip(): Make the function public. mysql_row_templ_t: Add clust_rec_field_no. row_sel_store_mysql_rec(), row_sel_push_cache_row_for_mysql(): Add the flag rec_clust, for returning data at clust_rec_field_no instead of rec_field_no. Resurrect the debug assertion that the record not be marked for deletion. (Bug #55626) [UNIV_DEBUG \|\| UNIV_IBUF_DEBUG] ibuf_debug, buf_page_get_gen(), buf_flush_page_try(): Implement innodb_change_buffering_debug=1 for evicting pages from the buffer pool, so that change buffering will be attempted more frequently.	2010-10-19 09:35:14 +03:00
Georgi Kodinov	083a647e6a	merge from 5.5-merge	2010-09-02 16:57:59 +03:00
Vasil Dimov	7f62ec7b38	Fix Bug#53761 RANGE estimation for matched rows may be 200 times different Improve the range estimation algorithm. Previously: For a given level the algo knows the number of pages in the requested range and the n With this change: Same idea, but peek a few (10) of the intermediate pages to get a better estimate of In the bug report one of the examples has a btree with a snippet of the leaf level li page1(899 records), page2(1 record), page3(1 record), page4(1 record) so when trying to estimate, the previous algo, assumed there are average (899+1)/2=45 Fix Bug#53761 RANGE estimation for matched rows may be 200 times different Improve the range estimation algorithm. Previously: For a given level the algo knows the number of pages in the requested range and the number of records on the leftmost and the rightmost page. Then it assumes all pages in between contain the average between the two border pages and multiplies this average number by the number of intermediate pages. With this change: Same idea, but peek a few (10) of the intermediate pages to get a better estimate of the average number of records per page. If there are less than 10 intermediate pages then all of them will be scanned and the result will be precise, not an estimation. In the bug report one of the examples has a btree with a snippet of the leaf level like this: page1(899 records), page2(1 record), page3(1 record), page4(1 record) so when trying to estimate, the previous algo, assumed there are average (899+1)/2=450 records per page which went terribly wrong. With this change page2 and page3 will be read and the exact number of records will be returned. Approved by: Sunny (rb://401)	2010-08-16 17:23:29 +03:00
Sunny Bains	3c4d4e0a25	Merge from -c3476 mysql-5.1-security. ------------------------------------------------------------ revno: 3476 committer: Sunny Bains <Sunny.Bains@Oracle.Com> branch nick: 5.1-security timestamp: Thu 2010-08-05 19:18:17 +1000 message: Fix bug# 55543 - InnoDB Plugin: Signal 6: Assertion failure in file fil/fil0fil.c line 4306 The bug is due to a double delete of a BLOB, once via: rollback -> btr_cur_pessimistic_delete() and the second time via purge. The bug is in row_upd_clust_rec_by_insert(). There we relinquish ownership of the non-updated BLOB columns in btr_cur_mark_extern_inherited_fields() before building the row entry that will be inserted and whose contents will be logged in the UNDO log. However, we don't set the BLOB column later to INHERITED so that a possible rollback will not free the original row's non-updated BLOB entries. This is because the condition that checks for that is in : if (node->upd_ext) {}. node->upd_ext is non-NULL only if a BLOB column was updated and that column is part of some key ordering (see row_upd_replace()). This results in the non-update BLOB columns being deleted during a rollback and subsequently by purge again. rb://413	2010-08-16 12:05:49 +10:00
Marko Mäkelä	7c718cdb01	Merge Bug#54358 fix from mysql-5.1-innodb: ------------------------------------------------------------ revno: 3529 revision-id: marko.makela@oracle.com-20100629125518-m3am4ia1ffjr0d0j parent: jimmy.yang@oracle.com-20100629024137-690sacm5sogruzvb committer: Marko Mäkelä <marko.makela@oracle.com> branch nick: 5.1-innodb timestamp: Tue 2010-06-29 15:55:18 +0300 message: Bug#54358: READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED columns When the server crashes after a record stub has been inserted and before all its off-page columns have been written, the record will contain incomplete off-page columns after crash recovery. Such records may only be accessed at the READ UNCOMMITTED isolation level or when rolling back a recovered transaction in recv_recovery_rollback_active(). Skip these records at the READ UNCOMMITTED isolation level. TODO: Add assertions for checking the above assumptions hold when an incomplete BLOB is encountered. btr_rec_copy_externally_stored_field(): Return NULL if the field is incomplete. row_prebuilt_t::templ_contains_blob: Clarify what "BLOB" means in this context. Hint: MySQL BLOBs are not the same as InnoDB BLOBs. row_sel_store_mysql_rec(): Return FALSE if not all columns could be retrieved. Previously this function always returned TRUE. Assert that the record is not delete-marked. row_sel_push_cache_row_for_mysql(): Return FALSE if not all columns could be retrieved. row_search_for_mysql(): Skip records containing incomplete off-page columns. Assert that the transaction isolation level is READ UNCOMMITTED. rb://380 approved by Jimmy Yang	2010-06-29 16:19:07 +03:00

1 2

60 commits