mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 11:01:52 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	d266036292	MDEV-12266: Remove lookups when dropping indexes btr_free_root_check(), btr_free_if_exists(): Replace page_id, page_size with fil_space_t* and the root page number. btr_free_but_not_root(): Take fil_space_t* as a parameter, or NULL if the operation is not going to be redo-logged. btr_free(): Pass space=NULL to btr_free_but_not_root().	2018-03-30 18:26:52 +03:00
Marko Mäkelä	4cad42392a	MDEV-12266: Change dict_table_t::space to fil_space_t* InnoDB always keeps all tablespaces in the fil_system cache. The fil_system.LRU is only for closing file handles; the fil_space_t and fil_node_t for all data files will remain in main memory. Between startup to shutdown, they can only be created and removed by DDL statements. Therefore, we can let dict_table_t::space point directly to the fil_space_t. dict_table_t::space_id: A numeric tablespace ID for the corner cases where we do not have a tablespace. The most prominent examples are ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file. There are a few functional differences; most notably: (1) DROP TABLE will delete matching .ibd and .cfg files, even if they were not attached to the data dictionary. (2) Some error messages will report file names instead of numeric IDs. There still are many functions that use numeric tablespace IDs instead of fil_space_t, and many functions could be converted to fil_space_t member functions. Also, Tablespace and Datafile should be merged with fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use fil_space_t& instead of a numeric ID, and after moving to a single buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to fil_space_t::page_hash. FilSpace: Remove. Only few calls to fil_space_acquire() will remain, and gradually they should be removed. mtr_t::set_named_space_id(ulint): Renamed from set_named_space(), to prevent accidental calls to this slower function. Very few callers remain. fseg_create(), fsp_reserve_free_extents(): Take fil_space_t as a parameter instead of a space_id. fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(), fil_name_write_rename(), fil_rename_tablespace(). Mariabackup passes the parameter log=false; InnoDB passes log=true. dict_mem_table_create(): Take fil_space_t* instead of space_id as parameter. dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter 'status' with 'bool cached'. dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name. fil_ibd_open(): Return the tablespace. fil_space_t::set_imported(): Replaces fil_space_set_imported(). truncate_t: Change many member function parameters to fil_space_t*, and remove page_size parameters. row_truncate_prepare(): Merge to its only caller. row_drop_table_from_cache(): Assert that the table is persistent. dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL if the tablespace has been discarded. row_import_update_discarded_flag(): Remove a constant parameter.	2018-03-29 22:02:05 +03:00
Marko Mäkelä	604fea1ad6	MDEV-12266: Remove dict_index_t::space We can rely on the dict_table_t::space. All indexes of a table object are always in the same tablespace. (For fulltext indexes, the data is located in auxiliary tables, and these will continue to have their own table objects, separate from the main table.)	2018-03-29 20:47:37 +03:00
Sergey Vojtovich	131d9a5d0c	Allocate lock_sys statically There is only one lock_sys. Allocate it statically in order to avoid dereferencing a pointer whenever accessing it. Also, align some members to their own cache line in order to avoid false sharing. lock_sys_t::create(): The deferred constructor. lock_sys_t::close(): The early destructor.	2018-02-23 08:18:18 +02:00
Eugene Kosov	af2d260b37	merge btr_page_get_level_low() and btr_page_get_level()	2018-02-16 21:44:51 +03:00
Marko Mäkelä	32170f8c6d	Add page_has_prev(), page_has_next(), page_has_siblings() Until now, InnoDB inefficiently compared the aligned fields FIL_PAGE_PREV, FIL_PAGE_NEXT to the byte-order-agnostic value FIL_NULL.	2018-02-08 22:34:21 +02:00
Marko Mäkelä	7660d8c94e	Remove dict_table_t::is_clust() Replace all occurrences of the is_clust() method with is_primary(), because that is what is actually meant. (Also the change buffer tree would count as a clustered index.)	2018-02-08 12:18:07 +02:00
Alexander Barkov	835cbbcc7b	Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3 TODO: enable MDEV-13049 optimization for 10.3	2017-10-30 20:47:39 +04:00
Marko Mäkelä	9dfe84d5de	Remove a bogus page_is_root() debug assertion on btr_create() failure The predicate page_is_root() would not hold if btr_create() fails before the root page is fully initialized. Move the debug assertion from btr_free_root_invalidate() to its other caller, btr_free_if_exists(). In that caller, we actually already checked for page_is_root().	2017-10-27 19:01:24 +03:00
Jan Lindström	a4fa940bad	MDEV-11336: Enable defragmentation on 10.2 when tests pass Problem was that we could take page latches on different order than wat is entitled with SX-lock. To follow the latching order defined in WL#6326, acquire index->lock X-latch. This entitles us to acquire page latches in any order for the index. btr0btr.cc Document latch rules before and after MariaDB 10.2.2 sync0rw.cc Document latch compatibility rules better. btr_defragment_merge_pages Fix parameter value. btr_defragment_thread Acquire X-lock to dict_index_t::lock before restoring cursor position and continuing defragmentation. ha_innobase::optimize Restore defragment feature. Testing Add GIS-index and FT-index to table being defragmented. Defragmentation is not done to GIS-indexes and FT auxiliary tables.	2017-10-12 12:56:20 +03:00
Marko Mäkelä	3edd007cea	btr_free_root(): Relax a too strict debug assertion When btr_create() invokes btr_free_root() after running out of space, fseg_create() would have acquired an SX-latch on the root page, not an X-latch. Relax the debug assertion in btr_free_root() accordingly. (In this case, SX-latch and X-latch are equivalent. During the CREATE operation there should be MDL_EXCLUSIVE and dict_operation_lock X-latch preventing concurrent access to the index. Normally the purpose of the SX-latch is to allow concurrent reads of the root page while certain fields in the root page are updated in place.)	2017-10-09 14:08:46 +03:00
Marko Mäkelä	a4948dafcd	MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG \| REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().	2017-10-06 09:50:10 +03:00
Marko Mäkelä	48192f963a	Add the parameter bool leaf to rec_get_offsets() This should affect debug builds only. Debug builds will check that the status bits of ROW_FORMAT!=REDUNDANT records match the is_leaf parameter. The only observable change to non-debug should be the addition of the is_leaf parameter to the function rec_copy_prefix_to_dtuple(), and the removal of some calls to update the adaptive hash index (it is only built for the leaf pages). This change should have been made in MySQL 5.0.3, instead of introducing the status flags in the ROW_FORMAT=COMPACT record header.	2017-09-20 16:53:34 +03:00
Marko Mäkelä	6b687a0fde	Introduce page_rec_is_leaf() and clean up page0page.h Define some page accessor functions inline in page0page.h, reducing code duplication in page0page.ic. Use page_rec_is_leaf() instead of page_is_leaf() where possible.	2017-09-20 08:42:44 +03:00
Marko Mäkelä	e52dd13c2e	Code clean-up related to MDEV-13167 xdes_get_descriptor_const(): New function, to get read-only access to the allocation descriptor. fseg_page_is_free(): Only acquire a shared latch on the tablespace, not an exclusive latch. Calculate the descriptor page address before acquiring the tablespace latch. If the page number is out of bounds, return without fetching any page. Access only one descriptor page. fsp_page_is_free(), fsp_page_is_free_func(): Remove. Use fseg_page_is_free() instead. fsp_init_file_page(): Move the debug parameter into a separate function. btr_validate_level(): Remove the unused variable "seg".	2017-08-23 09:47:50 +03:00
Marko Mäkelä	56ff6f1b0b	InnoDB: Remove dead code for DATA_POINT and DATA_VAR_POINT The POINT data type is being treated just like any other geometry data type in InnoDB. The fixed-length data type DATA_POINT had been introduced in WL#6942 based on a misunderstanding and without appropriate review. Because of fundamental design problems (such as a DEFAULT POINT(0 0) value secretly introduced by InnoDB), the code was disabled in Oracle Bug#20415831 fix. This patch removes the dead code and definitions that were left behind by the Oracle Bug#20415831 patch.	2017-07-03 12:17:10 +03:00
Marko Mäkelä	615b1f4189	Merge 10.1 into 10.2 innodb.table_flags: Adjust the test case. Due to the MDEV-12873 fix in 10.2, the corrupted flags for table test.td would be converted, and a tablespace flag mismatch will occur when trying to open the file.	2017-06-15 14:35:51 +03:00
Marko Mäkelä	58f87a41bd	Remove some fields from dict_table_t dict_table_t::thd: Remove. This was only used by btr_root_block_get() for reporting decryption failures, and it was only assigned by ha_innobase::open(), and never cleared. This could mean that if a connection is closed, the pointer would become stale, and the server could crash while trying to report the error. It could also mean that an error is being reported to the wrong client. It is better to use current_thd in this case, even though it could mean that if the code is invoked from an InnoDB background operation, there would be no connection to which to send the error message. Remove dict_table_t::crypt_data and dict_table_t::page_0_read. These fields were never read. fil_open_single_table_tablespace(): Remove the parameter "table".	2017-06-15 12:41:02 +03:00
Marko Mäkelä	2d8fdfbde5	Merge 10.1 into 10.2 Replace have_innodb_zip.inc with innodb_page_size_small.inc.	2017-06-08 12:45:08 +03:00
Marko Mäkelä	fbeb9489cd	Cleanup of MDEV-12600: crash during install_db with innodb_page_size=32K and ibdata1=3M The doublewrite buffer pages must fit in the first InnoDB system tablespace data file. The checks that were added in the initial patch (commit `112b21da37`) were at too high level and did not cover all cases. innodb.log_data_file_size: Test all innodb_page_size combinations. fsp_header_init(): Never return an error. Move the change buffer creation to the only caller that needs to do it. btr_create(): Clean up the logic. Remove the error log messages. buf_dblwr_create(): Try to return an error on non-fatal failure. Check that the first data file is big enough for creating the doublewrite buffers. buf_dblwr_process(): Check if the doublewrite buffer is available. Display the message only if it is available. recv_recovery_from_checkpoint_start_func(): Remove a redundant message about FIL_PAGE_FILE_FLUSH_LSN mismatch when crash recovery has already been initiated. fil_report_invalid_page_access(): Simplify the message. fseg_create_general(): Do not emit messages to the error log. innobase_init(): Revert the changes. trx_rseg_create(): Refactor (no functional change).	2017-06-08 11:55:47 +03:00
Jan Lindström	112b21da37	MDEV-12600: crash during install_db with innodb_page_size=32K and ibdata1=3M; Problem was that all doublewrite buffer pages must fit to first system datafile. Ported commit 27a34df7882b1f8ed283f22bf83e8bfc523cbfde Author: Shaohua Wang <shaohua.wang@oracle.com> Date: Wed Aug 12 15:55:19 2015 +0800 BUG#21551464 - SEGFAULT WHILE INITIALIZING DATABASE WHEN INNODB_DATA_FILE SIZE IS SMALL To 10.1 (with extended error printout). btr_create(): If ibuf header page allocation fails report error and return FIL_NULL. Similarly if root page allocation fails return a error. dict_build_table_def_step: If fsp_header_init fails return error code. fsp_header_init: returns true if header initialization succeeds and false if not. fseg_create_general: report error if segment or page allocation fails. innobase_init: If first datafile is smaller than 3M and could not contain all doublewrite buffer pages report error and fail to initialize InnoDB plugin. row_truncate_table_for_mysql: report error if fsp header init fails. srv_init_abort: New function to report database initialization errors. srv_undo_tablespaces_init, innobase_start_or_create_for_mysql: If database initialization fails report error and abort. trx_rseg_create: If segment header creation fails return.	2017-06-01 14:07:48 +03:00
Marko Mäkelä	021d636551	Fix some integer type mismatch. Use uint32_t for the encryption key_id. When filling unsigned integer values into INFORMATION_SCHEMA tables, use the method Field::store(longlong, bool unsigned) instead of using Field::store(double). Fix also some miscellanous type mismatch related to ulint (size_t).	2017-05-10 12:45:46 +03:00
Marko Mäkelä	b3939a35aa	MDEV-12720 recovery fails with "Generic error" for ROW_FORMAT=compressed This bug was introduced in the fix of MDEV-12123, which invoked page_zip_write_header() in the wrong way. page_zip_write_header(): Assert that the length is not zero, to be compatible with page_zip_parse_write_header(). btr_root_raise_and_insert(): Update the uncompressed page and then invoke page_zip_write_header() with the correct length.	2017-05-09 11:41:35 +03:00
Marko Mäkelä	c22ef4df26	MDEV-12253 post-merge fix: Use accessors for dict_table_t::file_unreadable	2017-05-06 15:54:31 +03:00
Marko Mäkelä	f9cc391863	Merge 10.1 into 10.2 This only merges MDEV-12253, adapting it to MDEV-12602 which is already present in 10.2 but not yet in the 10.1 revision that is being merged. TODO: Error handling in crash recovery needs to be improved. If a page cannot be decrypted (or read), we should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages. Currently, the test encryption.innodb-redo-badkey randomly fails like this (the last messages are from cmake -DWITH_ASAN): 2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994 2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1 2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption 2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown... i================================================================= ==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0 #0 0x736750 in operator delete(void) (/mariadb/server/build/sql/mysqld+0x736750) #1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4 #2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17 #3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3 #4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2 #5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2 #6 0x197e5e6 in innobase_init(void) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3	2017-05-05 10:38:53 +03:00
Jan Lindström	765a43605a	MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.	2017-04-26 15:19:16 +03:00
Marko Mäkelä	5684aa220c	MDEV-12488 Remove type mismatch in InnoDB printf-like calls Alias the InnoDB ulint and lint data types to size_t and ssize_t, which are the standard names for the machine-word-width data types. Correspondingly, define ULINTPF as "%zu" and introduce ULINTPFx as "%zx". In this way, better compiler warnings for type mismatch are possible. Furthermore, use PRIu64 for that 64-bit format, and define the feature macro __STDC_FORMAT_MACROS to enable it on Red Hat systems. Fix some errors in error messages, and replace some error messages with assertions. Most notably, an IMPORT TABLESPACE error message in InnoDB was displaying the number of columns instead of the mismatching flags.	2017-04-21 18:03:15 +03:00
Marko Mäkelä	d0ef1aaf61	MDEV-12123 Page contains nonzero PAGE_MAX_TRX_ID When MDEV-6076 repurposed the field PAGE_MAX_TRX_ID, it was assumed that the field always was 0 in the clustered index of old data files. This was not the case in IMPORT TABLESPACE (introduced in MySQL 5.6 and MariaDB 10.0), which is writing the transaction ID to all index pages, including clustered index pages. This means that on a data file that was at some point of its life IMPORTed to an InnoDB instance, MariaDB 10.2.4 or later could interpret the transaction ID as a persistent AUTO_INCREMENT value. This also means that future changes that repurpose PAGE_MAX_TRX_ID in the clustered index may cause trouble with files that were imported at some point of their life. There is a separate minor issue that InnoDB is writing PAGE_MAX_TRX_ID to every secondary index page, even though it is only needed on leaf pages. From now on we will write PAGE_MAX_TRX_ID as 0 to non-leaf pages, just to be able to keep stricter debug assertions. btr_root_raise_and_insert(): Reset the PAGE_MAX_TRX_ID field on non-root pages of the clustered index, and on the no-longer-leaf root page of secondary indexes. AbstractCallback::is_root_page(): Remove. Use page_is_root() instead. PageConverter::update_index_page(): Reset the PAGE_MAX_TRX_ID to 0 on other pages than the clustered index root page or secondary index leaf pages.	2017-04-19 07:59:24 +03:00
Marko Mäkelä	16b2b1eae5	Use page_is_leaf() where applicable	2017-04-17 03:50:10 +03:00
Marko Mäkelä	bb81828938	Use page_is_leaf() where applicable.	2017-04-17 03:20:34 +03:00
Marko Mäkelä	97acc4a1c3	MDEV-12270 Port MySQL 8.0 Bug#21141390 REMOVE UNUSED FUNCTIONS AND CONVERT GLOBAL SYMBOLS TO STATIC InnoDB defines some functions that are not called at all. Other functions are called, but only from the same compilation unit. Remove some function declarations and definitions, and add 'static' keywords. Some symbols must be kept for separately compiled tools, such as innochecksum.	2017-03-17 12:48:50 +02:00
Marko Mäkelä	4e1116b2c6	MDEV-12271 Port MySQL 8.0 Bug#23150562 REMOVE UNIV_MUST_NOT_INLINE AND UNIV_NONINL Also, remove empty .ic files that were not removed by my MySQL commit. Problem: InnoDB used to support a compilation mode that allowed to choose whether the function definitions in .ic files are to be inlined or not. This stopped making sense when InnoDB moved to C++ in MySQL 5.6 (and ha_innodb.cc started to #include .ic files), and more so in MySQL 5.7 when inline methods and functions were introduced in .h files. Solution: Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from all files, assuming that the symbols are never defined. Remove the files fut0fut.cc and ut0byte.cc which only mattered when UNIV_NONINL was defined.	2017-03-17 12:42:07 +02:00
Marko Mäkelä	89d80c1b0b	Fix many -Wconversion warnings. Define my_thread_id as an unsigned type, to avoid mismatch with ulonglong. Change some parameters to this type. Use size_t in a few more places. Declare many flag constants as unsigned to avoid sign mismatch when shifting bits or applying the unary ~ operator. When applying the unary ~ operator to enum constants, explictly cast the result to an unsigned type, because enum constants can be treated as signed. In InnoDB, change the source code line number parameters from ulint to unsigned type. Also, make some InnoDB functions return a narrower type (unsigned or uint32_t instead of ulint; bool instead of ibool).	2017-03-07 19:07:27 +02:00
Marko Mäkelä	7cf97ed4ee	MDEV-11816 Disallow CREATE TEMPORARY TABLE…ROW_FORMAT=COMPRESSED MySQL 5.7 allows temporary tables to be created in ROW_FORMAT=COMPRESSED. The usefulness of this is questionable. WL#7899 in MySQL 8.0.0 prevents the creation of such compressed tables, so that all InnoDB temporary tables will be located inside the predefined InnoDB temporary tablespace. Pick up and adjust some tests from MySQL 5.7 and 8.0. dict_tf_to_fsp_flags(): Remove the parameter is_temp. fsp_flags_init(): Remove the parameter is_temporary. row_mysql_drop_temp_tables(): Remove. There cannot be any temporary tables in InnoDB. (This never removed #sql* tables in the datadir which were created by DDL.) dict_table_t::dir_path_of_temp_table: Remove. create_table_info_t::m_temp_path: Remove. create_table_info_t::create_options_are_invalid(): Do not allow ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE for temporary tables. create_table_info_t::innobase_table_flags(): Do not unnecessarily prevent CREATE TEMPORARY TABLE with SPATIAL INDEX. (MySQL 5.7 does allow this.) fil_space_belongs_in_lru(): The only FIL_TYPE_TEMPORARY tablespace is never subjected to closing least-recently-used files.	2017-01-18 08:42:57 +02:00
Marko Mäkelä	494e4b99a4	Remove MYSQL_TABLESPACES. MySQL 5.7 introduced partial support for user-created shared tablespaces (for example, import and export are not supported). MariaDB Server does not support tablespaces at this point of time. Let us remove most InnoDB code and data structures that is related to shared tablespaces.	2017-01-18 08:30:43 +02:00
Marko Mäkelä	70c11485d2	Remove MYSQL_ENCRYPTION. MariaDB will likely never support MySQL-style encryption for InnoDB, because we cannot link with the Oracle encryption plugin. This is preparation for merging MDEV-11623.	2017-01-18 08:30:42 +02:00
Marko Mäkelä	63574f1275	MDEV-11690 Remove UNIV_HOTBACKUP The InnoDB source code contains quite a few references to a closed-source hot backup tool which was originally called InnoDB Hot Backup (ibbackup) and later incorporated in MySQL Enterprise Backup. The open source backup tool XtraBackup uses the full database for recovery. So, the references to UNIV_HOTBACKUP are only cluttering the source code.	2016-12-30 16:05:42 +02:00
Marko Mäkelä	7bcae22bf1	Merge branch 'bb-10.2-mdev-6076' into 10.2	2016-12-29 15:05:04 +02:00
Sergei Golubchik	4a5d25c338	Merge branch '10.1' into 10.2	2016-12-29 13:23:18 +01:00
Marko Mäkelä	8375a2c1ce	MDEV-11585 Hard-code the shared InnoDB temporary tablespace ID at -1 MySQL 5.7 supports only one shared temporary tablespace. MariaDB 10.2 does not support any other shared InnoDB tablespaces than the two predefined tablespaces: the persistent InnoDB system tablespace (default file name ibdata1) and the temporary tablespace (default file name ibtmp1). InnoDB is unnecessarily allocating a tablespace ID for the predefined temporary tablespace on every startup, and it is in several places testing whether a tablespace ID matches this dynamically generated ID. We should use a compile-time constant to reduce code size and to avoid unnecessary updates to the DICT_HDR page at every startup. Using a hard-coded tablespace ID will should make it easier to remove the TEMPORARY flag from FSP_SPACE_FLAGS in MDEV-11202.	2016-12-19 16:24:10 +02:00
Marko Mäkelä	c64edc6b83	MDEV-6076: Preserve PAGE_ROOT_AUTO_INC when emptying pages. Thanks to Zhangyuan from Alibaba for pointing out this bug. btr_page_empty(): When a clustered index root page is emptied, preserve PAGE_ROOT_AUTO_INC. This would occur during a page split. page_create_empty(): Preserve PAGE_ROOT_AUTO_INC when a clustered index root page becomes empty. Use a faster method for writing the field. page_zip_copy_recs(): Reset PAGE_MAX_TRX_ID when copying clustered index pages. We must clear the field when the root page was a leaf page and it is being split, so that PAGE_MAX_TRX_ID will continue to be 0 in clustered index non-root pages. page_create_zip(): Add debug assertions for validating PAGE_MAX_TRX_ID and PAGE_ROOT_AUTO_INC.	2016-12-16 10:26:41 +02:00
Marko Mäkelä	8777458a6e	MDEV-6076 Persistent AUTO_INCREMENT for InnoDB This should be functionally equivalent to WL#6204 in MySQL 8.0.0, with the notable difference that the file format changes are limited to repurposing a previously unused data field in B-tree pages. For persistent InnoDB tables, write the last used AUTO_INCREMENT value to the root page of the clustered index, in the previously unused (0) PAGE_MAX_TRX_ID field, now aliased as PAGE_ROOT_AUTO_INC. Unlike some other previously unused InnoDB data fields, this one was actually always zero-initialized, at least since MySQL 3.23.49. The writes to PAGE_ROOT_AUTO_INC are protected by SX or X latch on the root page. The SX latch will allow concurrent read access to the root page. (The field PAGE_ROOT_AUTO_INC will only be read on the first-time call to ha_innobase::open() from the SQL layer. The PAGE_ROOT_AUTO_INC can only be updated when executing SQL, so read/write races are not possible.) During INSERT, the PAGE_ROOT_AUTO_INC is updated by the low-level function btr_cur_search_to_nth_level(), adding no extra page access. [Adaptive hash index lookup will be disabled during INSERT.] If some rare UPDATE modifies an AUTO_INCREMENT column, the PAGE_ROOT_AUTO_INC will be adjusted in a separate mini-transaction in ha_innobase::update_row(). When a page is reorganized, we have to preserve the PAGE_ROOT_AUTO_INC field. During ALTER TABLE, the initial AUTO_INCREMENT value will be copied from the table. ALGORITHM=COPY and online log apply in LOCK=NONE will update PAGE_ROOT_AUTO_INC in real time. innodb_col_no(): Determine the dict_table_t::cols[] element index corresponding to a Field of a non-virtual column. (The MySQL 5.7 implementation of virtual columns breaks the 1:1 relationship between Field::field_index and dict_table_t::cols[]. Virtual columns are omitted from dict_table_t::cols[]. Therefore, we must translate the field_index of AUTO_INCREMENT columns into an index of dict_table_t::cols[].) Upgrade from old data files: By default, the AUTO_INCREMENT sequence in old data files would appear to be reset, because PAGE_MAX_TRX_ID or PAGE_ROOT_AUTO_INC would contain the value 0 in each clustered index page. In new data files, PAGE_ROOT_AUTO_INC can only be 0 if the table is empty or does not contain any AUTO_INCREMENT column. For backward compatibility, we use the old method of SELECT MAX(auto_increment_column) for initializing the sequence. btr_read_autoinc(): Read the AUTO_INCREMENT sequence from a new-format data file. btr_read_autoinc_with_fallback(): A variant of btr_read_autoinc() that will resort to reading MAX(auto_increment_column) for data files that did not use AUTO_INCREMENT yet. It was manually tested that during the execution of innodb.autoinc_persist the compatibility logic is not activated (for new files, PAGE_ROOT_AUTO_INC is never 0 in nonempty clustered index root pages). initialize_auto_increment(): Replaces ha_innobase::innobase_initialize_autoinc(). This initializes the AUTO_INCREMENT metadata. Only called from ha_innobase::open(). ha_innobase::info_low(): Do not try to lazily initialize dict_table_t::autoinc. It must already have been initialized by ha_innobase::open() or ha_innobase::create(). Note: The adjustments to class ha_innopart were not tested, because the source code (native InnoDB partitioning) is not being compiled.	2016-12-16 09:19:19 +02:00
Marko Mäkelä	c868acdf65	MDEV-11487 Revert InnoDB internal temporary tables from WL#7682 WL#7682 in MySQL 5.7 introduced the possibility to create light-weight temporary tables in InnoDB. These are called 'intrinsic temporary tables' in InnoDB, and in MySQL 5.7, they can be created by the optimizer for sorting or buffering data in query processing. In MariaDB 10.2, the optimizer temporary tables cannot be created in InnoDB, so we should remove the dead code and related data structures.	2016-12-09 12:05:07 +02:00
Jan Lindström	2bedc3978b	MDEV-9931: InnoDB reads first page of every .ibd file at startup Analysis: By design InnoDB was reading first page of every .ibd file at startup to find out is tablespace encrypted or not. This is because tablespace could have been encrypted always, not encrypted newer or encrypted based on configuration and this information can be find realible only from first page of .ibd file. Fix: Do not read first page of every .ibd file at startup. Instead whenever tablespace is first time accedded we will read the first page to find necessary information about tablespace encryption status. TODO: Add support for SYS_TABLEOPTIONS where all table options encryption information included will be stored.	2016-09-22 16:38:24 +03:00
Jan Lindström	fec844aca8	Merge InnoDB 5.7 from mysql-5.7.14. Contains also: MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE \|\| m_lock_type != 2' failed. (branch bb-10.2-jan) Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan) enable tests that were fixed in MDEV-10549 MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan) fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables	2016-09-08 15:49:03 +03:00
Jan Lindström	2e814d4702	Merge InnoDB 5.7 from mysql-5.7.9. Contains also MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7 The failure happened because 5.7 has changed the signature of the bool handler::primary_key_is_clustered() const virtual function ("const" was added). InnoDB was using the old signature which caused the function not to be used. MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7 Fixed mutexing problem on lock_trx_handle_wait. Note that rpl_parallel and rpl_optimistic_parallel tests still fail. MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan) Reason: incorrect merge MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan) Reason: incorrect merge	2016-09-02 13:22:28 +03:00
Sergei Golubchik	3361aee591	Merge branch '10.0' into 10.1	2016-06-28 22:01:55 +02:00
Sergei Golubchik	720e04ff67	5.6.31	2016-06-21 14:21:03 +02:00
Jan Lindström	36ca65b73b	MDEV-9559: Server without encryption configs crashes if selecting from an implicitly encrypted table There was two problems. Firstly, if page in ibuf is encrypted but decrypt failed we should not allow InnoDB to start because this means that system tablespace is encrypted and not usable. Secondly, if page decrypt is detected we should return false from buf_page_decrypt_after_read.	2016-02-17 12:32:07 +02:00
Sergei Golubchik	a2bcee626d	Merge branch '10.0' into 10.1	2015-12-21 21:24:22 +01:00

1 2

73 commits