mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 19:11:46 +01:00

Author	SHA1	Message	Date
Vicențiu Ciorbaru	65eefcdc60	Merge remote-tracking branch '10.2' into 10.3	2018-04-12 12:41:19 +03:00
Vicențiu Ciorbaru	45e6d0aebf	Merge branch '10.1' into 10.2	2018-04-10 17:43:18 +03:00
Marko Mäkelä	3498a656c9	MDEV-14705: Follow-up fixes buf_flush_remove(): Disable the output for now, because we certainly do not want this after every page flush on shutdown. It must be rate-limited somehow. There already is a timeout extension for waiting the page cleaner to exit in logs_empty_and_mark_files_at_shutdown(). log_write_up_to(): Use correct format. srv_purge_should_exit(): Move the timeout extension to the appropriate place, from one of the callers.	2018-04-06 12:29:25 +03:00
Daniel Black	1479273cdb	MDEV-14705: slow innodb startup/shutdown can exceed systemd timeout Use systemd EXTEND_TIMEOUT_USEC to advise systemd of progress Move towards progress measures rather than pure time based measures. Progress reporting at numberious shutdown/startup locations incuding: * For innodb_fast_shutdown=0 trx_roll_must_shutdown() for rolling back incomplete transactions. * For merging the change buffer (in srv_shutdown(bool ibuf_merge)) * For purging history, srv_do_purge Thanks Marko for feedback and suggestions.	2018-04-06 09:58:14 +03:00
Marko Mäkelä	4cad42392a	MDEV-12266: Change dict_table_t::space to fil_space_t* InnoDB always keeps all tablespaces in the fil_system cache. The fil_system.LRU is only for closing file handles; the fil_space_t and fil_node_t for all data files will remain in main memory. Between startup to shutdown, they can only be created and removed by DDL statements. Therefore, we can let dict_table_t::space point directly to the fil_space_t. dict_table_t::space_id: A numeric tablespace ID for the corner cases where we do not have a tablespace. The most prominent examples are ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file. There are a few functional differences; most notably: (1) DROP TABLE will delete matching .ibd and .cfg files, even if they were not attached to the data dictionary. (2) Some error messages will report file names instead of numeric IDs. There still are many functions that use numeric tablespace IDs instead of fil_space_t, and many functions could be converted to fil_space_t member functions. Also, Tablespace and Datafile should be merged with fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use fil_space_t& instead of a numeric ID, and after moving to a single buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to fil_space_t::page_hash. FilSpace: Remove. Only few calls to fil_space_acquire() will remain, and gradually they should be removed. mtr_t::set_named_space_id(ulint): Renamed from set_named_space(), to prevent accidental calls to this slower function. Very few callers remain. fseg_create(), fsp_reserve_free_extents(): Take fil_space_t as a parameter instead of a space_id. fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(), fil_name_write_rename(), fil_rename_tablespace(). Mariabackup passes the parameter log=false; InnoDB passes log=true. dict_mem_table_create(): Take fil_space_t* instead of space_id as parameter. dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter 'status' with 'bool cached'. dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name. fil_ibd_open(): Return the tablespace. fil_space_t::set_imported(): Replaces fil_space_set_imported(). truncate_t: Change many member function parameters to fil_space_t*, and remove page_size parameters. row_truncate_prepare(): Merge to its only caller. row_drop_table_from_cache(): Assert that the table is persistent. dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL if the tablespace has been discarded. row_import_update_discarded_flag(): Remove a constant parameter.	2018-03-29 22:02:05 +03:00
Sergei Golubchik	b1818dccf7	Merge branch '10.2' into 10.3	2018-03-28 17:31:57 +02:00
Thirunarayanan Balathandayuthapani	76ae6e725d	MDEV-15384 buf_flush_LRU_list_batch() always reports n->flushed=0, n->evicted=0 - buf_flush_LRU_list_batch() initializes the count to zero and updates them correctly.	2018-03-13 15:25:38 +05:30
Marko Mäkelä	71f9cc1221	MDEV-15554 InnoDB page_cleaner shutdown sometimes hangs buf_flush_page_cleaner_coordinator(): Signal the worker threads to exit while waiting for them to exit. Apparently, signals are sometimes lost, causing shutdown to occasionally hang when multiple page cleaners (and buffer pool instances) are used, that is, when innodb_buffer_pool_size is at least 1 GiB. buf_flush_page_cleaner_close(): Merge with the only caller.	2018-03-13 09:41:42 +02:00
Marko Mäkelä	7fb03d7abf	Merge bb-10.2-ext into 10.3	2018-03-13 08:15:06 +02:00
Marko Mäkelä	112df06996	MDEV-15529 IMPORT TABLESPACE unnecessarily uses the doublewrite buffer fil_space_t::atomic_write_supported: Always set this flag for TEMPORARY TABLESPACE and during IMPORT TABLESPACE. The page writes during these operations are by definition not crash-safe because they are not written to the redo log. fil_space_t::use_doublewrite(): Determine if doublewrite should be used. buf_dblwr_update(): Add assertions, and let the caller check whether doublewrite buffering is desired. buf_flush_write_block_low(): Disable the doublewrite buffer for the temporary tablespace and for IMPORT TABLESPACE. fil_space_set_imported(), fil_node_open_file(), fil_space_create(): Initialize or revise the space->atomic_write_supported flag. buf_page_io_complete(), buf_flush_write_complete(): Add the parameter dblwr, to indicate whether doublewrite was used for writes. buf_dblwr_sync_datafiles(): Remove an unnecessary flush of persistent tablespaces when flushing temporary tablespaces. (Move the call to buf_dblwr_flush_buffered_writes().)	2018-03-10 11:54:34 +02:00
Marko Mäkelä	54765aaa4d	MDEV-15524 Do not disable page checksums for temporary tables buf_flush_init_for_writing(): Remove the parameter skip_checksum.	2018-03-10 11:54:34 +02:00
Marko Mäkelä	09c5c335e3	Follow-up to MDEV-13690: Remove unused globals buf_flush_batch(), buf_flush_stats(): Declare static. buf_flush_common(): Remove (unused function).	2018-03-09 20:33:41 +02:00
Vladislav Vaintroub	56e7b7eaed	Make possible to use clang on Windows (clang-cl) -DWITH_ASAN can be used as well now, on x64 Fix many clang-cl warnings.	2018-02-20 21:17:36 +00:00
Vladislav Vaintroub	6c279ad6a7	MDEV-15091 : Windows, 64bit: reenable and fix warning C4267 (conversion from 'size_t' to 'type', possible loss of data) Handle string length as size_t, consistently (almost always:)) Change function prototypes to accept size_t, where in the past ulong or uint were used. change local/member variables to size_t when appropriate. This fix excludes rocksdb, spider,spider, sphinx and connect for now.	2018-02-06 12:55:58 +00:00
Marko Mäkelä	9875d5c3e1	Merge bb-10.2-ext into 10.3	2018-01-24 14:00:33 +02:00
Marko Mäkelä	c269f1d6fe	Allocate page_cleaner and page_cleaner.slot[] statically	2018-01-24 11:10:33 +02:00
Marko Mäkelä	ac3e7f788e	MDEV-15016: multiple page cleaner threads use a lot of CPU While the bug was reported as a regression of MDEV-11025 Make number of page cleaner threads variable dynamic in MariaDB Server 10.3, the code that MariaDB Server 10.2 inherited from MySQL 5.7.4 (WL#6642) looks prone to similar errors. pc_flush_slot(): If there is no work to do, reset the is_requested signal, to avoid potential busy-waiting in buf_flush_page_cleaner_worker(). If the coordinator thread has shut down, avoid resetting the is_requested event, to avoid a potential hang at shutdown if there are multiple worker threads.	2018-01-24 11:10:33 +02:00
Marko Mäkelä	976f6fb1b6	Merge bb-10.2-ext into 10.3	2017-12-06 19:36:33 +02:00
Marko Mäkelä	d1ab89037a	MDEV-13670/MDEV-14550 Error log flood : "InnoDB: page_cleaner: 1000ms intended loop took N ms. The settings might not be optimal." Silence the error log output that was introduced in MySQL 5.7 (MariaDB 10.2.2) if log_warnings=2 or less. We should still figure out what these messages really indicate and how to solve the problems. pc_sleep_if_needed(): Add a parameter for the current time, so that there will be fewer successive calls to ut_time_ms() with no I/O between them. buf_flush_page_cleaner_coordinator(): Exit the first loop whenever shutdown has been requested. At the start of the loop, call ut_time_ms() only once. Do not display the message if log_warnings=2 or less.	2017-12-05 12:58:09 +02:00
Marko Mäkelä	7cb3520c06	Merge bb-10.2-ext into 10.3	2017-11-30 08:16:37 +02:00
Sergey Vojtovich	17529785f1	Fix hang in buf_flush_set_page_cleaner_thread_cnt Running mysqld with innodb-buffer-pool-instances > 1 hangs on startup. On startup wrong variables was being used to detect number of page cleaner threads. As a result no threads were actually started. And subsequent code waits for threads to start forever. Fixed by using page_cleaner->n_workers, which holds number of page cleaner threads (0 at startup) instead of srv_n_page_cleaners, which holds number of requested page cleaner threads (4 by default).	2017-11-24 15:01:55 +04:00
Marko Mäkelä	ce64a65f27	MDEV-14310 Possible corruption by table-rebuilding or index-creating ALTER TABLE…ALGORITHM=INPLACE Also, MDEV-14317 When ALTER TABLE is aborted, do not write garbage pages to data files As pointed out by Shaohua Wang, the merge of MDEV-13328 from MariaDB 10.1 (based on MySQL 5.6) to 10.2 (based on 5.7) was performed incorrectly. Let us always pass a non-NULL FlushObserver* when writing to data files is desired. FlushObserver::is_partial_flush(): Check if this is a bulk-load (partial flush of the tablespace). FlushObserver::is_interrupted(): Check for interrupt status. buf_LRU_flush_or_remove_pages(): Instead of trx_t, take FlushObserver as a parameter. buf_flush_or_remove_pages(): Remove the parameters flush, trx. If observer!=NULL, write out the data pages. Use the new predicate observer->is_partial() to distinguish a partial tablespace flush (after bulk-loading) from a full tablespace flush (export). Return a bool (whether all pages were removed from the flush_list). buf_flush_dirty_pages(): Remove the parameter trx.	2017-11-20 13:26:56 +02:00
Marko Mäkelä	a48aa0cd56	Merge bb-10.2-ext into 10.3	2017-11-10 16:12:45 +02:00
Marko Mäkelä	843e4508c0	Merge 10.1 into 10.2	2017-11-07 23:02:39 +02:00
Marko Mäkelä	a4feb04ace	MDEV-14310 ALTER TABLE…ADD INDEX may corrupt the InnoDB system tablespace FlushObserver::flush(): Never discard unwritten changes. We do not want to risk corrupting the system tablespace or .ibd files that are not part of a table-rebuilding ALTER.	2017-11-07 23:00:51 +02:00
Alexander Barkov	835cbbcc7b	Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3 TODO: enable MDEV-13049 optimization for 10.3	2017-10-30 20:47:39 +04:00
Jan Lindström	b23a109695	MDEV-11025: Make number of page cleaner threads variable dynamic New test cases innodb-page-cleaners Modified test cases innodb_page_cleaners_basic New function buf_flush_set_page_cleaner_thread_cnt Increase or decrease the amount of page cleaner worker threads. In case of increase this function creates based on current abount and requested amount how many new threads should be created. In case of decrease this function sets up the requested amount of threads and uses is_requested event to signal workers. Then we wait until all new treads are started, old threads that should exit signal is_finished or shutdown has marked that page cleaner should finish. buf_flush_page_cleaner_worker Store current thread id and thread_no and then signal event is_finished. If number of used page cleaner threads decrease we shut down those threads that have thread_no greater or equal than number of page configured page cleaners - 1 (note that there will be always page cleaner coordinator). Before exiting we signal is_finished. New function innodb_page_cleaners_threads_update Update function for innodb-page-cleaners system variable. innobase_start_or_create_for_mysql If more than one page cleaner threads is configured we use new function buf_flush_set_page_cleaner_thread_cnt to set up the requested threads (-1 coordinator).	2017-10-24 19:12:59 +03:00
Vladislav Vaintroub	da05d0276a	merge 10.1->10.2 Some innobase/xtrabackup changes around from 10.1 are null merged , in partucular using os_set_file_size to extend tablespaces in server or mariabackup. They require non-trivial amount of additional work in 10.2, due to innobase differences between 10.1 and 10.2	2017-10-07 17:43:26 +00:00
Marko Mäkelä	f9b50c0657	MDEV-13512 buf_flush_update_zip_checksum() corrupts SPATIAL INDEX in ROW_FORMAT=COMPRESSED tables In MariaDB Server 10.1, this problem manifests itself only as a debug assertion failure in page_zip_decompress() when an insert requires a page to be decompressed. In MariaDB 10.1, the encryption of InnoDB data files repurposes the previously unused field FILE_FLUSH_LSN for an encryption key version. This field was only used in the first page of each file of the system tablespace. For ROW_FORMAT=COMPRESSED tables, the field was always written as 0 until encryption was implemented. There is no bug in the encryption, because the buffer pool blocks will not be written to files. Instead, copies of the blocks will be encrypted. In these encrypted copies, the key version field will be updated before the buffer is written to the file. The field in the buffer pool is basically garbage that does not really matter. Already in MariaDB 10.0, the memset() calls to reset this unused field in buf_flush_update_zip_checksum() and buf_flush_write_block_low() are unnecessary, because fsp_init_file_page_low() would guarantee that the field is always 0 in the buffer pool (unless 10.1 encryption is used). Removing the unnecessary memset() calls makes page_zip_decompress() happy and will prevent a SPATIAL INDEX corruption bug in MariaDB Server 10.2. In MySQL 5.7.5, as part of WL#6968, the same field was repurposed for an R-tree split sequence number (SSN) and these memset() were removed. (Because of the repurposing, MariaDB encryption is not available for tables that contain SPATIAL INDEX.)	2017-10-06 17:51:29 +03:00
Marko Mäkelä	a4948dafcd	MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG \| REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().	2017-10-06 09:50:10 +03:00
Jan Lindström	016c35a7f2	MDEV-13690: Remove unnecessary innodb_use_mtflush, innodb_mtflush_threads parameters and related code Users can use innodb-page-cleaners instead.	2017-09-01 18:33:46 +03:00
Marko Mäkelä	bfffe571ac	Fix some GCC 7 warnings for InnoDB buf_page_io_complete(): Do not test bpage for NULL, because it is declared (and always passed) as nonnull. buf_flush_batch(): Remove the constant local variable count=0. fil_ibd_load(): Use magic comment to suppress -Wimplicit-fallthrough. ut_stage_alter_t::inc(ulint): Disable references to an unused parameter. lock_queue_validate(), sync_array_find_thread(), rbt_check_ordering(): Define only in debug builds.	2017-08-10 14:00:51 +03:00
Marko Mäkelä	72a2de92a1	Avoid a hang when InnoDB startup is aborted during redo log apply buf_flush_page_cleaner_coordinator: In the first loop, use an appropriate termination condition, waiting for !recv_writer_thread_active. logs_empty_and_mark_files_at_shutdown(): Signal recv_sys->flush_start in case the recv_writer_thread was never started, or buf_flush_page_cleaner_coordinator failed to notice its termination. innobase_start_or_create_for_mysql(): Remove a redundant, unreachable condition, and properly release resources when aborting startup due to recv_sys->found_corrupt_log.	2017-07-05 22:09:28 +03:00
Marko Mäkelä	0d69d313a1	Prevent interleaved error log output on InnoDB startup buf_flush_page_cleaner_coordinator(): Signal the thread creator that the error log output regarding setpriority() has been issued. innobase_start_or_create_for_mysql(): Wait for buf_flush_page_cleaner_coordinator() to completely start up. This prevents sporadic failures of tests that search the server error log for InnoDB redo log recovery messages.	2017-06-23 22:54:42 +03:00
Marko Mäkelä	a71c870ebf	InnoDB: Reset unknown TRX_SYS page type to FIL_PAGE_TYPE_TRX_SYS buf_flush_init_for_writing(): Reset the FIL_PAGE_TYPE of the TRX_SYS page to the canonical value FIL_PAGE_TYPE_TRX_SYS instead of FIL_PAGE_TYPE_UNKNOWN.	2017-06-22 09:39:04 +03:00
Marko Mäkelä	50faeda4d6	Remove trx_t::has_search_latch and simplify debug code When the btr_search_latch was split into an array of latches in MySQL 5.7.8 as part of the Oracle Bug#20985298 fix, the "caching" of the latch across storage engine API calls was removed, and the field trx->has_search_latch would only be set during a short time frame in the execution of row_search_mvcc(), which was formerly called row_search_for_mysql(). This means that the column INFORMATION_SCHEMA.INNODB_TRX.TRX_ADAPTIVE_HASH_LATCHED will always report 0. That column cannot be removed in MariaDB 10.2, but it can be removed in future releases. trx_t::has_search_latch: Remove. trx_assert_no_search_latch(): Remove. row_sel_try_search_shortcut_for_mysql(): Remove a redundant condition on trx->has_search_latch (it was always true). sync_check_iterate(): Make the parameter const. sync_check_functor_t: Make the operator() const, and remove result() and the virtual destructor. There is no need to have mutable state in the functors. sync_checker<bool>: Replaces dict_sync_check and btrsea_sync_check. sync_check: Replaces btrsea_sync_check. dict_sync_check: Instantiated from sync_checker. sync_allowed_latches: Use std::find() directly on the array. Remove the std::vector. TrxInInnoDB::enter(), TrxInInnoDB::exit(): Remove obviously redundant debug assertions on trx->in_depth, and use equality comparison against 0 because it could be more efficient on some architectures.	2017-06-16 13:17:05 +03:00
Marko Mäkelä	a78476d342	Merge 10.1 into 10.2	2017-06-12 17:43:07 +03:00
Marko Mäkelä	fa57479fcd	Merge 10.0 into 10.1	2017-06-12 14:26:32 +03:00
Marko Mäkelä	417434f12d	MDEV-13039 innodb_fast_shutdown=0 may fail to purge all undo log When a slow shutdown is performed soon after spawning some work for background threads that can create or commit transactions, it is possible that new transactions are started or committed after the purge has finished. This is violating the specification of innodb_fast_shutdown=0, namely that the purge must be completed. (None of the history of the recent transactions would be purged.) Also, it is possible that the purge threads would exit in slow shutdown while there exist active transactions, such as recovered incomplete transactions that are being rolled back. Thus, the slow shutdown could fail to purge some undo log that becomes purgeable after the transaction commit or rollback. srv_undo_sources: A flag that indicates if undo log can be generated or the persistent, whether by background threads or by user SQL. Even when this flag is clear, active transactions that already exist in the system may be committed or rolled back. innodb_shutdown(): Renamed from innobase_shutdown_for_mysql(). Do not return an error code; the operation never fails. Clear the srv_undo_sources flag, and also ensure that the background DROP TABLE queue is empty. srv_purge_should_exit(): Do not allow the purge to exit if srv_undo_sources are active or the background DROP TABLE queue is not empty, or in slow shutdown, if any active transactions exist (and are being rolled back). srv_purge_coordinator_thread(): Remove some previous workarounds for this bug. innobase_start_or_create_for_mysql(): Set buf_page_cleaner_is_active and srv_dict_stats_thread_active directly. Set srv_undo_sources before starting the purge subsystem, to prevent immediate shutdown of the purge. Create dict_stats_thread and fts_optimize_thread immediately after setting srv_undo_sources, so that shutdown can use this flag to determine if these subsystems were started. dict_stats_shutdown(): Shut down dict_stats_thread. Backported from 10.2. srv_shutdown_table_bg_threads(): Remove (unused).	2017-06-09 16:20:42 +03:00
Marko Mäkelä	8f643e2063	Merge 10.1 into 10.2	2017-05-23 11:09:47 +03:00
Marko Mäkelä	a4d4a5fe82	After-merge fix for MDEV-11638 In commit `360a4a0372` some debug assertions were introduced to the page flushing code in XtraDB. Add these assertions to InnoDB as well, and adjust the InnoDB shutdown so that these assertions will not fail. logs_empty_and_mark_files_at_shutdown(): Advance srv_shutdown_state from the first phase SRV_SHUTDOWN_CLEANUP only after no page-dirtying activity is possible (well, except by srv_master_do_shutdown_tasks(), which will be fixed separately in MDEV-12052). rotate_thread_t::should_shutdown(): Already exit the key rotation threads at the first phase of shutdown (SRV_SHUTDOWN_CLEANUP). page_cleaner_sleep_if_needed(): Do not sleep during shutdown. This change is originally from XtraDB.	2017-05-20 08:41:34 +03:00
Marko Mäkelä	65e1399e64	Merge 10.0 into 10.1 Significantly reduce the amount of InnoDB, XtraDB and Mariabackup code changes by defining pfs_os_file_t as something that is transparently compatible with os_file_t.	2017-05-20 08:41:20 +03:00
Vicențiu Ciorbaru	b87873b221	Merge branch 'merge-innodb-5.6' into bb-10.0-vicentiu This merge reverts commit 6ca4f693c1ce472e2b1bf7392607c2d1124b4293 from current 5.6.36 innodb. Bug #23481444 OPTIMISER CALL ROW_SEARCH_MVCC() AND READ THE INDEX APPLIED BY UNCOMMITTED ROW Problem: ======== row_search_for_mysql() does whole table traversal for range query even though the end range is passed. Whole table traversal happens when the record is not with in transaction read view. Solution: ========= Convert the innodb last record of page to mysql format and compare with end range if the traversal of row_search_mvcc() exceeds 100, no ICP involved. If it is out of range then InnoDB can avoid the whole table traversal. Need to refactor the code little bit to make it compile. Reviewed-by: Jimmy Yang <jimmy.yang@oracle.com> Reviewed-by: Knut Hatlen <knut.hatlen@oracle.com> Reviewed-by: Dmitry Shulga <dmitry.shulga@oracle.com> RB: 14660	2017-05-17 14:53:28 +03:00
Vicențiu Ciorbaru	0af9818240	5.6.36	2017-05-15 17:17:16 +03:00
Marko Mäkelä	f9cc391863	Merge 10.1 into 10.2 This only merges MDEV-12253, adapting it to MDEV-12602 which is already present in 10.2 but not yet in the 10.1 revision that is being merged. TODO: Error handling in crash recovery needs to be improved. If a page cannot be decrypted (or read), we should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages. Currently, the test encryption.innodb-redo-badkey randomly fails like this (the last messages are from cmake -DWITH_ASAN): 2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994 2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1 2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption 2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown... i================================================================= ==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0 #0 0x736750 in operator delete(void) (/mariadb/server/build/sql/mysqld+0x736750) #1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4 #2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17 #3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3 #4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2 #5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2 #6 0x197e5e6 in innobase_init(void) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3	2017-05-05 10:38:53 +03:00
Marko Mäkelä	b82c602db5	MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().	2017-04-28 14:12:52 +03:00
Marko Mäkelä	4b24467ff3	MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. fil_space_free_low(): Wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. Future calls to fil_space_acquire_for_io() will not find this tablespace, because it will already have been detached from fil_system. Adjust a number of places accordingly, and remove some redundant tablespace lookups. FIXME: buf_page_check_corrupt() should take a tablespace from fil_space_acquire_for_io() as a parameter. This will be done in the 10.1 version of this patch and merged from there. That depends on MDEV-12253, which has not been merged from 10.1 yet.	2017-04-28 12:23:35 +03:00
Jan Lindström	765a43605a	MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.	2017-04-26 15:19:16 +03:00
Marko Mäkelä	0871a00a62	MDEV-12545 Reduce the amount of fil_space_t lookups buf_flush_write_block_low(): Acquire the tablespace reference once, and pass it to lower-level functions. This is only a start; further calls may be removed. fil_decompress_page(): Remove unsafe use of fil_space_get_by_id().	2017-04-21 18:12:10 +03:00
Marko Mäkelä	996c7d5cb5	MDEV-12545 Reduce the amount of fil_space_t lookups buf_flush_write_block_low(): Acquire the tablespace reference once, and pass it to lower-level functions. This is only a start; further calls may be removed later.	2017-04-21 17:47:23 +03:00

1 2 3

104 commits