mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-01-31 02:51:44 +01:00

Author	SHA1	Message	Date
Marko Mäkelä	be85d3e61b	Merge 10.2 into 10.3	2019-05-14 17:18:46 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	c0ac0b8860	Update FSF address	2019-05-11 19:25:02 +03:00
Marko Mäkelä	b6f4cccd19	Merge 10.2 into 10.3	2019-05-03 20:14:09 +03:00
Marko Mäkelä	ce195987c3	MDEV-19385: Inconsistent definition of dtuple_get_nth_v_field() The accessor dtuple_get_nth_v_field() was defined differently between debug and release builds in MySQL 5.7.8 in mysql/mysql-server@c47e1751b7 and a debug assertion to document or enforce the questionable assumption tuple->v_fields == &tuple->fields[tuple->n_fields] was missing. This was apparently no problem until MDEV-11369 introduced instant ADD COLUMN to MariaDB Server 10.3. With that work present, in one test case, trx_undo_report_insert_virtual() could in release builds fetch the wrong value for a virtual column. We replace many of the dtuple_t accessors with const-preserving inline functions, and fix missing or misleadingly applied const qualifiers accordingly.	2019-05-03 20:02:50 +03:00
Marko Mäkelä	fd58bb71e2	Merge 10.2 into 10.3	2018-11-19 18:45:53 +02:00
Marko Mäkelä	ff88e4bb8a	Remove many redundant #include from InnoDB	2018-11-19 11:42:14 +02:00
Marko Mäkelä	7830fb7f45	Merge 10.2 into 10.3	2018-08-28 12:22:56 +03:00
Marko Mäkelä	9a815401c6	MDEV-17043 Purge of indexed virtual columns may cause hang on table-rebuilding DDL When a table is renamed to an internal #sql2 or #sql-ib name during a table-rebuilding DDL operation such as OPTIMIZE TABLE or ALTER TABLE, and shortly after that a purge operation in an index on virtual columns is attempted, the operation could fail, but purge would fail to release the table reference. innodb_acquire_mdl(): Release the reference if the table name is not valid for acquiring a meta-data lock (MDL). innodb_find_table_for_vc(): Add a debug assertion if the table name is not valid. This code path is for DML execution. The table should have a valid name for executing DML, and furthermore a MDL will prevent the table from being renamed. row_vers_build_clust_v_col(): Add a debug assertion that both indexes must belong to the same table.	2018-08-23 13:11:11 +03:00
Marko Mäkelä	93b6552182	Merge 10.2 into 10.3	2018-07-26 09:19:52 +03:00
Thirunarayanan Balathandayuthapani	de85355436	MDEV-16713 Hangs server with repeating log entry At most one transaction can be active at a time for temporary tables. There is no need to check previous version of record for the temporary tables.	2018-07-25 13:56:39 +05:30
Marko Mäkelä	f418661efa	Merge 10.2 into 10.3	2018-07-23 18:56:52 +03:00
Marko Mäkelä	c5ba13dda0	MDEV-15855 cleanup: Privatize purge_vcol_info_t Declare all fields of purge_vcol_info_t private, and add accessor functions.	2018-07-23 18:31:42 +03:00
Marko Mäkelä	934d5f95d3	Merge 10.2 into 10.3	2018-07-06 22:18:35 +03:00
Thirunarayanan Balathandayuthapani	8b0d4cff07	MDEV-15855 Deadlock between purge thread and DDL statement Problem: ======== Truncate operation holds MDL on the table (t1) and tries to acquire InnoDB dict_operation_lock. Purge holds dict_operation_lock and tries to acquire MDL on the table (t1) to evaluate virtual column expressions for indexed virtual columns. It leads to deadlock of purge and truncate table (DDL). Solution: ========= If purge tries to acquire MDL on the table then it should do the following: i) Purge should release all innodb latches (including dict_operation_lock) before acquiring metadata lock on the table. ii) After acquiring metadata lock on the table, it should check whether the table was dropped or renamed. If the table is dropped then purge should ignore the undo log record. If the table is renamed then it should release the old MDL and acquire MDL on the new name. iii) Once purge acquires MDL, it should use the SQL table handle for all the remaining virtual index for the purge record. purge_node_t: Introduce new virtual column information to know whether the MDL was acquired successfully. This is joint work with Marko Mäkelä.	2018-07-06 17:13:53 +03:00
Marko Mäkelä	1748a31ae8	MDEV-16675 Unnecessary explicit lock acquisition during UPDATE or DELETE In InnoDB, an INSERT will not create an explicit lock object. Instead, the inserted record is initially implicitly locked by the transaction that wrote its trx_t::id to the hidden system column DB_TRX_ID. (Other transactions would check if DB_TRX_ID is referring to a transaction that has not been committed.) If a record was inserted in the current transaction, it would be implicitly locked by that transaction. Only if some other transaction is requesting access to the record, the implicit lock should be converted to an explicit one, so that the waits-for graph can be constructed for detecting deadlocks and lock wait timeouts. Before this fix, InnoDB would convert implicit locks to explicit ones, even if no conflict exists. lock_rec_convert_impl_to_expl(): Return whether caller_trx already holds an explicit lock that covers the record. row_vers_impl_x_locked_low(): Avoid a lookup if the record matches caller_trx->id. lock_trx_has_expl_x_lock(): Renamed from lock_trx_has_rec_x_lock(). row_upd_clust_step(): In a debug assertion, check for implicit lock before invoking lock_trx_has_expl_x_lock(). rw_trx_hash_t::find(): Make do_ref_count a mandatory parameter. Assert that trx_id is not 0 (the caller should check it). trx_sys_t::is_registered(): Only invoke find() if id != 0. trx_sys_t::find(): Add the optional parameter do_ref_count. lock_rec_queue_validate(): Avoid lookup for trx_id == 0.	2018-07-03 15:10:06 +03:00
Sergei Golubchik	36e59752e7	Merge branch '10.2' into 10.3	2018-06-30 16:39:20 +02:00
Monty	ab19466656	MDEV-15114 ASAN heap-use-after-free in mem_heap_dup or dfield_data_is_binary_equal The bug was that innobase_get_computed_value() trashed record[0] and data in Field_blob::value Fixed by using a record on the heap for innobase_get_computed_value() Reviewer: Marko Mäkelä	2018-06-19 16:23:34 +03:00
Marko Mäkelä	c57e9835ff	Replace dict_col_is_virtual(col) with col->is_virtual()	2018-05-12 22:12:12 +03:00
Marko Mäkelä	2b27ac8282	Fix many -Wunused-parameter Remove unused InnoDB function parameters and functions. i_s_sys_virtual_fill_table(): Do not allocate heap memory. mtr_is_block_fix(): Replace with mtr_memo_contains(). mtr_is_page_fix(): Replace with mtr_memo_contains_page().	2018-05-01 16:52:19 +03:00
Marko Mäkelä	97e51d24cb	MDEV-13697 DB_TRX_ID is not always reset The rollback of the modification of a pre-existing record should involve a purge-like operation. Before MDEV-12288 the only purge-like operation was the removal of a delete-marked record. After MDEV-12288, any rollback of updating an existing record must reset the DB_TRX_ID column when it is no longer visible in the purge read view. row_vers_must_preserve_del_marked(): Remove. It is cleaner to perform the check directly in row0umod.cc. row_trx_id_offset(): Auxiliary function to retrieve the byte offset of DB_TRX_ID in a clustered index leaf page record. row_undo_mod_must_purge(): Determine if a record should be purged. row_undo_mod_clust(): For temporary tables, skip the purge checks. When rolling back an update so that the original record was not delete-marked, reset DB_TRX_ID if the history is no longer visible.	2018-04-15 14:51:26 +03:00
Marko Mäkelä	fb335b48b5	Allocate purge_sys statically There is only one purge_sys. Allocate it statically in order to avoid dereferencing a pointer whenever accessing it. Also, align some members to their own cache line in order to avoid false sharing. purge_sys_t::create(): The deferred constructor. purge_sys_t::close(): The early destructor. undo::Truncate::create(): The deferred constructor. Because purge_sys.undo_trunc is constructed before the start-up parameters are parsed, the normal constructor would copy a wrong value of srv_purge_rseg_truncate_frequency. TrxUndoRsegsIterator: Do not forward-declare an inline constructor, because the static construction of purge_sys.rseg_iter would not have access to it.	2018-02-22 09:30:41 +02:00
Marko Mäkelä	b006d2ead4	Merge bb-10.2-ext into 10.3	2018-02-15 10:22:03 +02:00
Marko Mäkelä	44314c768f	MDEV-15165 InnoDB purge for index on virtual column is trying to access an incomplete record The algorithm change is based on a MySQL 8.0 fix for BUG #26818787: ASSERTION: DATA0DATA.IC:430:TUPLE by Krzysztof Kapuścik `ee606e62bb` If a record had been inserted in place of a delete-marked purgeable record by modifying that record, and purge was accessing that record before the off-page columns were written, row_build_index_entry() would have returned NULL, causing a crash. row_vers_non_virtual_fields_equal(): Check whether all non-virtual fields of an index are equal. Replaces row_vers_non_vc_match(). A more complex version of this function was called row_vers_non_vc_index_entry_match() in the MySQL 8.0 fix. row_vers_impl_x_locked_low(): This change is not directly related to the reported problem, but apparently to the removal of the function row_vers_non_vc_match(). This function checks if a secondary index record was modified by a transaction that has not been committed yet. For comparing the non-virtual columns, construct a secondary index tuple from the table row. row_vers_vc_matches_cluster(): Replace row_vers_non_vc_match() with code that is equivalent to the row_vers_non_vc_index_entry_match() in the MySQL 8.0 fix. Also, deduplicate some code by using goto.	2018-02-01 18:53:41 +02:00
Marko Mäkelä	29240b50e3	Correct a comment about incomplete records The comment that I made in commit `06299dddd4` is inaccurate. Replace the comment, and make the assertion debug-only, because I cannot remember any reports of it ever failing in these 10 years.	2018-02-01 18:53:41 +02:00
Sergey Vojtovich	bc7a1dc1fb	MDEV-15104 - Optimise MVCC snapshot With trx_sys_t::rw_trx_ids removal, MVCC snapshot overhead became slightly higher. That is instead of copying an array we now have to iterate LF_HASH. All this done under trx_sys.mutex protection. This patch moves MVCC snapshot out of trx_sys.mutex. Clean-ups: Removed MVCC: doesn't make too much sense to keep it in a separate class anymore. Refactored ReadView so that it now calls register()/deregister() routines (it was vice versa before). ReadView doesn't have friends anymore. :( Even less trx_sys.mutex references.	2018-01-31 20:13:34 +04:00
Sergey Vojtovich	55277e8840	MDEV-15059 - Misc small InnoDB scalability fixes Form better trx_sys API.	2018-01-26 10:25:33 +04:00
Marko Mäkelä	f8882cce93	Replace trx_sys_t* trx_sys with trx_sys_t trx_sys There is only one transaction system object in InnoDB. Allocate the storage for it at link time, not at runtime. lock_rec_fetch_page(): Use the correct fetch mode BUF_GET. Pages may never be deallocated from a tablespace while record locks are pointing to them.	2018-01-20 16:10:36 +04:00
Sergey Vojtovich	7078203389	MDEV-14756 - Remove trx_sys_t::rw_trx_list Use atomic operations when accessing trx_sys_t::max_trx_id. We can't yet move trx_sys_t::get_new_trx_id() out of mutex because it must be updated atomically along with trx_sys_t::rw_trx_ids.	2018-01-20 16:10:35 +04:00
Sergey Vojtovich	0ca2ea1a65	MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx reference counter was updated under mutex and read without any protection. This is both slow and unsafe. Use atomic operations for reference counter accesses.	2018-01-11 12:30:53 +04:00
Sergey Vojtovich	380069c235	MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().	2018-01-11 12:30:53 +04:00
Marko Mäkelä	34841d2305	Merge bb-10.2-ext into 10.3	2017-12-12 09:57:17 +02:00
Marko Mäkelä	1e6ac94451	Correct the comment of row_vers_impl_x_locked()	2017-12-11 13:56:36 +02:00
Marko Mäkelä	a4948dafcd	MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG \| REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().	2017-10-06 09:50:10 +03:00
Marko Mäkelä	e3d44f5d62	Merge bb-10.2-ext into 10.3	2017-09-21 08:12:19 +03:00
Marko Mäkelä	48192f963a	Add the parameter bool leaf to rec_get_offsets() This should affect debug builds only. Debug builds will check that the status bits of ROW_FORMAT!=REDUNDANT records match the is_leaf parameter. The only observable change to non-debug should be the addition of the is_leaf parameter to the function rec_copy_prefix_to_dtuple(), and the removal of some calls to update the adaptive hash index (it is only built for the leaf pages). This change should have been made in MySQL 5.0.3, instead of introducing the status flags in the ROW_FORMAT=COMPACT record header.	2017-09-20 16:53:34 +03:00
Marko Mäkelä	3c09f148f3	MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err	2017-07-07 13:08:48 +03:00
Marko Mäkelä	4e1116b2c6	MDEV-12271 Port MySQL 8.0 Bug#23150562 REMOVE UNIV_MUST_NOT_INLINE AND UNIV_NONINL Also, remove empty .ic files that were not removed by my MySQL commit. Problem: InnoDB used to support a compilation mode that allowed to choose whether the function definitions in .ic files are to be inlined or not. This stopped making sense when InnoDB moved to C++ in MySQL 5.6 (and ha_innodb.cc started to #include .ic files), and more so in MySQL 5.7 when inline methods and functions were introduced in .h files. Solution: Remove all references to UNIV_NONINL and UNIV_MUST_NOT_INLINE from all files, assuming that the symbols are never defined. Remove the files fut0fut.cc and ut0byte.cc which only mattered when UNIV_NONINL was defined.	2017-03-17 12:42:07 +02:00
Marko Mäkelä	8780b89529	MDEV-11831 Make InnoDB mini-transaction memo checks stricter InnoDB keeps track of buffer-fixed buf_block_t or acquired rw_lock_t within a mini-transaction. There are some memo_contains assertions in the code that document when certain blocks or rw_locks must be held. But, these assertions only check the mini-transaction memo, not the fact whether the rw_lock_t are actually being held by the caller. btr_pcur_store_position(): Remove #ifdef, and assert that the block is always buffer-fixed. rtr_pcur_getnext_from_path(), rtr_pcur_open_low(), ibuf_rec_get_page_no_func(), ibuf_rec_get_space_func(), ibuf_rec_get_info_func(), ibuf_rec_get_op_type_func(), ibuf_build_entry_from_ibuf_rec_func(), ibuf_rec_get_volume_func(), ibuf_get_merge_page_nos_func(), ibuf_get_volume_buffered_count_func() ibuf_get_entry_counter_low_func(), page_set_ssn_id(), row_vers_old_has_index_entry(), row_vers_build_for_consistent_read(), row_vers_build_for_semi_consistent_read(), trx_undo_prev_version_build(): Make use of mtr_memo_contains_page_flagged(). mtr_t::memo_contains(): Take a const memo. Assert rw_lock_own(). FindPage, FlaggedCheck: Assert rw_lock_own_flagged().	2017-01-18 14:57:10 +02:00
Sergei Golubchik	1cae1af6f9	MDEV-5800 InnoDB support for indexed vcols * remove old 5.2+ InnoDB support for virtual columns * enable corresponding parts of the innodb-5.7 sources * copy corresponding test cases from 5.7 * copy detailed Alter_inplace_info::HA_ALTER_FLAGS flags from 5.7 - and more detailed detection of changes in fill_alter_inplace_info() * more "innodb compatibility hooks" in sql_class.cc to - create/destroy/reset a THD (used by background purge threads) - find a prelocked table by name - open a table (from a background purge thread) * different from 5.7: - new service thread "thd_destructor_proxy" to make sure all THDs are destroyed at the correct point in time during the server shutdown - proper opening/closing of tables for vcol evaluations in + FK checks (use already opened prelocked tables) + purge threads (open the table, MDLock it, add it to tdc, close when not needed) - cache open tables in vc_templ - avoid unnecessary allocations, reuse table->record[0] and table->s->default_values - not needed in 5.7, because it overcalculates: + tell the server to calculate vcols for an on-going inline ADD INDEX + calculate vcols for correct error messages * update other engines (mroonga/tokudb) accordingly	2016-12-12 20:27:42 +01:00
Jan Lindström	fec844aca8	Merge InnoDB 5.7 from mysql-5.7.14. Contains also: MDEV-10549 mysqld: sql/handler.cc:2692: int handler::ha_index_first(uchar*): Assertion `table_share->tmp_table != NO_TMP_TABLE \|\| m_lock_type != 2' failed. (branch bb-10.2-jan) Unlike MySQL, InnoDB still uses THR_LOCK in MariaDB MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan) enable tests that were fixed in MDEV-10549 MDEV-10548 Some of the debug sync waits do not work with InnoDB 5.7 (branch bb-10.2-jan) fix main.innodb_mysql_sync - re-enable online alter for partitioned innodb tables	2016-09-08 15:49:03 +03:00
Jan Lindström	2e814d4702	Merge InnoDB 5.7 from mysql-5.7.9. Contains also MDEV-10547: Test multi_update_innodb fails with InnoDB 5.7 The failure happened because 5.7 has changed the signature of the bool handler::primary_key_is_clustered() const virtual function ("const" was added). InnoDB was using the old signature which caused the function not to be used. MDEV-10550: Parallel replication lock waits/deadlock handling does not work with InnoDB 5.7 Fixed mutexing problem on lock_trx_handle_wait. Note that rpl_parallel and rpl_optimistic_parallel tests still fail. MDEV-10156 : Group commit tests fail on 10.2 InnoDB (branch bb-10.2-jan) Reason: incorrect merge MDEV-10550: Parallel replication can't sync with master in InnoDB 5.7 (branch bb-10.2-jan) Reason: incorrect merge	2016-09-02 13:22:28 +03:00
Sergei Golubchik	8ee9d19607	innodb 5.6.17	2014-05-07 17:32:23 +02:00
Michael Widenius	068c61978e	Temporary commit of 10.0-merge	2013-03-26 00:03:13 +02:00
Sergei Golubchik	e1f681c99b	10.0-base -> 10.0-monty	2012-10-19 20:38:59 +02:00
Michael Widenius	1d0f70c2f8	Temporary commit of merge of MariaDB 10.0-base and MySQL 5.6	2012-08-01 17:27:34 +03:00

46 commits