mariadb/storage/innobase/include/page0cur.inl

203 lines
6.2 KiB
Text
Raw Normal View History

/*****************************************************************************
Copyright (c) 1994, 2014, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2015, 2022, MariaDB Corporation.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc.,
2019-05-11 19:25:02 +03:00
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
*****************************************************************************/
/********************************************************************//**
@file include/page0cur.ic
The page cursor
Created 10/4/1994 Heikki Tuuri
*************************************************************************/
#ifdef UNIV_DEBUG
/*********************************************************//**
Gets pointer to the page frame where the cursor is positioned.
@return page */
UNIV_INLINE
page_t*
page_cur_get_page(
/*==============*/
page_cur_t* cur) /*!< in: page cursor */
{
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
return page_align(page_cur_get_rec(cur));
}
/*********************************************************//**
Gets pointer to the buffer block where the cursor is positioned.
@return page */
UNIV_INLINE
buf_block_t*
page_cur_get_block(
/*===============*/
page_cur_t* cur) /*!< in: page cursor */
{
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
ut_ad(cur);
ut_ad(!cur->rec || page_align(cur->rec) == cur->block->page.frame);
return cur->block;
}
/*********************************************************//**
Gets pointer to the page frame where the cursor is positioned.
@return page */
UNIV_INLINE
page_zip_des_t*
page_cur_get_page_zip(
/*==================*/
page_cur_t* cur) /*!< in: page cursor */
{
return(buf_block_get_page_zip(page_cur_get_block(cur)));
}
MDEV-21136 InnoDB's records_in_range estimates can be way off Get rid of BTR_ESTIMATE and btr_cur_t::path_arr. Before the fix btr_estimate_n_rows_in_range_low() used two btr_cur_search_to_nth_level() calls to create two arrays of tree path, the array per border. And then it tried to estimate the number of rows diving level-by-level with the array elements. As the path pages are unlatched during the arrays iterating, the tree could be modified, the estimation function called itself until the number of attempts exceed. After the fix the estimation happens during search process. Roughly, the algorithm is the following. Dive in the left page, then if there are pages between left and right ones, read a few pages to the right, if the right page is reached, fetch it and count the exact number of rows, otherwise count the estimated number of rows, and fetch the right page. The latching order corresponds to WL#6326 rules, i.e.: (2.1) [same as (1.1)]: Page latches must be acquired in descending order of tree level. (2.2) When acquiring a node pointer page latch at level L, we must hold the left sibling page latch (at level L) or some ancestor latch (at level>L). When we dive to the level down, the parent page is unlatched only after the the current level page is latched. When we estimate the number of rows on some level, we latch the left border, then fetch the next page, and then fetch the next page unlatching the previous page after the current page is latched until the right border is reached. I.e. the left sibling is always latched when we acquire page latch on the same level. When we reach the right border, the current page is unlatched, and then the right border is latched. Following to (2.2) rule, we can do this because the right border's parent is latched.
2022-06-06 14:31:19 +03:00
/* Gets the record where the cursor is positioned.
@param cur page cursor
@return record */
UNIV_INLINE
MDEV-21136 InnoDB's records_in_range estimates can be way off Get rid of BTR_ESTIMATE and btr_cur_t::path_arr. Before the fix btr_estimate_n_rows_in_range_low() used two btr_cur_search_to_nth_level() calls to create two arrays of tree path, the array per border. And then it tried to estimate the number of rows diving level-by-level with the array elements. As the path pages are unlatched during the arrays iterating, the tree could be modified, the estimation function called itself until the number of attempts exceed. After the fix the estimation happens during search process. Roughly, the algorithm is the following. Dive in the left page, then if there are pages between left and right ones, read a few pages to the right, if the right page is reached, fetch it and count the exact number of rows, otherwise count the estimated number of rows, and fetch the right page. The latching order corresponds to WL#6326 rules, i.e.: (2.1) [same as (1.1)]: Page latches must be acquired in descending order of tree level. (2.2) When acquiring a node pointer page latch at level L, we must hold the left sibling page latch (at level L) or some ancestor latch (at level>L). When we dive to the level down, the parent page is unlatched only after the the current level page is latched. When we estimate the number of rows on some level, we latch the left border, then fetch the next page, and then fetch the next page unlatching the previous page after the current page is latched until the right border is reached. I.e. the left sibling is always latched when we acquire page latch on the same level. When we reach the right border, the current page is unlatched, and then the right border is latched. Following to (2.2) rule, we can do this because the right border's parent is latched.
2022-06-06 14:31:19 +03:00
rec_t *page_cur_get_rec(const page_cur_t *cur)
{
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
ut_ad(cur);
ut_ad(!cur->rec || page_align(cur->rec) == cur->block->page.frame);
return cur->rec;
}
#endif /* UNIV_DEBUG */
/*********************************************************//**
Sets the cursor object to point before the first user record
on the page. */
UNIV_INLINE
void
page_cur_set_before_first(
/*======================*/
const buf_block_t* block, /*!< in: index page */
page_cur_t* cur) /*!< in: cursor */
{
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
cur->block = const_cast<buf_block_t*>(block);
cur->rec = page_get_infimum_rec(buf_block_get_frame(cur->block));
}
/*********************************************************//**
Sets the cursor object to point after the last user record on
the page. */
UNIV_INLINE
void
page_cur_set_after_last(
/*====================*/
const buf_block_t* block, /*!< in: index page */
page_cur_t* cur) /*!< in: cursor */
{
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
cur->block = const_cast<buf_block_t*>(block);
cur->rec = page_get_supremum_rec(buf_block_get_frame(cur->block));
}
/*********************************************************//**
Returns TRUE if the cursor is before first user record on page.
@return TRUE if at start */
UNIV_INLINE
ibool
page_cur_is_before_first(
/*=====================*/
const page_cur_t* cur) /*!< in: cursor */
{
ut_ad(cur);
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
ut_ad(page_align(cur->rec) == cur->block->page.frame);
return(page_rec_is_infimum(cur->rec));
}
/*********************************************************//**
Returns TRUE if the cursor is after last user record.
@return TRUE if at end */
UNIV_INLINE
ibool
page_cur_is_after_last(
/*===================*/
const page_cur_t* cur) /*!< in: cursor */
{
ut_ad(cur);
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
ut_ad(page_align(cur->rec) == cur->block->page.frame);
return(page_rec_is_supremum(cur->rec));
}
/**********************************************************//**
Positions the cursor on the given record. */
UNIV_INLINE
void
page_cur_position(
/*==============*/
const rec_t* rec, /*!< in: record on a page */
const buf_block_t* block, /*!< in: buffer block containing
the record */
page_cur_t* cur) /*!< out: page cursor */
{
ut_ad(rec && block && cur);
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
ut_ad(page_align(rec) == block->page.frame);
cur->rec = (rec_t*) rec;
cur->block = (buf_block_t*) block;
}
/***********************************************************//**
Inserts a record next to page cursor. Returns pointer to inserted record if
succeed, i.e., enough space available, NULL otherwise. The cursor stays at
the same logical position, but the physical position may change if it is
pointing to a compressed page that was reorganized.
IMPORTANT: The caller will have to update IBUF_BITMAP_FREE
if this is a compressed leaf page in a secondary index.
This has to be done either within the same mini-transaction,
or by invoking ibuf_reset_free_bits() before mtr_commit().
@return pointer to record if succeed, NULL otherwise */
UNIV_INLINE
rec_t*
page_cur_tuple_insert(
/*==================*/
page_cur_t* cursor, /*!< in/out: a page cursor */
const dtuple_t* tuple, /*!< in: pointer to a data tuple */
rec_offs** offsets,/*!< out: offsets on *rec */
2013-03-26 00:03:13 +02:00
mem_heap_t** heap, /*!< in/out: pointer to memory heap, or NULL */
ulint n_ext, /*!< in: number of externally stored columns */
mtr_t* mtr) /*!< in/out: mini-transaction */
{
MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks btr_cur_t: Zero-initialize all fields in the default constructor. btr_cur_t::index: Remove; it duplicated page_cur.index. Many functions: Remove arguments that were duplicating page_cur_t::index and page_cur_t::block. page_cur_open_level(), btr_pcur_open_level(): Replaces btr_cur_open_at_index_side() for dict_stats_analyze_index(). At the end, release all latches except the dict_index_t::lock and the buf_page_t::lock on the requested page. dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint() to release all uninteresting page latches. btr_search_guess_on_hash(): Simplify the logic, and invoke mtr_t::rollback_to_savepoint(). We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo. In this way, we can avoid setting mtr_memo_slot_t::object to nullptr and instead just remove garbage from m_memo. mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this in dict_stats_analyze_index(), where we will release page latches and only retain the index->lock in mtr_t::m_memo. mtr_t::release_last_page(): Release the last acquired page latch. Replaces btr_leaf_page_release(). mtr_t::release(const buf_block_t&): Release a single page latch. Used in btr_pcur_move_backward_from_page(). mtr_t::memo_release(): Replaced with mtr_t::release(). mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page. This replaces the double bookkeeping in btr_cur_t::open_leaf(). Reviewed by: Vladislav Lesin
2022-11-17 08:19:01 +02:00
ulint size = rec_get_converted_size(cursor->index, tuple, n_ext);
2013-03-26 00:03:13 +02:00
if (!*heap) {
*heap = mem_heap_create(size
+ (4 + REC_OFFS_HEADER_SIZE
+ dtuple_get_n_fields(tuple))
* sizeof **offsets);
}
MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks btr_cur_t: Zero-initialize all fields in the default constructor. btr_cur_t::index: Remove; it duplicated page_cur.index. Many functions: Remove arguments that were duplicating page_cur_t::index and page_cur_t::block. page_cur_open_level(), btr_pcur_open_level(): Replaces btr_cur_open_at_index_side() for dict_stats_analyze_index(). At the end, release all latches except the dict_index_t::lock and the buf_page_t::lock on the requested page. dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint() to release all uninteresting page latches. btr_search_guess_on_hash(): Simplify the logic, and invoke mtr_t::rollback_to_savepoint(). We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo. In this way, we can avoid setting mtr_memo_slot_t::object to nullptr and instead just remove garbage from m_memo. mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this in dict_stats_analyze_index(), where we will release page latches and only retain the index->lock in mtr_t::m_memo. mtr_t::release_last_page(): Release the last acquired page latch. Replaces btr_leaf_page_release(). mtr_t::release(const buf_block_t&): Release a single page latch. Used in btr_pcur_move_backward_from_page(). mtr_t::memo_release(): Replaced with mtr_t::release(). mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page. This replaces the double bookkeeping in btr_cur_t::open_leaf(). Reviewed by: Vladislav Lesin
2022-11-17 08:19:01 +02:00
rec_t* rec = rec_convert_dtuple_to_rec(
static_cast<byte*>(mem_heap_alloc(*heap, size)),
cursor->index, tuple, n_ext);
MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks btr_cur_t: Zero-initialize all fields in the default constructor. btr_cur_t::index: Remove; it duplicated page_cur.index. Many functions: Remove arguments that were duplicating page_cur_t::index and page_cur_t::block. page_cur_open_level(), btr_pcur_open_level(): Replaces btr_cur_open_at_index_side() for dict_stats_analyze_index(). At the end, release all latches except the dict_index_t::lock and the buf_page_t::lock on the requested page. dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint() to release all uninteresting page latches. btr_search_guess_on_hash(): Simplify the logic, and invoke mtr_t::rollback_to_savepoint(). We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo. In this way, we can avoid setting mtr_memo_slot_t::object to nullptr and instead just remove garbage from m_memo. mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this in dict_stats_analyze_index(), where we will release page latches and only retain the index->lock in mtr_t::m_memo. mtr_t::release_last_page(): Release the last acquired page latch. Replaces btr_leaf_page_release(). mtr_t::release(const buf_block_t&): Release a single page latch. Used in btr_pcur_move_backward_from_page(). mtr_t::memo_release(): Replaced with mtr_t::release(). mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page. This replaces the double bookkeeping in btr_cur_t::open_leaf(). Reviewed by: Vladislav Lesin
2022-11-17 08:19:01 +02:00
*offsets = rec_get_offsets(rec, cursor->index, *offsets,
MDEV-27058: Reduce the size of buf_block_t and buf_page_t buf_page_t::frame: Moved from buf_block_t::frame. All 'thin' buf_page_t describing compressed-only ROW_FORMAT=COMPRESSED pages will have frame=nullptr, while all 'fat' buf_block_t will have a non-null frame pointing to aligned innodb_page_size bytes. This eliminates the need for separate states for BUF_BLOCK_FILE_PAGE and BUF_BLOCK_ZIP_PAGE. buf_page_t::lock: Moved from buf_block_t::lock. That is, all block descriptors will have a page latch. The IO_PIN state that was used for discarding or creating the uncompressed page frame of a ROW_FORMAT=COMPRESSED block is replaced by a combination of read-fix and page X-latch. page_zip_des_t::fix: Replaces state_, buf_fix_count_, io_fix_, status of buf_page_t with a single std::atomic<uint32_t>. All modifications will use store(), fetch_add(), fetch_sub(). This space was previously wasted to alignment on 64-bit systems. We will use the following encoding that combines a state (partly read-fix or write-fix) and a buffer-fix count: buf_page_t::NOT_USED=0 (previously BUF_BLOCK_NOT_USED) buf_page_t::MEMORY=1 (previously BUF_BLOCK_MEMORY) buf_page_t::REMOVE_HASH=2 (previously BUF_BLOCK_REMOVE_HASH) buf_page_t::FREED=3 + fix: pages marked as freed in the file buf_page_t::UNFIXED=1U<<29 + fix: normal pages buf_page_t::IBUF_EXIST=2U<<29 + fix: normal pages; may need ibuf merge buf_page_t::REINIT=3U<<29 + fix: reinitialized pages (skip doublewrite) buf_page_t::READ_FIX=4U<<29 + fix: read-fixed pages (also X-latched) buf_page_t::WRITE_FIX=5U<<29 + fix: write-fixed pages (also U-latched) buf_page_t::WRITE_FIX_IBUF=6U<<29 + fix: write-fixed; may have ibuf buf_page_t::WRITE_FIX_REINIT=7U<<29 + fix: write-fixed (no doublewrite) buf_page_t::write_complete(): Change WRITE_FIX or WRITE_FIX_REINIT to UNFIXED, and WRITE_FIX_IBUF to IBUF_EXIST, before releasing the U-latch. buf_page_t::read_complete(): Renamed from buf_page_read_complete(). Change READ_FIX to UNFIXED or IBUF_EXIST, before releasing the X-latch. buf_page_t::can_relocate(): If the page latch is being held or waited for, or the block is buffer-fixed or io-fixed, return false. (The condition on the page latch is new.) Outside buf_page_get_gen(), buf_page_get_low() and buf_page_free(), we will acquire the page latch before fix(), and unfix() before unlocking. buf_page_t::flush(): Replaces buf_flush_page(). Optimize the handling of FREED pages. buf_pool_t::release_freed_page(): Assume that buf_pool.mutex is held by the caller. buf_page_t::is_read_fixed(), buf_page_t::is_write_fixed(): New predicates. buf_page_get_low(): Ignore guesses that are read-fixed because they may not yet be registered in buf_pool.page_hash and buf_pool.LRU. buf_page_optimistic_get(): Acquire latch before buffer-fixing. buf_page_make_young(): Leave read-fixed blocks alone, because they might not be registered in buf_pool.LRU yet. recv_sys_t::recover_deferred(), recv_sys_t::recover_low(): Possibly fix MDEV-26326, by holding a page X-latch instead of only buffer-fixing the page.
2021-11-16 19:55:06 +02:00
page_is_leaf(cursor->block->page.frame)
MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks btr_cur_t: Zero-initialize all fields in the default constructor. btr_cur_t::index: Remove; it duplicated page_cur.index. Many functions: Remove arguments that were duplicating page_cur_t::index and page_cur_t::block. page_cur_open_level(), btr_pcur_open_level(): Replaces btr_cur_open_at_index_side() for dict_stats_analyze_index(). At the end, release all latches except the dict_index_t::lock and the buf_page_t::lock on the requested page. dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint() to release all uninteresting page latches. btr_search_guess_on_hash(): Simplify the logic, and invoke mtr_t::rollback_to_savepoint(). We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo. In this way, we can avoid setting mtr_memo_slot_t::object to nullptr and instead just remove garbage from m_memo. mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this in dict_stats_analyze_index(), where we will release page latches and only retain the index->lock in mtr_t::m_memo. mtr_t::release_last_page(): Release the last acquired page latch. Replaces btr_leaf_page_release(). mtr_t::release(const buf_block_t&): Release a single page latch. Used in btr_pcur_move_backward_from_page(). mtr_t::memo_release(): Replaced with mtr_t::release(). mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page. This replaces the double bookkeeping in btr_cur_t::open_leaf(). Reviewed by: Vladislav Lesin
2022-11-17 08:19:01 +02:00
? cursor->index->n_core_fields : 0,
ULINT_UNDEFINED, heap);
MDEV-15662 Instant DROP COLUMN or changing the order of columns Allow ADD COLUMN anywhere in a table, not only adding as the last column. Allow instant DROP COLUMN and instant changing the order of columns. The added columns will always be added last in clustered index records. In new records, instantly dropped columns will be stored as NULL or empty when possible. Information about dropped and reordered columns will be written in a metadata BLOB (mblob), which is stored before the first 'user' field in the hidden metadata record at the start of the clustered index. The presence of mblob is indicated by setting the delete-mark flag in the metadata record. The metadata BLOB stores the number of clustered index fields, followed by an array of column information for each field. For dropped columns, we store the NOT NULL flag, the fixed length, and for variable-length columns, whether the maximum length exceeded 255 bytes. For non-dropped columns, we store the column position. Unlike with MDEV-11369, when a table becomes empty, it cannot be converted back to the canonical format. The reason for this is that other threads may hold cached objects such as row_prebuilt_t::ins_node that could refer to dropped or reordered index fields. For instant DROP COLUMN and ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC, we must store the n_core_null_bytes in the root page, so that the chain of node pointer records can be followed in order to reach the leftmost leaf page where the metadata record is located. If the mblob is present, we will zero-initialize the strings "infimum" and "supremum" in the root page, and use the last byte of "supremum" for storing the number of null bytes (which are allocated but useless on node pointer pages). This is necessary for btr_cur_instant_init_metadata() to be able to navigate to the mblob. If the PRIMARY KEY contains any variable-length column and some nullable columns were instantly dropped, the dict_index_t::n_nullable in the data dictionary could be smaller than it actually is in the non-leaf pages. Because of this, the non-leaf pages could use more bytes for the null flags than the data dictionary expects, and we could be reading the lengths of the variable-length columns from the wrong offset, and thus reading the child page number from wrong place. This is the result of two design mistakes that involve unnecessary storage of data: First, it is nonsense to store any data fields for the leftmost node pointer records, because the comparisons would be resolved by the MIN_REC_FLAG alone. Second, there cannot be any null fields in the clustered index node pointer fields, but we nevertheless reserve space for all the null flags. Limitations (future work): MDEV-17459 Allow instant ALTER TABLE even if FULLTEXT INDEX exists MDEV-17468 Avoid table rebuild on operations on generated columns MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large btr_page_reorganize_low(): Preserve any metadata in the root page. Call lock_move_reorganize_page() only after restoring the "infimum" and "supremum" records, to avoid a memcmp() assertion failure. dict_col_t::DROPPED: Magic value for dict_col_t::ind. dict_col_t::clear_instant(): Renamed from dict_col_t::remove_instant(). Do not assert that the column was instantly added, because we sometimes call this unconditionally for all columns. Convert an instantly added column to a "core column". The old name remove_instant() could be mistaken to refer to "instant DROP COLUMN". dict_col_t::is_added(): Rename from dict_col_t::is_instant(). dtype_t::metadata_blob_init(): Initialize the mblob data type. dtuple_t::is_metadata(), dtuple_t::is_alter_metadata(), upd_t::is_metadata(), upd_t::is_alter_metadata(): Check if info_bits refer to a metadata record. dict_table_t::instant: Metadata about dropped or reordered columns. dict_table_t::prepare_instant(): Prepare ha_innobase_inplace_ctx::instant_table for instant ALTER TABLE. innobase_instant_try() will pass this to dict_table_t::instant_column(). On rollback, dict_table_t::rollback_instant() will be called. dict_table_t::instant_column(): Renamed from instant_add_column(). Add the parameter col_map so that columns can be reordered. Copy and adjust v_cols[] as well. dict_table_t::find(): Find an old column based on a new column number. dict_table_t::serialise_columns(), dict_table_t::deserialise_columns(): Convert the mblob. dict_index_t::instant_metadata(): Create the metadata record for instant ALTER TABLE. Invoke dict_table_t::serialise_columns(). dict_index_t::reconstruct_fields(): Invoked by dict_table_t::deserialise_columns(). dict_index_t::clear_instant_alter(): Move the fields for the dropped columns to the end, and sort the surviving index fields in ascending order of column position. ha_innobase::check_if_supported_inplace_alter(): Do not allow adding a FTS_DOC_ID column if a hidden FTS_DOC_ID column exists due to FULLTEXT INDEX. (This always required ALGORITHM=COPY.) instant_alter_column_possible(): Add a parameter for InnoDB table, to check for additional conditions, such as the maximum number of index fields. ha_innobase_inplace_ctx::first_alter_pos: The first column whose position is affected by instant ADD, DROP, or changing the order of columns. innobase_build_col_map(): Skip added virtual columns. prepare_inplace_add_virtual(): Correctly compute num_to_add_vcol. Remove some unnecessary code. Note that the call to innodb_base_col_setup() should be executed later. commit_try_norebuild(): If ctx->is_instant(), let the virtual columns be added or dropped by innobase_instant_try(). innobase_instant_try(): Fill in a zero default value for the hidden column FTS_DOC_ID (to reduce the work needed in MDEV-17459). If any columns were dropped or reordered (or added not last), delete any SYS_COLUMNS records for the following columns, and insert SYS_COLUMNS records for all subsequent stored columns as well as for all virtual columns. If any virtual column is dropped, rewrite all virtual column metadata. Use a shortcut only for adding virtual columns. This is because innobase_drop_virtual_try() assumes that the dropped virtual columns still exist in ctx->old_table. innodb_update_cols(): Renamed from innodb_update_n_cols(). innobase_add_one_virtual(), innobase_insert_sys_virtual(): Change the return type to bool, and invoke my_error() when detecting an error. innodb_insert_sys_columns(): Insert a record into SYS_COLUMNS. Refactored from innobase_add_one_virtual() and innobase_instant_add_col(). innobase_instant_add_col(): Replace the parameter dfield with type. innobase_instant_drop_cols(): Drop matching columns from SYS_COLUMNS and all columns from SYS_VIRTUAL. innobase_add_virtual_try(), innobase_drop_virtual_try(): Let the caller invoke innodb_update_cols(). innobase_rename_column_try(): Skip dropped columns. commit_cache_norebuild(): Update table->fts->doc_col. dict_mem_table_col_rename_low(): Skip dropped columns. trx_undo_rec_get_partial_row(): Skip dropped columns. trx_undo_update_rec_get_update(): Handle the metadata BLOB correctly. trx_undo_page_report_modify(): Avoid out-of-bounds access to record fields. Log metadata records consistently. Apparently, the first fields of a clustered index may be updated in an update_undo vector when the index is ID_IND of SYS_FOREIGN, as part of renaming the table during ALTER TABLE. Normally, updates of the PRIMARY KEY should be logged as delete-mark and an insert. row_undo_mod_parse_undo_rec(), row_purge_parse_undo_rec(): Use trx_undo_metadata. row_undo_mod_clust_low(): On metadata rollback, roll back the root page too. row_undo_mod_clust(): Relax an assertion. The delete-mark flag was repurposed for ALTER TABLE metadata records. row_rec_to_index_entry_impl(): Add the template parameter mblob and the optional parameter info_bits for specifying the desired new info bits. For the metadata tuple, allow conversion between the original format (ADD COLUMN only) and the generic format (with hidden BLOB). Add the optional parameter "pad" to determine whether the tuple should be padded to the index fields (on ALTER TABLE it should), or whether it should remain at its original size (on rollback). row_build_index_entry_low(): Clean up the code, removing redundant variables and conditions. For instantly dropped columns, generate a dummy value that is NULL, the empty string, or a fixed length of NUL bytes, depending on the type of the dropped column. row_upd_clust_rec_by_insert_inherit_func(): On the update of PRIMARY KEY of a record that contained a dropped column whose value was stored externally, we will be inserting a dummy NULL or empty string value to the field of the dropped column. The externally stored column would eventually be dropped when purge removes the delete-marked record for the old PRIMARY KEY value. btr_index_rec_validate(): Recognize the metadata record. btr_discard_only_page_on_level(): Preserve the generic instant ALTER TABLE metadata. btr_set_instant(): Replaces page_set_instant(). This sets a clustered index root page to the appropriate format, or upgrades from the MDEV-11369 instant ADD COLUMN to generic ALTER TABLE format. btr_cur_instant_init_low(): Read and validate the metadata BLOB page before reconstructing the dictionary information based on it. btr_cur_instant_init_metadata(): Do not read any lengths from the metadata record header before reading the BLOB. At this point, we would not actually know how many nullable fields the metadata record contains. btr_cur_instant_root_init(): Initialize n_core_null_bytes in one of two possible ways. btr_cur_trim(): Handle the mblob record. row_metadata_to_tuple(): Convert a metadata record to a data tuple, based on the new info_bits of the metadata record. btr_cur_pessimistic_update(): Invoke row_metadata_to_tuple() if needed. Invoke dtuple_convert_big_rec() for metadata records if the record is too large, or if the mblob is not yet marked as externally stored. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): When the last user record is deleted, do not delete the generic instant ALTER TABLE metadata record. Only delete MDEV-11369 instant ADD COLUMN metadata records. btr_cur_optimistic_insert(): Avoid unnecessary computation of rec_size. btr_pcur_store_position(): Allow a logically empty page to contain a metadata record for generic ALTER TABLE. REC_INFO_DEFAULT_ROW_ADD: Renamed from REC_INFO_DEFAULT_ROW. This is for the old instant ADD COLUMN (MDEV-11369) only. REC_INFO_DEFAULT_ROW_ALTER: The more generic metadata record, with additional information for dropped or reordered columns. rec_info_bits_valid(): Remove. The only case when this would fail is when the record is the generic ALTER TABLE metadata record. rec_is_alter_metadata(): Check if a record is the metadata record for instant ALTER TABLE (other than ADD COLUMN). NOTE: This function must not be invoked on node pointer records, because the delete-mark flag in those records may be set (it is garbage), and then a debug assertion could fail because index->is_instant() does not necessarily hold. rec_is_add_metadata(): Check if a record is MDEV-11369 ADD COLUMN metadata record (not more generic instant ALTER TABLE). rec_get_converted_size_comp_prefix_low(): Assume that the metadata field will be stored externally. In dtuple_convert_big_rec() during the rec_get_converted_size() call, it would not be there yet. rec_get_converted_size_comp(): Replace status,fields,n_fields with tuple. rec_init_offsets_comp_ordinary(), rec_get_converted_size_comp_prefix_low(), rec_convert_dtuple_to_rec_comp(): Add template<bool mblob = false>. With mblob=true, process a record with a metadata BLOB. rec_copy_prefix_to_buf(): Assert that no fields beyond the key and system columns are being copied. Exclude the metadata BLOB field. rec_convert_dtuple_to_metadata_comp(): Convert an alter metadata tuple into a record. row_upd_index_replace_metadata(): Apply an update vector to an alter_metadata tuple. row_log_allocate(): Replace dict_index_t::is_instant() with a more appropriate condition that ignores dict_table_t::instant. Only a table on which the MDEV-11369 ADD COLUMN was performed can "lose its instantness" when it becomes empty. After instant DROP COLUMN or reordering columns, we cannot simply convert the table to the canonical format, because the data dictionary cache and all possibly existing references to it from other client connection threads would have to be adjusted. row_quiesce_write_index_fields(): Do not crash when the table contains an instantly dropped column. Thanks to Thirunarayanan Balathandayuthapani for discussing the design and implementing an initial prototype of this. Thanks to Matthias Leich for testing.
2018-10-19 16:49:54 +03:00
ut_ad(size == rec_offs_size(*offsets));
if (is_buf_block_get_page_zip(cursor->block)) {
MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks btr_cur_t: Zero-initialize all fields in the default constructor. btr_cur_t::index: Remove; it duplicated page_cur.index. Many functions: Remove arguments that were duplicating page_cur_t::index and page_cur_t::block. page_cur_open_level(), btr_pcur_open_level(): Replaces btr_cur_open_at_index_side() for dict_stats_analyze_index(). At the end, release all latches except the dict_index_t::lock and the buf_page_t::lock on the requested page. dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint() to release all uninteresting page latches. btr_search_guess_on_hash(): Simplify the logic, and invoke mtr_t::rollback_to_savepoint(). We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo. In this way, we can avoid setting mtr_memo_slot_t::object to nullptr and instead just remove garbage from m_memo. mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this in dict_stats_analyze_index(), where we will release page latches and only retain the index->lock in mtr_t::m_memo. mtr_t::release_last_page(): Release the last acquired page latch. Replaces btr_leaf_page_release(). mtr_t::release(const buf_block_t&): Release a single page latch. Used in btr_pcur_move_backward_from_page(). mtr_t::memo_release(): Replaced with mtr_t::release(). mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page. This replaces the double bookkeeping in btr_cur_t::open_leaf(). Reviewed by: Vladislav Lesin
2022-11-17 08:19:01 +02:00
rec = page_cur_insert_rec_zip(cursor, rec, *offsets, mtr);
} else {
MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks btr_cur_t: Zero-initialize all fields in the default constructor. btr_cur_t::index: Remove; it duplicated page_cur.index. Many functions: Remove arguments that were duplicating page_cur_t::index and page_cur_t::block. page_cur_open_level(), btr_pcur_open_level(): Replaces btr_cur_open_at_index_side() for dict_stats_analyze_index(). At the end, release all latches except the dict_index_t::lock and the buf_page_t::lock on the requested page. dict_stats_analyze_index(): Rely on mtr_t::rollback_to_savepoint() to release all uninteresting page latches. btr_search_guess_on_hash(): Simplify the logic, and invoke mtr_t::rollback_to_savepoint(). We will use plain C++ std::vector<mtr_memo_slot_t> for mtr_t::m_memo. In this way, we can avoid setting mtr_memo_slot_t::object to nullptr and instead just remove garbage from m_memo. mtr_t::rollback_to_savepoint(): Shrink the vector. We will be needing this in dict_stats_analyze_index(), where we will release page latches and only retain the index->lock in mtr_t::m_memo. mtr_t::release_last_page(): Release the last acquired page latch. Replaces btr_leaf_page_release(). mtr_t::release(const buf_block_t&): Release a single page latch. Used in btr_pcur_move_backward_from_page(). mtr_t::memo_release(): Replaced with mtr_t::release(). mtr_t::upgrade_buffer_fix(): Acquire a latch for a buffer-fixed page. This replaces the double bookkeeping in btr_cur_t::open_leaf(). Reviewed by: Vladislav Lesin
2022-11-17 08:19:01 +02:00
rec = page_cur_insert_rec_low(cursor, rec, *offsets, mtr);
}
2013-03-26 00:03:13 +02:00
ut_ad(!rec || !cmp_dtuple_rec(tuple, rec, *offsets));
return(rec);
}