mariadb/storage/maria/ma_pagecache.h
unknown 18bc7b695a WL#3072 - Maria Recovery
* to honour WAL we now force the whole log when flushing a bitmap page.
* ability to intentionally crash in various places for recovery testing
* bugfix (dirty pages list found in checkpoint record was ignored)
* smaller checkpoint record
* misc small cleanups and comments


mysql-test/include/maria_empty_logs.inc:
  maria-purge.test creates ~11 logs, remove them all
mysql-test/r/maria-recovery-bitmap.result:
  result is good; without the _ma_bitmap_get_log_address() call,
  we got
  check   error   Bitmap at 0 has pages reserved outside of data file length
mysql-test/r/maria-recovery.result:
  result update
mysql-test/t/maria-recovery-bitmap.test:
  enable test of "bitmap-flush should flush whole log otherwise
  corrupted data file (bitmap ahead of data pages)".
mysql-test/t/maria-recovery.test:
  test of checkpoint
sql/sql_table.cc:
  comment
storage/maria/ha_maria.cc:
  _ma_reenable_logging_for_table() now includes file->trn=0.
  At the end of repair() we don't need to re-enable logging, it is
  done already by caller (like copy_data_between_tables()); it sounds
  strange that this function could decide to re-enable, it should be
  up to caller who knows what other operations it plans. Removing this
  line led to assertion failure in maria_lock_database(F_UNLCK), fixed
  by removing the assertion: maria_lock_database()
  is here called in a context where F_UNLCK does not make the
  table visible to others so assertion is excessive, and external_lock()
  is already designed to honour the asserted condition.
  Ability to crash at the end of bulk insert when indices
  have been enabled.
storage/maria/ma_bitmap.c:
  Better use pagecache_file_init() than set pagecache callbacks directly;
  and a new function to set those callbacks for bitmap so that we can
  reuse it.
  _ma_bitmap_get_log_address() is a pagecache get_log_address callback
  which causes the whole log to be flushed when a bitmap page
  is flushed by the page cache. This was required by WAL.
storage/maria/ma_blockrec.c:
  get_log_address pagecache callback for data (non bitmap) pages:
  just reads the LSN from the page's content, like was hard-coded
  before in ma_pagecache.c.
storage/maria/ma_blockrec.h:
  functions which need to be exported
storage/maria/ma_check.c:
  create_new_data_handle() can be static.
  Ability to crash after rebuilding the index in OPTIMIZE,
  in REPAIR. my_lock() implemented already.
storage/maria/ma_checkpoint.c:
  As MARIA_SHARE* is now accessible to pagecache_collect_changed_blocks_LSN(),
  we don't need to store kfile/dfile descriptors in checkpoint record,
  2-byte-id of the table plus one byte to say if this is data or index
  file is enough. So we go from 4+4 bytes per table down to 2+1.
storage/maria/ma_commit.c:
  removing duplicate functions (see _ma_tmp_disable_logging_for_table())
storage/maria/ma_extra.c:
  Monty fixed
storage/maria/ma_key_recover.c:
  comment
storage/maria/ma_locking.c:
  Sometimes other code does funny things with maria_lock_database(),
  like ha_maria::repair() calling it at start and end without going
  through ha_maria::external_lock(). So it happens that maria_lock_database()
  is called with now_transactional!=born_transactional.
storage/maria/ma_loghandler.c:
  update to new prototype
storage/maria/ma_open.c:
  set_data|index_pagecache_callbacks() need to be exported as
  they are now called when disabling/enabling transactionality.
storage/maria/ma_pagecache.c:
  Removing PAGE_LSN_OFFSET, as much of the code relies on it being
  0 anyway (let's not give impression we can just change this constant).
  When flushing a page to disk, call the get_log_address callback to
  know up to which LSN the log should be flushed.
  As we now can access MARIA_SHARE* we can know share->id and store
  it into the checkpoint record; we thus go from 4 bytes per dirty page
  to 2+1.
storage/maria/ma_pagecache.h:
  get_log_address callback
storage/maria/ma_panic.c:
  No reason to reset pagecache callbacks in HA_PANIC_READ:
  all we do is reopen files if they were closed; callbacks should
  be in place already as 'info' exists; we just want to modify
  the file descriptors, not the full PAGECACHE_FILE structure.
  If we open data file and it was closed, share->bitmap.file needs
  to be set.
  Note that the modified code is disabled anyway.
storage/maria/ma_recovery.c:
  Checkpoint record does not contain kfile/dfile descriptors anymore
  so code can be simplified. Hash key in all_dirty_pages is 
  not made from file_descriptor & pageno anymore, but
  index_or_data & table-short-id & pageno.
  If a table's create_rename_lsn is higher than record's LSN,
  we skip the table and don't fail if it's corrupted (because the LSNs
  say that we don't have to look at this table).
  If a table is skipped (for example due to create_rename_lsn),
  its UNDOs still cause undo_lsn to advance; this is so that if later
  we notice the transaction has to rollback we fail (as table should
  not be skipped in this case).
  Fixing a bug: the dirty_pages list was never used, because
  the LSN below which it was used was the minimum rec_lsn of dirty pages!
  It is now the min(checkpoint_start_log_horizon, min(trn's rec_lsn)).
  When we disable/reenable transactionality, we modify pagecache
  callbacks (needed for example for get_log_address: changing
  share->page_type is not enough anymore).
storage/maria/ma_write.c:
  'records' and 'checksum' are protected: they are updated under
  log's mutex in write-hooks when UNDO is written.
storage/maria/maria_chk.c:
  remove use of duplicate functions.
storage/maria/maria_def.h:
  set_data|index_pagecache_callbacks() need to be exported;
  _ma_reenable_logging_for_table() changes to a real function.
storage/maria/unittest/ma_pagecache_consist.c:
  new prototype
storage/maria/unittest/ma_pagecache_single.c:
  new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
  new prototype
2007-12-30 21:32:07 +01:00

310 lines
14 KiB
C

/* Copyright (C) 2006 MySQL AB
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
/* Page cache variable structures */
#ifndef _ma_pagecache_h
#define _ma_pagecache_h
C_MODE_START
#include "ma_loghandler_lsn.h"
#include <m_string.h>
#include <hash.h>
/* Type of the page */
enum pagecache_page_type
{
/*
Used only for control page type changing during debugging. This define
should only be using when using DBUG.
*/
PAGECACHE_EMPTY_PAGE,
/* the page does not contain LSN */
PAGECACHE_PLAIN_PAGE,
/* the page contain LSN (maria tablespace page) */
PAGECACHE_LSN_PAGE,
/* Page type used when scanning file and we don't care about the type */
PAGECACHE_READ_UNKNOWN_PAGE
};
/*
This enum describe lock status changing. every type of page cache will
interpret WRITE/READ lock as it need.
*/
enum pagecache_page_lock
{
PAGECACHE_LOCK_LEFT_UNLOCKED, /* free -> free */
PAGECACHE_LOCK_LEFT_READLOCKED, /* read -> read */
PAGECACHE_LOCK_LEFT_WRITELOCKED, /* write -> write */
PAGECACHE_LOCK_READ, /* free -> read */
PAGECACHE_LOCK_WRITE, /* free -> write */
PAGECACHE_LOCK_READ_UNLOCK, /* read -> free */
PAGECACHE_LOCK_WRITE_UNLOCK, /* write -> free */
PAGECACHE_LOCK_WRITE_TO_READ /* write -> read */
};
/*
This enum describe pin status changing
*/
enum pagecache_page_pin
{
PAGECACHE_PIN_LEFT_PINNED, /* pinned -> pinned */
PAGECACHE_PIN_LEFT_UNPINNED, /* unpinned -> unpinned */
PAGECACHE_PIN, /* unpinned -> pinned */
PAGECACHE_UNPIN /* pinned -> unpinned */
};
/* How to write the page */
enum pagecache_write_mode
{
/* do not write immediately, i.e. it will be dirty page */
PAGECACHE_WRITE_DELAY,
/* page already is in the file. (key cache insert analogue) */
PAGECACHE_WRITE_DONE
};
/* page number for maria */
typedef uint32 pgcache_page_no_t;
/* file descriptor for Maria */
typedef struct st_pagecache_file
{
File file;
/** Cannot be NULL */
my_bool (*read_callback)(uchar *page, pgcache_page_no_t offset,
uchar *data);
/** Cannot be NULL */
my_bool (*write_callback)(uchar *page, pgcache_page_no_t offset,
uchar *data);
/** Can be NULL */
TRANSLOG_ADDRESS (*get_log_address_callback)
(uchar *page, pgcache_page_no_t offset, uchar *data);
uchar *callback_data;
} PAGECACHE_FILE;
/* declare structures that is used by st_pagecache */
struct st_pagecache_block_link;
typedef struct st_pagecache_block_link PAGECACHE_BLOCK_LINK;
struct st_pagecache_page;
typedef struct st_pagecache_page PAGECACHE_PAGE;
struct st_pagecache_hash_link;
typedef struct st_pagecache_hash_link PAGECACHE_HASH_LINK;
#include <wqueue.h>
#define PAGECACHE_CHANGED_BLOCKS_HASH 128 /* must be power of 2 */
#define PAGECACHE_PRIORITY_LOW 0
#define PAGECACHE_PRIORITY_DEFAULT 3
#define PAGECACHE_PRIORITY_HIGH 6
/*
The page cache structure
It also contains read-only statistics parameters.
*/
typedef struct st_pagecache
{
size_t mem_size; /* specified size of the cache memory */
ulong min_warm_blocks; /* min number of warm blocks; */
ulong age_threshold; /* age threshold for hot blocks */
ulonglong time; /* total number of block link operations */
ulong hash_entries; /* max number of entries in the hash table */
long hash_links; /* max number of hash links */
long hash_links_used; /* number of hash links taken from free links pool */
long disk_blocks; /* max number of blocks in the cache */
ulong blocks_used; /* maximum number of concurrently used blocks */
ulong blocks_unused; /* number of currently unused blocks */
ulong blocks_changed; /* number of currently dirty blocks */
ulong warm_blocks; /* number of blocks in warm sub-chain */
ulong cnt_for_resize_op; /* counter to block resize operation */
ulong blocks_available; /* number of blocks available in the LRU chain */
long blocks; /* max number of blocks in the cache */
uint32 block_size; /* size of the page buffer of a cache block */
PAGECACHE_HASH_LINK **hash_root;/* arr. of entries into hash table buckets */
PAGECACHE_HASH_LINK *hash_link_root;/* memory for hash table links */
PAGECACHE_HASH_LINK *free_hash_list;/* list of free hash links */
PAGECACHE_BLOCK_LINK *free_block_list;/* list of free blocks */
PAGECACHE_BLOCK_LINK *block_root;/* memory for block links */
uchar HUGE_PTR *block_mem; /* memory for block buffers */
PAGECACHE_BLOCK_LINK *used_last;/* ptr to the last block of the LRU chain */
PAGECACHE_BLOCK_LINK *used_ins;/* ptr to the insertion block in LRU chain */
pthread_mutex_t cache_lock; /* to lock access to the cache structure */
WQUEUE resize_queue; /* threads waiting during resize operation */
WQUEUE waiting_for_hash_link;/* waiting for a free hash link */
WQUEUE waiting_for_block; /* requests waiting for a free block */
/* hash for dirty file bl.*/
PAGECACHE_BLOCK_LINK *changed_blocks[PAGECACHE_CHANGED_BLOCKS_HASH];
/* hash for other file bl.*/
PAGECACHE_BLOCK_LINK *file_blocks[PAGECACHE_CHANGED_BLOCKS_HASH];
/*
The following variables are and variables used to hold parameters for
initializing the key cache.
*/
ulonglong param_buff_size; /* size the memory allocated for the cache */
ulong param_block_size; /* size of the blocks in the key cache */
ulong param_division_limit; /* min. percentage of warm blocks */
ulong param_age_threshold; /* determines when hot block is downgraded */
/* Statistics variables. These are reset in reset_pagecache_counters(). */
ulong global_blocks_changed; /* number of currently dirty blocks */
ulonglong global_cache_w_requests;/* number of write requests (write hits) */
ulonglong global_cache_write; /* number of writes from cache to files */
ulonglong global_cache_r_requests;/* number of read requests (read hits) */
ulonglong global_cache_read; /* number of reads from files to cache */
uint shift; /* block size = 2 ^ shift */
myf readwrite_flags; /* Flags to pread/pwrite() */
myf org_readwrite_flags; /* Flags to pread/pwrite() at init */
my_bool inited;
my_bool resize_in_flush; /* true during flush of resize operation */
my_bool can_be_used; /* usage of cache for read/write is allowed */
my_bool in_init; /* Set to 1 in MySQL during init/resize */
HASH files_in_flush; /**< files in flush_pagecache_blocks_int() */
} PAGECACHE;
/** @brief Return values for PAGECACHE_FLUSH_FILTER */
enum pagecache_flush_filter_result
{
FLUSH_FILTER_SKIP_TRY_NEXT= 0,/**< skip page and move on to next one */
FLUSH_FILTER_OK, /**< flush page and move on to next one */
FLUSH_FILTER_SKIP_ALL /**< skip page and all next ones */
};
/** @brief a filter function type for flush_pagecache_blocks_with_filter() */
typedef enum pagecache_flush_filter_result
(*PAGECACHE_FLUSH_FILTER)(enum pagecache_page_type type, pgcache_page_no_t page,
LSN rec_lsn, void *arg);
/* The default key cache */
extern PAGECACHE dflt_pagecache_var, *dflt_pagecache;
extern ulong init_pagecache(PAGECACHE *pagecache, size_t use_mem,
uint division_limit, uint age_threshold,
uint block_size, myf my_read_flags);
extern ulong resize_pagecache(PAGECACHE *pagecache,
size_t use_mem, uint division_limit,
uint age_threshold);
extern void change_pagecache_param(PAGECACHE *pagecache, uint division_limit,
uint age_threshold);
extern uchar *pagecache_read(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
uint level,
uchar *buff,
enum pagecache_page_type type,
enum pagecache_page_lock lock,
PAGECACHE_BLOCK_LINK **link);
#define pagecache_write(P,F,N,L,B,T,O,I,M,K,R) \
pagecache_write_part(P,F,N,L,B,T,O,I,M,K,R,0,(P)->block_size)
#define pagecache_inject(P,F,N,L,B,T,O,I,K,R) \
pagecache_write_part(P,F,N,L,B,T,O,I,PAGECACHE_WRITE_DONE, \
K,R,0,(P)->block_size)
extern my_bool pagecache_write_part(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
uint level,
uchar *buff,
enum pagecache_page_type type,
enum pagecache_page_lock lock,
enum pagecache_page_pin pin,
enum pagecache_write_mode write_mode,
PAGECACHE_BLOCK_LINK **link,
LSN first_REDO_LSN_for_page,
uint offset,
uint size);
extern void pagecache_unlock(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
enum pagecache_page_lock lock,
enum pagecache_page_pin pin,
LSN first_REDO_LSN_for_page,
LSN lsn, my_bool was_changed);
extern void pagecache_unlock_by_link(PAGECACHE *pagecache,
PAGECACHE_BLOCK_LINK *block,
enum pagecache_page_lock lock,
enum pagecache_page_pin pin,
LSN first_REDO_LSN_for_page,
LSN lsn, my_bool was_changed);
extern void pagecache_unpin(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
LSN lsn);
extern void pagecache_unpin_by_link(PAGECACHE *pagecache,
PAGECACHE_BLOCK_LINK *link,
LSN lsn);
/* Results of flush operation (bit field in fact) */
/* The flush is done. */
#define PCFLUSH_OK 0
/* There was errors during the flush process. */
#define PCFLUSH_ERROR 1
/* Pinned blocks was met and skipped. */
#define PCFLUSH_PINNED 2
/* PCFLUSH_ERROR and PCFLUSH_PINNED. */
#define PCFLUSH_PINNED_AND_ERROR (PCFLUSH_ERROR|PCFLUSH_PINNED)
#define pagecache_file_init(F,RC,WC,GLC,D) \
do{ \
(F).read_callback= (RC); (F).write_callback= (WC); \
(F).get_log_address_callback= (GLC); (F).callback_data= (uchar*)(D); \
} while(0)
#define flush_pagecache_blocks(A,B,C) \
flush_pagecache_blocks_with_filter(A,B,C,NULL,NULL)
extern int flush_pagecache_blocks_with_filter(PAGECACHE *keycache,
PAGECACHE_FILE *file,
enum flush_type type,
PAGECACHE_FLUSH_FILTER filter,
void *filter_arg);
extern my_bool pagecache_delete(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
enum pagecache_page_lock lock,
my_bool flush);
extern my_bool pagecache_delete_pages(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
uint page_count,
enum pagecache_page_lock lock,
my_bool flush);
extern void end_pagecache(PAGECACHE *keycache, my_bool cleanup);
extern my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
LEX_STRING *str,
LSN *min_lsn);
extern int reset_pagecache_counters(const char *name, PAGECACHE *pagecache);
extern uchar *pagecache_block_link_to_buffer(PAGECACHE_BLOCK_LINK *block);
/* Functions to handle multiple key caches */
extern my_bool multi_pagecache_init(void);
extern void multi_pagecache_free(void);
extern PAGECACHE *multi_pagecache_search(uchar *key, uint length,
PAGECACHE *def);
extern my_bool multi_pagecache_set(const uchar *key, uint length,
PAGECACHE *pagecache);
extern void multi_pagecache_change(PAGECACHE *old_data,
PAGECACHE *new_data);
extern int reset_pagecache_counters(const char *name,
PAGECACHE *pagecache);
C_MODE_END
#endif /* _keycache_h */