mariadb/storage/maria/ma_pagecache.h
unknown cec8ac3e07 WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)


BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
  Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
  Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
  compile Checkpoint module
storage/maria/ha_maria.cc:
  When ha_maria starts, do a recovery from last checkpoint.
  Take a checkpoint when that recovery has ended and when ha_maria
  shuts down cleanly.
storage/maria/ma_blockrec.c:
  * even if my_sync() fails we have to my_close() (otherwise we leak
  a descriptor)
  * UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
  as promised in the old comment; it gives us skipping during the
  UNDO phase.
storage/maria/ma_check.c:
  All REDOs before create_rename_lsn are ignored by Recovery. So
  create_rename_lsn must be set only after all data/index has been
  flushed and forced to disk. We thus move write_log_record_for_repair()
  to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
  Checkpoint module.
storage/maria/ma_checkpoint.h:
  optional argument if caller wants a thread to periodically take
  checkpoints and flush dirty pages.
storage/maria/ma_create.c:
  * no need to init some vars as the initial bzero(share) takes care of this.
  * update to new function's name
  * even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
  Checkpoint reads share->last_version under intern_lock, so we make
  maria_extra() update it under intern_lock. THR_LOCK_maria still needed
  because of _ma_test_if_reopen().
storage/maria/ma_init.c:
  destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
  * UNDO_ROW_PURGE gone (see ma_blockrec.c)
  * we need to remember the LSN of the LOGREC_FILE_ID for a share,
  because this LSN is needed into the checkpoint record (Recovery wants
  to know the validity domain of an id->name mapping)
  * translog_get_horizon_no_lock() needed for Checkpoint
  * comment about failing assertion (Sanja knows)
  * translog_init_reader_data() thought that translog_read_record_header_scan()
  returns 0 in case of error, but 0 just means "0-length header".
  * translog_assign_id_to_share() now needs the MARIA_HA because
  LOGREC_FILE_ID uses a log-write hook.
  * Verify that (de)assignment of share->id happens only under intern_lock,
  as Checkpoint reads this id with intern_lock.
  * translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
  a real LSN.
storage/maria/ma_loghandler.h:
  prototype updates
storage/maria/ma_open.c:
  no need to initialize "res"
storage/maria/ma_pagecache.c:
  When taking a checkpoint, we don't need to know the maximum rec_lsn
  of dirty pages; this LSN was intended to be used in the two-checkpoint
  rule, but last_checkpoint_lsn is as good.
  4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
  of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
  new prototype
storage/maria/ma_recovery.c:
  * added replaying of REDO_RENAME_TABLE
  * UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
  * Recovery from the last checkpoint record now possible
  * In new_table() we skip the table if the id->name mapping is older than
  create_rename_lsn (mapping dates from lsn_of_file_id).
  * in get_MARIA_HA_from_REDO_record() we skip the record
  if the id->name mapping is newer than the record (can happen if processing
  a record which is before the checkpoint record).
  * parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
  new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
  * equivalent of ma_test1's --test-undo added (named -u here).
  * -t=1 now stops right after creating the table, so that
  we can test undoing of INSERTs with duplicate keys (which tests the
  CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
  Result of testing undoing of INSERTs with duplicate keys; there are
  some differences in maria_chk -dvv but they are normal (removing
  records does not shrink data/index file, does not put back the
  "analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
  Test undoing of INSERTs with duplicate keys, using ma_test2;
  when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
  CLR_END; we abort after that, and test that CLR_END causes recovery
  to jump over UNDO_INSERT.
storage/maria/ma_write.c:
  comment
storage/maria/maria_chk.c:
  comment
storage/maria/maria_def.h:
  * a new bit in MARIA_SHARE::in_checkpoint, used to build a list
  of unique shares during Checkpoint.
  * MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
  for this share; needed to know to which LSN domain the mappings
  found in the Checkpoint record apply (new mappings should not apply
  to old REDOs).
storage/maria/trnman.c:
  * small changes to how trnman_collect_transactions() fills its buffer;
  it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00

267 lines
12 KiB
C

/* Copyright (C) 2006 MySQL AB
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
/* Page cache variable structures */
#ifndef _ma_pagecache_h
#define _ma_pagecache_h
C_MODE_START
#include "ma_loghandler_lsn.h"
#include <m_string.h>
/* Type of the page */
enum pagecache_page_type
{
/*
Used only for control page type changing during debugging. This define
should only be using when using DBUG.
*/
PAGECACHE_EMPTY_PAGE,
/* the page does not contain LSN */
PAGECACHE_PLAIN_PAGE,
/* the page contain LSN (maria tablespace page) */
PAGECACHE_LSN_PAGE,
/* Page type used when scanning file and we don't care about the type */
PAGECACHE_READ_UNKNOWN_PAGE
};
/*
This enum describe lock status changing. every type of page cache will
interpret WRITE/READ lock as it need.
*/
enum pagecache_page_lock
{
PAGECACHE_LOCK_LEFT_UNLOCKED, /* free -> free */
PAGECACHE_LOCK_LEFT_READLOCKED, /* read -> read */
PAGECACHE_LOCK_LEFT_WRITELOCKED, /* write -> write */
PAGECACHE_LOCK_READ, /* free -> read */
PAGECACHE_LOCK_WRITE, /* free -> write */
PAGECACHE_LOCK_READ_UNLOCK, /* read -> free */
PAGECACHE_LOCK_WRITE_UNLOCK, /* write -> free */
PAGECACHE_LOCK_WRITE_TO_READ /* write -> read */
};
/*
This enum describe pin status changing
*/
enum pagecache_page_pin
{
PAGECACHE_PIN_LEFT_PINNED, /* pinned -> pinned */
PAGECACHE_PIN_LEFT_UNPINNED, /* unpinned -> unpinned */
PAGECACHE_PIN, /* unpinned -> pinned */
PAGECACHE_UNPIN /* pinned -> unpinned */
};
/* How to write the page */
enum pagecache_write_mode
{
/* do not write immediately, i.e. it will be dirty page */
PAGECACHE_WRITE_DELAY,
/* page already is in the file. (key cache insert analogue) */
PAGECACHE_WRITE_DONE
};
typedef void *PAGECACHE_PAGE_LINK;
/* file descriptor for Maria */
typedef struct st_pagecache_file
{
File file;
} PAGECACHE_FILE;
/* page number for maria */
typedef uint32 pgcache_page_no_t;
/* declare structures that is used by st_pagecache */
struct st_pagecache_block_link;
typedef struct st_pagecache_block_link PAGECACHE_BLOCK_LINK;
struct st_pagecache_page;
typedef struct st_pagecache_page PAGECACHE_PAGE;
struct st_pagecache_hash_link;
typedef struct st_pagecache_hash_link PAGECACHE_HASH_LINK;
#include <wqueue.h>
typedef my_bool (*pagecache_disk_read_validator)(uchar *page, uchar *data);
#define PAGECACHE_CHANGED_BLOCKS_HASH 128 /* must be power of 2 */
/*
The page cache structure
It also contains read-only statistics parameters.
*/
typedef struct st_pagecache
{
my_bool inited;
my_bool resize_in_flush; /* true during flush of resize operation */
my_bool can_be_used; /* usage of cache for read/write is allowed */
uint shift; /* block size = 2 ^ shift */
size_t mem_size; /* specified size of the cache memory */
uint32 block_size; /* size of the page buffer of a cache block */
ulong min_warm_blocks; /* min number of warm blocks; */
ulong age_threshold; /* age threshold for hot blocks */
ulonglong time; /* total number of block link operations */
uint hash_entries; /* max number of entries in the hash table */
int hash_links; /* max number of hash links */
int hash_links_used; /* number of hash links taken from free links pool */
int disk_blocks; /* max number of blocks in the cache */
ulong blocks_used; /* maximum number of concurrently used blocks */
ulong blocks_unused; /* number of currently unused blocks */
ulong blocks_changed; /* number of currently dirty blocks */
ulong warm_blocks; /* number of blocks in warm sub-chain */
ulong cnt_for_resize_op; /* counter to block resize operation */
ulong blocks_available; /* number of blocks available in the LRU chain */
PAGECACHE_HASH_LINK **hash_root;/* arr. of entries into hash table buckets */
PAGECACHE_HASH_LINK *hash_link_root;/* memory for hash table links */
PAGECACHE_HASH_LINK *free_hash_list;/* list of free hash links */
PAGECACHE_BLOCK_LINK *free_block_list;/* list of free blocks */
PAGECACHE_BLOCK_LINK *block_root;/* memory for block links */
uchar HUGE_PTR *block_mem; /* memory for block buffers */
PAGECACHE_BLOCK_LINK *used_last;/* ptr to the last block of the LRU chain */
PAGECACHE_BLOCK_LINK *used_ins;/* ptr to the insertion block in LRU chain */
pthread_mutex_t cache_lock; /* to lock access to the cache structure */
WQUEUE resize_queue; /* threads waiting during resize operation */
WQUEUE waiting_for_hash_link;/* waiting for a free hash link */
WQUEUE waiting_for_block; /* requests waiting for a free block */
/* hash for dirty file bl.*/
PAGECACHE_BLOCK_LINK *changed_blocks[PAGECACHE_CHANGED_BLOCKS_HASH];
/* hash for other file bl.*/
PAGECACHE_BLOCK_LINK *file_blocks[PAGECACHE_CHANGED_BLOCKS_HASH];
/*
The following variables are and variables used to hold parameters for
initializing the key cache.
*/
ulonglong param_buff_size; /* size the memory allocated for the cache */
ulong param_block_size; /* size of the blocks in the key cache */
ulong param_division_limit; /* min. percentage of warm blocks */
ulong param_age_threshold; /* determines when hot block is downgraded */
/* Statistics variables. These are reset in reset_pagecache_counters(). */
ulong global_blocks_changed; /* number of currently dirty blocks */
ulonglong global_cache_w_requests;/* number of write requests (write hits) */
ulonglong global_cache_write; /* number of writes from cache to files */
ulonglong global_cache_r_requests;/* number of read requests (read hits) */
ulonglong global_cache_read; /* number of reads from files to cache */
int blocks; /* max number of blocks in the cache */
my_bool in_init; /* Set to 1 in MySQL during init/resize */
} PAGECACHE;
/* The default key cache */
extern PAGECACHE dflt_pagecache_var, *dflt_pagecache;
extern int init_pagecache(PAGECACHE *pagecache, size_t use_mem,
uint division_limit, uint age_threshold,
uint block_size);
extern int resize_pagecache(PAGECACHE *pagecache,
size_t use_mem, uint division_limit,
uint age_threshold);
extern void change_pagecache_param(PAGECACHE *pagecache, uint division_limit,
uint age_threshold);
#define pagecache_read(P,F,N,L,B,T,K,I) \
pagecache_valid_read(P,F,N,L,B,T,K,I,0,0)
extern uchar *pagecache_valid_read(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
uint level,
uchar *buff,
enum pagecache_page_type type,
enum pagecache_page_lock lock,
PAGECACHE_PAGE_LINK *link,
pagecache_disk_read_validator validator,
uchar* validator_data);
#define pagecache_write(P,F,N,L,B,T,O,I,M,K) \
pagecache_write_part(P,F,N,L,B,T,O,I,M,K,0,(P)->block_size,0,0)
#define pagecache_inject(P,F,N,L,B,T,O,I,K,V,D) \
pagecache_write_part(P,F,N,L,B,T,O,I,PAGECACHE_WRITE_DONE, \
K,0,(P)->block_size,V,D)
extern my_bool pagecache_write_part(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
uint level,
uchar *buff,
enum pagecache_page_type type,
enum pagecache_page_lock lock,
enum pagecache_page_pin pin,
enum pagecache_write_mode write_mode,
PAGECACHE_PAGE_LINK *link,
uint offset,
uint size,
pagecache_disk_read_validator validator,
uchar* validator_data);
extern void pagecache_unlock(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
enum pagecache_page_lock lock,
enum pagecache_page_pin pin,
LSN first_REDO_LSN_for_page,
LSN lsn);
extern void pagecache_unlock_by_link(PAGECACHE *pagecache,
PAGECACHE_PAGE_LINK *link,
enum pagecache_page_lock lock,
enum pagecache_page_pin pin,
LSN first_REDO_LSN_for_page,
LSN lsn);
extern void pagecache_unpin(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
LSN lsn);
extern void pagecache_unpin_by_link(PAGECACHE *pagecache,
PAGECACHE_PAGE_LINK *link,
LSN lsn);
extern int flush_pagecache_blocks(PAGECACHE *keycache,
PAGECACHE_FILE *file,
enum flush_type type);
extern my_bool pagecache_delete(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
enum pagecache_page_lock lock,
my_bool flush);
extern my_bool pagecache_delete_pages(PAGECACHE *pagecache,
PAGECACHE_FILE *file,
pgcache_page_no_t pageno,
uint page_count,
enum pagecache_page_lock lock,
my_bool flush);
extern void end_pagecache(PAGECACHE *keycache, my_bool cleanup);
extern my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
LEX_STRING *str,
LSN *min_lsn);
extern int reset_pagecache_counters(const char *name, PAGECACHE *pagecache);
/* Functions to handle multiple key caches */
extern my_bool multi_pagecache_init(void);
extern void multi_pagecache_free(void);
extern PAGECACHE *multi_pagecache_search(uchar *key, uint length,
PAGECACHE *def);
extern my_bool multi_pagecache_set(const uchar *key, uint length,
PAGECACHE *pagecache);
extern void multi_pagecache_change(PAGECACHE *old_data,
PAGECACHE *new_data);
extern int reset_pagecache_counters(const char *name,
PAGECACHE *pagecache);
C_MODE_END
#endif /* _keycache_h */