mariadb/storage/maria/ma_loghandler.h
unknown d0b9387b88 WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.


storage/maria/ha_maria.cc:
  question for Monty
storage/maria/ma_blockrec.c:
  * log in-write hooks (write_hook_for_redo() etc) move from
  ma_loghandler.c to here; this is natural: the hooks are coupled
  to their callers (functions in ma_blockrec.c).
  * translog_write_record() now has a new argument "hook_arg";
  using it to pass down to write_hook_for_clr_end() the transaction's
  previous_undo_lsn and the type of the being undone record, and also
  to pass down to all UNDOs the live checksum variation caused by the
  operation.
  * If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
  and in CLR_END the checksum variation ("delta") caused by the
  operation. For example if a DELETE caused the table's live checksum
  to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
  the value 333 (456-123).
  * Instead of hard-coded "1" as length of the place where we store
  the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
  use macros clr_type_store and clr_type_korr.
  * write_block_record() has a new parameter 'old_record_checksum'
  which is the pre-computed checksum of old_record; that value is used
  to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
  * In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
  the row's checksum is already computed.
  * _ma_update_block_record2() now expect the new row's checksum into
  cur_row.checksum (was already true) and the old row's checksum into
  new_row.checksum (that's new). Its two callers, maria_update() and
  _ma_apply_undo_row_update(), honour this.
  * When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
  up the checksum delta from the log record. It is then used to update
  the table's live checksum when writing CLR_END, and saves us a
  computation of record.
storage/maria/ma_blockrec.h:
  in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
  more straightforward size of buffer
storage/maria/ma_checkpoint.c:
  <= is enough
storage/maria/ma_commit.c:
  new prototype of translog_write_record()
storage/maria/ma_create.c:
  new prototype of translog_write_record()
storage/maria/ma_delete.c:
  The row's checksum must be computed before calling(*delete_record)(),
  not after, because it must be known inside _ma_delete_block_record()
  (to update the table's live checksum when writing UNDO_ROW_DELETE).
  If deleting from a transactional table, live checksum was already updated
  when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
  @todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
  new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
  * in-write hooks move to ma_blockrec.c.
  * translog_write_record() gets a new argument 'hook_arg' which is
  passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
  for those hooks, because when those hooks are called, 'parts' has
  possibly been mangled (like with LSN compression) and is so
  unpredictable.
  * fix for compiler warning (unused buffer_start when compiling without
  debug support)
  * Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
  and CLR_END, but only if the table has live checksum, these records
  are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
  length is X if no live checksum and X+4 otherwise).
  * add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
  table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
  DELETE and REDO_DELETE_ALL and CLR_END.
  * Bugfix: when reading a record in translog_read_record(), it happened
  that "length" became negative, because the function assumed that
  the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
  * Instead of hard-coded "1" and "4", use symbols and macros
  to store/retrieve the type of record which the CLR_END corresponds
  to, and the checksum variation caused by the operation which logs the
  record
  * translog_write_record() gets a new argument 'hook_arg' which is
  passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
  for those hooks, because when those hooks are called, 'parts' has
  possibly been mangled (like with LSN compression) and is so
  unpredictable.
storage/maria/ma_open.c:
  fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
  <= is enough
storage/maria/ma_recovery.c:
  * print the time that each recovery phase (REDO/UNDO/flush) took;
  this is enabled only when recovering from ha_maria. Is it printed
  n seconds with a fractional part of one digit (like 123.4 seconds).
  * In the REDO phase, update the table's live checksum by using
  the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
  Update it too when seeing REDO_DELETE_ALL.
  * In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
  not have live checksum then reading the record's header (as done by
  the master loop of run_undo_phase()) is enough; otherwise we
  do a translog_read_record() to have the checksum delta ready
  for _ma_apply_undo_row_insert().
  * When at the end of the REDO phase we notice that there is an unfinished
  group of REDOs, don't assert in debug binaries, as I verified that it
  can happen in real life (with kill -9)
  * removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
  new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
  Change in output of ma_test_recovery: now all live checksums of
  original tables equal those of tables recreated by the REDO phase
  and those of tables fixed by the UNDO phase. I.e. recovery of
  the live checksum looks like working (which was after all the only
  goal of this changeset).
  I checked by hand that it's not just all live checksums which are
  now 0 and that's why they match. They are the old values like
  3757530372. maria.test has hard-coded checksum values in its result
  file so checks this too.
storage/maria/ma_update.c:
  * It's useless to put up HA_STATE_CHANGED in 'key_changed',
  as we put up HA_STATE_CHANGED in info->update anyway.
  * We need to compute the old and new rows' checksum before calling
  (*update_record)(), as checksum delta must be known when logging
  UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
  some functions change the 'newrec' record (at least _ma_check_unique()
  does) so we cannot move the checksum computation too early in the
  function.
storage/maria/ma_write.c:
  If inserting into a transactional table, live's checksum was
  already updated when writing UNDO_ROW_INSERT. The multiplication
  is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
  new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
  new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
  new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
  new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
  new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
  new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
  new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
  new prototype of translog_write_record()
storage/myisam/sort.c:
  fix for compiler warnings in pushbuild (write_merge_key* functions
  didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00

378 lines
12 KiB
C

/* Copyright (C) 2007 MySQL AB & Sanja Belkin
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
#ifndef _ma_loghandler_h
#define _ma_loghandler_h
/* transaction log default cache size (TODO: make it global variable) */
#define TRANSLOG_PAGECACHE_SIZE 1024*1024*2
/* transaction log default file size (TODO: make it global variable) */
#define TRANSLOG_FILE_SIZE 1024*1024*1024
/* transaction log default flags (TODO: make it global variable) */
#define TRANSLOG_DEFAULT_FLAGS 0
/* Transaction log flags */
#define TRANSLOG_PAGE_CRC 1
#define TRANSLOG_SECTOR_PROTECTION (1<<1)
#define TRANSLOG_RECORD_CRC (1<<2)
#define TRANSLOG_FLAGS_NUM ((TRANSLOG_PAGE_CRC | TRANSLOG_SECTOR_PROTECTION | \
TRANSLOG_RECORD_CRC) + 1)
#define RECHEADER_READ_ERROR -1
#define RECHEADER_READ_EOF -2
/*
Page size in transaction log
It should be Power of 2 and multiple of DISK_DRIVE_SECTOR_SIZE
(DISK_DRIVE_SECTOR_SIZE * 2^N)
*/
#define TRANSLOG_PAGE_SIZE (8*1024)
#include "ma_loghandler_lsn.h"
#include "trnman_public.h"
/* short transaction ID type */
typedef uint16 SHORT_TRANSACTION_ID;
struct st_maria_info;
/* Changing one of the "SIZE" below will break backward-compatibility! */
/* Length of CRC at end of pages */
#define CRC_LENGTH 4
/* Size of file id in logs */
#define FILEID_STORE_SIZE 2
/* Size of page reference in log */
#define PAGE_STORE_SIZE ROW_EXTENT_PAGE_SIZE
/* Size of page ranges in log */
#define PAGERANGE_STORE_SIZE ROW_EXTENT_COUNT_SIZE
#define DIRPOS_STORE_SIZE 1
#define CLR_TYPE_STORE_SIZE 1
/* If table has live checksum we store its changes in UNDOs */
#define HA_CHECKSUM_STORE_SIZE 4
/* Store methods to match the above sizes */
#define fileid_store(T,A) int2store(T,A)
#define page_store(T,A) int5store(T,A)
#define dirpos_store(T,A) ((*(uchar*) (T)) = A)
#define pagerange_store(T,A) int2store(T,A)
#define clr_type_store(T,A) ((*(uchar*) (T)) = A)
#define ha_checksum_store(T,A) int4store(T,A)
#define fileid_korr(P) uint2korr(P)
#define page_korr(P) uint5korr(P)
#define dirpos_korr(P) ((P)[0])
#define pagerange_korr(P) uint2korr(P)
#define clr_type_korr(P) ((P)[0])
#define ha_checksum_korr(P) uint4korr(P)
/*
Length of disk drive sector size (we assume that writing it
to disk is atomic operation)
*/
#define DISK_DRIVE_SECTOR_SIZE 512
/*
Number of empty entries we need to have in LEX_STRING for
translog_write_record()
*/
#define LOG_INTERNAL_PARTS 1
/* position reserved in an array of parts of a log record */
#define TRANSLOG_INTERNAL_PARTS 2
/* types of records in the transaction log */
/* Todo: Set numbers for these when we have all entries figured out */
enum translog_record_type
{
LOGREC_RESERVED_FOR_CHUNKS23= 0,
LOGREC_REDO_INSERT_ROW_HEAD,
LOGREC_REDO_INSERT_ROW_TAIL,
LOGREC_REDO_INSERT_ROW_BLOB,
LOGREC_REDO_INSERT_ROW_BLOBS,
LOGREC_REDO_PURGE_ROW_HEAD,
LOGREC_REDO_PURGE_ROW_TAIL,
LOGREC_REDO_PURGE_BLOCKS,
LOGREC_REDO_DELETE_ROW,
LOGREC_REDO_UPDATE_ROW_HEAD,
LOGREC_REDO_INDEX,
LOGREC_REDO_UNDELETE_ROW,
LOGREC_CLR_END,
LOGREC_PURGE_END,
LOGREC_UNDO_ROW_INSERT,
LOGREC_UNDO_ROW_DELETE,
LOGREC_UNDO_ROW_UPDATE,
LOGREC_UNDO_KEY_INSERT,
LOGREC_UNDO_KEY_DELETE,
LOGREC_PREPARE,
LOGREC_PREPARE_WITH_UNDO_PURGE,
LOGREC_COMMIT,
LOGREC_COMMIT_WITH_UNDO_PURGE,
LOGREC_CHECKPOINT,
LOGREC_REDO_CREATE_TABLE,
LOGREC_REDO_RENAME_TABLE,
LOGREC_REDO_DROP_TABLE,
LOGREC_REDO_DELETE_ALL,
LOGREC_REDO_REPAIR_TABLE,
LOGREC_FILE_ID,
LOGREC_LONG_TRANSACTION_ID,
LOGREC_RESERVED_FUTURE_EXTENSION= 63
};
#define LOGREC_NUMBER_OF_TYPES 64 /* Maximum, can't be extended */
/* Size of log file; One log file is restricted to 4G */
typedef uint32 translog_size_t;
#define TRANSLOG_RECORD_HEADER_MAX_SIZE 1024
typedef struct st_translog_group_descriptor
{
TRANSLOG_ADDRESS addr;
uint8 num;
} TRANSLOG_GROUP;
typedef struct st_translog_header_buffer
{
/* LSN of the read record */
LSN lsn;
/* array of groups descriptors, can be used only if groups_no > 0 */
TRANSLOG_GROUP *groups;
/* short transaction ID or 0 if it has no sense for the record */
SHORT_TRANSACTION_ID short_trid;
/*
The Record length in buffer (including read header, but excluding
hidden part of record (type, short TrID, length)
*/
translog_size_t record_length;
/*
Buffer for write decoded header of the record (depend on the record
type)
*/
uchar header[TRANSLOG_RECORD_HEADER_MAX_SIZE];
/* number of groups listed in */
uint groups_no;
/* in multi-group number of chunk0 pages (valid only if groups_no > 0) */
uint chunk0_pages;
/* type of the read record */
enum translog_record_type type;
/* chunk 0 data address (valid only if groups_no > 0) */
TRANSLOG_ADDRESS chunk0_data_addr;
/*
Real compressed LSN(s) size economy (<number of LSN(s)>*7 - <real_size>)
*/
int16 compressed_LSN_economy;
/* short transaction ID or 0 if it has no sense for the record */
uint16 non_header_data_start_offset;
/* non read body data length in this first chunk */
uint16 non_header_data_len;
/* chunk 0 data size (valid only if groups_no > 0) */
uint16 chunk0_data_len;
} TRANSLOG_HEADER_BUFFER;
typedef struct st_translog_scanner_data
{
uchar buffer[TRANSLOG_PAGE_SIZE]; /* buffer for page content */
TRANSLOG_ADDRESS page_addr; /* current page address */
/* end of the log which we saw last time */
TRANSLOG_ADDRESS horizon;
TRANSLOG_ADDRESS last_file_page; /* Last page on in this file */
uchar *page; /* page content pointer */
/* direct link on the current page or NULL if not supported/requested */
PAGECACHE_BLOCK_LINK *direct_link;
/* offset of the chunk in the page */
translog_size_t page_offset;
/* set horizon only once at init */
my_bool fixed_horizon;
/* try to get direct link on the page if it is possible */
my_bool use_direct_link;
} TRANSLOG_SCANNER_DATA;
struct st_translog_reader_data
{
TRANSLOG_HEADER_BUFFER header; /* Header */
TRANSLOG_SCANNER_DATA scanner; /* chunks scanner */
translog_size_t body_offset; /* current chunk body offset */
/* data offset from the record beginning */
translog_size_t current_offset;
/* number of bytes read in header */
uint16 read_header;
uint16 chunk_size; /* current chunk size */
uint current_group; /* current group */
uint current_chunk; /* current chunk in the group */
my_bool eor; /* end of the record */
};
struct st_transaction;
C_MODE_START
/* Records types for unittests */
#define LOGREC_FIXED_RECORD_0LSN_EXAMPLE 1
#define LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE 2
#define LOGREC_FIXED_RECORD_1LSN_EXAMPLE 3
#define LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE 4
#define LOGREC_FIXED_RECORD_2LSN_EXAMPLE 5
#define LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE 6
extern void example_loghandler_init();
extern my_bool translog_init(const char *directory, uint32 log_file_max_size,
uint32 server_version, uint32 server_id,
PAGECACHE *pagecache, uint flags);
extern my_bool
translog_write_record(LSN *lsn, enum translog_record_type type,
struct st_transaction *trn,
struct st_maria_info *tbl_info,
translog_size_t rec_len, uint part_no,
LEX_STRING *parts_data, uchar *store_share_id,
void *hook_arg);
extern void translog_destroy();
extern int translog_read_record_header(LSN lsn, TRANSLOG_HEADER_BUFFER *buff);
extern void translog_free_record_header(TRANSLOG_HEADER_BUFFER *buff);
extern translog_size_t translog_read_record(LSN lsn,
translog_size_t offset,
translog_size_t length,
uchar *buffer,
struct st_translog_reader_data
*data);
extern my_bool translog_flush(LSN lsn);
extern my_bool translog_init_scanner(LSN lsn,
my_bool fixed_horizon,
struct st_translog_scanner_data *scanner,
my_bool use_direct_link);
extern void translog_destroy_scanner(TRANSLOG_SCANNER_DATA *scanner);
extern int translog_read_next_record_header(TRANSLOG_SCANNER_DATA *scanner,
TRANSLOG_HEADER_BUFFER *buff);
extern LSN translog_get_file_max_lsn_stored(uint32 file);
extern my_bool translog_purge(TRANSLOG_ADDRESS low);
extern my_bool translog_is_file(uint file_no);
extern my_bool translog_lock();
extern my_bool translog_unlock();
extern void translog_lock_assert_owner();
extern TRANSLOG_ADDRESS translog_get_horizon();
extern TRANSLOG_ADDRESS translog_get_horizon_no_lock();
extern int translog_assign_id_to_share(struct st_maria_info *tbl_info,
struct st_transaction *trn);
extern void translog_deassign_id_from_share(struct st_maria_share *share);
extern void
translog_assign_id_to_share_from_recovery(struct st_maria_share *share,
uint16 id);
extern my_bool translog_inited;
/*
all the rest added because of recovery; should we make
ma_loghandler_for_recovery.h ?
*/
#define SHARE_ID_MAX 65535 /* array's size */
extern LSN translog_first_lsn_in_log();
extern LSN translog_first_theoretical_lsn();
extern LSN translog_next_LSN(TRANSLOG_ADDRESS addr, TRANSLOG_ADDRESS horizon);
/* record parts descriptor */
struct st_translog_parts
{
/* full record length */
translog_size_t record_length;
/* full record length with chunk headers */
translog_size_t total_record_length;
/* current part index */
uint current;
/* total number of elements in parts */
uint elements;
/* array of parts (LEX_STRING) */
LEX_STRING *parts;
};
typedef my_bool(*prewrite_rec_hook) (enum translog_record_type type,
TRN *trn, struct st_maria_info *tbl_info,
void *hook_arg);
typedef my_bool(*inwrite_rec_hook) (enum translog_record_type type,
TRN *trn, struct st_maria_info *tbl_info,
LSN *lsn, void *hook_arg);
typedef uint16(*read_rec_hook) (enum translog_record_type type,
uint16 read_length, uchar *read_buff,
uchar *decoded_buff);
/* record classes */
enum record_class
{
LOGRECTYPE_NOT_ALLOWED,
LOGRECTYPE_VARIABLE_LENGTH,
LOGRECTYPE_PSEUDOFIXEDLENGTH,
LOGRECTYPE_FIXEDLENGTH
};
/* C++ can't bear that a variable's name is "class" */
#ifndef __cplusplus
enum enum_record_in_group {
LOGREC_NOT_LAST_IN_GROUP= 0, LOGREC_LAST_IN_GROUP, LOGREC_IS_GROUP_ITSELF
};
/*
Descriptor of log record type
Note: Don't reorder because of constructs later...
*/
typedef struct st_log_record_type_descriptor
{
/* internal class of the record */
enum record_class class;
/*
length for fixed-size record, pseudo-fixed record
length with uncompressed LSNs
*/
uint16 fixed_length;
/* how much record body (belonged to headers too) read with headers */
uint16 read_header_len;
/* HOOK for writing the record called before lock */
prewrite_rec_hook prewrite_hook;
/* HOOK for writing the record called when LSN is known, inside lock */
inwrite_rec_hook inwrite_hook;
/* HOOK for reading headers */
read_rec_hook read_hook;
/*
For pseudo fixed records number of compressed LSNs followed by
system header
*/
int16 compressed_LSN;
/* the rest is for maria_read_log & Recovery */
/** @brief for debug error messages or "maria_read_log" command-line tool */
const char *name;
enum enum_record_in_group record_in_group;
/* a function to execute when we see the record during the REDO phase */
int (*record_execute_in_redo_phase)(const TRANSLOG_HEADER_BUFFER *);
/* a function to execute when we see the record during the UNDO phase */
int (*record_execute_in_undo_phase)(const TRANSLOG_HEADER_BUFFER *, TRN *);
} LOG_DESC;
extern LOG_DESC log_record_type_descriptor[LOGREC_NUMBER_OF_TYPES];
#endif
C_MODE_END
#endif