2007-06-04 13:07:18 +02:00
|
|
|
/* Copyright (C) 2007 MySQL AB & Sanja Belkin
|
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
|
|
|
the Free Software Foundation; version 2 of the License.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program; if not, write to the Free Software
|
|
|
|
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
#include "maria_def.h"
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
#include "ma_blockrec.h" /* for some constants and in-write hooks */
|
|
|
|
#include "trnman.h" /* for access to members of TRN */
|
2007-02-02 08:41:32 +01:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/**
|
|
|
|
@file
|
|
|
|
@brief Module which writes and reads to a transaction log
|
|
|
|
|
|
|
|
@todo LOG: in functions where the log's lock is required, a
|
|
|
|
translog_assert_owner() could be added.
|
|
|
|
*/
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
/* number of opened log files in the pagecache (should be at least 2) */
|
2007-02-02 08:41:32 +01:00
|
|
|
#define OPENED_FILES_NUM 3
|
|
|
|
|
|
|
|
/* records buffer size (should be LOG_PAGE_SIZE * n) */
|
|
|
|
#define TRANSLOG_WRITE_BUFFER (1024*1024)
|
|
|
|
/* min chunk length */
|
|
|
|
#define TRANSLOG_MIN_CHUNK 3
|
|
|
|
/*
|
|
|
|
Number of buffers used by loghandler
|
|
|
|
|
|
|
|
Should be at least 4, because one thread can block up to 2 buffers in
|
|
|
|
normal circumstances (less then half of one and full other, or just
|
|
|
|
switched one and other), But if we met end of the file in the middle and
|
2007-02-19 22:01:27 +01:00
|
|
|
have to switch buffer it will be 3. + 1 buffer for flushing/writing.
|
|
|
|
We have a bigger number here for higher concurrency.
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
#define TRANSLOG_BUFFERS_NO 5
|
2007-02-19 22:01:27 +01:00
|
|
|
/* number of bytes (+ header) which can be unused on first page in sequence */
|
2007-02-02 08:41:32 +01:00
|
|
|
#define TRANSLOG_MINCHUNK_CONTENT 1
|
|
|
|
/* version of log file */
|
2007-02-19 22:01:27 +01:00
|
|
|
#define TRANSLOG_VERSION_ID 10000 /* 1.00.00 */
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
#define TRANSLOG_PAGE_FLAGS 6 /* transaction log page flags offset */
|
|
|
|
|
|
|
|
/* QQ: For temporary debugging */
|
2007-02-02 08:41:32 +01:00
|
|
|
#define UNRECOVERABLE_ERROR(E) \
|
|
|
|
do { \
|
|
|
|
DBUG_PRINT("error", E); \
|
|
|
|
printf E; \
|
|
|
|
putchar('\n'); \
|
|
|
|
} while(0);
|
|
|
|
|
2007-06-15 00:45:55 +02:00
|
|
|
/* Maximum length of compressed LSNs (the worst case of whole LSN storing) */
|
|
|
|
#define COMPRESSED_LSN_MAX_STORE_SIZE (2 + LSN_STORE_SIZE)
|
|
|
|
#define MAX_NUMBER_OF_LSNS_PER_RECORD 2
|
2007-02-19 22:01:27 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
/* log write buffer descriptor */
|
|
|
|
struct st_translog_buffer
|
|
|
|
{
|
|
|
|
LSN last_lsn;
|
|
|
|
/* This buffer offset in the file */
|
|
|
|
TRANSLOG_ADDRESS offset;
|
2007-08-13 14:17:49 +02:00
|
|
|
/*
|
|
|
|
Next buffer offset in the file (it is not always offset + size,
|
|
|
|
in case of flush by LSN it can be offset + size - TRANSLOG_PAGE_SIZE)
|
|
|
|
*/
|
|
|
|
TRANSLOG_ADDRESS next_buffer_offset;
|
2007-02-02 08:41:32 +01:00
|
|
|
/*
|
|
|
|
How much written (or will be written when copy_to_buffer_in_progress
|
|
|
|
become 0) to this buffer
|
|
|
|
*/
|
2007-02-19 22:01:27 +01:00
|
|
|
translog_size_t size;
|
|
|
|
/* File handler for this buffer */
|
2007-02-02 08:41:32 +01:00
|
|
|
File file;
|
|
|
|
/* Threads which are waiting for buffer filling/freeing */
|
|
|
|
WQUEUE waiting_filling_buffer;
|
|
|
|
/* Number of record which are in copy progress */
|
2007-02-19 22:01:27 +01:00
|
|
|
uint copy_to_buffer_in_progress;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* list of waiting buffer ready threads */
|
|
|
|
struct st_my_thread_var *waiting_flush;
|
|
|
|
struct st_translog_buffer *overlay;
|
|
|
|
#ifndef DBUG_OFF
|
2007-02-19 22:01:27 +01:00
|
|
|
uint buffer_no;
|
2007-02-02 08:41:32 +01:00
|
|
|
#endif
|
2007-02-19 22:01:27 +01:00
|
|
|
/* lock for the buffer. Current buffer also lock the handler */
|
|
|
|
pthread_mutex_t mutex;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* IO cache for current log */
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar buffer[TRANSLOG_WRITE_BUFFER];
|
2007-02-02 08:41:32 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
struct st_buffer_cursor
|
|
|
|
{
|
|
|
|
/* pointer on the buffer */
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *ptr;
|
2007-02-19 22:01:27 +01:00
|
|
|
/* current buffer */
|
|
|
|
struct st_translog_buffer *buffer;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* current page fill */
|
2007-02-19 22:01:27 +01:00
|
|
|
uint16 current_page_fill;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* how many times we finish this page to write it */
|
|
|
|
uint16 write_counter;
|
|
|
|
/* previous write offset */
|
|
|
|
uint16 previous_offset;
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Number of current buffer */
|
2007-02-02 08:41:32 +01:00
|
|
|
uint8 buffer_no;
|
|
|
|
my_bool chaser, protected;
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
struct st_translog_descriptor
|
|
|
|
{
|
|
|
|
/* *** Parameters of the log handler *** */
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Page cache for the log reads */
|
|
|
|
PAGECACHE *pagecache;
|
|
|
|
/* Flags */
|
|
|
|
uint flags;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* max size of one log size (for new logs creation) */
|
|
|
|
uint32 log_file_max_size;
|
|
|
|
/* server version */
|
|
|
|
uint32 server_version;
|
|
|
|
/* server ID */
|
|
|
|
uint32 server_id;
|
|
|
|
/* Loghandler's buffer capacity in case of chunk 2 filling */
|
|
|
|
uint32 buffer_capacity_chunk_2;
|
|
|
|
/* Half of the buffer capacity in case of chunk 2 filling */
|
|
|
|
uint32 half_buffer_capacity_chunk_2;
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Page overhead calculated by flags */
|
|
|
|
uint16 page_overhead;
|
|
|
|
/* Page capacity calculated by flags (TRANSLOG_PAGE_SIZE-page_overhead-1) */
|
|
|
|
uint16 page_capacity_chunk_2;
|
|
|
|
/* Directory to store files */
|
|
|
|
char directory[FN_REFLEN];
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/* *** Current state of the log handler *** */
|
|
|
|
/* Current and (OPENED_FILES_NUM-1) last logs number in page cache */
|
|
|
|
File log_file_num[OPENED_FILES_NUM];
|
2007-02-19 22:01:27 +01:00
|
|
|
File directory_fd;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* buffers for log writing */
|
|
|
|
struct st_translog_buffer buffers[TRANSLOG_BUFFERS_NO];
|
|
|
|
/*
|
|
|
|
horizon - visible end of the log (here is absolute end of the log:
|
|
|
|
position where next chunk can start
|
|
|
|
*/
|
|
|
|
TRANSLOG_ADDRESS horizon;
|
|
|
|
/* horizon buffer cursor */
|
|
|
|
struct st_buffer_cursor bc;
|
2007-08-19 21:27:43 +02:00
|
|
|
/* maximum LSN of the current (not finished) file */
|
|
|
|
LSN max_lsn;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/* Last flushed LSN */
|
|
|
|
LSN flushed;
|
2007-08-13 14:17:49 +02:00
|
|
|
/* Last LSN sent to the disk (but maybe not written yet) */
|
2007-02-02 08:41:32 +01:00
|
|
|
LSN sent_to_file;
|
2007-08-31 08:22:15 +02:00
|
|
|
/* All what is after this address is not sent to disk yet */
|
2007-08-13 14:17:49 +02:00
|
|
|
TRANSLOG_ADDRESS in_buffers_only;
|
2007-02-02 08:41:32 +01:00
|
|
|
pthread_mutex_t sent_to_file_lock;
|
2007-09-21 12:48:57 +02:00
|
|
|
pthread_mutex_t log_flush_lock;
|
2007-08-19 21:27:43 +02:00
|
|
|
|
|
|
|
/* Protects changing of headers of finished files (max_lsn) */
|
|
|
|
pthread_mutex_t file_header_lock;
|
|
|
|
|
|
|
|
/*
|
|
|
|
Sorted array (with protection) of files where we started writing process
|
|
|
|
and so we can't give last LSN yet
|
|
|
|
*/
|
|
|
|
pthread_mutex_t unfinished_files_lock;
|
|
|
|
DYNAMIC_ARRAY unfinished_files;
|
2007-08-27 20:55:39 +02:00
|
|
|
|
|
|
|
/* Purger data: minimum file in the log (or 0 if unknown) */
|
|
|
|
uint32 min_file_number;
|
|
|
|
/* Protect purger from many calls and it's data */
|
|
|
|
pthread_mutex_t purger_lock;
|
|
|
|
/* last low water mark checked */
|
|
|
|
LSN last_lsn_checked;
|
2007-02-02 08:41:32 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
static struct st_translog_descriptor log_descriptor;
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Marker for end of log */
|
2007-07-02 19:45:15 +02:00
|
|
|
static uchar end_of_log= 0;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
my_bool translog_inited= 0;
|
2007-06-04 13:07:18 +02:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
/* chunk types */
|
2007-02-19 22:01:27 +01:00
|
|
|
#define TRANSLOG_CHUNK_LSN 0x00 /* 0 chunk refer as LSN (head or tail */
|
|
|
|
#define TRANSLOG_CHUNK_FIXED (1 << 6) /* 1 (pseudo)fixed record (also LSN) */
|
|
|
|
#define TRANSLOG_CHUNK_NOHDR (2 << 6) /* 2 no head chunk (till page end) */
|
|
|
|
#define TRANSLOG_CHUNK_LNGTH (3 << 6) /* 3 chunk with chunk length */
|
|
|
|
#define TRANSLOG_CHUNK_TYPE (3 << 6) /* Mask to get chunk type */
|
2007-02-02 08:41:32 +01:00
|
|
|
#define TRANSLOG_REC_TYPE 0x3F /* Mask to get record type */
|
|
|
|
|
|
|
|
/* compressed (relative) LSN constants */
|
2007-02-19 22:01:27 +01:00
|
|
|
#define TRANSLOG_CLSN_LEN_BITS 0xC0 /* Mask to get compressed LSN length */
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
#include <my_atomic.h>
|
|
|
|
/* an array that maps id of a MARIA_SHARE to this MARIA_SHARE */
|
|
|
|
static MARIA_SHARE **id_to_share= NULL;
|
|
|
|
/* lock for id_to_share */
|
|
|
|
static my_atomic_rwlock_t LOCK_id_to_share;
|
|
|
|
|
2007-08-13 14:17:49 +02:00
|
|
|
static my_bool translog_page_validator(uchar *page_addr, uchar* data_ptr);
|
|
|
|
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
/*
|
|
|
|
Initialize log_record_type_descriptors
|
|
|
|
|
|
|
|
NOTE that after first public Maria release, these can NOT be changed
|
|
|
|
*/
|
|
|
|
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
LOG_DESC log_record_type_descriptor[LOGREC_NUMBER_OF_TYPES];
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
2007-09-04 21:52:32 +02:00
|
|
|
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
/**
|
|
|
|
@brief check the description table validity
|
|
|
|
|
|
|
|
@param num how many records should be filled
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void check_translog_description_table(int num)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
DBUG_ENTER("check_translog_description_table");
|
|
|
|
DBUG_PRINT("enter", ("last record: %d", num));
|
|
|
|
DBUG_ASSERT(num > 0);
|
|
|
|
/* last is reserved for extending the table */
|
|
|
|
DBUG_ASSERT(num < LOGREC_NUMBER_OF_TYPES - 1);
|
|
|
|
DBUG_PRINT("info", ("records number: OK"));
|
|
|
|
DBUG_PRINT("info",
|
|
|
|
("record type: %d class: %d fixed: %u header: %u LSNs: %u "
|
|
|
|
"name: %s",
|
|
|
|
0,
|
|
|
|
log_record_type_descriptor[0].class,
|
|
|
|
(uint)log_record_type_descriptor[0].fixed_length,
|
|
|
|
(uint)log_record_type_descriptor[0].read_header_len,
|
|
|
|
(uint)log_record_type_descriptor[0].compressed_LSN,
|
|
|
|
log_record_type_descriptor[0].name));
|
|
|
|
DBUG_ASSERT(log_record_type_descriptor[0].class == LOGRECTYPE_NOT_ALLOWED);
|
|
|
|
DBUG_PRINT("info", ("record type 0: OK"));
|
|
|
|
for (i= 1; i <= num; i++)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info",
|
|
|
|
("record type: %d class: %d fixed: %u header: %u LSNs: %u "
|
|
|
|
"name: %s",
|
|
|
|
i, log_record_type_descriptor[i].class,
|
|
|
|
(uint)log_record_type_descriptor[i].fixed_length,
|
|
|
|
(uint)log_record_type_descriptor[i].read_header_len,
|
|
|
|
(uint)log_record_type_descriptor[i].compressed_LSN,
|
|
|
|
log_record_type_descriptor[i].name));
|
|
|
|
switch (log_record_type_descriptor[i].class) {
|
|
|
|
case LOGRECTYPE_NOT_ALLOWED:
|
|
|
|
DBUG_ASSERT(0);
|
|
|
|
break;
|
|
|
|
case LOGRECTYPE_VARIABLE_LENGTH:
|
|
|
|
DBUG_ASSERT(log_record_type_descriptor[i].fixed_length == 0);
|
|
|
|
DBUG_ASSERT((log_record_type_descriptor[i].compressed_LSN == 0) ||
|
|
|
|
((log_record_type_descriptor[i].compressed_LSN == 1) &&
|
|
|
|
(log_record_type_descriptor[i].read_header_len >=
|
|
|
|
LSN_STORE_SIZE)) ||
|
|
|
|
((log_record_type_descriptor[i].compressed_LSN == 2) &&
|
|
|
|
(log_record_type_descriptor[i].read_header_len >=
|
|
|
|
LSN_STORE_SIZE * 2)));
|
|
|
|
break;
|
|
|
|
case LOGRECTYPE_PSEUDOFIXEDLENGTH:
|
|
|
|
DBUG_ASSERT(log_record_type_descriptor[i].fixed_length ==
|
|
|
|
log_record_type_descriptor[i].read_header_len);
|
|
|
|
DBUG_ASSERT(log_record_type_descriptor[i].compressed_LSN > 0);
|
|
|
|
DBUG_ASSERT(log_record_type_descriptor[i].compressed_LSN <= 2);
|
|
|
|
break;
|
|
|
|
case LOGRECTYPE_FIXEDLENGTH:
|
|
|
|
DBUG_ASSERT(log_record_type_descriptor[i].fixed_length ==
|
|
|
|
log_record_type_descriptor[i].read_header_len);
|
|
|
|
DBUG_ASSERT(log_record_type_descriptor[i].compressed_LSN == 0);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
DBUG_ASSERT(0);
|
|
|
|
}
|
|
|
|
DBUG_PRINT("info", ("record type %d: OK", i));
|
|
|
|
}
|
|
|
|
DBUG_PRINT("info", ("All filled records are OK"));
|
|
|
|
for (i= num + 1; i < LOGREC_NUMBER_OF_TYPES; i++)
|
|
|
|
{
|
|
|
|
DBUG_ASSERT(log_record_type_descriptor[i].class == LOGRECTYPE_NOT_ALLOWED);
|
|
|
|
DBUG_PRINT("info", ("record type %d: OK", i));
|
|
|
|
}
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2007-06-15 00:45:55 +02:00
|
|
|
static LOG_DESC INIT_LOGREC_FIXED_RECORD_0LSN_EXAMPLE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_FIXEDLENGTH, 6, 6, NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"fixed0example", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
2007-06-15 00:45:55 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 9, NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"variable0example", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
2007-06-15 00:45:55 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_FIXED_RECORD_1LSN_EXAMPLE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_PSEUDOFIXEDLENGTH, 7, 7, NULL, NULL, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"fixed1example", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
2007-06-15 00:45:55 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 12, NULL, NULL, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"variable1example", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
2007-06-15 00:45:55 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_FIXED_RECORD_2LSN_EXAMPLE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_PSEUDOFIXEDLENGTH, 23, 23, NULL, NULL, NULL, 2,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"fixed2example", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
2007-06-15 00:45:55 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 19, NULL, NULL, NULL, 2,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"variable2example", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
2007-06-15 00:45:55 +02:00
|
|
|
|
|
|
|
|
|
|
|
void example_loghandler_init()
|
|
|
|
{
|
2007-09-04 21:52:32 +02:00
|
|
|
int i;
|
2007-06-15 00:45:55 +02:00
|
|
|
log_record_type_descriptor[LOGREC_FIXED_RECORD_0LSN_EXAMPLE]=
|
|
|
|
INIT_LOGREC_FIXED_RECORD_0LSN_EXAMPLE;
|
|
|
|
log_record_type_descriptor[LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE]=
|
|
|
|
INIT_LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE;
|
|
|
|
log_record_type_descriptor[LOGREC_FIXED_RECORD_1LSN_EXAMPLE]=
|
|
|
|
INIT_LOGREC_FIXED_RECORD_1LSN_EXAMPLE;
|
|
|
|
log_record_type_descriptor[LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE]=
|
|
|
|
INIT_LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE;
|
|
|
|
log_record_type_descriptor[LOGREC_FIXED_RECORD_2LSN_EXAMPLE]=
|
|
|
|
INIT_LOGREC_FIXED_RECORD_2LSN_EXAMPLE;
|
|
|
|
log_record_type_descriptor[LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE]=
|
|
|
|
INIT_LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE;
|
2007-09-04 21:52:32 +02:00
|
|
|
for (i= LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE + 1;
|
|
|
|
i < LOGREC_NUMBER_OF_TYPES;
|
|
|
|
i++)
|
|
|
|
log_record_type_descriptor[i].class= LOGRECTYPE_NOT_ALLOWED;
|
|
|
|
DBUG_EXECUTE("info",
|
|
|
|
check_translog_description_table(LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE););
|
2007-06-15 00:45:55 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
static LOG_DESC INIT_LOGREC_RESERVED_FOR_CHUNKS23=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_NOT_ALLOWED, 0, 0, NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"reserved", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL };
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_HEAD=
|
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0,
|
2007-06-11 16:29:53 +02:00
|
|
|
FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE, NULL,
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_insert_row_head", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_TAIL=
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0,
|
|
|
|
FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE, NULL,
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_insert_row_tail", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
/** @todo RECOVERY BUG unused, remove? */
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_BLOB=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_insert_row_blob", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
/** @todo RECOVERY BUG handle it in recovery */
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
/*QQQ:TODO:header???*/
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_BLOBS=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, FILEID_STORE_SIZE, NULL,
|
|
|
|
write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_insert_row_blobs", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_PURGE_ROW_HEAD=
|
|
|
|
{LOGRECTYPE_FIXEDLENGTH,
|
|
|
|
FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
|
|
|
|
FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
NULL, write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_purge_row_head", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_PURGE_ROW_TAIL=
|
|
|
|
{LOGRECTYPE_FIXEDLENGTH,
|
|
|
|
FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
|
|
|
|
FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
NULL, write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_purge_row_tail", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_PURGE_BLOCKS=
|
2007-08-29 08:03:10 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0,
|
|
|
|
FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE,
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
NULL, write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_purge_blocks", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
/* not yet used; for when we have versioning */
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
static LOG_DESC INIT_LOGREC_REDO_DELETE_ROW=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_FIXEDLENGTH, 16, 16, NULL, write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_delete_row", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
/** @todo RECOVERY BUG unused, remove? */
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
static LOG_DESC INIT_LOGREC_REDO_UPDATE_ROW_HEAD=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 9, NULL, write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_update_row_head", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_INDEX=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 9, NULL, write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_index", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_UNDELETE_ROW=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_FIXEDLENGTH, 16, 16, NULL, write_hook_for_redo, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_undelete_row", LOGREC_NOT_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_CLR_END=
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, LSN_STORE_SIZE + FILEID_STORE_SIZE +
|
|
|
|
CLR_TYPE_STORE_SIZE, NULL, write_hook_for_clr_end, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"clr_end", LOGREC_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_PURGE_END=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_PSEUDOFIXEDLENGTH, 5, 5, NULL, NULL, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"purge_end", LOGREC_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_UNDO_ROW_INSERT=
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0,
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
LSN_STORE_SIZE + FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
NULL, write_hook_for_undo_row_insert, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"undo_row_insert", LOGREC_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_UNDO_ROW_DELETE=
|
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0,
|
|
|
|
LSN_STORE_SIZE + FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
NULL, write_hook_for_undo_row_delete, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"undo_row_delete", LOGREC_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_UNDO_ROW_UPDATE=
|
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0,
|
|
|
|
LSN_STORE_SIZE + FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
NULL, write_hook_for_undo_row_update, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"undo_row_update", LOGREC_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_UNDO_KEY_INSERT=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 10, NULL, write_hook_for_undo, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"undo_key_insert", LOGREC_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_UNDO_KEY_DELETE=
|
WL#3072 Maria Recovery
misc fixes of execution of UNDOs in the UNDO phase:
- into the CLR_END, store the LSN of the _previous_ UNDO (we debated
what was best, so far we're going with "previous"; later we can change
to "current" if needed), and store the type of record which is being
undone (needed to know how to update state.records when we see the
CLR_END during the REDO phase).
- declaring all UNDOs and CLR_END as "compressed"
- when executing an UNDO in the UNDO phase, state.records is updated
as a hook when writing CLR_END (needed for "recovery of the state"),
and so is trn->undo_lsn (needed for when we have checkpoints).
- bugfix (execution of UNDO_ROW_DELETE didn't store the correct checksum
into the re-inserted row, maria_chk -r thus threw the row away).
- modifications of ma_test1: where to stop is now driven by --testflag;
--test-undo just tells how to stop (flush data, flush log, nothing).
- ma_test_recovery: testing of the UNDO phase, more testing of the
REDO phase, identification of a bug.
storage/maria/ma_blockrec.c:
- bugfix: execution of UNDO_ROW_DELETE didn't store the correct
checksum into the row (leading to "maria_chk -r" eliminating the
re-inserted row, net effect was that rollback appeared to have
rolled back no deletion). Reason was that write_block_record() used
info->cur_row.checksum, while "row" can be != &info->cur_row
(case of UNDO_ROW_DELETE). After fixing this, problems with
_ma_update_block_record() appeared; indeed checksum was computed
by allocate_and_write_block_record() while _ma_update_block_record()
directly calls write_block_record(). Solution is to compute checksum
in write_block_record() instead.
- when executing an UNDO, we now pass the LSN of the _previous_ UNDO
to block_format functions. This LSN can be 0 (if the being-executed UNDO
was the transaction's first UNDO), so "undo_lsn==0" cannot work
anymore to indicate "this is not UNDO work". Using undo_lsn==LSN_ERROR
instead (this is an impossible LSN).
- store into CLR_END the type of log record which was undone
(INSERT/UPDATE/DELETE); needed for Recovery to know if/how it has
to update state.records if it sees this CLR_END in the REDO phase.
- when writing the CLR_END in _ma_apply_undo_row_insert(),
the place to store file's id is log_data+LSN_STORE_SIZE.
- in _ma_apply_undo_row_insert(), the records-- is moved
to a hook when writing the CLR_END (this way it is under log's mutex
which is needed for "recovery of the state")
storage/maria/ma_loghandler.c:
- all UNDOs, and CLR_END, start with the LSN of another UNDO; so
we can declare them "compressed".
- write_hook_for_clr_end() to set trn->undo_lsn (to the previous
UNDO's LSN) under log's lock (like UNDOs set trn->undo_lsn under log's
lock), and also update, if appropriate, state.records.
- reset share->id to 0 when deassigning; not useful for now but
sounds logical.
storage/maria/ma_recovery.c:
- if no table is found for a REDO, it's not an error; for an UNDO, it is
- in the REDO phase, when we see a CLR_END we must update trn->undo_lsn
and sometimes state.records.
- in the UNDO phase, when we execute an UNDO_ROW_INSERT:
* update trn->undo_lsn only after executing the record
* store the _previous_ undo_lsn into the CLR_END
- at the end of the REDO phase, when we recreate TRN objects, they
have already their long id in the log (either via a
LOGREC_LONG_TRANSACTION_ID, or in a checkpoint record), don't write
a new, useless LOGREC_LONG_TRANSACTION_ID for them.
storage/maria/ma_test1.c:
* where to stop execution is now driven by --testflag and not --test-undo
(ma_test2 already has --testflag for the same purpose). This allows
us to do a clean stop (with commit) at any point.
* --test-undo=# tells how to abort (flush all pages (which implies
flushing log) or only log or nothing); all such "ways of crashing"
are tested in ma_test_recovery
storage/maria/ma_test_recovery:
* Testing execution of UNDOs, with and without BLOBs.
* Testing idempotency of REDOs.
* See @todo for a probable bug with BLOBs.
* maria_chk -rq instead of -r, as with -q it nicely stops on any
problem in the data file (like the checksum bug see comment of
ma_blockrec.c).
* Testing if log was written by UNDO phase (often expected),
not written by REDO phase (always expected).
* Less output on the screen, compares with expected output in the end.
* some shell thingies like "set --" and $# are courtesy of
Danny and Pekka.
storage/maria/maria_read_log.c:
when only displaying the records, don't do an UNDO phase
storage/maria/ma_test_recovery.expected:
This is the expected output of a great part of ma_test_recovery.
ma_test_recovery compares its output to the expected output
and tells if different.
If we look at this file it mentions differences in checksum
(normal, it's not recovered yet) and in records count
(getting a correct records' count when recovery starts on an
already existing table, like when testing rollback,
is coded but not yet pushed).
2007-09-06 16:04:36 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 15, NULL, write_hook_for_undo, NULL, 1,
|
|
|
|
"undo_key_delete", LOGREC_LAST_IN_GROUP, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_PREPARE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"prepare", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_PREPARE_WITH_UNDO_PURGE=
|
2007-09-04 21:52:32 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, LSN_STORE_SIZE, NULL, NULL, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"prepare_with_undo_purge", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_COMMIT=
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
{LOGRECTYPE_FIXEDLENGTH, 0, 0, NULL,
|
|
|
|
NULL, NULL, 0, "commit", LOGREC_IS_GROUP_ITSELF, NULL,
|
|
|
|
NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_COMMIT_WITH_UNDO_PURGE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_PSEUDOFIXEDLENGTH, 5, 5, NULL, NULL, NULL, 1,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"commit_with_undo_purge", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
static LOG_DESC INIT_LOGREC_CHECKPOINT=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"checkpoint", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_CREATE_TABLE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 1 + 2, NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_create_table", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_RENAME_TABLE=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_rename_table", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_DROP_TABLE=
|
2007-07-30 15:05:43 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_drop_table", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
static LOG_DESC INIT_LOGREC_REDO_DELETE_ALL=
|
|
|
|
{LOGRECTYPE_FIXEDLENGTH, FILEID_STORE_SIZE, FILEID_STORE_SIZE,
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
NULL, write_hook_for_redo_delete_all, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_delete_all", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_REDO_REPAIR_TABLE=
|
|
|
|
{LOGRECTYPE_FIXEDLENGTH, FILEID_STORE_SIZE + 4, FILEID_STORE_SIZE + 4,
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"redo_repair_table", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_FILE_ID=
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
{LOGRECTYPE_VARIABLE_LENGTH, 0, 2, NULL, write_hook_for_file_id, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"file_id", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
|
|
|
static LOG_DESC INIT_LOGREC_LONG_TRANSACTION_ID=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{LOGRECTYPE_FIXEDLENGTH, 6, 6, NULL, NULL, NULL, 0,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
"long_transaction_id", LOGREC_IS_GROUP_ITSELF, NULL, NULL};
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
const myf log_write_flags= MY_WME | MY_NABP | MY_WAIT_IF_FULL;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
2007-06-15 00:45:55 +02:00
|
|
|
static void loghandler_init()
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-09-04 21:52:32 +02:00
|
|
|
int i;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
log_record_type_descriptor[LOGREC_RESERVED_FOR_CHUNKS23]=
|
|
|
|
INIT_LOGREC_RESERVED_FOR_CHUNKS23;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_INSERT_ROW_HEAD]=
|
|
|
|
INIT_LOGREC_REDO_INSERT_ROW_HEAD;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_INSERT_ROW_TAIL]=
|
|
|
|
INIT_LOGREC_REDO_INSERT_ROW_TAIL;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_INSERT_ROW_BLOB]=
|
|
|
|
INIT_LOGREC_REDO_INSERT_ROW_BLOB;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_INSERT_ROW_BLOBS]=
|
|
|
|
INIT_LOGREC_REDO_INSERT_ROW_BLOBS;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_PURGE_ROW_HEAD]=
|
|
|
|
INIT_LOGREC_REDO_PURGE_ROW_HEAD;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_PURGE_ROW_TAIL]=
|
|
|
|
INIT_LOGREC_REDO_PURGE_ROW_TAIL;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_PURGE_BLOCKS]=
|
|
|
|
INIT_LOGREC_REDO_PURGE_BLOCKS;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_DELETE_ROW]=
|
|
|
|
INIT_LOGREC_REDO_DELETE_ROW;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_UPDATE_ROW_HEAD]=
|
|
|
|
INIT_LOGREC_REDO_UPDATE_ROW_HEAD;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_INDEX]=
|
|
|
|
INIT_LOGREC_REDO_INDEX;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_UNDELETE_ROW]=
|
|
|
|
INIT_LOGREC_REDO_UNDELETE_ROW;
|
|
|
|
log_record_type_descriptor[LOGREC_CLR_END]=
|
|
|
|
INIT_LOGREC_CLR_END;
|
|
|
|
log_record_type_descriptor[LOGREC_PURGE_END]=
|
|
|
|
INIT_LOGREC_PURGE_END;
|
|
|
|
log_record_type_descriptor[LOGREC_UNDO_ROW_INSERT]=
|
|
|
|
INIT_LOGREC_UNDO_ROW_INSERT;
|
|
|
|
log_record_type_descriptor[LOGREC_UNDO_ROW_DELETE]=
|
|
|
|
INIT_LOGREC_UNDO_ROW_DELETE;
|
|
|
|
log_record_type_descriptor[LOGREC_UNDO_ROW_UPDATE]=
|
|
|
|
INIT_LOGREC_UNDO_ROW_UPDATE;
|
|
|
|
log_record_type_descriptor[LOGREC_UNDO_KEY_INSERT]=
|
|
|
|
INIT_LOGREC_UNDO_KEY_INSERT;
|
|
|
|
log_record_type_descriptor[LOGREC_UNDO_KEY_DELETE]=
|
|
|
|
INIT_LOGREC_UNDO_KEY_DELETE;
|
|
|
|
log_record_type_descriptor[LOGREC_PREPARE]=
|
|
|
|
INIT_LOGREC_PREPARE;
|
|
|
|
log_record_type_descriptor[LOGREC_PREPARE_WITH_UNDO_PURGE]=
|
|
|
|
INIT_LOGREC_PREPARE_WITH_UNDO_PURGE;
|
|
|
|
log_record_type_descriptor[LOGREC_COMMIT]=
|
|
|
|
INIT_LOGREC_COMMIT;
|
|
|
|
log_record_type_descriptor[LOGREC_COMMIT_WITH_UNDO_PURGE]=
|
|
|
|
INIT_LOGREC_COMMIT_WITH_UNDO_PURGE;
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
log_record_type_descriptor[LOGREC_CHECKPOINT]=
|
|
|
|
INIT_LOGREC_CHECKPOINT;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
log_record_type_descriptor[LOGREC_REDO_CREATE_TABLE]=
|
|
|
|
INIT_LOGREC_REDO_CREATE_TABLE;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_RENAME_TABLE]=
|
|
|
|
INIT_LOGREC_REDO_RENAME_TABLE;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_DROP_TABLE]=
|
|
|
|
INIT_LOGREC_REDO_DROP_TABLE;
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
log_record_type_descriptor[LOGREC_REDO_DELETE_ALL]=
|
|
|
|
INIT_LOGREC_REDO_DELETE_ALL;
|
|
|
|
log_record_type_descriptor[LOGREC_REDO_REPAIR_TABLE]=
|
|
|
|
INIT_LOGREC_REDO_REPAIR_TABLE;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
log_record_type_descriptor[LOGREC_FILE_ID]=
|
|
|
|
INIT_LOGREC_FILE_ID;
|
|
|
|
log_record_type_descriptor[LOGREC_LONG_TRANSACTION_ID]=
|
|
|
|
INIT_LOGREC_LONG_TRANSACTION_ID;
|
2007-09-04 21:52:32 +02:00
|
|
|
for (i= LOGREC_LONG_TRANSACTION_ID + 1;
|
|
|
|
i < LOGREC_NUMBER_OF_TYPES;
|
|
|
|
i++)
|
|
|
|
log_record_type_descriptor[i].class= LOGRECTYPE_NOT_ALLOWED;
|
|
|
|
DBUG_EXECUTE("info",
|
|
|
|
check_translog_description_table(LOGREC_LONG_TRANSACTION_ID););
|
2007-02-02 08:41:32 +01:00
|
|
|
};
|
|
|
|
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
/* all possible flags page overheads */
|
|
|
|
static uint page_overhead[TRANSLOG_FLAGS_NUM];
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
typedef struct st_translog_validator_data
|
|
|
|
{
|
|
|
|
TRANSLOG_ADDRESS *addr;
|
|
|
|
my_bool was_recovered;
|
|
|
|
} TRANSLOG_VALIDATOR_DATA;
|
|
|
|
|
|
|
|
|
|
|
|
const char *maria_data_root;
|
|
|
|
|
|
|
|
|
2007-02-21 22:30:58 +01:00
|
|
|
/*
|
|
|
|
Check cursor/buffer consistence
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_check_cursor
|
|
|
|
cursor cursor which will be checked
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
static void translog_check_cursor(struct st_buffer_cursor *cursor)
|
|
|
|
{
|
|
|
|
DBUG_ASSERT(cursor->chaser ||
|
|
|
|
((ulong) (cursor->ptr - cursor->buffer->buffer) ==
|
|
|
|
cursor->buffer->size));
|
|
|
|
DBUG_ASSERT(cursor->buffer->buffer_no == cursor->buffer_no);
|
|
|
|
DBUG_ASSERT((cursor->ptr -cursor->buffer->buffer) %TRANSLOG_PAGE_SIZE ==
|
|
|
|
cursor->current_page_fill % TRANSLOG_PAGE_SIZE);
|
|
|
|
DBUG_ASSERT(cursor->current_page_fill <= TRANSLOG_PAGE_SIZE);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
/*
|
|
|
|
Get file name of the log by log number
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_filename_by_fileno()
|
|
|
|
file_no Number of the log we want to open
|
|
|
|
path Pointer to buffer where file name will be
|
2007-09-27 16:41:21 +02:00
|
|
|
stored (must be FN_REFLEN bytes at least)
|
2007-02-02 08:41:32 +01:00
|
|
|
RETURN
|
|
|
|
pointer to path
|
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
static char *translog_filename_by_fileno(uint32 file_no, char *path)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-09-27 16:41:21 +02:00
|
|
|
char buff[11], *end;
|
|
|
|
uint length;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_filename_by_fileno");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(file_no <= 0xfffffff);
|
2007-09-27 16:41:21 +02:00
|
|
|
|
|
|
|
/* log_descriptor.directory is already formated */
|
|
|
|
end= strxmov(path, log_descriptor.directory, "maria_log.0000000", NullS);
|
|
|
|
length= (uint) (int10_to_str(file_no, buff, 10) - buff);
|
|
|
|
strmov(end-length+1, buff);
|
|
|
|
|
2007-09-27 20:45:45 +02:00
|
|
|
DBUG_PRINT("info", ("Path: '%s' path: 0x%lx", path, (ulong) path));
|
2007-09-27 16:41:21 +02:00
|
|
|
DBUG_RETURN(path);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Open log file with given number without cache
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
open_logfile_by_number_no_cache()
|
|
|
|
file_no Number of the log we want to open
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
-1 error
|
|
|
|
# file descriptor number
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
static File open_logfile_by_number_no_cache(uint32 file_no)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
File file;
|
|
|
|
char path[FN_REFLEN];
|
|
|
|
DBUG_ENTER("open_logfile_by_number_no_cache");
|
|
|
|
|
2007-04-25 23:19:56 +02:00
|
|
|
/* TODO: add O_DIRECT to open flags (when buffer is aligned) */
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/* TODO: use my_create() */
|
2007-02-19 22:01:27 +01:00
|
|
|
if ((file= my_open(translog_filename_by_fileno(file_no, path),
|
|
|
|
O_CREAT | O_BINARY | O_RDWR,
|
2007-02-02 08:41:32 +01:00
|
|
|
MYF(MY_WME))) < 0)
|
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("Error %d during opening file '%s'", errno, path));
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_RETURN(-1);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("File: '%s' handler: %d", path, file));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(file);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Write log file page header in the just opened new log file
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_write_file_header();
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
NOTES
|
|
|
|
First page is just a marker page; We don't store any real log data in it.
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
RETURN
|
|
|
|
0 OK
|
|
|
|
1 ERROR
|
|
|
|
*/
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
uchar NEAR maria_trans_file_magic[]=
|
|
|
|
{ (uchar) 254, (uchar) 254, (uchar) 11, '\001', 'M', 'A', 'R', 'I', 'A',
|
|
|
|
'L', 'O', 'G' };
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
static my_bool translog_write_file_header()
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
ulonglong timestamp;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar page_buff[TRANSLOG_PAGE_SIZE], *page= page_buff;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_write_file_header");
|
|
|
|
|
|
|
|
/* file tag */
|
2007-02-19 22:01:27 +01:00
|
|
|
memcpy(page, maria_trans_file_magic, sizeof(maria_trans_file_magic));
|
|
|
|
page+= sizeof(maria_trans_file_magic);
|
2007-02-02 08:41:32 +01:00
|
|
|
/* timestamp */
|
|
|
|
timestamp= my_getsystime();
|
2007-02-19 22:01:27 +01:00
|
|
|
int8store(page, timestamp);
|
|
|
|
page+= 8;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* maria version */
|
2007-02-19 22:01:27 +01:00
|
|
|
int4store(page, TRANSLOG_VERSION_ID);
|
|
|
|
page+= 4;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* mysql version (MYSQL_VERSION_ID) */
|
2007-02-19 22:01:27 +01:00
|
|
|
int4store(page, log_descriptor.server_version);
|
|
|
|
page+= 4;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* server ID */
|
2007-02-19 22:01:27 +01:00
|
|
|
int4store(page, log_descriptor.server_id);
|
|
|
|
page+= 4;
|
|
|
|
/* loghandler page_size/DISK_DRIVE_SECTOR_SIZE */
|
|
|
|
int2store(page, TRANSLOG_PAGE_SIZE / DISK_DRIVE_SECTOR_SIZE);
|
|
|
|
page+= 2;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* file number */
|
2007-02-19 22:01:27 +01:00
|
|
|
int3store(page, LSN_FILE_NO(log_descriptor.horizon));
|
|
|
|
page+= 3;
|
2007-08-19 21:27:43 +02:00
|
|
|
/*
|
|
|
|
Here should be max lsn storing for current file (which is LSN_IPOSSIBLE):
|
|
|
|
lsn_store(page, LSN_IPOSSIBLE);
|
|
|
|
page+= LSN_STORE_SIZE;
|
|
|
|
But it is zeros so we can rely on bzero() in this case
|
|
|
|
*/
|
2007-02-19 22:01:27 +01:00
|
|
|
bzero(page, sizeof(page_buff) - (page- page_buff));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_RETURN(my_pwrite(log_descriptor.log_file_num[0], page_buff,
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
sizeof(page_buff), 0, log_write_flags) != 0);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
2007-08-19 21:27:43 +02:00
|
|
|
/*
|
|
|
|
@brief write the new LSN on the given file header
|
|
|
|
|
|
|
|
@param file The file descriptor
|
|
|
|
@param lsn That LSN which should be written
|
|
|
|
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_max_lsn_to_header(File file, LSN lsn)
|
|
|
|
{
|
|
|
|
uchar lsn_buff[LSN_STORE_SIZE];
|
|
|
|
DBUG_ENTER("translog_max_lsn_to_header");
|
|
|
|
DBUG_PRINT("enter", ("File descriptor: %ld "
|
|
|
|
"lsn: (%lu,0x%lx)",
|
|
|
|
(long) file,
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(lsn)));
|
2007-08-19 21:27:43 +02:00
|
|
|
|
|
|
|
lsn_store(lsn_buff, lsn);
|
|
|
|
|
|
|
|
DBUG_RETURN(my_pwrite(file, lsn_buff,
|
|
|
|
LSN_STORE_SIZE,
|
|
|
|
(sizeof(maria_trans_file_magic) +
|
|
|
|
8 + 4 + 4 + 4 + 2 + 3),
|
|
|
|
log_write_flags) != 0 ||
|
|
|
|
my_sync(file, MYF(MY_WME)) != 0);
|
|
|
|
}
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-06-25 09:07:46 +02:00
|
|
|
/*
|
|
|
|
Information from transaction log file header
|
|
|
|
*/
|
|
|
|
|
|
|
|
typedef struct st_loghandler_file_info
|
|
|
|
{
|
2007-08-19 21:27:43 +02:00
|
|
|
/*
|
2007-08-31 08:22:15 +02:00
|
|
|
LSN_IMPOSSIBLE for current file and max LSN which parts stored in the
|
2007-08-19 21:27:43 +02:00
|
|
|
file for all other (finished) files.
|
|
|
|
*/
|
|
|
|
LSN max_lsn;
|
2007-06-25 09:07:46 +02:00
|
|
|
ulonglong timestamp; /* Time stamp */
|
|
|
|
ulong maria_version; /* Version of maria loghandler */
|
|
|
|
ulong mysql_versiob; /* Version of mysql server */
|
|
|
|
ulong server_id; /* Server ID */
|
|
|
|
uint page_size; /* Loghandler page size */
|
|
|
|
uint file_number; /* Number of the file (from the file header) */
|
|
|
|
} LOGHANDLER_FILE_INFO;
|
|
|
|
|
|
|
|
/*
|
2007-08-19 21:27:43 +02:00
|
|
|
@brief Read hander file information from loghandler file
|
2007-06-25 09:07:46 +02:00
|
|
|
|
|
|
|
@param desc header information descriptor to be filled with information
|
2007-08-19 21:27:43 +02:00
|
|
|
@param file file descriptor to read
|
2007-06-25 09:07:46 +02:00
|
|
|
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
|
|
|
*/
|
|
|
|
|
2007-08-19 21:27:43 +02:00
|
|
|
#define LOG_HEADER_DATA_SIZE (sizeof(maria_trans_file_magic) + \
|
|
|
|
8 + 4 + 4 + 4 + 2 + 3 + \
|
|
|
|
LSN_STORE_SIZE)
|
|
|
|
|
|
|
|
my_bool translog_read_file_header(LOGHANDLER_FILE_INFO *desc, File file)
|
2007-06-25 09:07:46 +02:00
|
|
|
{
|
2007-08-19 21:27:43 +02:00
|
|
|
uchar page_buff[LOG_HEADER_DATA_SIZE], *ptr;
|
2007-06-25 09:07:46 +02:00
|
|
|
DBUG_ENTER("translog_read_file_header");
|
|
|
|
|
2007-08-19 21:27:43 +02:00
|
|
|
if (my_pread(file, page_buff,
|
2007-06-25 09:07:46 +02:00
|
|
|
sizeof(page_buff), 0, MYF(MY_FNABP | MY_WME)))
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("log read fail error: %d", my_errno));
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
ptr= page_buff + sizeof(maria_trans_file_magic);
|
|
|
|
desc->timestamp= uint8korr(ptr);
|
|
|
|
ptr+= 8;
|
|
|
|
desc->maria_version= uint4korr(ptr);
|
|
|
|
ptr+= 4;
|
|
|
|
desc->mysql_versiob= uint4korr(ptr);
|
|
|
|
ptr+= 4;
|
|
|
|
desc->server_id= uint4korr(ptr);
|
2007-08-19 21:27:43 +02:00
|
|
|
ptr+= 4;
|
2007-06-25 09:07:46 +02:00
|
|
|
desc->page_size= uint2korr(ptr);
|
|
|
|
ptr+= 2;
|
|
|
|
desc->file_number= uint3korr(ptr);
|
2007-08-19 21:27:43 +02:00
|
|
|
ptr+=3;
|
|
|
|
desc->max_lsn= lsn_korr(ptr);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
@brief set the lsn to the files from_file - to_file if it is greater
|
|
|
|
then written in the file
|
|
|
|
|
|
|
|
@param from_file first file number (min)
|
|
|
|
@param to_file last file number (max)
|
|
|
|
@param lsn the lsn for writing
|
|
|
|
@param is_locked true if current thread locked the log handler
|
|
|
|
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
|
|
|
*/
|
|
|
|
|
2007-08-27 20:55:39 +02:00
|
|
|
static my_bool translog_set_lsn_for_files(uint32 from_file, uint32 to_file,
|
2007-08-19 21:27:43 +02:00
|
|
|
LSN lsn, my_bool is_locked)
|
|
|
|
{
|
2007-08-27 20:55:39 +02:00
|
|
|
uint32 file;
|
2007-08-19 21:27:43 +02:00
|
|
|
DBUG_ENTER("translog_set_lsn_for_files");
|
|
|
|
DBUG_PRINT("enter", ("From: %lu to: %lu lsn: (%lu,0x%lx) locked: %d",
|
2007-08-27 20:55:39 +02:00
|
|
|
(ulong) from_file, (ulong) to_file,
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(lsn),
|
2007-08-19 21:27:43 +02:00
|
|
|
is_locked));
|
|
|
|
DBUG_ASSERT(from_file <= to_file);
|
|
|
|
DBUG_ASSERT(from_file > 0); /* we have not file 0 */
|
|
|
|
|
|
|
|
/* Checks the current file (not finished yet file) */
|
|
|
|
if (!is_locked)
|
|
|
|
translog_lock();
|
2007-08-27 20:55:39 +02:00
|
|
|
if (to_file == (uint32) LSN_FILE_NO(log_descriptor.horizon))
|
2007-08-19 21:27:43 +02:00
|
|
|
{
|
|
|
|
if (likely(cmp_translog_addr(lsn, log_descriptor.max_lsn) > 0))
|
|
|
|
log_descriptor.max_lsn= lsn;
|
|
|
|
to_file--;
|
|
|
|
}
|
|
|
|
if (!is_locked)
|
|
|
|
translog_unlock();
|
|
|
|
|
|
|
|
/* Checks finished files if they are */
|
|
|
|
pthread_mutex_lock(&log_descriptor.file_header_lock);
|
|
|
|
for (file= from_file; file <= to_file; file++)
|
|
|
|
{
|
|
|
|
LOGHANDLER_FILE_INFO info;
|
|
|
|
File fd= open_logfile_by_number_no_cache(file);
|
|
|
|
if (fd < 0 ||
|
|
|
|
translog_read_file_header(&info, fd) ||
|
|
|
|
(cmp_translog_addr(lsn, info.max_lsn) > 0 &&
|
|
|
|
translog_max_lsn_to_header(fd, lsn)))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&log_descriptor.file_header_lock);
|
|
|
|
|
2007-06-25 09:07:46 +02:00
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-08-19 21:27:43 +02:00
|
|
|
/* descriptor of file in unfinished_files */
|
|
|
|
struct st_file_counter
|
|
|
|
{
|
2007-08-27 20:55:39 +02:00
|
|
|
uint32 file; /* file number */
|
|
|
|
uint32 counter; /* counter for started writes */
|
2007-08-19 21:27:43 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
@brief mark file "in progress" (for multi-group records)
|
|
|
|
|
|
|
|
@param file log file number
|
|
|
|
*/
|
|
|
|
|
2007-08-27 20:55:39 +02:00
|
|
|
static void translog_mark_file_unfinished(uint32 file)
|
2007-08-19 21:27:43 +02:00
|
|
|
{
|
|
|
|
int place, i;
|
|
|
|
struct st_file_counter fc, *fc_ptr;
|
|
|
|
fc.file= file; fc.counter= 1;
|
|
|
|
|
|
|
|
DBUG_ENTER("translog_mark_file_unfinished");
|
2007-08-27 20:55:39 +02:00
|
|
|
DBUG_PRINT("enter", ("file: %lu", (ulong) file));
|
2007-08-19 21:27:43 +02:00
|
|
|
|
|
|
|
pthread_mutex_lock(&log_descriptor.unfinished_files_lock);
|
|
|
|
|
|
|
|
if (log_descriptor.unfinished_files.elements == 0)
|
|
|
|
{
|
|
|
|
insert_dynamic(&log_descriptor.unfinished_files, (uchar*) &fc);
|
|
|
|
DBUG_PRINT("info", ("The first element inserted"));
|
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
2007-08-31 08:22:15 +02:00
|
|
|
for (place= log_descriptor.unfinished_files.elements - 1;
|
2007-08-19 21:27:43 +02:00
|
|
|
place >= 0;
|
|
|
|
place--)
|
|
|
|
{
|
|
|
|
fc_ptr= dynamic_element(&log_descriptor.unfinished_files,
|
|
|
|
place, struct st_file_counter *);
|
|
|
|
if (fc_ptr->file <= file)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (place >= 0 && fc_ptr->file == file)
|
|
|
|
{
|
|
|
|
fc_ptr->counter++;
|
|
|
|
DBUG_PRINT("info", ("counter increased"));
|
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (place == (int)log_descriptor.unfinished_files.elements)
|
|
|
|
{
|
|
|
|
insert_dynamic(&log_descriptor.unfinished_files, (uchar*) &fc);
|
|
|
|
DBUG_PRINT("info", ("The last element inserted"));
|
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
/* shift and assign new element */
|
|
|
|
insert_dynamic(&log_descriptor.unfinished_files,
|
|
|
|
(uchar*)
|
|
|
|
dynamic_element(&log_descriptor.unfinished_files,
|
|
|
|
log_descriptor.unfinished_files.elements- 1,
|
|
|
|
struct st_file_counter *));
|
|
|
|
for(i= log_descriptor.unfinished_files.elements - 1; i > place; i--)
|
|
|
|
{
|
|
|
|
/* we do not use set_dynamic() to avoid unneeded checks */
|
|
|
|
memcpy(dynamic_element(&log_descriptor.unfinished_files,
|
|
|
|
i, struct st_file_counter *),
|
|
|
|
dynamic_element(&log_descriptor.unfinished_files,
|
|
|
|
i + 1, struct st_file_counter *),
|
|
|
|
sizeof(struct st_file_counter));
|
|
|
|
}
|
|
|
|
memcpy(dynamic_element(&log_descriptor.unfinished_files,
|
|
|
|
place + 1, struct st_file_counter *),
|
|
|
|
&fc, sizeof(struct st_file_counter));
|
|
|
|
end:
|
|
|
|
pthread_mutex_unlock(&log_descriptor.unfinished_files_lock);
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
@brief remove file mark "in progress" (for multi-group records)
|
|
|
|
|
|
|
|
@param file log file number
|
|
|
|
*/
|
|
|
|
|
2007-08-27 20:55:39 +02:00
|
|
|
static void translog_mark_file_finished(uint32 file)
|
2007-08-19 21:27:43 +02:00
|
|
|
{
|
|
|
|
int i;
|
|
|
|
struct st_file_counter *fc_ptr;
|
|
|
|
DBUG_ENTER("translog_mark_file_finished");
|
2007-08-27 20:55:39 +02:00
|
|
|
DBUG_PRINT("enter", ("file: %lu", (ulong) file));
|
2007-08-19 21:27:43 +02:00
|
|
|
|
2007-09-27 16:41:21 +02:00
|
|
|
LINT_INIT(fc_ptr);
|
|
|
|
|
2007-08-19 21:27:43 +02:00
|
|
|
pthread_mutex_lock(&log_descriptor.unfinished_files_lock);
|
|
|
|
|
|
|
|
DBUG_ASSERT(log_descriptor.unfinished_files.elements > 0);
|
|
|
|
for (i= 0;
|
|
|
|
i < (int) log_descriptor.unfinished_files.elements;
|
|
|
|
i++)
|
|
|
|
{
|
|
|
|
fc_ptr= dynamic_element(&log_descriptor.unfinished_files,
|
|
|
|
i, struct st_file_counter *);
|
|
|
|
if (fc_ptr->file == file)
|
|
|
|
{
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
DBUG_ASSERT(i < (int) log_descriptor.unfinished_files.elements);
|
|
|
|
|
|
|
|
if (! --fc_ptr->counter)
|
|
|
|
delete_dynamic_element(&log_descriptor.unfinished_files, i);
|
|
|
|
pthread_mutex_unlock(&log_descriptor.unfinished_files_lock);
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
@brief get max LSN of the record which parts stored in this file
|
|
|
|
|
|
|
|
@param file file number
|
|
|
|
|
|
|
|
@return requested LSN or LSN_IMPOSSIBLE/LSN_ERROR
|
|
|
|
@retval LSN_IMPOSSIBLE File is still not finished
|
|
|
|
@retval LSN_ERROR Error opening file
|
|
|
|
@retval # LSN of the record which parts stored in this file
|
|
|
|
*/
|
|
|
|
|
2007-08-27 20:55:39 +02:00
|
|
|
LSN translog_get_file_max_lsn_stored(uint32 file)
|
2007-08-19 21:27:43 +02:00
|
|
|
{
|
2007-08-27 20:55:39 +02:00
|
|
|
uint32 limit= FILENO_IMPOSSIBLE;
|
2007-08-19 21:27:43 +02:00
|
|
|
DBUG_ENTER("translog_get_file_max_lsn_stored");
|
2007-08-27 20:55:39 +02:00
|
|
|
DBUG_PRINT("enter", ("file: %lu", (ulong)file));
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-08-19 21:27:43 +02:00
|
|
|
|
|
|
|
pthread_mutex_lock(&log_descriptor.unfinished_files_lock);
|
|
|
|
|
|
|
|
/* find file with minimum file number "in progress" */
|
|
|
|
if (log_descriptor.unfinished_files.elements > 0)
|
|
|
|
{
|
|
|
|
struct st_file_counter *fc_ptr;
|
|
|
|
fc_ptr= dynamic_element(&log_descriptor.unfinished_files,
|
|
|
|
0, struct st_file_counter *);
|
|
|
|
limit= fc_ptr->file; /* minimal file number "in progress" */
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&log_descriptor.unfinished_files_lock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
if there is no "in progress file" then unfinished file is in progress
|
|
|
|
for sure
|
|
|
|
*/
|
|
|
|
if (limit == FILENO_IMPOSSIBLE)
|
|
|
|
{
|
|
|
|
TRANSLOG_ADDRESS horizon= translog_get_horizon();
|
|
|
|
limit= LSN_FILE_NO(horizon);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (file >= limit)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("The file in in progress"));
|
|
|
|
DBUG_RETURN(LSN_IMPOSSIBLE);
|
|
|
|
}
|
|
|
|
|
|
|
|
{
|
|
|
|
LOGHANDLER_FILE_INFO info;
|
|
|
|
File fd= open_logfile_by_number_no_cache(file);
|
|
|
|
if (fd < 0 ||
|
|
|
|
translog_read_file_header(&info, fd))
|
|
|
|
{
|
|
|
|
DBUG_PRINT("error", ("Can't read file header"));
|
|
|
|
DBUG_RETURN(LSN_ERROR);
|
|
|
|
}
|
|
|
|
DBUG_PRINT("error", ("Max lsn: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(info.max_lsn)));
|
2007-08-19 21:27:43 +02:00
|
|
|
DBUG_RETURN(info.max_lsn);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
/*
|
|
|
|
Initialize transaction log file buffer
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_buffer_init()
|
|
|
|
buffer The buffer to initialize
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
static my_bool translog_buffer_init(struct st_translog_buffer *buffer)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_buffer_init");
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
buffer->last_lsn= LSN_IMPOSSIBLE;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* This Buffer File */
|
2007-02-19 22:01:27 +01:00
|
|
|
buffer->file= -1;
|
2007-02-02 08:41:32 +01:00
|
|
|
buffer->overlay= 0;
|
|
|
|
/* IO cache for current log */
|
|
|
|
bzero(buffer->buffer, TRANSLOG_WRITE_BUFFER);
|
|
|
|
/* Buffer size */
|
|
|
|
buffer->size= 0;
|
|
|
|
/* cond of thread which is waiting for buffer filling */
|
|
|
|
buffer->waiting_filling_buffer.last_thread= 0;
|
|
|
|
/* Number of record which are in copy progress */
|
|
|
|
buffer->copy_to_buffer_in_progress= 0;
|
|
|
|
/* list of waiting buffer ready threads */
|
|
|
|
buffer->waiting_flush= 0;
|
|
|
|
/* lock for the buffer. Current buffer also lock the handler */
|
|
|
|
if (pthread_mutex_init(&buffer->mutex, MY_MUTEX_INIT_FAST))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Close transaction log file by descriptor
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_close_log_file()
|
|
|
|
file file descriptor
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_close_log_file(File file)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
int rc;
|
|
|
|
PAGECACHE_FILE fl;
|
|
|
|
fl.file= file;
|
2007-02-02 08:41:32 +01:00
|
|
|
flush_pagecache_blocks(log_descriptor.pagecache, &fl, FLUSH_RELEASE);
|
2007-02-19 22:01:27 +01:00
|
|
|
/*
|
|
|
|
Sync file when we close it
|
|
|
|
TODO: sync only we have changed the log
|
|
|
|
*/
|
|
|
|
rc= my_sync(file, MYF(MY_WME));
|
|
|
|
rc|= my_close(file, MYF(MY_WME));
|
|
|
|
return test(rc);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Create and fill header of new file
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_create_new_file()
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 OK
|
|
|
|
1 Error
|
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
static my_bool translog_create_new_file()
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
int i;
|
2007-02-12 13:23:43 +01:00
|
|
|
uint32 file_no= LSN_FILE_NO(log_descriptor.horizon);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_create_new_file");
|
|
|
|
|
2007-08-19 21:27:43 +02:00
|
|
|
/*
|
|
|
|
Writes max_lsn to the file header before finishing it (it is no need to
|
|
|
|
lock file header buffer because it is still unfinished file)
|
|
|
|
*/
|
|
|
|
translog_max_lsn_to_header(log_descriptor.log_file_num[0],
|
|
|
|
log_descriptor.max_lsn);
|
|
|
|
log_descriptor.max_lsn= LSN_IMPOSSIBLE;
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
if (log_descriptor.log_file_num[OPENED_FILES_NUM - 1] != -1 &&
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_close_log_file(log_descriptor.log_file_num[OPENED_FILES_NUM -
|
|
|
|
1]))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
for (i= OPENED_FILES_NUM - 1; i > 0; i--)
|
|
|
|
log_descriptor.log_file_num[i]= log_descriptor.log_file_num[i - 1];
|
|
|
|
|
|
|
|
if ((log_descriptor.log_file_num[0]=
|
2007-02-19 22:01:27 +01:00
|
|
|
open_logfile_by_number_no_cache(file_no)) == -1 ||
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_write_file_header())
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
if (ma_control_file_write_and_force(LSN_IMPOSSIBLE, file_no,
|
2007-02-02 08:41:32 +01:00
|
|
|
CONTROL_FILE_UPDATE_ONLY_LOGNO))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Lock the loghandler buffer
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_buffer_lock()
|
|
|
|
buffer This buffer which should be locked
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
static my_bool translog_buffer_lock(struct st_translog_buffer *buffer)
|
|
|
|
{
|
|
|
|
int res;
|
|
|
|
DBUG_ENTER("translog_buffer_lock");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter",
|
|
|
|
("Lock buffer #%u: (0x%lx) mutex: 0x%lx",
|
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
|
|
|
(ulong) &buffer->mutex));
|
2007-02-02 08:41:32 +01:00
|
|
|
res= (pthread_mutex_lock(&buffer->mutex) != 0);
|
|
|
|
DBUG_RETURN(res);
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
#define translog_buffer_lock(B) \
|
2007-06-09 13:52:17 +02:00
|
|
|
pthread_mutex_lock(&B->mutex)
|
2007-02-02 08:41:32 +01:00
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Unlock the loghandler buffer
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_buffer_unlock()
|
|
|
|
buffer This buffer which should be unlocked
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
static my_bool translog_buffer_unlock(struct st_translog_buffer *buffer)
|
|
|
|
{
|
|
|
|
int res;
|
|
|
|
DBUG_ENTER("translog_buffer_unlock");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Unlock buffer... #%u (0x%lx) "
|
|
|
|
"mutex: 0x%lx",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
|
|
|
(ulong) &buffer->mutex));
|
|
|
|
|
|
|
|
res= (pthread_mutex_unlock(&buffer->mutex) != 0);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Unlocked buffer... #%u: 0x%lx mutex: 0x%lx",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
|
|
|
(ulong) &buffer->mutex));
|
|
|
|
DBUG_RETURN(res);
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
#define translog_buffer_unlock(B) \
|
2007-06-09 13:52:17 +02:00
|
|
|
pthread_mutex_unlock(&B->mutex)
|
2007-02-02 08:41:32 +01:00
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
Write a header on the page
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_new_page_header()
|
|
|
|
horizon Where to write the page
|
|
|
|
cursor Where to write the page
|
|
|
|
|
|
|
|
NOTE
|
|
|
|
- space for page header should be checked before
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void translog_new_page_header(TRANSLOG_ADDRESS *horizon,
|
|
|
|
struct st_buffer_cursor *cursor)
|
|
|
|
{
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *ptr;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_ENTER("translog_new_page_header");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(cursor->ptr);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
cursor->protected= 0;
|
|
|
|
|
|
|
|
ptr= cursor->ptr;
|
|
|
|
/* Page number */
|
2007-02-12 13:23:43 +01:00
|
|
|
int3store(ptr, LSN_OFFSET(*horizon) / TRANSLOG_PAGE_SIZE);
|
|
|
|
ptr+= 3;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* File number */
|
2007-02-12 13:23:43 +01:00
|
|
|
int3store(ptr, LSN_FILE_NO(*horizon));
|
|
|
|
ptr+= 3;
|
2007-07-02 19:45:15 +02:00
|
|
|
*(ptr++)= (uchar) log_descriptor.flags;
|
2007-02-02 08:41:32 +01:00
|
|
|
if (log_descriptor.flags & TRANSLOG_PAGE_CRC)
|
|
|
|
{
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
DBUG_PRINT("info", ("write 0x11223344 CRC to (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(*horizon)));
|
2007-02-19 22:01:27 +01:00
|
|
|
/* This will be overwritten by real CRC; This is just for debugging */
|
2007-02-02 08:41:32 +01:00
|
|
|
int4store(ptr, 0x11223344);
|
|
|
|
#endif
|
2007-02-19 22:01:27 +01:00
|
|
|
/* CRC will be put when page is finished */
|
|
|
|
ptr+= CRC_LENGTH;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
if (log_descriptor.flags & TRANSLOG_SECTOR_PROTECTION)
|
|
|
|
{
|
|
|
|
time_t tm;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint16 tmp_time= time(&tm);
|
|
|
|
int2store(ptr, tmp_time);
|
|
|
|
ptr+= (TRANSLOG_PAGE_SIZE / DISK_DRIVE_SECTOR_SIZE) * 2;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint len= (ptr - cursor->ptr);
|
|
|
|
(*horizon)+= len; /* it is increasing of offset part of the address */
|
|
|
|
cursor->current_page_fill= len;
|
2007-02-02 08:41:32 +01:00
|
|
|
if (!cursor->chaser)
|
|
|
|
cursor->buffer->size+= len;
|
|
|
|
}
|
|
|
|
cursor->ptr= ptr;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("NewP buffer #%u: 0x%lx chaser: %d Size: %lu (%lu)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) cursor->buffer->buffer_no, (ulong) cursor->buffer,
|
|
|
|
cursor->chaser, (ulong) cursor->buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (cursor->ptr - cursor->buffer->buffer)));
|
2007-02-21 22:30:58 +01:00
|
|
|
DBUG_EXECUTE("info", translog_check_cursor(cursor););
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Put sector protection on the page image
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_put_sector_protection()
|
|
|
|
page reference on the page content
|
|
|
|
cursor cursor of the buffer
|
2007-02-19 22:01:27 +01:00
|
|
|
|
|
|
|
NOTES
|
|
|
|
We put a sector protection on all following sectors on the page,
|
|
|
|
except the first sector that is protected by page header.
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static void translog_put_sector_protection(uchar *page,
|
2007-02-02 08:41:32 +01:00
|
|
|
struct st_buffer_cursor *cursor)
|
|
|
|
{
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *table= page + log_descriptor.page_overhead -
|
2007-02-19 22:01:27 +01:00
|
|
|
(TRANSLOG_PAGE_SIZE / DISK_DRIVE_SECTOR_SIZE) * 2;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint16 value= uint2korr(table) + cursor->write_counter;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint16 last_protected_sector= ((cursor->previous_offset - 1) /
|
|
|
|
DISK_DRIVE_SECTOR_SIZE);
|
|
|
|
uint16 start_sector= cursor->previous_offset / DISK_DRIVE_SECTOR_SIZE;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint i, offset;
|
|
|
|
DBUG_ENTER("translog_put_sector_protection");
|
2007-02-19 22:01:27 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
if (start_sector == 0)
|
2007-02-19 22:01:27 +01:00
|
|
|
start_sector= 1; /* First sector is protected */
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Write counter:%u value:%u offset:%u, "
|
|
|
|
"last protected:%u start sector:%u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) cursor->write_counter,
|
|
|
|
(uint) value,
|
|
|
|
(uint) cursor->previous_offset,
|
|
|
|
(uint) last_protected_sector, (uint) start_sector));
|
|
|
|
if (last_protected_sector == start_sector)
|
|
|
|
{
|
|
|
|
i= last_protected_sector * 2;
|
2007-02-19 22:01:27 +01:00
|
|
|
offset= last_protected_sector * DISK_DRIVE_SECTOR_SIZE;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* restore data, because we modified sector which was protected */
|
|
|
|
if (offset < cursor->previous_offset)
|
|
|
|
page[offset]= table[i];
|
|
|
|
offset++;
|
|
|
|
if (offset < cursor->previous_offset)
|
|
|
|
page[offset]= table[i + 1];
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
for (i= start_sector * 2, offset= start_sector * DISK_DRIVE_SECTOR_SIZE;
|
|
|
|
i < (TRANSLOG_PAGE_SIZE / DISK_DRIVE_SECTOR_SIZE) * 2;
|
|
|
|
(i+= 2), (offset+= DISK_DRIVE_SECTOR_SIZE))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("sector:%u offset:%u data 0x%x%x",
|
2007-02-02 08:41:32 +01:00
|
|
|
i / 2, offset, (uint) page[offset],
|
|
|
|
(uint) page[offset + 1]));
|
|
|
|
table[i]= page[offset];
|
|
|
|
table[i + 1]= page[offset + 1];
|
|
|
|
int2store(page + offset, value);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("sector:%u offset:%u data 0x%x%x",
|
2007-02-02 08:41:32 +01:00
|
|
|
i / 2, offset, (uint) page[offset],
|
|
|
|
(uint) page[offset + 1]));
|
|
|
|
}
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
Calculate CRC32 of given area
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
SYNOPSIS
|
2007-02-19 22:01:27 +01:00
|
|
|
translog_crc()
|
2007-02-02 08:41:32 +01:00
|
|
|
area Pointer of the area beginning
|
|
|
|
length The Area length
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
CRC32
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static uint32 translog_crc(uchar *area, uint length)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-08-13 14:17:49 +02:00
|
|
|
DBUG_ENTER("translog_crc");
|
|
|
|
DBUG_RETURN(crc32(0L, (unsigned char*) area, length));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Finish current page with zeros
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_finish_page()
|
|
|
|
horizon \ horizon & buffer pointers
|
|
|
|
cursor /
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void translog_finish_page(TRANSLOG_ADDRESS *horizon,
|
|
|
|
struct st_buffer_cursor *cursor)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint16 left= TRANSLOG_PAGE_SIZE - cursor->current_page_fill;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *page= cursor->ptr -cursor->current_page_fill;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_finish_page");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Buffer: #%u 0x%lx "
|
|
|
|
"Buffer addr: (%lu,0x%lx) "
|
|
|
|
"Page addr: (%lu,0x%lx) "
|
|
|
|
"size:%lu (%lu) Pg:%u left:%u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) cursor->buffer_no, (ulong) cursor->buffer,
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(cursor->buffer->offset),
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_FILE_NO(*horizon),
|
|
|
|
(ulong) (LSN_OFFSET(*horizon) -
|
2007-02-19 22:01:27 +01:00
|
|
|
cursor->current_page_fill),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) cursor->buffer->size,
|
|
|
|
(ulong) (cursor->ptr -cursor->buffer->buffer),
|
2007-02-19 22:01:27 +01:00
|
|
|
(uint) cursor->current_page_fill, (uint) left));
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_FILE_NO(*horizon) == LSN_FILE_NO(cursor->buffer->offset));
|
2007-02-21 22:30:58 +01:00
|
|
|
DBUG_EXECUTE("info", translog_check_cursor(cursor););
|
2007-02-02 08:41:32 +01:00
|
|
|
if (cursor->protected)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("Already protected and finished"));
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
cursor->protected= 1;
|
|
|
|
|
|
|
|
DBUG_ASSERT(left < TRANSLOG_PAGE_SIZE);
|
|
|
|
if (left != 0)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("left: %u", (uint) left));
|
2007-02-02 08:41:32 +01:00
|
|
|
bzero(cursor->ptr, left);
|
|
|
|
cursor->ptr +=left;
|
2007-02-19 22:01:27 +01:00
|
|
|
(*horizon)+= left; /* offset increasing */
|
2007-02-02 08:41:32 +01:00
|
|
|
if (!cursor->chaser)
|
|
|
|
cursor->buffer->size+= left;
|
2007-02-19 22:01:27 +01:00
|
|
|
cursor->current_page_fill= 0;
|
|
|
|
DBUG_PRINT("info", ("Finish Page buffer #%u: 0x%lx "
|
|
|
|
"chaser: %d Size: %lu (%lu)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) cursor->buffer->buffer_no,
|
|
|
|
(ulong) cursor->buffer, cursor->chaser,
|
|
|
|
(ulong) cursor->buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (cursor->ptr - cursor->buffer->buffer)));
|
2007-02-21 22:30:58 +01:00
|
|
|
DBUG_EXECUTE("info", translog_check_cursor(cursor););
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
if (page[TRANSLOG_PAGE_FLAGS] & TRANSLOG_SECTOR_PROTECTION)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
translog_put_sector_protection(page, cursor);
|
|
|
|
DBUG_PRINT("info", ("drop write_counter"));
|
|
|
|
cursor->write_counter= 0;
|
|
|
|
cursor->previous_offset= 0;
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
if (page[TRANSLOG_PAGE_FLAGS] & TRANSLOG_PAGE_CRC)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint32 crc= translog_crc(page + log_descriptor.page_overhead,
|
|
|
|
TRANSLOG_PAGE_SIZE -
|
|
|
|
log_descriptor.page_overhead);
|
|
|
|
DBUG_PRINT("info", ("CRC: %lx", (ulong) crc));
|
|
|
|
/* We have page number, file number and flag before crc */
|
2007-02-02 08:41:32 +01:00
|
|
|
int4store(page + 3 + 3 + 1, crc);
|
|
|
|
}
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
Wait until all thread finish filling this buffer
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_wait_for_writers()
|
|
|
|
buffer This buffer should be check
|
|
|
|
|
|
|
|
NOTE
|
|
|
|
This buffer should be locked
|
|
|
|
*/
|
2007-02-12 13:23:43 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
static void translog_wait_for_writers(struct st_translog_buffer *buffer)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
struct st_my_thread_var *thread= my_thread_var;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_wait_for_writers");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Buffer #%u 0x%lx copies in progress: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
|
|
|
(int) buffer->copy_to_buffer_in_progress));
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
while (buffer->copy_to_buffer_in_progress)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("wait for writers... "
|
|
|
|
"buffer: #%u 0x%lx "
|
2007-02-02 08:41:32 +01:00
|
|
|
"mutex: 0x%lx",
|
2007-02-12 13:23:43 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) &buffer->mutex));
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(buffer->file != -1);
|
2007-02-02 08:41:32 +01:00
|
|
|
wqueue_add_and_wait(&buffer->waiting_filling_buffer, thread,
|
|
|
|
&buffer->mutex);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("wait for writers done "
|
|
|
|
"buffer: #%u 0x%lx "
|
2007-02-02 08:41:32 +01:00
|
|
|
"mutex: 0x%lx",
|
2007-02-12 13:23:43 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) &buffer->mutex));
|
2007-02-19 22:01:27 +01:00
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
Wait for buffer to become free
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_wait_for_buffer_free()
|
2007-02-19 22:01:27 +01:00
|
|
|
buffer The buffer we are waiting for
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
NOTE
|
|
|
|
- this buffer should be locked
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void translog_wait_for_buffer_free(struct st_translog_buffer *buffer)
|
|
|
|
{
|
|
|
|
struct st_my_thread_var *thread= my_thread_var;
|
|
|
|
DBUG_ENTER("translog_wait_for_buffer_free");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Buffer: #%u 0x%lx copies in progress: %u "
|
|
|
|
"File: %d size: 0x%lu",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
|
|
|
(int) buffer->copy_to_buffer_in_progress,
|
2007-02-19 22:01:27 +01:00
|
|
|
buffer->file, (ulong) buffer->size));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
translog_wait_for_writers(buffer);
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
while (buffer->file != -1)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("wait for writers... "
|
|
|
|
"buffer: #%u 0x%lx "
|
2007-02-02 08:41:32 +01:00
|
|
|
"mutex: 0x%lx",
|
2007-02-12 13:23:43 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) &buffer->mutex));
|
|
|
|
wqueue_add_and_wait(&buffer->waiting_filling_buffer, thread,
|
|
|
|
&buffer->mutex);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("wait for writers done. "
|
|
|
|
"buffer: #%u 0x%lx "
|
2007-02-02 08:41:32 +01:00
|
|
|
"mutex: 0x%lx",
|
2007-02-12 13:23:43 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) &buffer->mutex));
|
2007-02-19 22:01:27 +01:00
|
|
|
}
|
|
|
|
DBUG_ASSERT(buffer->copy_to_buffer_in_progress == 0);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
Initialize the cursor for a buffer
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_cursor_init()
|
|
|
|
buffer The buffer
|
|
|
|
cursor It's cursor
|
|
|
|
buffer_no Number of buffer
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void translog_cursor_init(struct st_buffer_cursor *cursor,
|
|
|
|
struct st_translog_buffer *buffer,
|
|
|
|
uint8 buffer_no)
|
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_cursor_init");
|
|
|
|
cursor->ptr= buffer->buffer;
|
|
|
|
cursor->buffer= buffer;
|
|
|
|
cursor->buffer_no= buffer_no;
|
2007-02-19 22:01:27 +01:00
|
|
|
cursor->current_page_fill= 0;
|
2007-02-02 08:41:32 +01:00
|
|
|
cursor->chaser= (cursor != &log_descriptor.bc);
|
|
|
|
cursor->write_counter= 0;
|
|
|
|
cursor->previous_offset= 0;
|
|
|
|
cursor->protected= 0;
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Initialize buffer for current file
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_start_buffer()
|
|
|
|
buffer The buffer
|
|
|
|
cursor It's cursor
|
|
|
|
buffer_no Number of buffer
|
|
|
|
*/
|
2007-02-12 13:23:43 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
static void translog_start_buffer(struct st_translog_buffer *buffer,
|
|
|
|
struct st_buffer_cursor *cursor,
|
2007-02-19 22:01:27 +01:00
|
|
|
uint buffer_no)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_start_buffer");
|
|
|
|
DBUG_PRINT("enter",
|
2007-02-19 22:01:27 +01:00
|
|
|
("Assign buffer: #%u (0x%lx) to file: %d offset: 0x%lx(%lu)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.log_file_num[0],
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_OFFSET(log_descriptor.horizon),
|
|
|
|
(ulong) LSN_OFFSET(log_descriptor.horizon)));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ASSERT(buffer_no == buffer->buffer_no);
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
buffer->last_lsn= LSN_IMPOSSIBLE;
|
2007-02-02 08:41:32 +01:00
|
|
|
buffer->offset= log_descriptor.horizon;
|
2007-08-13 14:17:49 +02:00
|
|
|
buffer->next_buffer_offset= LSN_IMPOSSIBLE;
|
2007-02-02 08:41:32 +01:00
|
|
|
buffer->file= log_descriptor.log_file_num[0];
|
|
|
|
buffer->overlay= 0;
|
|
|
|
buffer->size= 0;
|
|
|
|
translog_cursor_init(cursor, buffer, buffer_no);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("init cursor #%u: 0x%lx chaser: %d Size: %lu (%lu)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) cursor->buffer->buffer_no, (ulong) cursor->buffer,
|
|
|
|
cursor->chaser, (ulong) cursor->buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (cursor->ptr - cursor->buffer->buffer)));
|
2007-02-21 22:30:58 +01:00
|
|
|
DBUG_EXECUTE("info", translog_check_cursor(cursor););
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Switch to the next buffer in a chain
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_buffer_next()
|
|
|
|
horizon \ Pointers on current position in file and buffer
|
|
|
|
cursor /
|
|
|
|
next_file Also start new file
|
|
|
|
|
|
|
|
NOTE:
|
|
|
|
- loghandler should be locked
|
|
|
|
- after return new and old buffer still are locked
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_buffer_next(TRANSLOG_ADDRESS *horizon,
|
|
|
|
struct st_buffer_cursor *cursor,
|
|
|
|
my_bool new_file)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint old_buffer_no= cursor->buffer_no;
|
|
|
|
uint new_buffer_no= (old_buffer_no + 1) % TRANSLOG_BUFFERS_NO;
|
2007-02-02 08:41:32 +01:00
|
|
|
struct st_translog_buffer *new_buffer= log_descriptor.buffers + new_buffer_no;
|
|
|
|
my_bool chasing= cursor->chaser;
|
|
|
|
DBUG_ENTER("translog_buffer_next");
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("horizon: (%lu,0x%lx) chasing: %d",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon), chasing));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_ASSERT(cmp_translog_addr(log_descriptor.horizon, *horizon) >= 0);
|
|
|
|
|
|
|
|
translog_finish_page(horizon, cursor);
|
|
|
|
|
|
|
|
if (!chasing)
|
|
|
|
{
|
|
|
|
translog_buffer_lock(new_buffer);
|
|
|
|
translog_wait_for_buffer_free(new_buffer);
|
|
|
|
}
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
else
|
|
|
|
DBUG_ASSERT(new_buffer->file != 0);
|
|
|
|
#endif
|
|
|
|
if (new_file)
|
|
|
|
{
|
2007-08-19 21:27:43 +02:00
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
/* move the horizon to the next file and its header page */
|
2007-02-19 22:01:27 +01:00
|
|
|
(*horizon)+= LSN_ONE_FILE;
|
|
|
|
(*horizon)= LSN_REPLACE_OFFSET(*horizon, TRANSLOG_PAGE_SIZE);
|
2007-02-02 08:41:32 +01:00
|
|
|
if (!chasing && translog_create_new_file())
|
|
|
|
{
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* prepare next page */
|
|
|
|
if (chasing)
|
|
|
|
translog_cursor_init(cursor, new_buffer, new_buffer_no);
|
|
|
|
else
|
|
|
|
translog_start_buffer(new_buffer, cursor, new_buffer_no);
|
2007-08-13 14:17:49 +02:00
|
|
|
log_descriptor.buffers[old_buffer_no].next_buffer_offset= new_buffer->offset;
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_new_page_header(horizon, cursor);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-08-13 14:17:49 +02:00
|
|
|
Sets max LSN sent to file, and address from which data is only in the buffer
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_set_sent_to_file()
|
|
|
|
lsn LSN to assign
|
2007-08-13 14:17:49 +02:00
|
|
|
in_buffers to assign to in_buffers_only
|
|
|
|
|
|
|
|
TODO: use atomic operations if possible (64bit architectures?)
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-08-13 14:17:49 +02:00
|
|
|
static void translog_set_sent_to_file(LSN lsn, TRANSLOG_ADDRESS in_buffers)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_set_sent_to_file");
|
|
|
|
pthread_mutex_lock(&log_descriptor.sent_to_file_lock);
|
2007-08-13 14:17:49 +02:00
|
|
|
DBUG_PRINT("enter", ("lsn: (%lu,0x%lx) in_buffers: (%lu,0x%lx) "
|
|
|
|
"in_buffers_only: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(lsn),
|
|
|
|
LSN_IN_PARTS(in_buffers),
|
|
|
|
LSN_IN_PARTS(log_descriptor.in_buffers_only)));
|
2007-08-13 14:17:49 +02:00
|
|
|
DBUG_ASSERT(cmp_translog_addr(lsn, log_descriptor.sent_to_file) >= 0);
|
|
|
|
log_descriptor.sent_to_file= lsn;
|
|
|
|
/* LSN_IMPOSSIBLE == 0 => it will work for very first time */
|
|
|
|
if (cmp_translog_addr(in_buffers, log_descriptor.in_buffers_only) > 0)
|
|
|
|
{
|
|
|
|
log_descriptor.in_buffers_only= in_buffers;
|
|
|
|
DBUG_PRINT("info", ("set new in_buffers_only"));
|
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
pthread_mutex_unlock(&log_descriptor.sent_to_file_lock);
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-08-13 14:17:49 +02:00
|
|
|
Sets address from which data is only in the buffer
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_set_only_in_buffers()
|
|
|
|
lsn LSN to assign
|
|
|
|
in_buffers to assign to in_buffers_only
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void translog_set_only_in_buffers(TRANSLOG_ADDRESS in_buffers)
|
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_set_only_in_buffers");
|
|
|
|
pthread_mutex_lock(&log_descriptor.sent_to_file_lock);
|
|
|
|
DBUG_PRINT("enter", ("in_buffers: (%lu,0x%lx) "
|
|
|
|
"in_buffers_only: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(in_buffers),
|
|
|
|
LSN_IN_PARTS(log_descriptor.in_buffers_only)));
|
2007-08-13 14:17:49 +02:00
|
|
|
/* LSN_IMPOSSIBLE == 0 => it will work for very first time */
|
|
|
|
if (cmp_translog_addr(in_buffers, log_descriptor.in_buffers_only) > 0)
|
|
|
|
{
|
|
|
|
log_descriptor.in_buffers_only= in_buffers;
|
|
|
|
DBUG_PRINT("info", ("set new in_buffers_only"));
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&log_descriptor.sent_to_file_lock);
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Gets address from which data is only in the buffer
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_only_in_buffers()
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
address from which data is only in the buffer
|
|
|
|
*/
|
|
|
|
|
|
|
|
static TRANSLOG_ADDRESS translog_only_in_buffers()
|
|
|
|
{
|
|
|
|
register TRANSLOG_ADDRESS addr;
|
|
|
|
DBUG_ENTER("translog_only_in_buffers");
|
|
|
|
pthread_mutex_lock(&log_descriptor.sent_to_file_lock);
|
|
|
|
addr= log_descriptor.in_buffers_only;
|
|
|
|
pthread_mutex_unlock(&log_descriptor.sent_to_file_lock);
|
|
|
|
DBUG_RETURN(addr);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Get max LSN sent to file
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_sent_to_file()
|
2007-08-13 14:17:49 +02:00
|
|
|
|
|
|
|
RETURN
|
|
|
|
max LSN send to file
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-08-13 14:17:49 +02:00
|
|
|
static LSN translog_get_sent_to_file()
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-08-13 14:17:49 +02:00
|
|
|
register LSN lsn;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_get_sent_to_file");
|
|
|
|
pthread_mutex_lock(&log_descriptor.sent_to_file_lock);
|
2007-08-13 14:17:49 +02:00
|
|
|
lsn= log_descriptor.sent_to_file;
|
2007-02-02 08:41:32 +01:00
|
|
|
pthread_mutex_unlock(&log_descriptor.sent_to_file_lock);
|
2007-08-13 14:17:49 +02:00
|
|
|
DBUG_RETURN(lsn);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Get first chunk address on the given page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_first_chunk_offset()
|
|
|
|
page The page where to find first chunk
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
first chunk offset
|
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static my_bool translog_get_first_chunk_offset(uchar *page)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
uint16 page_header= 7;
|
|
|
|
DBUG_ENTER("translog_get_first_chunk_offset");
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
if (page[TRANSLOG_PAGE_FLAGS] & TRANSLOG_PAGE_CRC)
|
2007-02-02 08:41:32 +01:00
|
|
|
page_header+= 4;
|
2007-02-19 22:01:27 +01:00
|
|
|
if (page[TRANSLOG_PAGE_FLAGS] & TRANSLOG_SECTOR_PROTECTION)
|
|
|
|
page_header+= (TRANSLOG_PAGE_SIZE / DISK_DRIVE_SECTOR_SIZE) * 2;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(page_header);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Write coded length of record
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_write_variable_record_1group_code_len
|
|
|
|
dst Destination buffer pointer
|
|
|
|
length Length which should be coded
|
|
|
|
header_len Calculated total header length
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void
|
2007-07-02 19:45:15 +02:00
|
|
|
translog_write_variable_record_1group_code_len(uchar *dst,
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_size_t length,
|
|
|
|
uint16 header_len)
|
|
|
|
{
|
|
|
|
switch (header_len) {
|
|
|
|
case 6: /* (5 + 1) */
|
|
|
|
DBUG_ASSERT(length <= 250);
|
|
|
|
*dst= (uint8) length;
|
|
|
|
return;
|
|
|
|
case 8: /* (5 + 3) */
|
|
|
|
DBUG_ASSERT(length <= 0xFFFF);
|
|
|
|
*dst= 251;
|
|
|
|
int2store(dst + 1, length);
|
|
|
|
return;
|
|
|
|
case 9: /* (5 + 4) */
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(length <= (ulong) 0xFFFFFF);
|
2007-02-02 08:41:32 +01:00
|
|
|
*dst= 252;
|
|
|
|
int3store(dst + 1, length);
|
|
|
|
return;
|
|
|
|
case 10: /* (5 + 5) */
|
|
|
|
*dst= 253;
|
|
|
|
int4store(dst + 1, length);
|
|
|
|
return;
|
|
|
|
default:
|
|
|
|
DBUG_ASSERT(0);
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Decode record data length and advance given pointer to the next field
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_variable_record_1group_decode_len()
|
|
|
|
src The pointer to the pointer to the length beginning
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
decoded length
|
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static translog_size_t translog_variable_record_1group_decode_len(uchar **src)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
uint8 first= (uint8) (**src);
|
|
|
|
switch (first) {
|
|
|
|
case 251:
|
2007-02-19 22:01:27 +01:00
|
|
|
(*src)+= 3;
|
2007-02-02 08:41:32 +01:00
|
|
|
return (uint2korr((*src) - 2));
|
|
|
|
case 252:
|
2007-02-19 22:01:27 +01:00
|
|
|
(*src)+= 4;
|
2007-02-02 08:41:32 +01:00
|
|
|
return (uint3korr((*src) - 3));
|
|
|
|
case 253:
|
2007-02-19 22:01:27 +01:00
|
|
|
(*src)+= 5;
|
2007-02-02 08:41:32 +01:00
|
|
|
return (uint4korr((*src) - 4));
|
|
|
|
case 254:
|
|
|
|
case 255:
|
|
|
|
DBUG_ASSERT(0); /* reserved for future use */
|
|
|
|
return (0);
|
|
|
|
default:
|
|
|
|
(*src)++;
|
|
|
|
return (first);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Get total length of this chunk (not only body)
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_total_chunk_length()
|
|
|
|
page The page where chunk placed
|
|
|
|
offset Offset of the chunk on this place
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
total length of the chunk
|
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static uint16 translog_get_total_chunk_length(uchar *page, uint16 offset)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_get_total_chunk_length");
|
|
|
|
switch (page[offset] & TRANSLOG_CHUNK_TYPE) {
|
2007-02-19 22:01:27 +01:00
|
|
|
case TRANSLOG_CHUNK_LSN:
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
/* 0 chunk referred as LSN (head or tail) */
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_size_t rec_len;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *start= page + offset;
|
|
|
|
uchar *ptr= start + 1 + 2;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint16 chunk_len, header_len, page_rest;
|
|
|
|
DBUG_PRINT("info", ("TRANSLOG_CHUNK_LSN"));
|
|
|
|
rec_len= translog_variable_record_1group_decode_len(&ptr);
|
|
|
|
chunk_len= uint2korr(ptr);
|
2007-02-19 22:01:27 +01:00
|
|
|
header_len= (ptr -start) + 2;
|
|
|
|
DBUG_PRINT("info", ("rec len: %lu chunk len: %u header len: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) rec_len, (uint) chunk_len, (uint) header_len));
|
|
|
|
if (chunk_len)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("chunk len: %u + %u = %u",
|
|
|
|
(uint) header_len, (uint) chunk_len,
|
|
|
|
(uint) (chunk_len + header_len)));
|
|
|
|
DBUG_RETURN(chunk_len + header_len);
|
|
|
|
}
|
|
|
|
page_rest= TRANSLOG_PAGE_SIZE - offset;
|
|
|
|
DBUG_PRINT("info", ("page_rest %u", (uint) page_rest));
|
|
|
|
if (rec_len + header_len < page_rest)
|
|
|
|
DBUG_RETURN(rec_len + header_len);
|
|
|
|
DBUG_RETURN(page_rest);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
case TRANSLOG_CHUNK_FIXED:
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *ptr;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint type= page[offset] & TRANSLOG_REC_TYPE;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint length;
|
|
|
|
int i;
|
|
|
|
/* 1 (pseudo)fixed record (also LSN) */
|
|
|
|
DBUG_PRINT("info", ("TRANSLOG_CHUNK_FIXED"));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ASSERT(log_record_type_descriptor[type].class ==
|
|
|
|
LOGRECTYPE_FIXEDLENGTH ||
|
|
|
|
log_record_type_descriptor[type].class ==
|
|
|
|
LOGRECTYPE_PSEUDOFIXEDLENGTH);
|
|
|
|
if (log_record_type_descriptor[type].class == LOGRECTYPE_FIXEDLENGTH)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info",
|
|
|
|
("Fixed length: %u",
|
|
|
|
(uint) (log_record_type_descriptor[type].fixed_length + 3)));
|
|
|
|
DBUG_RETURN(log_record_type_descriptor[type].fixed_length + 3);
|
|
|
|
}
|
2007-06-09 13:52:17 +02:00
|
|
|
|
|
|
|
ptr= page + offset + 3; /* first compressed LSN */
|
|
|
|
length= log_record_type_descriptor[type].fixed_length + 3;
|
|
|
|
for (i= 0; i < log_record_type_descriptor[type].compressed_LSN; i++)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-06-09 13:52:17 +02:00
|
|
|
/* first 2 bits is length - 2 */
|
|
|
|
uint len= ((((uint8) (*ptr)) & TRANSLOG_CLSN_LEN_BITS) >> 6) + 2;
|
2007-06-15 00:45:55 +02:00
|
|
|
if (ptr[0] == 0 && ((uint8) ptr[1]) == 1)
|
|
|
|
len+= LSN_STORE_SIZE; /* case of full LSN storing */
|
2007-06-09 13:52:17 +02:00
|
|
|
ptr+= len;
|
|
|
|
/* subtract economized bytes */
|
2007-06-15 00:45:55 +02:00
|
|
|
length-= (LSN_STORE_SIZE - len);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-06-09 13:52:17 +02:00
|
|
|
DBUG_PRINT("info", ("Pseudo-fixed length: %u", length));
|
|
|
|
DBUG_RETURN(length);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
case TRANSLOG_CHUNK_NOHDR:
|
|
|
|
/* 2 no header chunk (till page end) */
|
|
|
|
DBUG_PRINT("info", ("TRANSLOG_CHUNK_NOHDR length: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) (TRANSLOG_PAGE_SIZE - offset)));
|
|
|
|
DBUG_RETURN(TRANSLOG_PAGE_SIZE - offset);
|
|
|
|
case TRANSLOG_CHUNK_LNGTH: /* 3 chunk with chunk length */
|
|
|
|
DBUG_PRINT("info", ("TRANSLOG_CHUNK_LNGTH"));
|
|
|
|
DBUG_ASSERT(TRANSLOG_PAGE_SIZE - offset >= 3);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("length: %u", uint2korr(page + offset + 1) + 3));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(uint2korr(page + offset + 1) + 3);
|
|
|
|
default:
|
|
|
|
DBUG_ASSERT(0);
|
2007-06-09 13:52:17 +02:00
|
|
|
DBUG_RETURN(0);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Flush given buffer
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_buffer_flush()
|
|
|
|
buffer This buffer should be flushed
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_buffer_flush(struct st_translog_buffer *buffer)
|
|
|
|
{
|
|
|
|
uint32 i;
|
2007-06-11 11:52:40 +02:00
|
|
|
PAGECACHE_FILE file;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_buffer_flush");
|
|
|
|
DBUG_PRINT("enter",
|
2007-02-19 22:01:27 +01:00
|
|
|
("Buffer: #%u 0x%lx: "
|
|
|
|
"file: %d offset: (%lu,0x%lx) size: %lu",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
2007-02-19 22:01:27 +01:00
|
|
|
buffer->file,
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(buffer->offset),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) buffer->size));
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(buffer->file != -1);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
translog_wait_for_writers(buffer);
|
2007-02-19 22:01:27 +01:00
|
|
|
if (buffer->overlay && buffer->overlay->file != -1)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
struct st_translog_buffer *overlay= buffer->overlay;
|
|
|
|
translog_buffer_unlock(buffer);
|
|
|
|
translog_buffer_lock(overlay);
|
|
|
|
translog_wait_for_buffer_free(overlay);
|
|
|
|
translog_buffer_unlock(overlay);
|
|
|
|
translog_buffer_lock(buffer);
|
|
|
|
}
|
|
|
|
|
2007-06-11 11:52:40 +02:00
|
|
|
file.file= buffer->file;
|
2007-02-02 08:41:32 +01:00
|
|
|
for (i= 0; i < buffer->size; i+= TRANSLOG_PAGE_SIZE)
|
|
|
|
{
|
2007-08-13 14:17:49 +02:00
|
|
|
TRANSLOG_ADDRESS addr= (buffer->offset + i);
|
|
|
|
TRANSLOG_VALIDATOR_DATA data;
|
|
|
|
data.addr= &addr;
|
2007-04-04 22:37:09 +02:00
|
|
|
DBUG_ASSERT(log_descriptor.pagecache->block_size == TRANSLOG_PAGE_SIZE);
|
2007-06-11 11:52:40 +02:00
|
|
|
DBUG_ASSERT(i + TRANSLOG_PAGE_SIZE <= buffer->size);
|
2007-08-13 14:17:49 +02:00
|
|
|
if (pagecache_inject(log_descriptor.pagecache,
|
2007-02-02 08:41:32 +01:00
|
|
|
&file,
|
2007-02-12 13:23:43 +01:00
|
|
|
(LSN_OFFSET(buffer->offset) + i) / TRANSLOG_PAGE_SIZE,
|
2007-02-02 08:41:32 +01:00
|
|
|
3,
|
|
|
|
buffer->buffer + i,
|
|
|
|
PAGECACHE_PLAIN_PAGE,
|
|
|
|
PAGECACHE_LOCK_LEFT_UNLOCKED,
|
2007-08-13 14:17:49 +02:00
|
|
|
PAGECACHE_PIN_LEFT_UNPINNED, 0,
|
|
|
|
&translog_page_validator, (uchar*) &data))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
UNRECOVERABLE_ERROR(("Can't write page (%lu,0x%lx) to pagecache",
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) buffer->file,
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) (LSN_OFFSET(buffer->offset)+ i)));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
if (my_pwrite(buffer->file, (char*) buffer->buffer,
|
2007-02-12 13:23:43 +01:00
|
|
|
buffer->size, LSN_OFFSET(buffer->offset),
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
log_write_flags))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
UNRECOVERABLE_ERROR(("Can't write buffer (%lu,0x%lx) size %lu "
|
|
|
|
"to the disk (%d)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) buffer->file,
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_OFFSET(buffer->offset),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) buffer->size, errno));
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
|
2007-08-13 14:17:49 +02:00
|
|
|
if (LSN_OFFSET(buffer->last_lsn) != 0) /* if buffer->last_lsn is set */
|
|
|
|
translog_set_sent_to_file(buffer->last_lsn,
|
|
|
|
buffer->next_buffer_offset);
|
|
|
|
else
|
|
|
|
translog_set_only_in_buffers(buffer->next_buffer_offset);
|
2007-02-02 08:41:32 +01:00
|
|
|
/* Free buffer */
|
2007-02-19 22:01:27 +01:00
|
|
|
buffer->file= -1;
|
2007-02-02 08:41:32 +01:00
|
|
|
buffer->overlay= 0;
|
2007-02-19 22:01:27 +01:00
|
|
|
if (buffer->waiting_filling_buffer.last_thread)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
wqueue_release_queue(&buffer->waiting_filling_buffer);
|
|
|
|
}
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Recover page with sector protection (wipe out failed chunks)
|
|
|
|
|
|
|
|
SYNOPSYS
|
|
|
|
translog_recover_page_up_to_sector()
|
|
|
|
page reference on the page
|
|
|
|
offset offset of failed sector
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static my_bool translog_recover_page_up_to_sector(uchar *page, uint16 offset)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
uint16 chunk_offset= translog_get_first_chunk_offset(page), valid_chunk_end;
|
|
|
|
DBUG_ENTER("translog_recover_page_up_to_sector");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("offset: %u first chunk: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) offset, (uint) chunk_offset));
|
|
|
|
|
|
|
|
while (page[chunk_offset] != '\0' && chunk_offset < offset)
|
|
|
|
{
|
|
|
|
uint16 chunk_length;
|
|
|
|
if ((chunk_length=
|
|
|
|
translog_get_total_chunk_length(page, chunk_offset)) == 0)
|
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("cant get chunk length (offset %u)",
|
|
|
|
(uint) chunk_offset));
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("chunk: offset: %u length %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) chunk_offset, (uint) chunk_length));
|
|
|
|
if (((ulong) chunk_offset) + ((ulong) chunk_length) > TRANSLOG_PAGE_SIZE)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
UNRECOVERABLE_ERROR(("damaged chunk (offset %u) in trusted area",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) chunk_offset));
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
chunk_offset+= chunk_length;
|
|
|
|
}
|
|
|
|
|
|
|
|
valid_chunk_end= chunk_offset;
|
2007-02-19 22:01:27 +01:00
|
|
|
/* end of trusted area - sector parsing */
|
2007-02-02 08:41:32 +01:00
|
|
|
while (page[chunk_offset] != '\0')
|
|
|
|
{
|
|
|
|
uint16 chunk_length;
|
|
|
|
if ((chunk_length=
|
|
|
|
translog_get_total_chunk_length(page, chunk_offset)) == 0)
|
|
|
|
break;
|
2007-02-19 22:01:27 +01:00
|
|
|
|
|
|
|
DBUG_PRINT("info", ("chunk: offset: %u length %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) chunk_offset, (uint) chunk_length));
|
2007-02-19 22:01:27 +01:00
|
|
|
if (((ulong) chunk_offset) + ((ulong) chunk_length) >
|
|
|
|
(uint) (offset + DISK_DRIVE_SECTOR_SIZE))
|
2007-02-02 08:41:32 +01:00
|
|
|
break;
|
2007-02-19 22:01:27 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
chunk_offset+= chunk_length;
|
|
|
|
valid_chunk_end= chunk_offset;
|
|
|
|
}
|
|
|
|
DBUG_PRINT("info", ("valid chunk end offset: %u", (uint) valid_chunk_end));
|
|
|
|
|
|
|
|
bzero(page + valid_chunk_end, TRANSLOG_PAGE_SIZE - valid_chunk_end);
|
|
|
|
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Log page validator
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_page_validator()
|
|
|
|
page_addr The page to check
|
|
|
|
data data, need for validation (address in this case)
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
2007-07-02 19:45:15 +02:00
|
|
|
static my_bool translog_page_validator(uchar *page_addr, uchar* data_ptr)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint this_page_page_overhead;
|
|
|
|
uint flags;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *page= (uchar*) page_addr, *page_pos;
|
2007-02-19 22:01:27 +01:00
|
|
|
TRANSLOG_VALIDATOR_DATA *data= (TRANSLOG_VALIDATOR_DATA *) data_ptr;
|
|
|
|
TRANSLOG_ADDRESS addr= *(data->addr);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_page_validator");
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
data->was_recovered= 0;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
if (uint3korr(page) != LSN_OFFSET(addr) / TRANSLOG_PAGE_SIZE ||
|
|
|
|
uint3korr(page + 3) != LSN_FILE_NO(addr))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("Page (%lu,0x%lx): "
|
2007-02-19 22:01:27 +01:00
|
|
|
"page address written in the page is incorrect: "
|
2007-02-02 08:41:32 +01:00
|
|
|
"File %lu instead of %lu or page %lu instead of %lu",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(addr),
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) uint3korr(page + 3), (ulong) LSN_FILE_NO(addr),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) uint3korr(page),
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_OFFSET(addr) / TRANSLOG_PAGE_SIZE));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
flags= (uint)(page[TRANSLOG_PAGE_FLAGS]);
|
|
|
|
this_page_page_overhead= page_overhead[flags];
|
2007-02-02 08:41:32 +01:00
|
|
|
if (flags & ~(TRANSLOG_PAGE_CRC | TRANSLOG_SECTOR_PROTECTION |
|
|
|
|
TRANSLOG_RECORD_CRC))
|
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("Page (%lu,0x%lx): "
|
|
|
|
"Garbage in the page flags field detected : %x",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(addr), (uint) flags));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
page_pos= page + (3 + 3 + 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
if (flags & TRANSLOG_PAGE_CRC)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint32 crc= translog_crc(page + this_page_page_overhead,
|
|
|
|
TRANSLOG_PAGE_SIZE -
|
|
|
|
this_page_page_overhead);
|
|
|
|
if (crc != uint4korr(page_pos))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("Page (%lu,0x%lx): "
|
|
|
|
"CRC mismatch: calculated: %lx on the page %lx",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(addr),
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) crc, (ulong) uint4korr(page_pos)));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
page_pos+= CRC_LENGTH; /* Skip crc */
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
if (flags & TRANSLOG_SECTOR_PROTECTION)
|
|
|
|
{
|
|
|
|
uint i, offset;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *table= page_pos;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint16 current= uint2korr(table);
|
2007-02-19 22:01:27 +01:00
|
|
|
for (i= 2, offset= DISK_DRIVE_SECTOR_SIZE;
|
|
|
|
i < (TRANSLOG_PAGE_SIZE / DISK_DRIVE_SECTOR_SIZE) * 2;
|
|
|
|
i+= 2, offset+= DISK_DRIVE_SECTOR_SIZE)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
TODO: add chunk counting for "suspecting" sectors (difference is
|
|
|
|
more than 1-2)
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
uint16 test= uint2korr(page + offset);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("sector: #%u offset: %u current: %lx "
|
|
|
|
"read: 0x%x stored: 0x%x%x",
|
2007-02-12 13:23:43 +01:00
|
|
|
i / 2, offset, (ulong) current,
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) uint2korr(page + offset), (uint) table[i],
|
|
|
|
(uint) table[i + 1]));
|
2007-02-19 22:01:27 +01:00
|
|
|
if (((test < current) &&
|
|
|
|
(LL(0xFFFF) - current + test > DISK_DRIVE_SECTOR_SIZE / 3)) ||
|
|
|
|
((test >= current) &&
|
|
|
|
(test - current > DISK_DRIVE_SECTOR_SIZE / 3)))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
if (translog_recover_page_up_to_sector(page, offset))
|
|
|
|
DBUG_RETURN(1);
|
2007-02-19 22:01:27 +01:00
|
|
|
data->was_recovered= 1;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Return value on the page */
|
|
|
|
page[offset]= table[i];
|
|
|
|
page[offset + 1]= table[i + 1];
|
|
|
|
current= test;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("sector: #%u offset: %u current: %lx "
|
|
|
|
"read: 0x%x stored: 0x%x%x",
|
2007-02-12 13:23:43 +01:00
|
|
|
i / 2, offset, (ulong) current,
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) uint2korr(page + offset), (uint) table[i],
|
|
|
|
(uint) table[i + 1]));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
|
2007-08-13 14:17:49 +02:00
|
|
|
/*
|
|
|
|
Lock the loghandler
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_lock()
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 OK
|
|
|
|
1 Error
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool translog_lock()
|
|
|
|
{
|
|
|
|
struct st_translog_buffer *current_buffer;
|
|
|
|
DBUG_ENTER("translog_lock");
|
|
|
|
|
|
|
|
/*
|
|
|
|
Locking the loghandler mean locking current buffer, but it can change
|
|
|
|
during locking, so we should check it
|
|
|
|
*/
|
|
|
|
for (;;)
|
|
|
|
{
|
|
|
|
current_buffer= log_descriptor.bc.buffer;
|
|
|
|
if (translog_buffer_lock(current_buffer))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
if (log_descriptor.bc.buffer == current_buffer)
|
|
|
|
break;
|
|
|
|
translog_buffer_unlock(current_buffer);
|
|
|
|
}
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Unlock the loghandler
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_unlock()
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 OK
|
|
|
|
1 Error
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool translog_unlock()
|
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_unlock");
|
|
|
|
translog_buffer_unlock(log_descriptor.bc.buffer);
|
|
|
|
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
/**
|
|
|
|
@brief Get log page by file number and offset of the beginning of the page
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
@param data validator data, which contains the page address
|
|
|
|
@param buffer buffer for page placing
|
2007-02-02 08:41:32 +01:00
|
|
|
(might not be used in some cache implementations)
|
2007-09-27 13:18:28 +02:00
|
|
|
@param direct_link if it is not NULL then caller can accept direct
|
|
|
|
link to the page cache
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
@retval NULL Error
|
|
|
|
@retval # pointer to the page cache which should be used to read this page
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
static uchar *translog_get_page(TRANSLOG_VALIDATOR_DATA *data, uchar *buffer,
|
2007-09-27 16:41:21 +02:00
|
|
|
PAGECACHE_BLOCK_LINK **direct_link)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-08-13 14:17:49 +02:00
|
|
|
TRANSLOG_ADDRESS addr= *(data->addr), in_buffers;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint cache_index;
|
2007-02-12 13:23:43 +01:00
|
|
|
uint32 file_no= LSN_FILE_NO(addr);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_get_page");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("File: %lu Offset: %lu(0x%lx)",
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) file_no,
|
|
|
|
(ulong) LSN_OFFSET(addr),
|
|
|
|
(ulong) LSN_OFFSET(addr)));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/* it is really page address */
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_OFFSET(addr) % TRANSLOG_PAGE_SIZE == 0);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
if (direct_link)
|
|
|
|
*direct_link= NULL;
|
|
|
|
|
2007-08-13 14:17:49 +02:00
|
|
|
in_buffers= translog_only_in_buffers();
|
|
|
|
DBUG_PRINT("info", ("in_buffers: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(in_buffers)));
|
2007-08-13 14:17:49 +02:00
|
|
|
if (in_buffers != LSN_IMPOSSIBLE &&
|
|
|
|
cmp_translog_addr(addr, in_buffers) >= 0)
|
|
|
|
{
|
|
|
|
translog_lock();
|
|
|
|
/* recheck with locked loghandler */
|
|
|
|
in_buffers= translog_only_in_buffers();
|
|
|
|
if (cmp_translog_addr(addr, in_buffers) >= 0)
|
|
|
|
{
|
|
|
|
uint16 buffer_no= log_descriptor.bc.buffer_no;
|
2007-09-27 13:18:28 +02:00
|
|
|
#ifndef DBUG_OFF
|
2007-08-13 14:17:49 +02:00
|
|
|
uint16 buffer_start= buffer_no;
|
2007-09-27 13:18:28 +02:00
|
|
|
#endif
|
2007-08-13 14:17:49 +02:00
|
|
|
struct st_translog_buffer *buffer_unlock= log_descriptor.bc.buffer;
|
|
|
|
struct st_translog_buffer *curr_buffer= log_descriptor.bc.buffer;
|
|
|
|
for (;;)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
if the page is in the buffer and it is the last version of the
|
|
|
|
page (in case of devision the page bu buffer flush
|
|
|
|
*/
|
|
|
|
if (curr_buffer->file != -1 &&
|
|
|
|
cmp_translog_addr(addr, curr_buffer->offset) >= 0 &&
|
|
|
|
cmp_translog_addr(addr,
|
|
|
|
(curr_buffer->next_buffer_offset ?
|
|
|
|
curr_buffer->next_buffer_offset:
|
|
|
|
curr_buffer->offset + curr_buffer->size)) < 0)
|
|
|
|
{
|
|
|
|
int is_last_unfinished_page;
|
|
|
|
uint last_protected_sector= 0;
|
2007-09-04 21:52:32 +02:00
|
|
|
uchar *from, *table= NULL;
|
2007-08-13 14:17:49 +02:00
|
|
|
translog_wait_for_writers(curr_buffer);
|
|
|
|
DBUG_ASSERT(LSN_FILE_NO(addr) == LSN_FILE_NO(curr_buffer->offset));
|
|
|
|
from= curr_buffer->buffer + (addr - curr_buffer->offset);
|
|
|
|
memcpy(buffer, from, TRANSLOG_PAGE_SIZE);
|
|
|
|
is_last_unfinished_page= ((log_descriptor.bc.buffer ==
|
|
|
|
curr_buffer) &&
|
|
|
|
(log_descriptor.bc.ptr >= from) &&
|
|
|
|
(log_descriptor.bc.ptr <
|
|
|
|
from + TRANSLOG_PAGE_SIZE));
|
|
|
|
if (is_last_unfinished_page &&
|
|
|
|
(buffer[TRANSLOG_PAGE_FLAGS] & TRANSLOG_SECTOR_PROTECTION))
|
|
|
|
{
|
|
|
|
last_protected_sector= ((log_descriptor.bc.previous_offset - 1) /
|
|
|
|
DISK_DRIVE_SECTOR_SIZE);
|
|
|
|
table= buffer + log_descriptor.page_overhead -
|
|
|
|
(TRANSLOG_PAGE_SIZE / DISK_DRIVE_SECTOR_SIZE) * 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
DBUG_ASSERT(buffer_unlock == curr_buffer);
|
|
|
|
translog_buffer_unlock(buffer_unlock);
|
|
|
|
if (is_last_unfinished_page)
|
|
|
|
{
|
|
|
|
uint i;
|
|
|
|
/*
|
|
|
|
This is last unfinished page => we should not check CRC and
|
|
|
|
remove only that protection which already installed (no need
|
|
|
|
to check it)
|
|
|
|
|
|
|
|
We do not check the flag of sector protection, because if
|
|
|
|
(buffer[TRANSLOG_PAGE_FLAGS] & TRANSLOG_SECTOR_PROTECTION) is
|
|
|
|
not set then last_protected_sector will be 0 so following loop
|
|
|
|
will be never executed
|
|
|
|
*/
|
|
|
|
DBUG_PRINT("info", ("This is last unfinished page, "
|
|
|
|
"last protected sector %u",
|
|
|
|
last_protected_sector));
|
|
|
|
for (i= 1; i <= last_protected_sector; i++)
|
|
|
|
{
|
|
|
|
uint index= i * 2;
|
|
|
|
uint offset= i * DISK_DRIVE_SECTOR_SIZE;
|
|
|
|
DBUG_PRINT("info", ("Sector %u: 0x%02x%02x <- 0x%02x%02x",
|
|
|
|
i, buffer[offset], buffer[offset + 1],
|
|
|
|
table[index], table[index + 1]));
|
|
|
|
buffer[offset]= table[index];
|
|
|
|
buffer[offset + 1]= table[index + 1];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
This IF should be true because we use in-memory data which
|
|
|
|
supposed to be correct.
|
|
|
|
*/
|
|
|
|
if (translog_page_validator((uchar*) buffer, (uchar*) data))
|
|
|
|
buffer= NULL;
|
|
|
|
}
|
|
|
|
DBUG_RETURN(buffer);
|
|
|
|
}
|
|
|
|
buffer_no= (buffer_no + 1) % TRANSLOG_BUFFERS_NO;
|
|
|
|
curr_buffer= log_descriptor.buffers + buffer_no;
|
|
|
|
translog_buffer_lock(curr_buffer);
|
|
|
|
translog_buffer_unlock(buffer_unlock);
|
|
|
|
buffer_unlock= curr_buffer;
|
|
|
|
/* we can't make full circle */
|
|
|
|
DBUG_ASSERT(buffer_start != buffer_no);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
translog_unlock();
|
|
|
|
}
|
2007-02-12 13:23:43 +01:00
|
|
|
if ((cache_index= LSN_FILE_NO(log_descriptor.horizon) - file_no) <
|
2007-02-02 08:41:32 +01:00
|
|
|
OPENED_FILES_NUM)
|
|
|
|
{
|
|
|
|
PAGECACHE_FILE file;
|
|
|
|
/* file in the cache */
|
2007-02-19 22:01:27 +01:00
|
|
|
if (log_descriptor.log_file_num[cache_index] == -1)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
if ((log_descriptor.log_file_num[cache_index]=
|
2007-02-19 22:01:27 +01:00
|
|
|
open_logfile_by_number_no_cache(file_no)) == -1)
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(NULL);
|
|
|
|
}
|
|
|
|
file.file= log_descriptor.log_file_num[cache_index];
|
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
buffer=
|
|
|
|
(uchar*) (direct_link ?
|
|
|
|
pagecache_valid_read(log_descriptor.pagecache, &file,
|
|
|
|
LSN_OFFSET(addr) / TRANSLOG_PAGE_SIZE,
|
|
|
|
3, NULL,
|
|
|
|
PAGECACHE_PLAIN_PAGE,
|
|
|
|
PAGECACHE_LOCK_READ, direct_link,
|
|
|
|
&translog_page_validator, (uchar*) data) :
|
|
|
|
pagecache_valid_read(log_descriptor.pagecache, &file,
|
|
|
|
LSN_OFFSET(addr) / TRANSLOG_PAGE_SIZE,
|
|
|
|
3, (char*) buffer,
|
|
|
|
PAGECACHE_PLAIN_PAGE,
|
|
|
|
PAGECACHE_LOCK_LEFT_UNLOCKED, direct_link,
|
|
|
|
&translog_page_validator, (uchar*) data));
|
|
|
|
DBUG_PRINT("info", ("Direct link is assigned to : 0x%lx * 0x%lx",
|
|
|
|
(ulong) direct_link,
|
|
|
|
(ulong)(direct_link ? *direct_link : NULL)));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
/*
|
|
|
|
TODO: WE KEEP THE LAST OPENED_FILES_NUM FILES IN THE LOG CACHE, NOT
|
|
|
|
THE LAST USED FILES. THIS WILL BE A NOTABLE PROBLEM IF WE ARE
|
|
|
|
FOLLOWING AN UNDO CHAIN THAT GOES OVER MANY OLD LOG FILES. WE WILL
|
|
|
|
PROBABLY NEED SPECIAL HANDLING OF THIS OR HAVE A FILO FOR THE LOG
|
|
|
|
FILES.
|
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
File file= open_logfile_by_number_no_cache(file_no);
|
2007-02-19 22:01:27 +01:00
|
|
|
if (file == -1)
|
|
|
|
DBUG_RETURN(NULL);
|
2007-02-02 08:41:32 +01:00
|
|
|
if (my_pread(file, (char*) buffer, TRANSLOG_PAGE_SIZE,
|
2007-02-12 13:23:43 +01:00
|
|
|
LSN_OFFSET(addr), MYF(MY_FNABP | MY_WME)))
|
2007-02-02 08:41:32 +01:00
|
|
|
buffer= NULL;
|
2007-07-02 19:45:15 +02:00
|
|
|
else if (translog_page_validator((uchar*) buffer, (uchar*) data))
|
2007-02-02 08:41:32 +01:00
|
|
|
buffer= NULL;
|
|
|
|
my_close(file, MYF(MY_WME));
|
|
|
|
}
|
|
|
|
DBUG_RETURN(buffer);
|
|
|
|
}
|
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
/**
|
|
|
|
@brief free direct log page link
|
|
|
|
|
|
|
|
@param direct_link the direct log page link to be freed
|
|
|
|
|
|
|
|
*/
|
|
|
|
|
2007-09-27 16:41:21 +02:00
|
|
|
static void translog_free_link(PAGECACHE_BLOCK_LINK *direct_link)
|
2007-09-27 13:18:28 +02:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_free_link");
|
|
|
|
DBUG_PRINT("info", ("Direct link: 0x%lx",
|
|
|
|
(ulong) direct_link));
|
|
|
|
if (direct_link)
|
|
|
|
pagecache_unlock_by_link(log_descriptor.pagecache, direct_link,
|
|
|
|
PAGECACHE_LOCK_READ_UNLOCK, PAGECACHE_UNPIN,
|
|
|
|
LSN_IMPOSSIBLE, LSN_IMPOSSIBLE);
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
Finds last page of the given log file
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_last_page_addr()
|
|
|
|
addr address structure to fill with data, which contain
|
|
|
|
file number of the log file
|
|
|
|
last_page_ok assigned 1 if last page was OK
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_get_last_page_addr(TRANSLOG_ADDRESS *addr,
|
|
|
|
my_bool *last_page_ok)
|
|
|
|
{
|
|
|
|
MY_STAT stat_buff, *stat;
|
|
|
|
char path[FN_REFLEN];
|
2007-02-12 13:23:43 +01:00
|
|
|
uint32 rec_offset;
|
|
|
|
uint32 file_no= LSN_FILE_NO(*addr);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_get_last_page_addr");
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
if (!(stat= my_stat(translog_filename_by_fileno(file_no, path),
|
|
|
|
&stat_buff, MYF(MY_WME))))
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("File size: %lu", (ulong) stat->st_size));
|
2007-02-02 08:41:32 +01:00
|
|
|
if (stat->st_size > TRANSLOG_PAGE_SIZE)
|
|
|
|
{
|
2007-02-12 13:23:43 +01:00
|
|
|
rec_offset= (((stat->st_size / TRANSLOG_PAGE_SIZE) - 1) *
|
2007-02-02 08:41:32 +01:00
|
|
|
TRANSLOG_PAGE_SIZE);
|
2007-02-12 13:23:43 +01:00
|
|
|
*last_page_ok= (stat->st_size == rec_offset + TRANSLOG_PAGE_SIZE);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
*last_page_ok= 0;
|
2007-02-12 13:23:43 +01:00
|
|
|
rec_offset= 0;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-12 13:23:43 +01:00
|
|
|
*addr= MAKE_LSN(file_no, rec_offset);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Last page: 0x%lx ok: %d", (ulong) rec_offset,
|
2007-02-02 08:41:32 +01:00
|
|
|
*last_page_ok));
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Get number bytes for record length storing
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_variable_record_length_bytes()
|
|
|
|
length Record length wich will be codded
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
1,3,4,5 - number of bytes to store given length
|
|
|
|
*/
|
2007-02-12 13:23:43 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
static uint translog_variable_record_length_bytes(translog_size_t length)
|
|
|
|
{
|
|
|
|
if (length < 250)
|
|
|
|
return 1;
|
2007-02-19 22:01:27 +01:00
|
|
|
if (length < 0xFFFF)
|
2007-02-02 08:41:32 +01:00
|
|
|
return 3;
|
2007-02-19 22:01:27 +01:00
|
|
|
if (length < (ulong) 0xFFFFFF)
|
2007-02-02 08:41:32 +01:00
|
|
|
return 4;
|
|
|
|
return 5;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Get header of this chunk
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_chunk_header_length()
|
|
|
|
page The page where chunk placed
|
|
|
|
offset Offset of the chunk on this place
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
# total length of the chunk
|
|
|
|
0 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static uint16 translog_get_chunk_header_length(uchar *page, uint16 offset)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_get_chunk_header_length");
|
2007-02-19 22:01:27 +01:00
|
|
|
page+= offset;
|
|
|
|
switch (*page & TRANSLOG_CHUNK_TYPE) {
|
|
|
|
case TRANSLOG_CHUNK_LSN:
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
/* 0 chunk referred as LSN (head or tail) */
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_size_t rec_len;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *start= page;
|
|
|
|
uchar *ptr= start + 1 + 2;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint16 chunk_len, header_len;
|
|
|
|
DBUG_PRINT("info", ("TRANSLOG_CHUNK_LSN"));
|
|
|
|
rec_len= translog_variable_record_1group_decode_len(&ptr);
|
|
|
|
chunk_len= uint2korr(ptr);
|
2007-02-19 22:01:27 +01:00
|
|
|
header_len= (ptr - start) +2;
|
|
|
|
DBUG_PRINT("info", ("rec len: %lu chunk len: %u header len: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) rec_len, (uint) chunk_len, (uint) header_len));
|
|
|
|
if (chunk_len)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
/* TODO: fine header end */
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ASSERT(0);
|
2007-06-09 13:52:17 +02:00
|
|
|
DBUG_RETURN(0); /* Keep compiler happy */
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
DBUG_RETURN(header_len);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
case TRANSLOG_CHUNK_FIXED:
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
/* 1 (pseudo)fixed record (also LSN) */
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_PRINT("info", ("TRANSLOG_CHUNK_FIXED = 3"));
|
|
|
|
DBUG_RETURN(3);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
case TRANSLOG_CHUNK_NOHDR:
|
|
|
|
/* 2 no header chunk (till page end) */
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_PRINT("info", ("TRANSLOG_CHUNK_NOHDR = 1"));
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
break;
|
2007-02-19 22:01:27 +01:00
|
|
|
case TRANSLOG_CHUNK_LNGTH:
|
|
|
|
/* 3 chunk with chunk length */
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_PRINT("info", ("TRANSLOG_CHUNK_LNGTH = 3"));
|
|
|
|
DBUG_RETURN(3);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
DBUG_ASSERT(0);
|
2007-06-09 13:52:17 +02:00
|
|
|
DBUG_RETURN(0); /* Keep compiler happy */
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Initialize transaction log
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_init()
|
|
|
|
directory Directory where log files are put
|
|
|
|
log_file_max_size max size of one log size (for new logs creation)
|
|
|
|
server_version version of MySQL server (MYSQL_VERSION_ID)
|
|
|
|
server_id server ID (replication & Co)
|
|
|
|
pagecache Page cache for the log reads
|
|
|
|
flags flags (TRANSLOG_PAGE_CRC, TRANSLOG_SECTOR_PROTECTION
|
|
|
|
TRANSLOG_RECORD_CRC)
|
|
|
|
|
2007-06-04 13:07:18 +02:00
|
|
|
TODO
|
|
|
|
Free used resources in case of error.
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool translog_init(const char *directory,
|
|
|
|
uint32 log_file_max_size,
|
|
|
|
uint32 server_version,
|
|
|
|
uint32 server_id, PAGECACHE *pagecache, uint flags)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
int old_log_was_recovered= 0, logs_found= 0;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint old_flags= flags;
|
2007-02-02 08:41:32 +01:00
|
|
|
TRANSLOG_ADDRESS sure_page, last_page, last_valid_page;
|
2007-06-25 09:07:46 +02:00
|
|
|
my_bool version_changed= 0;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_init");
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 0);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-06-04 13:07:18 +02:00
|
|
|
loghandler_init(); /* Safe to do many times */
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
if (pthread_mutex_init(&log_descriptor.sent_to_file_lock,
|
2007-08-19 21:27:43 +02:00
|
|
|
MY_MUTEX_INIT_FAST) ||
|
|
|
|
pthread_mutex_init(&log_descriptor.file_header_lock,
|
|
|
|
MY_MUTEX_INIT_FAST) ||
|
|
|
|
pthread_mutex_init(&log_descriptor.unfinished_files_lock,
|
|
|
|
MY_MUTEX_INIT_FAST) ||
|
2007-08-27 20:55:39 +02:00
|
|
|
pthread_mutex_init(&log_descriptor.purger_lock,
|
|
|
|
MY_MUTEX_INIT_FAST) ||
|
2007-09-21 12:48:57 +02:00
|
|
|
pthread_mutex_init(&log_descriptor.log_flush_lock,
|
|
|
|
MY_MUTEX_INIT_FAST) ||
|
2007-08-19 21:27:43 +02:00
|
|
|
init_dynamic_array(&log_descriptor.unfinished_files,
|
|
|
|
sizeof(struct st_file_counter),
|
|
|
|
10, 10 CALLER_INFO))
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
2007-08-27 20:55:39 +02:00
|
|
|
log_descriptor.min_file_number= 0;
|
|
|
|
log_descriptor.last_lsn_checked= LSN_IMPOSSIBLE;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/* Directory to store files */
|
|
|
|
unpack_dirname(log_descriptor.directory, directory);
|
|
|
|
|
|
|
|
if ((log_descriptor.directory_fd= my_open(log_descriptor.directory,
|
|
|
|
O_RDONLY, MYF(MY_WME))) < 0)
|
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("Error %d during opening directory '%s'",
|
|
|
|
errno, log_descriptor.directory));
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
|
2007-08-13 14:17:49 +02:00
|
|
|
log_descriptor.in_buffers_only= LSN_IMPOSSIBLE;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* max size of one log size (for new logs creation) */
|
|
|
|
log_descriptor.log_file_max_size=
|
|
|
|
log_file_max_size - (log_file_max_size % TRANSLOG_PAGE_SIZE);
|
|
|
|
/* server version */
|
|
|
|
log_descriptor.server_version= server_version;
|
|
|
|
/* server ID */
|
|
|
|
log_descriptor.server_id= server_id;
|
|
|
|
/* Page cache for the log reads */
|
|
|
|
log_descriptor.pagecache= pagecache;
|
|
|
|
/* Flags */
|
|
|
|
DBUG_ASSERT((flags &
|
|
|
|
~(TRANSLOG_PAGE_CRC | TRANSLOG_SECTOR_PROTECTION |
|
|
|
|
TRANSLOG_RECORD_CRC)) == 0);
|
|
|
|
log_descriptor.flags= flags;
|
2007-02-19 22:01:27 +01:00
|
|
|
for (i= 0; i < TRANSLOG_FLAGS_NUM; i++)
|
|
|
|
{
|
|
|
|
page_overhead[i]= 7;
|
|
|
|
if (i & TRANSLOG_PAGE_CRC)
|
|
|
|
page_overhead[i]+= CRC_LENGTH;
|
|
|
|
if (i & TRANSLOG_SECTOR_PROTECTION)
|
|
|
|
page_overhead[i]+= (TRANSLOG_PAGE_SIZE /
|
|
|
|
DISK_DRIVE_SECTOR_SIZE) * 2;
|
|
|
|
}
|
|
|
|
log_descriptor.page_overhead= page_overhead[flags];
|
2007-02-02 08:41:32 +01:00
|
|
|
log_descriptor.page_capacity_chunk_2=
|
|
|
|
TRANSLOG_PAGE_SIZE - log_descriptor.page_overhead - 1;
|
|
|
|
DBUG_ASSERT(TRANSLOG_WRITE_BUFFER % TRANSLOG_PAGE_SIZE == 0);
|
|
|
|
log_descriptor.buffer_capacity_chunk_2=
|
|
|
|
(TRANSLOG_WRITE_BUFFER / TRANSLOG_PAGE_SIZE) *
|
|
|
|
log_descriptor.page_capacity_chunk_2;
|
|
|
|
log_descriptor.half_buffer_capacity_chunk_2=
|
|
|
|
log_descriptor.buffer_capacity_chunk_2 / 2;
|
|
|
|
DBUG_PRINT("info",
|
2007-02-19 22:01:27 +01:00
|
|
|
("Overhead: %u pc2: %u bc2: %u, bc2/2: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
log_descriptor.page_overhead,
|
|
|
|
log_descriptor.page_capacity_chunk_2,
|
|
|
|
log_descriptor.buffer_capacity_chunk_2,
|
|
|
|
log_descriptor.half_buffer_capacity_chunk_2));
|
|
|
|
|
|
|
|
/* *** Current state of the log handler *** */
|
|
|
|
|
|
|
|
/* Init log handler file handlers cache */
|
|
|
|
for (i= 0; i < OPENED_FILES_NUM; i++)
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.log_file_num[i]= -1;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/* just to init it somehow */
|
|
|
|
translog_start_buffer(log_descriptor.buffers, &log_descriptor.bc, 0);
|
|
|
|
|
|
|
|
/* Buffers for log writing */
|
|
|
|
for (i= 0; i < TRANSLOG_BUFFERS_NO; i++)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
if (translog_buffer_init(log_descriptor.buffers + i))
|
|
|
|
DBUG_RETURN(1);
|
2007-02-02 08:41:32 +01:00
|
|
|
#ifndef DBUG_OFF
|
|
|
|
log_descriptor.buffers[i].buffer_no= (uint8) i;
|
|
|
|
#endif
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("translog_buffer buffer #%u: 0x%lx",
|
|
|
|
i, (ulong) log_descriptor.buffers + i));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
logs_found= (last_logno != FILENO_IMPOSSIBLE);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
if (logs_found)
|
|
|
|
{
|
|
|
|
my_bool pageok;
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
TODO: scan directory for maria_log.XXXXXXXX files and find
|
2007-02-02 08:41:32 +01:00
|
|
|
highest XXXXXXXX & set logs_found
|
2007-02-19 22:01:27 +01:00
|
|
|
TODO: check that last checkpoint within present log addresses space
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-21 22:30:58 +01:00
|
|
|
find the log end
|
2007-02-19 22:01:27 +01:00
|
|
|
*/
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
if (LSN_FILE_NO(last_checkpoint_lsn) == FILENO_IMPOSSIBLE)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_OFFSET(last_checkpoint_lsn) == 0);
|
2007-02-02 08:41:32 +01:00
|
|
|
/* there was no checkpoints we will read from the beginning */
|
2007-02-12 13:23:43 +01:00
|
|
|
sure_page= (LSN_ONE_FILE | TRANSLOG_PAGE_SIZE);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
sure_page= last_checkpoint_lsn;
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_OFFSET(sure_page) % TRANSLOG_PAGE_SIZE != 0);
|
|
|
|
sure_page-= LSN_OFFSET(sure_page) % TRANSLOG_PAGE_SIZE;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-12 13:23:43 +01:00
|
|
|
log_descriptor.horizon= last_page= MAKE_LSN(last_logno,0);
|
2007-02-02 08:41:32 +01:00
|
|
|
if (translog_get_last_page_addr(&last_page, &pageok))
|
|
|
|
DBUG_RETURN(1);
|
2007-02-12 13:23:43 +01:00
|
|
|
if (LSN_OFFSET(last_page) == 0)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-12 13:23:43 +01:00
|
|
|
if (LSN_FILE_NO(last_page) == 1)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
logs_found= 0; /* file #1 has no pages */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2007-02-12 13:23:43 +01:00
|
|
|
last_page-= LSN_ONE_FILE;
|
2007-02-02 08:41:32 +01:00
|
|
|
if (translog_get_last_page_addr(&last_page, &pageok))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (logs_found)
|
|
|
|
{
|
|
|
|
TRANSLOG_ADDRESS current_page= sure_page;
|
|
|
|
my_bool pageok;
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(sure_page <= last_page);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/* TODO: check page size */
|
|
|
|
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
last_valid_page= LSN_IMPOSSIBLE;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* scan and validate pages */
|
|
|
|
do
|
|
|
|
{
|
|
|
|
TRANSLOG_ADDRESS current_file_last_page;
|
2007-02-12 13:23:43 +01:00
|
|
|
current_file_last_page= current_page;
|
2007-02-02 08:41:32 +01:00
|
|
|
if (translog_get_last_page_addr(¤t_file_last_page, &pageok))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
if (!pageok)
|
|
|
|
{
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_PRINT("error", ("File %lu have no complete last page",
|
|
|
|
(ulong) LSN_FILE_NO(current_file_last_page)));
|
2007-02-02 08:41:32 +01:00
|
|
|
old_log_was_recovered= 1;
|
|
|
|
/* This file is not written till the end so it should be last */
|
|
|
|
last_page= current_file_last_page;
|
|
|
|
/* TODO: issue warning */
|
|
|
|
}
|
|
|
|
do
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
TRANSLOG_VALIDATOR_DATA data;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar buffer[TRANSLOG_PAGE_SIZE], *page;
|
2007-02-19 22:01:27 +01:00
|
|
|
data.addr= ¤t_page;
|
2007-09-27 13:18:28 +02:00
|
|
|
if ((page= translog_get_page(&data, buffer, NULL)) == NULL)
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
if (data.was_recovered)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("error", ("file no: %lu (%d) "
|
|
|
|
"rec_offset: 0x%lx (%lu) (%d)",
|
|
|
|
(ulong) LSN_FILE_NO(current_page),
|
2007-02-12 13:23:43 +01:00
|
|
|
(uint3korr(page + 3) !=
|
|
|
|
LSN_FILE_NO(current_page)),
|
|
|
|
(ulong) LSN_OFFSET(current_page),
|
|
|
|
(ulong) (LSN_OFFSET(current_page) /
|
2007-02-02 08:41:32 +01:00
|
|
|
TRANSLOG_PAGE_SIZE),
|
|
|
|
(uint3korr(page) !=
|
2007-02-12 13:23:43 +01:00
|
|
|
LSN_OFFSET(current_page) /
|
|
|
|
TRANSLOG_PAGE_SIZE)));
|
2007-02-02 08:41:32 +01:00
|
|
|
old_log_was_recovered= 1;
|
|
|
|
break;
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
old_flags= page[TRANSLOG_PAGE_FLAGS];
|
2007-02-02 08:41:32 +01:00
|
|
|
last_valid_page= current_page;
|
2007-02-12 13:23:43 +01:00
|
|
|
current_page+= TRANSLOG_PAGE_SIZE; /* increase offset */
|
|
|
|
} while (current_page <= current_file_last_page);
|
|
|
|
current_page+= LSN_ONE_FILE;
|
|
|
|
current_page= LSN_REPLACE_OFFSET(current_page, TRANSLOG_PAGE_SIZE);
|
|
|
|
} while (LSN_FILE_NO(current_page) <= LSN_FILE_NO(last_page) &&
|
2007-02-02 08:41:32 +01:00
|
|
|
!old_log_was_recovered);
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
if (last_valid_page == LSN_IMPOSSIBLE)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/* Panic!!! Even page which should be valid is invalid */
|
|
|
|
/* TODO: issue error */
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Last valid page is in file: %lu "
|
|
|
|
"offset: %lu (0x%lx) "
|
|
|
|
"Logs found: %d was recovered: %d "
|
|
|
|
"flags match: %d",
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_FILE_NO(last_valid_page),
|
|
|
|
(ulong) LSN_OFFSET(last_valid_page),
|
|
|
|
(ulong) LSN_OFFSET(last_valid_page),
|
2007-02-19 22:01:27 +01:00
|
|
|
logs_found, old_log_was_recovered,
|
|
|
|
(old_flags == flags)));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/* TODO: check server ID */
|
2007-02-19 22:01:27 +01:00
|
|
|
if (logs_found && !old_log_was_recovered && old_flags == flags)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
TRANSLOG_VALIDATOR_DATA data;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar buffer[TRANSLOG_PAGE_SIZE], *page;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint16 chunk_offset;
|
2007-02-19 22:01:27 +01:00
|
|
|
data.addr= &last_valid_page;
|
2007-02-02 08:41:32 +01:00
|
|
|
/* continue old log */
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_FILE_NO(last_valid_page)==
|
|
|
|
LSN_FILE_NO(log_descriptor.horizon));
|
2007-09-27 13:18:28 +02:00
|
|
|
if ((page= translog_get_page(&data, buffer, NULL)) == NULL ||
|
2007-02-02 08:41:32 +01:00
|
|
|
(chunk_offset= translog_get_first_chunk_offset(page)) == 0)
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
|
|
|
/* Puts filled part of old page in the buffer */
|
|
|
|
log_descriptor.horizon= last_valid_page;
|
|
|
|
translog_start_buffer(log_descriptor.buffers, &log_descriptor.bc, 0);
|
|
|
|
/*
|
2007-07-02 19:45:15 +02:00
|
|
|
Free space if filled with 0 and first uchar of
|
2007-02-02 08:41:32 +01:00
|
|
|
real chunk can't be 0
|
|
|
|
*/
|
|
|
|
while (chunk_offset < TRANSLOG_PAGE_SIZE && page[chunk_offset] != '\0')
|
|
|
|
{
|
|
|
|
uint16 chunk_length;
|
|
|
|
if ((chunk_length=
|
|
|
|
translog_get_total_chunk_length(page, chunk_offset)) == 0)
|
|
|
|
DBUG_RETURN(1);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("chunk: offset: %u length: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) chunk_offset, (uint) chunk_length));
|
|
|
|
chunk_offset+= chunk_length;
|
|
|
|
|
|
|
|
/* chunk can't cross the page border */
|
|
|
|
DBUG_ASSERT(chunk_offset <= TRANSLOG_PAGE_SIZE);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
memcpy(log_descriptor.buffers->buffer, page, chunk_offset);
|
2007-02-02 08:41:32 +01:00
|
|
|
log_descriptor.bc.buffer->size+= chunk_offset;
|
|
|
|
log_descriptor.bc.ptr+= chunk_offset;
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.bc.current_page_fill= chunk_offset;
|
2007-02-12 13:23:43 +01:00
|
|
|
log_descriptor.horizon= LSN_REPLACE_OFFSET(log_descriptor.horizon,
|
|
|
|
(chunk_offset +
|
|
|
|
LSN_OFFSET(last_valid_page)));
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Move Page #%u: 0x%lx chaser: %d Size: %lu (%lu)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) log_descriptor.bc.buffer_no,
|
|
|
|
(ulong) log_descriptor.bc.buffer,
|
|
|
|
log_descriptor.bc.chaser,
|
|
|
|
(ulong) log_descriptor.bc.buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (log_descriptor.bc.ptr - log_descriptor.bc.
|
2007-02-02 08:41:32 +01:00
|
|
|
buffer->buffer)));
|
2007-02-21 22:30:58 +01:00
|
|
|
DBUG_EXECUTE("info", translog_check_cursor(&log_descriptor.bc););
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-06-25 09:07:46 +02:00
|
|
|
if (!old_log_was_recovered && old_flags == flags)
|
|
|
|
{
|
|
|
|
LOGHANDLER_FILE_INFO info;
|
2007-08-19 21:27:43 +02:00
|
|
|
if (translog_read_file_header(&info, log_descriptor.log_file_num[0]))
|
2007-06-25 09:07:46 +02:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
version_changed= (info.maria_version != TRANSLOG_VERSION_ID);
|
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Logs found: %d was recovered: %d",
|
2007-02-02 08:41:32 +01:00
|
|
|
logs_found, old_log_was_recovered));
|
|
|
|
if (!logs_found)
|
|
|
|
{
|
|
|
|
/* Start new log system from scratch */
|
|
|
|
/* Used space */
|
2007-02-21 22:54:07 +01:00
|
|
|
log_descriptor.horizon= MAKE_LSN(1, TRANSLOG_PAGE_SIZE); /* header page */
|
2007-02-02 08:41:32 +01:00
|
|
|
/* Current logs file number in page cache */
|
2007-02-19 22:01:27 +01:00
|
|
|
if ((log_descriptor.log_file_num[0]=
|
|
|
|
open_logfile_by_number_no_cache(1)) == -1 ||
|
|
|
|
translog_write_file_header())
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
if (ma_control_file_write_and_force(LSN_IMPOSSIBLE, 1,
|
2007-02-02 08:41:32 +01:00
|
|
|
CONTROL_FILE_UPDATE_ONLY_LOGNO))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
/* assign buffer 0 */
|
|
|
|
translog_start_buffer(log_descriptor.buffers, &log_descriptor.bc, 0);
|
|
|
|
translog_new_page_header(&log_descriptor.horizon, &log_descriptor.bc);
|
|
|
|
}
|
2007-06-25 09:07:46 +02:00
|
|
|
else if (old_log_was_recovered || old_flags != flags || version_changed)
|
2007-02-19 22:01:27 +01:00
|
|
|
{
|
|
|
|
/* leave the damaged file untouched */
|
|
|
|
log_descriptor.horizon+= LSN_ONE_FILE;
|
|
|
|
/* header page */
|
|
|
|
log_descriptor.horizon= LSN_REPLACE_OFFSET(log_descriptor.horizon,
|
|
|
|
TRANSLOG_PAGE_SIZE);
|
|
|
|
if (translog_create_new_file())
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
/*
|
|
|
|
Buffer system left untouched after recovery => we should init it
|
|
|
|
(starting from buffer 0)
|
|
|
|
*/
|
|
|
|
translog_start_buffer(log_descriptor.buffers, &log_descriptor.bc, 0);
|
|
|
|
translog_new_page_header(&log_descriptor.horizon, &log_descriptor.bc);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* all LSNs that are on disk are flushed */
|
2007-08-13 14:17:49 +02:00
|
|
|
log_descriptor.sent_to_file=
|
|
|
|
log_descriptor.flushed= log_descriptor.horizon;
|
|
|
|
log_descriptor.in_buffers_only= log_descriptor.bc.buffer->offset;
|
2007-08-19 21:27:43 +02:00
|
|
|
log_descriptor.max_lsn= LSN_IMPOSSIBLE; /* set to 0 */
|
2007-02-19 22:01:27 +01:00
|
|
|
/*
|
|
|
|
horizon is (potentially) address of the next LSN we need decrease
|
|
|
|
it to signal that all LSNs before it are flushed
|
|
|
|
*/
|
2007-02-12 13:23:43 +01:00
|
|
|
log_descriptor.flushed--; /* offset decreased */
|
|
|
|
log_descriptor.sent_to_file--; /* offset decreased */
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/*
|
|
|
|
Log records will refer to a MARIA_SHARE by a unique 2-byte id; set up
|
|
|
|
structures for generating 2-byte ids:
|
|
|
|
*/
|
|
|
|
my_atomic_rwlock_init(&LOCK_id_to_share);
|
2007-06-26 22:30:09 +02:00
|
|
|
id_to_share= (MARIA_SHARE **) my_malloc(SHARE_ID_MAX * sizeof(MARIA_SHARE*),
|
|
|
|
MYF(MY_WME | MY_ZEROFILL));
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
if (unlikely(!id_to_share))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
id_to_share--; /* min id is 1 */
|
2007-06-04 13:07:18 +02:00
|
|
|
translog_inited= 1;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Free transaction log file buffer
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_buffer_destroy()
|
|
|
|
buffer_no The buffer to free
|
|
|
|
|
|
|
|
NOTE
|
2007-02-19 22:01:27 +01:00
|
|
|
This buffer should be locked
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static void translog_buffer_destroy(struct st_translog_buffer *buffer)
|
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_buffer_destroy");
|
|
|
|
DBUG_PRINT("enter",
|
2007-02-19 22:01:27 +01:00
|
|
|
("Buffer #%u: 0x%lx file: %d offset: (%lu,0x%lx) size: %lu",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
2007-02-19 22:01:27 +01:00
|
|
|
buffer->file,
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(buffer->offset),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) buffer->size));
|
|
|
|
DBUG_ASSERT(buffer->waiting_filling_buffer.last_thread == 0);
|
2007-02-19 22:01:27 +01:00
|
|
|
if (buffer->file != -1)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
We ignore errors here, because we can't do something about it
|
2007-02-02 08:41:32 +01:00
|
|
|
(it is shutting down)
|
|
|
|
*/
|
|
|
|
translog_buffer_flush(buffer);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Destroy mutex: 0x%lx", (ulong) &buffer->mutex));
|
2007-02-02 08:41:32 +01:00
|
|
|
pthread_mutex_destroy(&buffer->mutex);
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Free log handler resources
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_destroy()
|
|
|
|
*/
|
|
|
|
|
|
|
|
void translog_destroy()
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint i;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_destroy");
|
2007-08-19 21:27:43 +02:00
|
|
|
|
2007-06-04 13:07:18 +02:00
|
|
|
if (translog_inited)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-06-04 13:07:18 +02:00
|
|
|
if (log_descriptor.bc.buffer->file != -1)
|
|
|
|
translog_finish_page(&log_descriptor.horizon, &log_descriptor.bc);
|
|
|
|
|
|
|
|
for (i= 0; i < TRANSLOG_BUFFERS_NO; i++)
|
|
|
|
{
|
|
|
|
struct st_translog_buffer *buffer= log_descriptor.buffers + i;
|
|
|
|
translog_buffer_destroy(buffer);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* close files */
|
|
|
|
for (i= 0; i < OPENED_FILES_NUM; i++)
|
|
|
|
{
|
|
|
|
if (log_descriptor.log_file_num[i] != -1)
|
|
|
|
translog_close_log_file(log_descriptor.log_file_num[i]);
|
|
|
|
}
|
|
|
|
pthread_mutex_destroy(&log_descriptor.sent_to_file_lock);
|
2007-08-19 21:27:43 +02:00
|
|
|
pthread_mutex_destroy(&log_descriptor.file_header_lock);
|
|
|
|
pthread_mutex_destroy(&log_descriptor.unfinished_files_lock);
|
2007-08-27 20:55:39 +02:00
|
|
|
pthread_mutex_destroy(&log_descriptor.purger_lock);
|
2007-09-21 12:48:57 +02:00
|
|
|
pthread_mutex_destroy(&log_descriptor.log_flush_lock);
|
2007-08-19 21:27:43 +02:00
|
|
|
delete_dynamic(&log_descriptor.unfinished_files);
|
|
|
|
|
2007-06-04 13:07:18 +02:00
|
|
|
my_close(log_descriptor.directory_fd, MYF(MY_WME));
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
my_atomic_rwlock_destroy(&LOCK_id_to_share);
|
2007-07-02 19:45:15 +02:00
|
|
|
my_free((uchar*)(id_to_share + 1), MYF(MY_ALLOW_ZERO_PTR));
|
2007-06-04 13:07:18 +02:00
|
|
|
translog_inited= 0;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
#define translog_buffer_lock_assert_owner(B) \
|
|
|
|
safe_mutex_assert_owner(&B->mutex);
|
|
|
|
void translog_lock_assert_owner()
|
|
|
|
{
|
|
|
|
translog_buffer_lock_assert_owner(log_descriptor.bc.buffer);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
/*
|
|
|
|
Start new page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_page_next()
|
|
|
|
horizon \ Position in file and buffer where we are
|
|
|
|
cursor /
|
|
|
|
prev_buffer Buffer which should be flushed will be assigned
|
2007-02-19 22:01:27 +01:00
|
|
|
here if it is need. This is always set.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
NOTE
|
|
|
|
handler should be locked
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_page_next(TRANSLOG_ADDRESS *horizon,
|
|
|
|
struct st_buffer_cursor *cursor,
|
|
|
|
struct st_translog_buffer **prev_buffer)
|
|
|
|
{
|
|
|
|
struct st_translog_buffer *buffer= cursor->buffer;
|
|
|
|
DBUG_ENTER("translog_page_next");
|
|
|
|
|
|
|
|
if ((cursor->ptr +TRANSLOG_PAGE_SIZE >
|
|
|
|
cursor->buffer->buffer + TRANSLOG_WRITE_BUFFER) ||
|
2007-02-12 13:23:43 +01:00
|
|
|
(LSN_OFFSET(*horizon) >
|
|
|
|
log_descriptor.log_file_max_size - TRANSLOG_PAGE_SIZE))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Switch to next buffer Buffer Size: %lu (%lu) => %d "
|
|
|
|
"File size: %lu max: %lu => %d",
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) cursor->buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (cursor->ptr - cursor->buffer->buffer),
|
2007-02-12 13:23:43 +01:00
|
|
|
(cursor->ptr + TRANSLOG_PAGE_SIZE >
|
2007-02-02 08:41:32 +01:00
|
|
|
cursor->buffer->buffer + TRANSLOG_WRITE_BUFFER),
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_OFFSET(*horizon),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) log_descriptor.log_file_max_size,
|
2007-02-12 13:23:43 +01:00
|
|
|
(LSN_OFFSET(*horizon) >
|
|
|
|
(log_descriptor.log_file_max_size -
|
|
|
|
TRANSLOG_PAGE_SIZE))));
|
2007-02-02 08:41:32 +01:00
|
|
|
if (translog_buffer_next(horizon, cursor,
|
2007-02-12 13:23:43 +01:00
|
|
|
LSN_OFFSET(*horizon) >
|
|
|
|
(log_descriptor.log_file_max_size -
|
|
|
|
TRANSLOG_PAGE_SIZE)))
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
*prev_buffer= buffer;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Buffer #%u (0x%lu): have to be flushed",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Use the same buffer #%u (0x%lu): "
|
|
|
|
"Buffer Size: %lu (%lu)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no,
|
|
|
|
(ulong) buffer,
|
|
|
|
(ulong) cursor->buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (cursor->ptr - cursor->buffer->buffer)));
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_finish_page(horizon, cursor);
|
|
|
|
translog_new_page_header(horizon, cursor);
|
|
|
|
*prev_buffer= NULL;
|
|
|
|
}
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Write data of given length to the current page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_write_data_on_page()
|
|
|
|
horizon \ Pointers on file and buffer
|
|
|
|
cursor /
|
|
|
|
length IN length of the chunk
|
|
|
|
buffer buffer with data
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
static my_bool translog_write_data_on_page(TRANSLOG_ADDRESS *horizon,
|
|
|
|
struct st_buffer_cursor *cursor,
|
|
|
|
translog_size_t length,
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *buffer)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_write_data_on_page");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Chunk length: %lu Page size %u",
|
|
|
|
(ulong) length, (uint) cursor->current_page_fill));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ASSERT(length > 0);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(length + cursor->current_page_fill <= TRANSLOG_PAGE_SIZE);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ASSERT(length + cursor->ptr <=cursor->buffer->buffer +
|
|
|
|
TRANSLOG_WRITE_BUFFER);
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
memcpy(cursor->ptr, buffer, length);
|
2007-02-02 08:41:32 +01:00
|
|
|
cursor->ptr+= length;
|
2007-02-19 22:01:27 +01:00
|
|
|
(*horizon)+= length; /* adds offset */
|
|
|
|
cursor->current_page_fill+= length;
|
2007-02-02 08:41:32 +01:00
|
|
|
if (!cursor->chaser)
|
|
|
|
cursor->buffer->size+= length;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Write data buffer #%u: 0x%lx "
|
|
|
|
"chaser: %d Size: %lu (%lu)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) cursor->buffer->buffer_no, (ulong) cursor->buffer,
|
|
|
|
cursor->chaser, (ulong) cursor->buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (cursor->ptr - cursor->buffer->buffer)));
|
2007-02-21 22:30:58 +01:00
|
|
|
DBUG_EXECUTE("info", translog_check_cursor(cursor););
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Write data from parts of given length to the current page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_write_parts_on_page()
|
|
|
|
horizon \ Pointers on file and buffer
|
|
|
|
cursor /
|
|
|
|
length IN length of the chunk
|
|
|
|
parts IN/OUT chunk source
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
static my_bool translog_write_parts_on_page(TRANSLOG_ADDRESS *horizon,
|
|
|
|
struct st_buffer_cursor *cursor,
|
|
|
|
translog_size_t length,
|
|
|
|
struct st_translog_parts *parts)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
translog_size_t left= length;
|
|
|
|
uint cur= (uint) parts->current;
|
|
|
|
DBUG_ENTER("translog_write_parts_on_page");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Chunk length: %lu parts: %u of %u. Page size: %u "
|
2007-02-02 08:41:32 +01:00
|
|
|
"Buffer size: %lu (%lu)",
|
|
|
|
(ulong) length,
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
(uint) (cur + 1), (uint) parts->elements,
|
2007-02-19 22:01:27 +01:00
|
|
|
(uint) cursor->current_page_fill,
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) cursor->buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (cursor->ptr - cursor->buffer->buffer)));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ASSERT(length > 0);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(length + cursor->current_page_fill <= TRANSLOG_PAGE_SIZE);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ASSERT(length + cursor->ptr <=cursor->buffer->buffer +
|
|
|
|
TRANSLOG_WRITE_BUFFER);
|
|
|
|
|
|
|
|
do
|
|
|
|
{
|
|
|
|
translog_size_t len;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
LEX_STRING *part;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *buff;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
DBUG_ASSERT(cur < parts->elements);
|
|
|
|
part= parts->parts + cur;
|
2007-07-02 19:45:15 +02:00
|
|
|
buff= (uchar*) part->str;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
DBUG_PRINT("info", ("Part: %u Length: %lu left: %lu buff: 0x%lx",
|
|
|
|
(uint) (cur + 1), (ulong) part->length, (ulong) left,
|
|
|
|
(ulong) buff));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
if (part->length > left)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/* we should write less then the current part */
|
|
|
|
len= left;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
part->length-= len;
|
|
|
|
part->str+= len;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Set new part: %u Length: %lu",
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
(uint) (cur + 1), (ulong) part->length));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
len= part->length;
|
2007-02-02 08:41:32 +01:00
|
|
|
cur++;
|
|
|
|
DBUG_PRINT("info", ("moved to next part (len: %lu)", (ulong) len));
|
|
|
|
}
|
|
|
|
DBUG_PRINT("info", ("copy: 0x%lx <- 0x%lx %u",
|
|
|
|
(ulong) cursor->ptr, (ulong)buff, (uint)len));
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
if (likely(len))
|
|
|
|
{
|
|
|
|
memcpy(cursor->ptr, buff, len);
|
|
|
|
left-= len;
|
|
|
|
cursor->ptr+= len;
|
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
} while (left);
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Horizon: (%lu,0x%lx) Length %lu(0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(*horizon),
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) length, (ulong) length));
|
2007-02-02 08:41:32 +01:00
|
|
|
parts->current= cur;
|
2007-02-19 22:01:27 +01:00
|
|
|
(*horizon)+= length; /* offset increasing */
|
|
|
|
cursor->current_page_fill+= length;
|
2007-02-02 08:41:32 +01:00
|
|
|
if (!cursor->chaser)
|
|
|
|
cursor->buffer->size+= length;
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_PRINT("info", ("Write parts buffer #%u: 0x%lx "
|
|
|
|
"chaser: %d Size: %lu (%lu) "
|
2007-02-19 22:01:27 +01:00
|
|
|
"Horizon: (%lu,0x%lx) buff offset: 0x%lx",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) cursor->buffer->buffer_no, (ulong) cursor->buffer,
|
|
|
|
cursor->chaser, (ulong) cursor->buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (cursor->ptr - cursor->buffer->buffer),
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(*horizon),
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) (LSN_OFFSET(cursor->buffer->offset) +
|
|
|
|
cursor->buffer->size)));
|
2007-02-21 22:30:58 +01:00
|
|
|
DBUG_EXECUTE("info", translog_check_cursor(cursor););
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Put 1 group chunk type 0 header into parts array
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_write_variable_record_1group_header()
|
|
|
|
parts Descriptor of record source parts
|
2007-02-19 22:01:27 +01:00
|
|
|
type The log record type
|
2007-06-11 16:29:53 +02:00
|
|
|
short_trid Short transaction ID or 0 if it has no sense
|
2007-02-02 08:41:32 +01:00
|
|
|
header_length Calculated header length of chunk type 0
|
|
|
|
chunk0_header Buffer for the chunk header writing
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void
|
|
|
|
translog_write_variable_record_1group_header(struct st_translog_parts *parts,
|
|
|
|
enum translog_record_type type,
|
|
|
|
SHORT_TRANSACTION_ID short_trid,
|
|
|
|
uint16 header_length,
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *chunk0_header)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
LEX_STRING *part;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(parts->current != 0); /* first part is left for header */
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
part= parts->parts + (--parts->current);
|
|
|
|
parts->total_record_length+= (part->length= header_length);
|
|
|
|
part->str= (char*)chunk0_header;
|
2007-02-19 22:01:27 +01:00
|
|
|
/* puts chunk type */
|
2007-07-02 19:45:15 +02:00
|
|
|
*chunk0_header= (uchar) (type | TRANSLOG_CHUNK_LSN);
|
2007-02-02 08:41:32 +01:00
|
|
|
int2store(chunk0_header + 1, short_trid);
|
2007-02-19 22:01:27 +01:00
|
|
|
/* puts record length */
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_write_variable_record_1group_code_len(chunk0_header + 3,
|
|
|
|
parts->record_length,
|
|
|
|
header_length);
|
2007-02-19 22:01:27 +01:00
|
|
|
/* puts 0 as chunk length which indicate 1 group record */
|
2007-02-02 08:41:32 +01:00
|
|
|
int2store(chunk0_header + header_length - 2, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Increase number of writers for this buffer
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_buffer_increase_writers()
|
|
|
|
buffer target buffer
|
|
|
|
*/
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
static inline void
|
|
|
|
translog_buffer_increase_writers(struct st_translog_buffer *buffer)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_buffer_increase_writers");
|
|
|
|
buffer->copy_to_buffer_in_progress++;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("copy_to_buffer_in_progress. Buffer #%u 0x%lx: %d",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
|
|
|
buffer->copy_to_buffer_in_progress));
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Decrease number of writers for this buffer
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_buffer_decrease_writers()
|
|
|
|
buffer target buffer
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
|
|
static void translog_buffer_decrease_writers(struct st_translog_buffer *buffer)
|
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_buffer_decrease_writers");
|
|
|
|
buffer->copy_to_buffer_in_progress--;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("copy_to_buffer_in_progress. Buffer #%u 0x%lx: %d",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buffer->buffer_no, (ulong) buffer,
|
|
|
|
buffer->copy_to_buffer_in_progress));
|
|
|
|
if (buffer->copy_to_buffer_in_progress == 0 &&
|
|
|
|
buffer->waiting_filling_buffer.last_thread != NULL)
|
|
|
|
wqueue_release_queue(&buffer->waiting_filling_buffer);
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Put chunk 2 from new page beginning
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_write_variable_record_chunk2_page()
|
|
|
|
parts Descriptor of record source parts
|
|
|
|
horizon \ Pointers on file position and buffer
|
|
|
|
cursor /
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool
|
|
|
|
translog_write_variable_record_chunk2_page(struct st_translog_parts *parts,
|
|
|
|
TRANSLOG_ADDRESS *horizon,
|
|
|
|
struct st_buffer_cursor *cursor)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
struct st_translog_buffer *buffer_to_flush;
|
2007-02-02 08:41:32 +01:00
|
|
|
int rc;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar chunk2_header[1];
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_write_variable_record_chunk2_page");
|
2007-02-19 22:01:27 +01:00
|
|
|
chunk2_header[0]= TRANSLOG_CHUNK_NOHDR;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-06-09 13:52:17 +02:00
|
|
|
LINT_INIT(buffer_to_flush);
|
2007-02-02 08:41:32 +01:00
|
|
|
rc= translog_page_next(horizon, cursor, &buffer_to_flush);
|
|
|
|
if (buffer_to_flush != NULL)
|
|
|
|
{
|
|
|
|
rc|= translog_buffer_lock(buffer_to_flush);
|
|
|
|
translog_buffer_decrease_writers(buffer_to_flush);
|
|
|
|
if (!rc)
|
|
|
|
rc= translog_buffer_flush(buffer_to_flush);
|
|
|
|
rc|= translog_buffer_unlock(buffer_to_flush);
|
|
|
|
}
|
|
|
|
if (rc)
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Puts chunk type */
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_write_data_on_page(horizon, cursor, 1, chunk2_header);
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Puts chunk body */
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_write_parts_on_page(horizon, cursor,
|
|
|
|
log_descriptor.page_capacity_chunk_2, parts);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Put chunk 3 of requested length in the buffer from new page beginning
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_write_variable_record_chunk3_page()
|
|
|
|
parts Descriptor of record source parts
|
|
|
|
length Length of this chunk
|
|
|
|
horizon \ Pointers on file position and buffer
|
|
|
|
cursor /
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool
|
|
|
|
translog_write_variable_record_chunk3_page(struct st_translog_parts *parts,
|
|
|
|
uint16 length,
|
|
|
|
TRANSLOG_ADDRESS *horizon,
|
|
|
|
struct st_buffer_cursor *cursor)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
struct st_translog_buffer *buffer_to_flush;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
LEX_STRING *part;
|
2007-02-02 08:41:32 +01:00
|
|
|
int rc;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar chunk3_header[1 + 2];
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_write_variable_record_chunk3_page");
|
|
|
|
|
2007-06-09 13:52:17 +02:00
|
|
|
LINT_INIT(buffer_to_flush);
|
2007-02-02 08:41:32 +01:00
|
|
|
rc= translog_page_next(horizon, cursor, &buffer_to_flush);
|
|
|
|
if (buffer_to_flush != NULL)
|
|
|
|
{
|
|
|
|
rc|= translog_buffer_lock(buffer_to_flush);
|
|
|
|
translog_buffer_decrease_writers(buffer_to_flush);
|
|
|
|
if (!rc)
|
|
|
|
rc= translog_buffer_flush(buffer_to_flush);
|
|
|
|
rc|= translog_buffer_unlock(buffer_to_flush);
|
|
|
|
}
|
|
|
|
if (rc)
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
if (length == 0)
|
|
|
|
{
|
|
|
|
/* It was call to write page header only (no data for chunk 3) */
|
|
|
|
DBUG_PRINT("info", ("It is a call to make page header only"));
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(parts->current != 0); /* first part is left for header */
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
part= parts->parts + (--parts->current);
|
|
|
|
parts->total_record_length+= (part->length= 1 + 2);
|
|
|
|
part->str= (char*)chunk3_header;
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Puts chunk type */
|
2007-07-02 19:45:15 +02:00
|
|
|
*chunk3_header= (uchar) (TRANSLOG_CHUNK_LNGTH);
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Puts chunk length */
|
2007-02-02 08:41:32 +01:00
|
|
|
int2store(chunk3_header + 1, length);
|
|
|
|
|
|
|
|
translog_write_parts_on_page(horizon, cursor, length + 1 + 2, parts);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
Move log pointer (horizon) on given number pages starting from next page,
|
|
|
|
and given offset on the last page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_advance_pointer()
|
|
|
|
pages Number of full pages starting from the next one
|
|
|
|
last_page_data Plus this data on the last page
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_advance_pointer(uint pages, uint16 last_page_data)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
translog_size_t last_page_offset= (log_descriptor.page_overhead +
|
|
|
|
last_page_data);
|
2007-02-12 13:23:43 +01:00
|
|
|
translog_size_t offset= (TRANSLOG_PAGE_SIZE -
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.bc.current_page_fill +
|
2007-02-12 13:23:43 +01:00
|
|
|
pages * TRANSLOG_PAGE_SIZE + last_page_offset);
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_size_t buffer_end_offset, file_end_offset, min_offset;
|
|
|
|
DBUG_ENTER("translog_advance_pointer");
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_PRINT("enter", ("Pointer: (%lu, 0x%lx) + %u + %u pages + %u + %u",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) (TRANSLOG_PAGE_SIZE -
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.bc.current_page_fill),
|
2007-02-02 08:41:32 +01:00
|
|
|
pages, (uint) log_descriptor.page_overhead,
|
|
|
|
(uint) last_page_data));
|
|
|
|
|
|
|
|
for (;;)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint8 new_buffer_no;
|
2007-02-02 08:41:32 +01:00
|
|
|
struct st_translog_buffer *new_buffer;
|
|
|
|
struct st_translog_buffer *old_buffer;
|
|
|
|
buffer_end_offset= TRANSLOG_WRITE_BUFFER - log_descriptor.bc.buffer->size;
|
2007-02-19 22:01:27 +01:00
|
|
|
file_end_offset= (log_descriptor.log_file_max_size -
|
|
|
|
LSN_OFFSET(log_descriptor.horizon));
|
|
|
|
DBUG_PRINT("info", ("offset: %lu buffer_end_offs: %lu, "
|
2007-02-02 08:41:32 +01:00
|
|
|
"file_end_offs: %lu",
|
|
|
|
(ulong) offset, (ulong) buffer_end_offset,
|
|
|
|
(ulong) file_end_offset));
|
|
|
|
DBUG_PRINT("info", ("Buff #%u %u (0x%lx) offset 0x%lx + size 0x%lx = "
|
|
|
|
"0x%lx (0x%lx)",
|
|
|
|
(uint) log_descriptor.bc.buffer->buffer_no,
|
|
|
|
(uint) log_descriptor.bc.buffer_no,
|
|
|
|
(ulong) log_descriptor.bc.buffer,
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_OFFSET(log_descriptor.bc.buffer->offset),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) log_descriptor.bc.buffer->size,
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) (LSN_OFFSET(log_descriptor.bc.buffer->offset) +
|
2007-02-02 08:41:32 +01:00
|
|
|
log_descriptor.bc.buffer->size),
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_OFFSET(log_descriptor.horizon)));
|
|
|
|
DBUG_ASSERT(LSN_OFFSET(log_descriptor.bc.buffer->offset) +
|
2007-02-02 08:41:32 +01:00
|
|
|
log_descriptor.bc.buffer->size ==
|
2007-02-12 13:23:43 +01:00
|
|
|
LSN_OFFSET(log_descriptor.horizon));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
if (offset <= buffer_end_offset && offset <= file_end_offset)
|
|
|
|
break;
|
|
|
|
old_buffer= log_descriptor.bc.buffer;
|
|
|
|
new_buffer_no= (log_descriptor.bc.buffer_no + 1) % TRANSLOG_BUFFERS_NO;
|
|
|
|
new_buffer= log_descriptor.buffers + new_buffer_no;
|
|
|
|
|
|
|
|
translog_buffer_lock(new_buffer);
|
|
|
|
translog_wait_for_buffer_free(new_buffer);
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
min_offset= min(buffer_end_offset, file_end_offset);
|
2007-02-21 22:30:58 +01:00
|
|
|
/* TODO: check is it ptr or size enough */
|
2007-02-02 08:41:32 +01:00
|
|
|
log_descriptor.bc.buffer->size+= min_offset;
|
2007-02-12 13:23:43 +01:00
|
|
|
log_descriptor.bc.ptr+= min_offset;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("NewP buffer #%u: 0x%lx chaser: %d Size: %lu (%lu)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) log_descriptor.bc.buffer->buffer_no,
|
|
|
|
(ulong) log_descriptor.bc.buffer,
|
|
|
|
log_descriptor.bc.chaser,
|
|
|
|
(ulong) log_descriptor.bc.buffer->size,
|
|
|
|
(ulong) (log_descriptor.bc.ptr -log_descriptor.bc.
|
|
|
|
buffer->buffer)));
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT((ulong) (log_descriptor.bc.ptr -
|
|
|
|
log_descriptor.bc.buffer->buffer) ==
|
2007-02-02 08:41:32 +01:00
|
|
|
log_descriptor.bc.buffer->size);
|
|
|
|
DBUG_ASSERT(log_descriptor.bc.buffer->buffer_no ==
|
|
|
|
log_descriptor.bc.buffer_no);
|
|
|
|
translog_buffer_increase_writers(log_descriptor.bc.buffer);
|
|
|
|
|
|
|
|
if (file_end_offset <= buffer_end_offset)
|
|
|
|
{
|
2007-02-12 13:23:43 +01:00
|
|
|
log_descriptor.horizon+= LSN_ONE_FILE;
|
|
|
|
log_descriptor.horizon= LSN_REPLACE_OFFSET(log_descriptor.horizon,
|
|
|
|
TRANSLOG_PAGE_SIZE);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("New file: %lu",
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_FILE_NO(log_descriptor.horizon)));
|
2007-02-02 08:41:32 +01:00
|
|
|
if (translog_create_new_file())
|
|
|
|
{
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("The same file"));
|
2007-02-12 13:23:43 +01:00
|
|
|
log_descriptor.horizon+= min_offset; /* offset increasing */
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
translog_start_buffer(new_buffer, &log_descriptor.bc, new_buffer_no);
|
2007-08-13 14:17:49 +02:00
|
|
|
old_buffer->next_buffer_offset= new_buffer->offset;
|
2007-02-02 08:41:32 +01:00
|
|
|
if (translog_buffer_unlock(old_buffer))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
offset-= min_offset;
|
|
|
|
}
|
|
|
|
log_descriptor.bc.ptr+= offset;
|
|
|
|
log_descriptor.bc.buffer->size+= offset;
|
|
|
|
translog_buffer_increase_writers(log_descriptor.bc.buffer);
|
2007-02-12 13:23:43 +01:00
|
|
|
log_descriptor.horizon+= offset; /* offset increasing */
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.bc.current_page_fill= last_page_offset;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_PRINT("info", ("drop write_counter"));
|
|
|
|
log_descriptor.bc.write_counter= 0;
|
|
|
|
log_descriptor.bc.previous_offset= 0;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("NewP buffer #%u: 0x%lx chaser: %d Size: %lu (%lu) "
|
|
|
|
"offset: %u last page: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) log_descriptor.bc.buffer->buffer_no,
|
|
|
|
(ulong) log_descriptor.bc.buffer,
|
|
|
|
log_descriptor.bc.chaser,
|
|
|
|
(ulong) log_descriptor.bc.buffer->size,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (log_descriptor.bc.ptr -
|
|
|
|
log_descriptor.bc.buffer->
|
2007-02-02 08:41:32 +01:00
|
|
|
buffer), (uint) offset,
|
|
|
|
(uint) last_page_offset));
|
|
|
|
DBUG_PRINT("info",
|
2007-02-12 13:23:43 +01:00
|
|
|
("pointer moved to: (%lu, 0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon)));
|
2007-02-21 22:30:58 +01:00
|
|
|
DBUG_EXECUTE("info", translog_check_cursor(&log_descriptor.bc););
|
2007-02-02 08:41:32 +01:00
|
|
|
log_descriptor.bc.protected= 0;
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Get page rest
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_current_page_rest()
|
|
|
|
|
|
|
|
NOTE loghandler should be locked
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
number of bytes left on the current page
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define translog_get_current_page_rest() \
|
2007-02-19 22:01:27 +01:00
|
|
|
(TRANSLOG_PAGE_SIZE - log_descriptor.bc.current_page_fill)
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
Get buffer rest in full pages
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_current_buffer_rest()
|
|
|
|
|
|
|
|
NOTE loghandler should be locked
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
number of full pages left on the current buffer
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define translog_get_current_buffer_rest() \
|
|
|
|
((log_descriptor.bc.buffer->buffer + TRANSLOG_WRITE_BUFFER - \
|
|
|
|
log_descriptor.bc.ptr) / \
|
|
|
|
TRANSLOG_PAGE_SIZE)
|
|
|
|
|
|
|
|
/*
|
|
|
|
Calculate possible group size without first (current) page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_current_group_size()
|
|
|
|
|
|
|
|
NOTE loghandler should be locked
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
group size without first (current) page
|
|
|
|
*/
|
|
|
|
|
|
|
|
static translog_size_t translog_get_current_group_size()
|
|
|
|
{
|
|
|
|
/* buffer rest in full pages */
|
|
|
|
translog_size_t buffer_rest= translog_get_current_buffer_rest();
|
|
|
|
DBUG_ENTER("translog_get_current_group_size");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("buffer_rest in pages: %u", buffer_rest));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
buffer_rest*= log_descriptor.page_capacity_chunk_2;
|
|
|
|
/* in case of only half of buffer free we can write this and next buffer */
|
|
|
|
if (buffer_rest < log_descriptor.half_buffer_capacity_chunk_2)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("buffer_rest: %lu -> add %lu",
|
|
|
|
(ulong) buffer_rest,
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) log_descriptor.buffer_capacity_chunk_2));
|
|
|
|
buffer_rest+= log_descriptor.buffer_capacity_chunk_2;
|
|
|
|
}
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("buffer_rest: %lu", (ulong) buffer_rest));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_RETURN(buffer_rest);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
/**
|
|
|
|
@brief Write variable record in 1 group.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@param lsn LSN of the record will be written here
|
|
|
|
@param type the log record type
|
|
|
|
@param short_trid Short transaction ID or 0 if it has no sense
|
|
|
|
@param parts Descriptor of record source parts
|
|
|
|
@param buffer_to_flush Buffer which have to be flushed if it is not 0
|
|
|
|
@param header_length Calculated header length of chunk type 0
|
|
|
|
@param trn Transaction structure pointer for hooks by
|
|
|
|
record log type, for short_id
|
|
|
|
@param hook_arg Argument which will be passed to pre-write and
|
|
|
|
in-write hooks of this record.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool
|
|
|
|
translog_write_variable_record_1group(LSN *lsn,
|
|
|
|
enum translog_record_type type,
|
2007-08-01 15:52:57 +02:00
|
|
|
MARIA_HA *tbl_info,
|
2007-02-02 08:41:32 +01:00
|
|
|
SHORT_TRANSACTION_ID short_trid,
|
|
|
|
struct st_translog_parts *parts,
|
|
|
|
struct st_translog_buffer
|
|
|
|
*buffer_to_flush, uint16 header_length,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
TRN *trn, void *hook_arg)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
TRANSLOG_ADDRESS horizon;
|
|
|
|
struct st_buffer_cursor cursor;
|
|
|
|
int rc= 0;
|
|
|
|
uint i;
|
|
|
|
translog_size_t record_rest, full_pages, first_page;
|
|
|
|
uint additional_chunk3_page= 0;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar chunk0_header[1 + 2 + 5 + 2];
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_write_variable_record_1group");
|
|
|
|
|
|
|
|
*lsn= horizon= log_descriptor.horizon;
|
2007-08-19 21:27:43 +02:00
|
|
|
if (translog_set_lsn_for_files(LSN_FILE_NO(*lsn), LSN_FILE_NO(*lsn),
|
|
|
|
*lsn, TRUE) ||
|
|
|
|
(log_record_type_descriptor[type].inwrite_hook &&
|
|
|
|
(*log_record_type_descriptor[type].inwrite_hook)(type, trn, tbl_info,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
lsn, hook_arg)))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
translog_unlock();
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
cursor= log_descriptor.bc;
|
|
|
|
cursor.chaser= 1;
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Advance pointer To be able unlock the loghandler */
|
2007-02-02 08:41:32 +01:00
|
|
|
first_page= translog_get_current_page_rest();
|
|
|
|
record_rest= parts->record_length - (first_page - header_length);
|
|
|
|
full_pages= record_rest / log_descriptor.page_capacity_chunk_2;
|
|
|
|
record_rest= (record_rest % log_descriptor.page_capacity_chunk_2);
|
|
|
|
|
|
|
|
if (record_rest + 1 == log_descriptor.page_capacity_chunk_2)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("2 chunks type 3 is needed"));
|
|
|
|
/* We will write 2 chunks type 3 at the end of this group */
|
|
|
|
additional_chunk3_page= 1;
|
|
|
|
record_rest= 1;
|
|
|
|
}
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("first_page: %u (%u) full_pages: %u (%lu) "
|
|
|
|
"additional: %u (%u) rest %u = %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
first_page, first_page - header_length,
|
|
|
|
full_pages,
|
|
|
|
(ulong) full_pages *
|
|
|
|
log_descriptor.page_capacity_chunk_2,
|
|
|
|
additional_chunk3_page,
|
|
|
|
additional_chunk3_page *
|
|
|
|
(log_descriptor.page_capacity_chunk_2 - 1),
|
|
|
|
record_rest, parts->record_length));
|
|
|
|
/* record_rest + 3 is chunk type 3 overhead + record_rest */
|
2007-02-19 22:01:27 +01:00
|
|
|
rc|= translog_advance_pointer(full_pages + additional_chunk3_page,
|
|
|
|
(record_rest ? record_rest + 3 : 0));
|
2007-02-02 08:41:32 +01:00
|
|
|
log_descriptor.bc.buffer->last_lsn= *lsn;
|
|
|
|
|
|
|
|
rc|= translog_unlock();
|
|
|
|
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
Check if we switched buffer and need process it (current buffer is
|
2007-02-02 08:41:32 +01:00
|
|
|
unlocked already => we will not delay other threads
|
|
|
|
*/
|
|
|
|
if (buffer_to_flush != NULL)
|
|
|
|
{
|
|
|
|
if (!rc)
|
|
|
|
rc= translog_buffer_flush(buffer_to_flush);
|
|
|
|
rc|= translog_buffer_unlock(buffer_to_flush);
|
|
|
|
}
|
|
|
|
if (rc)
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
|
|
|
translog_write_variable_record_1group_header(parts, type, short_trid,
|
|
|
|
header_length, chunk0_header);
|
|
|
|
|
|
|
|
/* fill the pages */
|
|
|
|
translog_write_parts_on_page(&horizon, &cursor, first_page, parts);
|
|
|
|
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) local: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
|
|
|
LSN_IN_PARTS(horizon)));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
for (i= 0; i < full_pages; i++)
|
|
|
|
{
|
|
|
|
if (translog_write_variable_record_chunk2_page(parts, &horizon, &cursor))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) local: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
|
|
|
LSN_IN_PARTS(horizon)));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
if (additional_chunk3_page)
|
|
|
|
{
|
|
|
|
if (translog_write_variable_record_chunk3_page(parts,
|
|
|
|
log_descriptor.
|
|
|
|
page_capacity_chunk_2 - 2,
|
|
|
|
&horizon, &cursor))
|
|
|
|
DBUG_RETURN(1);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) local: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
|
|
|
LSN_IN_PARTS(horizon)));
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(cursor.current_page_fill == TRANSLOG_PAGE_SIZE);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
if (translog_write_variable_record_chunk3_page(parts,
|
|
|
|
record_rest,
|
|
|
|
&horizon, &cursor))
|
|
|
|
DBUG_RETURN(1);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) local: (%lu,0x%lx)",
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_FILE_NO(log_descriptor.horizon),
|
|
|
|
(ulong) LSN_OFFSET(log_descriptor.horizon),
|
|
|
|
(ulong) LSN_FILE_NO(horizon),
|
|
|
|
(ulong) LSN_OFFSET(horizon)));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
if (!(rc= translog_buffer_lock(cursor.buffer)))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
Check if we wrote something on 1:st not full page and need to reconstruct
|
2007-02-02 08:41:32 +01:00
|
|
|
CRC and sector protection
|
|
|
|
*/
|
|
|
|
translog_buffer_decrease_writers(cursor.buffer);
|
|
|
|
}
|
|
|
|
rc|= translog_buffer_unlock(cursor.buffer);
|
|
|
|
DBUG_RETURN(rc);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
/**
|
|
|
|
@brief Write variable record in 1 chunk.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@param lsn LSN of the record will be written here
|
|
|
|
@param type the log record type
|
|
|
|
@param short_trid Short transaction ID or 0 if it has no sense
|
|
|
|
@param parts Descriptor of record source parts
|
|
|
|
@param buffer_to_flush Buffer which have to be flushed if it is not 0
|
|
|
|
@param header_length Calculated header length of chunk type 0
|
|
|
|
@param trn Transaction structure pointer for hooks by
|
|
|
|
record log type, for short_id
|
|
|
|
@param hook_arg Argument which will be passed to pre-write and
|
|
|
|
in-write hooks of this record.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool
|
|
|
|
translog_write_variable_record_1chunk(LSN *lsn,
|
|
|
|
enum translog_record_type type,
|
2007-08-01 15:52:57 +02:00
|
|
|
MARIA_HA *tbl_info,
|
2007-02-02 08:41:32 +01:00
|
|
|
SHORT_TRANSACTION_ID short_trid,
|
|
|
|
struct st_translog_parts *parts,
|
|
|
|
struct st_translog_buffer
|
|
|
|
*buffer_to_flush, uint16 header_length,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
TRN *trn, void *hook_arg)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
int rc;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar chunk0_header[1 + 2 + 5 + 2];
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_write_variable_record_1chunk");
|
|
|
|
|
|
|
|
translog_write_variable_record_1group_header(parts, type, short_trid,
|
|
|
|
header_length, chunk0_header);
|
|
|
|
|
|
|
|
*lsn= log_descriptor.horizon;
|
2007-08-19 21:27:43 +02:00
|
|
|
if (translog_set_lsn_for_files(LSN_FILE_NO(*lsn), LSN_FILE_NO(*lsn),
|
|
|
|
*lsn, TRUE) ||
|
|
|
|
(log_record_type_descriptor[type].inwrite_hook &&
|
|
|
|
(*log_record_type_descriptor[type].inwrite_hook)(type, trn, tbl_info,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
lsn, hook_arg)))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
translog_unlock();
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
rc= translog_write_parts_on_page(&log_descriptor.horizon,
|
|
|
|
&log_descriptor.bc,
|
|
|
|
parts->total_record_length, parts);
|
|
|
|
log_descriptor.bc.buffer->last_lsn= *lsn;
|
|
|
|
rc|= translog_unlock();
|
|
|
|
|
|
|
|
/*
|
|
|
|
check if we switched buffer and need process it (current buffer is
|
|
|
|
unlocked already => we will not delay other threads
|
|
|
|
*/
|
|
|
|
if (buffer_to_flush != NULL)
|
|
|
|
{
|
|
|
|
if (!rc)
|
|
|
|
rc= translog_buffer_flush(buffer_to_flush);
|
|
|
|
rc|= translog_buffer_unlock(buffer_to_flush);
|
|
|
|
}
|
|
|
|
|
|
|
|
DBUG_RETURN(rc);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Calculate and write LSN difference (compressed LSN)
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_put_LSN_diff()
|
|
|
|
base_lsn LSN from which we calculate difference
|
|
|
|
lsn LSN for codding
|
2007-02-19 22:01:27 +01:00
|
|
|
dst Result will be written to dst[-pack_length] .. dst[-1]
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
NOTE:
|
2007-02-19 22:01:27 +01:00
|
|
|
To store an LSN in a compact way we will use the following compression:
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
If a log record has LSN1, and it contains the lSN2 as a back reference,
|
|
|
|
Instead of LSN2 we write LSN1-LSN2, encoded as:
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
two bits the number N (see below)
|
|
|
|
14 bits
|
|
|
|
N bytes
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
That is, LSN is encoded in 2..5 bytes, and the number of bytes minus 2
|
2007-02-02 08:41:32 +01:00
|
|
|
is stored in the first two bits.
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
# pointer on coded LSN
|
|
|
|
NULL Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static uchar *translog_put_LSN_diff(LSN base_lsn, LSN lsn, uchar *dst)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_put_LSN_diff");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Base: (0x%lu,0x%lx) val: (0x%lu,0x%lx) dst: 0x%lx",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(base_lsn), LSN_IN_PARTS(lsn),
|
|
|
|
(ulong) dst));
|
2007-02-12 13:23:43 +01:00
|
|
|
if (LSN_FILE_NO(base_lsn) == LSN_FILE_NO(lsn))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
uint32 diff;
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(base_lsn > lsn);
|
|
|
|
diff= base_lsn - lsn;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("File is the same. Diff: 0x%lx", (ulong) diff));
|
2007-02-02 08:41:32 +01:00
|
|
|
if (diff <= 0x3FFF)
|
|
|
|
{
|
|
|
|
dst-= 2;
|
2007-02-19 22:01:27 +01:00
|
|
|
/*
|
2007-07-02 19:45:15 +02:00
|
|
|
Note we store this high uchar first to ensure that first uchar has
|
2007-02-19 22:01:27 +01:00
|
|
|
0 in the 3 upper bits.
|
|
|
|
*/
|
2007-02-02 08:41:32 +01:00
|
|
|
dst[0]= diff >> 8;
|
|
|
|
dst[1]= (diff & 0xFF);
|
|
|
|
}
|
|
|
|
else if (diff <= 0x3FFFFF)
|
|
|
|
{
|
|
|
|
dst-= 3;
|
|
|
|
dst[0]= 0x40 | (diff >> 16);
|
|
|
|
int2store(dst + 1, diff & 0xFFFF);
|
|
|
|
}
|
|
|
|
else if (diff <= 0x3FFFFFFF)
|
|
|
|
{
|
|
|
|
dst-= 4;
|
|
|
|
dst[0]= 0x80 | (diff >> 24);
|
|
|
|
int3store(dst + 1, diff & 0xFFFFFF);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
dst-= 5;
|
|
|
|
dst[0]= 0xC0;
|
|
|
|
int4store(dst + 1, diff);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
uint32 diff;
|
|
|
|
uint32 offset_diff;
|
2007-02-12 13:23:43 +01:00
|
|
|
ulonglong base_offset= LSN_OFFSET(base_lsn);
|
|
|
|
DBUG_ASSERT(base_lsn > lsn);
|
|
|
|
diff= LSN_FILE_NO(base_lsn) - LSN_FILE_NO(lsn);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("File is different. Diff: 0x%lx", (ulong) diff));
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
if (base_offset < LSN_OFFSET(lsn))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/* take 1 from file offset */
|
|
|
|
diff--;
|
2007-02-19 22:01:27 +01:00
|
|
|
base_offset+= LL(0x100000000);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-12 13:23:43 +01:00
|
|
|
offset_diff= base_offset - LSN_OFFSET(lsn);
|
2007-02-02 08:41:32 +01:00
|
|
|
if (diff > 0x3f)
|
|
|
|
{
|
2007-06-15 00:45:55 +02:00
|
|
|
/*
|
|
|
|
It is full LSN after special 1 diff (which is impossible
|
|
|
|
in real life)
|
|
|
|
*/
|
|
|
|
dst-= 2 + LSN_STORE_SIZE;
|
|
|
|
dst[0]= 0;
|
|
|
|
dst[1]= 1;
|
|
|
|
lsn_store(dst + 2, lsn);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
dst-= 5;
|
|
|
|
*dst= (0xC0 | diff);
|
|
|
|
int4store(dst + 1, offset_diff);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
DBUG_PRINT("info", ("new dst: 0x%lx", (ulong) dst));
|
|
|
|
DBUG_RETURN(dst);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Get LSN from LSN-difference (compressed LSN)
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_LSN_from_diff()
|
|
|
|
base_lsn LSN from which we calculate difference
|
|
|
|
src pointer to coded lsn
|
|
|
|
dst pointer to buffer where to write 7byte LSN
|
|
|
|
|
|
|
|
NOTE:
|
2007-02-19 22:01:27 +01:00
|
|
|
To store an LSN in a compact way we will use the following compression:
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
If a log record has LSN1, and it contains the lSN2 as a back reference,
|
2007-02-19 22:01:27 +01:00
|
|
|
Instead of LSN2 we write LSN1-LSN2, encoded as:
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
two bits the number N (see below)
|
|
|
|
14 bits
|
|
|
|
N bytes
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
That is, LSN is encoded in 2..5 bytes, and the number of bytes minus 2
|
|
|
|
is stored in the first two bits.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
RETURN
|
|
|
|
pointer to buffer after decoded LSN
|
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static uchar *translog_get_LSN_from_diff(LSN base_lsn, uchar *src, uchar *dst)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
LSN lsn;
|
|
|
|
uint32 diff;
|
|
|
|
uint32 first_byte;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint32 file_no, rec_offset;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint8 code;
|
|
|
|
DBUG_ENTER("translog_get_LSN_from_diff");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Base: (0x%lx,0x%lx) src: 0x%lx dst 0x%lx",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(base_lsn), (ulong) src, (ulong) dst));
|
2007-02-02 08:41:32 +01:00
|
|
|
first_byte= *((uint8*) src);
|
2007-04-25 23:19:56 +02:00
|
|
|
code= first_byte >> 6; /* Length is in 2 most significant bits */
|
2007-02-19 22:01:27 +01:00
|
|
|
first_byte&= 0x3F;
|
|
|
|
src++; /* Skip length + encode */
|
|
|
|
file_no= LSN_FILE_NO(base_lsn); /* Assume relative */
|
|
|
|
DBUG_PRINT("info", ("code: %u first byte: %lu",
|
|
|
|
(uint) code, (ulong) first_byte));
|
2007-02-02 08:41:32 +01:00
|
|
|
switch (code) {
|
2007-02-19 22:01:27 +01:00
|
|
|
case 0:
|
2007-06-15 00:45:55 +02:00
|
|
|
if (first_byte == 0 && *((uint8*)src) == 1)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
It is full LSN after special 1 diff (which is impossible
|
|
|
|
in real life)
|
|
|
|
*/
|
|
|
|
memcpy(dst, src + 1, LSN_STORE_SIZE);
|
|
|
|
DBUG_PRINT("info", ("Special case of full LSN, new src: 0x%lx",
|
|
|
|
(ulong) (src + 1 + LSN_STORE_SIZE)));
|
|
|
|
DBUG_RETURN(src + 1 + LSN_STORE_SIZE);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
rec_offset= LSN_OFFSET(base_lsn) - ((first_byte << 8) + *((uint8*)src));
|
2007-02-02 08:41:32 +01:00
|
|
|
break;
|
2007-02-19 22:01:27 +01:00
|
|
|
case 1:
|
|
|
|
diff= uint2korr(src);
|
|
|
|
rec_offset= LSN_OFFSET(base_lsn) - ((first_byte << 16) + diff);
|
2007-02-02 08:41:32 +01:00
|
|
|
break;
|
2007-02-19 22:01:27 +01:00
|
|
|
case 2:
|
|
|
|
diff= uint3korr(src);
|
|
|
|
rec_offset= LSN_OFFSET(base_lsn) - ((first_byte << 24) + diff);
|
2007-02-02 08:41:32 +01:00
|
|
|
break;
|
2007-02-19 22:01:27 +01:00
|
|
|
case 3:
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-12 13:23:43 +01:00
|
|
|
ulonglong base_offset= LSN_OFFSET(base_lsn);
|
2007-02-19 22:01:27 +01:00
|
|
|
diff= uint4korr(src);
|
2007-02-12 13:23:43 +01:00
|
|
|
if (diff > LSN_OFFSET(base_lsn))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/* take 1 from file offset */
|
|
|
|
first_byte++;
|
2007-02-19 22:01:27 +01:00
|
|
|
base_offset+= LL(0x100000000);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
file_no= LSN_FILE_NO(base_lsn) - first_byte;
|
|
|
|
rec_offset= base_offset - diff;
|
2007-02-02 08:41:32 +01:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
DBUG_ASSERT(0);
|
|
|
|
DBUG_RETURN(NULL);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
lsn= MAKE_LSN(file_no, rec_offset);
|
|
|
|
src+= code + 1;
|
|
|
|
lsn_store(dst, lsn);
|
2007-06-15 00:45:55 +02:00
|
|
|
DBUG_PRINT("info", ("new src: 0x%lx", (ulong) src));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(src);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Encode relative LSNs listed in the parameters
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_relative_LSN_encode()
|
|
|
|
parts Parts list with encoded LSN(s)
|
|
|
|
base_lsn LSN which is base for encoding
|
|
|
|
lsns number of LSN(s) to encode
|
|
|
|
compressed_LSNs buffer which can be used for storing compressed LSN(s)
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_relative_LSN_encode(struct st_translog_parts *parts,
|
2007-02-12 13:23:43 +01:00
|
|
|
LSN base_lsn,
|
2007-07-02 19:45:15 +02:00
|
|
|
uint lsns, uchar *compressed_LSNs)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
LEX_STRING *part;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint lsns_len= lsns * LSN_STORE_SIZE;
|
2007-06-15 00:45:55 +02:00
|
|
|
char buffer_src[MAX_NUMBER_OF_LSNS_PER_RECORD * LSN_STORE_SIZE];
|
|
|
|
char *buffer= buffer_src;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_ENTER("translog_relative_LSN_encode");
|
|
|
|
|
2007-06-15 00:45:55 +02:00
|
|
|
DBUG_ASSERT(parts->current != 0);
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
part= parts->parts + parts->current;
|
2007-06-15 00:45:55 +02:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
/* collect all LSN(s) in one chunk if it (they) is (are) divided */
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
if (part->length < lsns_len)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
uint copied= part->length;
|
|
|
|
LEX_STRING *next_part;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Using buffer: 0x%lx", (ulong) compressed_LSNs));
|
2007-07-02 19:45:15 +02:00
|
|
|
memcpy(buffer, (uchar*)part->str, part->length);
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
next_part= parts->parts + parts->current + 1;
|
2007-02-02 08:41:32 +01:00
|
|
|
do
|
|
|
|
{
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
DBUG_ASSERT(next_part < parts->parts + parts->elements);
|
|
|
|
if ((next_part->length + copied) < lsns_len)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-07-02 19:45:15 +02:00
|
|
|
memcpy(buffer + copied, (uchar*)next_part->str,
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
next_part->length);
|
|
|
|
copied+= next_part->length;
|
|
|
|
next_part->length= 0; next_part->str= 0;
|
|
|
|
/* delete_dynamic_element(&parts->parts, parts->current + 1); */
|
|
|
|
next_part++;
|
2007-06-15 00:45:55 +02:00
|
|
|
parts->current++;
|
|
|
|
part= parts->parts + parts->current;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
uint len= lsns_len - copied;
|
2007-07-02 19:45:15 +02:00
|
|
|
memcpy(buffer + copied, (uchar*)next_part->str, len);
|
2007-02-02 08:41:32 +01:00
|
|
|
copied= lsns_len;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
next_part->str+= len;
|
|
|
|
next_part->length-= len;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
} while (copied < lsns_len);
|
|
|
|
}
|
2007-06-15 00:45:55 +02:00
|
|
|
else
|
|
|
|
{
|
|
|
|
buffer= part->str;
|
|
|
|
part->str+= lsns_len;
|
|
|
|
part->length-= lsns_len;
|
|
|
|
parts->current--;
|
|
|
|
part= parts->parts + parts->current;
|
|
|
|
}
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/* Compress */
|
|
|
|
LSN ref;
|
2007-06-15 00:45:55 +02:00
|
|
|
int economy;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *src_ptr;
|
|
|
|
uchar *dst_ptr= compressed_LSNs + (MAX_NUMBER_OF_LSNS_PER_RECORD *
|
2007-06-15 00:45:55 +02:00
|
|
|
COMPRESSED_LSN_MAX_STORE_SIZE);
|
|
|
|
for (src_ptr= buffer + lsns_len - LSN_STORE_SIZE;
|
2007-08-13 14:17:49 +02:00
|
|
|
src_ptr >= (uchar*) buffer;
|
2007-06-15 00:45:55 +02:00
|
|
|
src_ptr-= LSN_STORE_SIZE)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-06-15 00:45:55 +02:00
|
|
|
ref= lsn_korr(src_ptr);
|
2007-02-12 13:23:43 +01:00
|
|
|
if ((dst_ptr= translog_put_LSN_diff(base_lsn, ref, dst_ptr)) == NULL)
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
2007-06-15 00:45:55 +02:00
|
|
|
part->length= (uint)((compressed_LSNs +
|
|
|
|
(MAX_NUMBER_OF_LSNS_PER_RECORD *
|
|
|
|
COMPRESSED_LSN_MAX_STORE_SIZE)) -
|
|
|
|
dst_ptr);
|
|
|
|
parts->record_length-= (economy= lsns_len - part->length);
|
2007-07-27 12:06:39 +02:00
|
|
|
DBUG_PRINT("info", ("new length of LSNs: %lu economy: %d",
|
|
|
|
(ulong)part->length, economy));
|
2007-02-02 08:41:32 +01:00
|
|
|
parts->total_record_length-= economy;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
part->str= (char*)dst_ptr;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
/**
|
|
|
|
@brief Write multi-group variable-size record.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@param lsn LSN of the record will be written here
|
|
|
|
@param type the log record type
|
|
|
|
@param short_trid Short transaction ID or 0 if it has no sense
|
|
|
|
@param parts Descriptor of record source parts
|
|
|
|
@param buffer_to_flush Buffer which have to be flushed if it is not 0
|
|
|
|
@param header_length Header length calculated for 1 group
|
|
|
|
@param buffer_rest Beginning from which we plan to write in full pages
|
|
|
|
@param trn Transaction structure pointer for hooks by
|
|
|
|
record log type, for short_id
|
|
|
|
@param hook_arg Argument which will be passed to pre-write and
|
|
|
|
in-write hooks of this record.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool
|
|
|
|
translog_write_variable_record_mgroup(LSN *lsn,
|
|
|
|
enum translog_record_type type,
|
2007-08-01 15:52:57 +02:00
|
|
|
MARIA_HA *tbl_info,
|
2007-02-02 08:41:32 +01:00
|
|
|
SHORT_TRANSACTION_ID short_trid,
|
|
|
|
struct st_translog_parts *parts,
|
|
|
|
struct st_translog_buffer
|
|
|
|
*buffer_to_flush,
|
|
|
|
uint16 header_length,
|
|
|
|
translog_size_t buffer_rest,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
TRN *trn, void *hook_arg)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
TRANSLOG_ADDRESS horizon;
|
|
|
|
struct st_buffer_cursor cursor;
|
|
|
|
int rc= 0;
|
|
|
|
uint i, chunk2_page, full_pages;
|
|
|
|
uint curr_group= 0;
|
|
|
|
translog_size_t record_rest, first_page, chunk3_pages, chunk0_pages= 1;
|
|
|
|
translog_size_t done= 0;
|
|
|
|
struct st_translog_group_descriptor group;
|
|
|
|
DYNAMIC_ARRAY groups;
|
|
|
|
uint16 chunk3_size;
|
|
|
|
uint16 page_capacity= log_descriptor.page_capacity_chunk_2 + 1;
|
|
|
|
uint16 last_page_capacity;
|
|
|
|
my_bool new_page_before_chunk0= 1, first_chunk0= 1;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar chunk0_header[1 + 2 + 5 + 2 + 2], group_desc[7 + 1];
|
|
|
|
uchar chunk2_header[1];
|
2007-02-02 08:41:32 +01:00
|
|
|
uint header_fixed_part= header_length + 2;
|
|
|
|
uint groups_per_page= (page_capacity - header_fixed_part) / (7 + 1);
|
2007-08-19 21:27:43 +02:00
|
|
|
uint file_of_the_first_group;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_write_variable_record_mgroup");
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
chunk2_header[0]= TRANSLOG_CHUNK_NOHDR;
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
if (init_dynamic_array(&groups, sizeof(struct st_translog_group_descriptor),
|
|
|
|
10, 10 CALLER_INFO))
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
translog_unlock();
|
2007-02-02 08:41:32 +01:00
|
|
|
UNRECOVERABLE_ERROR(("init array failed"));
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
first_page= translog_get_current_page_rest();
|
|
|
|
record_rest= parts->record_length - (first_page - 1);
|
|
|
|
DBUG_PRINT("info", ("Record Rest: %lu", (ulong) record_rest));
|
|
|
|
|
|
|
|
if (record_rest < buffer_rest)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("too many free space because changing header"));
|
|
|
|
buffer_rest-= log_descriptor.page_capacity_chunk_2;
|
|
|
|
DBUG_ASSERT(record_rest >= buffer_rest);
|
|
|
|
}
|
|
|
|
|
2007-08-19 21:27:43 +02:00
|
|
|
file_of_the_first_group= LSN_FILE_NO(log_descriptor.horizon);
|
|
|
|
translog_mark_file_unfinished(file_of_the_first_group);
|
2007-02-02 08:41:32 +01:00
|
|
|
do
|
|
|
|
{
|
|
|
|
group.addr= horizon= log_descriptor.horizon;
|
|
|
|
cursor= log_descriptor.bc;
|
|
|
|
cursor.chaser= 1;
|
|
|
|
if ((full_pages= buffer_rest / log_descriptor.page_capacity_chunk_2) > 255)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
/* sizeof(uint8) == 256 is max number of chunk in multi-chunks group */
|
2007-02-02 08:41:32 +01:00
|
|
|
full_pages= 255;
|
|
|
|
buffer_rest= full_pages * log_descriptor.page_capacity_chunk_2;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
group chunks =
|
2007-02-19 22:01:27 +01:00
|
|
|
full pages + first page (which actually can be full, too).
|
2007-02-02 08:41:32 +01:00
|
|
|
But here we assign number of chunks - 1
|
|
|
|
*/
|
|
|
|
group.num= full_pages;
|
2007-07-02 19:45:15 +02:00
|
|
|
if (insert_dynamic(&groups, (uchar*) &group))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("insert into array failed"));
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err_unlock;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("chunk: #%u first_page: %u (%u) "
|
|
|
|
"full_pages: %lu (%lu) "
|
2007-02-02 08:41:32 +01:00
|
|
|
"Left %lu",
|
|
|
|
groups.elements,
|
|
|
|
first_page, first_page - 1,
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) full_pages,
|
2007-02-19 22:01:27 +01:00
|
|
|
(ulong) (full_pages *
|
|
|
|
log_descriptor.page_capacity_chunk_2),
|
|
|
|
(ulong)(parts->record_length - (first_page - 1 +
|
|
|
|
buffer_rest) -
|
|
|
|
done)));
|
|
|
|
rc|= translog_advance_pointer(full_pages, 0);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
rc|= translog_unlock();
|
|
|
|
|
|
|
|
if (buffer_to_flush != NULL)
|
|
|
|
{
|
|
|
|
rc|= translog_buffer_lock(buffer_to_flush);
|
|
|
|
translog_buffer_decrease_writers(buffer_to_flush);
|
|
|
|
if (!rc)
|
|
|
|
rc= translog_buffer_flush(buffer_to_flush);
|
|
|
|
rc|= translog_buffer_unlock(buffer_to_flush);
|
|
|
|
buffer_to_flush= NULL;
|
|
|
|
}
|
|
|
|
if (rc)
|
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("flush of unlock buffer failed"));
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
translog_write_data_on_page(&horizon, &cursor, 1, chunk2_header);
|
|
|
|
translog_write_parts_on_page(&horizon, &cursor, first_page - 1, parts);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) local: (%lu,0x%lx) "
|
|
|
|
"Left %lu",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
|
|
|
LSN_IN_PARTS(horizon),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) (parts->record_length - (first_page - 1) -
|
|
|
|
done)));
|
|
|
|
|
|
|
|
for (i= 0; i < full_pages; i++)
|
|
|
|
{
|
|
|
|
if (translog_write_variable_record_chunk2_page(parts, &horizon, &cursor))
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) "
|
|
|
|
"local: (%lu,0x%lx) "
|
2007-02-02 08:41:32 +01:00
|
|
|
"Left: %lu",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
|
|
|
LSN_IN_PARTS(horizon),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) (parts->record_length - (first_page - 1) -
|
|
|
|
i * log_descriptor.page_capacity_chunk_2 -
|
|
|
|
done)));
|
|
|
|
}
|
|
|
|
|
|
|
|
done+= (first_page - 1 + buffer_rest);
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
/* TODO: make separate function for following */
|
2007-02-02 08:41:32 +01:00
|
|
|
rc= translog_page_next(&horizon, &cursor, &buffer_to_flush);
|
|
|
|
if (buffer_to_flush != NULL)
|
|
|
|
{
|
|
|
|
rc|= translog_buffer_lock(buffer_to_flush);
|
|
|
|
translog_buffer_decrease_writers(buffer_to_flush);
|
|
|
|
if (!rc)
|
|
|
|
rc= translog_buffer_flush(buffer_to_flush);
|
|
|
|
rc|= translog_buffer_unlock(buffer_to_flush);
|
|
|
|
buffer_to_flush= NULL;
|
|
|
|
}
|
|
|
|
if (rc)
|
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("flush of unlock buffer failed"));
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
rc= translog_buffer_lock(cursor.buffer);
|
|
|
|
if (!rc)
|
|
|
|
translog_buffer_decrease_writers(cursor.buffer);
|
|
|
|
rc|= translog_buffer_unlock(cursor.buffer);
|
|
|
|
if (rc)
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
translog_lock();
|
|
|
|
|
|
|
|
first_page= translog_get_current_page_rest();
|
|
|
|
buffer_rest= translog_get_current_group_size();
|
|
|
|
} while (first_page + buffer_rest < (uint) (parts->record_length - done));
|
|
|
|
|
|
|
|
group.addr= horizon= log_descriptor.horizon;
|
|
|
|
cursor= log_descriptor.bc;
|
|
|
|
cursor.chaser= 1;
|
2007-02-12 13:23:43 +01:00
|
|
|
group.num= 0; /* 0 because it does not matter */
|
2007-07-02 19:45:15 +02:00
|
|
|
if (insert_dynamic(&groups, (uchar*) &group))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("insert into array failed"));
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err_unlock;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
record_rest= parts->record_length - done;
|
|
|
|
DBUG_PRINT("info", ("Record rest: %lu", (ulong) record_rest));
|
|
|
|
if (first_page <= record_rest + 1)
|
|
|
|
{
|
|
|
|
chunk2_page= 1;
|
|
|
|
record_rest-= (first_page - 1);
|
|
|
|
full_pages= record_rest / log_descriptor.page_capacity_chunk_2;
|
|
|
|
record_rest= (record_rest % log_descriptor.page_capacity_chunk_2);
|
|
|
|
last_page_capacity= page_capacity;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
chunk2_page= full_pages= 0;
|
|
|
|
last_page_capacity= first_page;
|
|
|
|
}
|
|
|
|
chunk3_size= 0;
|
|
|
|
chunk3_pages= 0;
|
|
|
|
if (last_page_capacity > record_rest + 1 && record_rest != 0)
|
|
|
|
{
|
|
|
|
if (last_page_capacity >
|
|
|
|
record_rest + header_fixed_part + groups.elements * (7 + 1))
|
|
|
|
{
|
|
|
|
/* 1 record of type 0 */
|
|
|
|
chunk3_pages= 0;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
chunk3_pages= 1;
|
|
|
|
if (record_rest + 2 == last_page_capacity)
|
|
|
|
{
|
|
|
|
chunk3_size= record_rest - 1;
|
|
|
|
record_rest= 1;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
chunk3_size= record_rest;
|
|
|
|
record_rest= 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
A first non-full page will hold type 0 chunk only if it fit in it with
|
|
|
|
all its headers
|
|
|
|
*/
|
|
|
|
while (page_capacity <
|
|
|
|
record_rest + header_fixed_part +
|
|
|
|
(groups.elements - groups_per_page * (chunk0_pages - 1)) * (7 + 1))
|
|
|
|
chunk0_pages++;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("chunk0_pages: %u groups %u groups per full page: %u "
|
|
|
|
"Group on last page: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
chunk0_pages, groups.elements,
|
|
|
|
groups_per_page,
|
|
|
|
(groups.elements -
|
|
|
|
((page_capacity - header_fixed_part) / (7 + 1)) *
|
|
|
|
(chunk0_pages - 1))));
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("first_page: %u chunk2: %u full_pages: %u (%lu) "
|
|
|
|
"chunk3: %u (%u) rest: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
first_page,
|
|
|
|
chunk2_page, full_pages,
|
|
|
|
(ulong) full_pages *
|
|
|
|
log_descriptor.page_capacity_chunk_2,
|
|
|
|
chunk3_pages, (uint) chunk3_size, (uint) record_rest));
|
2007-02-19 22:01:27 +01:00
|
|
|
rc= translog_advance_pointer(full_pages + chunk3_pages +
|
|
|
|
(chunk0_pages - 1),
|
|
|
|
record_rest + header_fixed_part +
|
|
|
|
(groups.elements -
|
|
|
|
((page_capacity -
|
|
|
|
header_fixed_part) / (7 + 1)) *
|
|
|
|
(chunk0_pages - 1)) * (7 + 1));
|
|
|
|
rc|= translog_unlock();
|
|
|
|
if (rc)
|
|
|
|
goto err;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
if (chunk2_page)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("chunk 2 to finish first page"));
|
|
|
|
translog_write_data_on_page(&horizon, &cursor, 1, chunk2_header);
|
|
|
|
translog_write_parts_on_page(&horizon, &cursor, first_page - 1, parts);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) local: (%lu,0x%lx) "
|
2007-02-02 08:41:32 +01:00
|
|
|
"Left: %lu",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
|
|
|
LSN_IN_PARTS(horizon),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) (parts->record_length - (first_page - 1) -
|
|
|
|
done)));
|
|
|
|
}
|
|
|
|
else if (chunk3_pages)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("chunk 3"));
|
|
|
|
DBUG_ASSERT(full_pages == 0);
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar chunk3_header[3];
|
2007-02-19 22:01:27 +01:00
|
|
|
chunk3_pages= 0;
|
2007-02-02 08:41:32 +01:00
|
|
|
chunk3_header[0]= TRANSLOG_CHUNK_LNGTH;
|
|
|
|
int2store(chunk3_header + 1, chunk3_size);
|
|
|
|
translog_write_data_on_page(&horizon, &cursor, 3, chunk3_header);
|
|
|
|
translog_write_parts_on_page(&horizon, &cursor, chunk3_size, parts);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) local: (%lu,0x%lx) "
|
2007-02-02 08:41:32 +01:00
|
|
|
"Left: %lu",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
|
|
|
LSN_IN_PARTS(horizon),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) (parts->record_length - chunk3_size - done)));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("no new_page_before_chunk0"));
|
|
|
|
new_page_before_chunk0= 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i= 0; i < full_pages; i++)
|
|
|
|
{
|
|
|
|
DBUG_ASSERT(chunk2_page != 0);
|
|
|
|
if (translog_write_variable_record_chunk2_page(parts, &horizon, &cursor))
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) local: (%lu,0x%lx) "
|
2007-02-02 08:41:32 +01:00
|
|
|
"Left: %lu",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
|
|
|
LSN_IN_PARTS(horizon),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) (parts->record_length - (first_page - 1) -
|
|
|
|
i * log_descriptor.page_capacity_chunk_2 -
|
|
|
|
done)));
|
|
|
|
}
|
|
|
|
|
|
|
|
if (chunk3_pages &&
|
|
|
|
translog_write_variable_record_chunk3_page(parts,
|
|
|
|
chunk3_size,
|
|
|
|
&horizon, &cursor))
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err;
|
|
|
|
DBUG_PRINT("info", ("absolute horizon: (%lu,0x%lx) local: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon),
|
|
|
|
LSN_IN_PARTS(horizon)));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
*chunk0_header= (uchar) (type |TRANSLOG_CHUNK_LSN);
|
2007-02-02 08:41:32 +01:00
|
|
|
int2store(chunk0_header + 1, short_trid);
|
|
|
|
translog_write_variable_record_1group_code_len(chunk0_header + 3,
|
|
|
|
parts->record_length,
|
|
|
|
header_length);
|
|
|
|
do
|
|
|
|
{
|
|
|
|
int limit;
|
|
|
|
if (new_page_before_chunk0)
|
|
|
|
{
|
|
|
|
rc= translog_page_next(&horizon, &cursor, &buffer_to_flush);
|
|
|
|
if (buffer_to_flush != NULL)
|
|
|
|
{
|
|
|
|
rc|= translog_buffer_lock(buffer_to_flush);
|
|
|
|
translog_buffer_decrease_writers(buffer_to_flush);
|
|
|
|
if (!rc)
|
|
|
|
rc= translog_buffer_flush(buffer_to_flush);
|
|
|
|
rc|= translog_buffer_unlock(buffer_to_flush);
|
|
|
|
buffer_to_flush= NULL;
|
|
|
|
}
|
|
|
|
if (rc)
|
|
|
|
{
|
|
|
|
UNRECOVERABLE_ERROR(("flush of unlock buffer failed"));
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
new_page_before_chunk0= 1;
|
|
|
|
|
|
|
|
if (first_chunk0)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
first_chunk0= 0;
|
2007-02-02 08:41:32 +01:00
|
|
|
*lsn= horizon;
|
|
|
|
if (log_record_type_descriptor[type].inwrite_hook &&
|
2007-08-01 15:52:57 +02:00
|
|
|
(*log_record_type_descriptor[type].inwrite_hook) (type, trn,
|
|
|
|
tbl_info,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
lsn, hook_arg))
|
2007-02-19 22:01:27 +01:00
|
|
|
goto err;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
A first non-full page will hold type 0 chunk only if it fit in it with
|
|
|
|
all its headers => the fist page is full or number of groups less then
|
|
|
|
possible number of full page.
|
|
|
|
*/
|
|
|
|
limit= (groups_per_page < groups.elements - curr_group ?
|
|
|
|
groups_per_page : groups.elements - curr_group);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Groups: %u curr: %u limit: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) groups.elements, (uint) curr_group,
|
|
|
|
(uint) limit));
|
|
|
|
|
|
|
|
if (chunk0_pages == 1)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("chunk_len: 2 + %u * (7+1) + %u = %u",
|
|
|
|
(uint) limit, (uint) record_rest,
|
|
|
|
(uint) (2 + limit * (7 + 1) + record_rest)));
|
|
|
|
int2store(chunk0_header + header_length - 2,
|
|
|
|
2 + limit * (7 + 1) + record_rest);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("chunk_len: 2 + %u * (7+1) = %u",
|
|
|
|
(uint) limit, (uint) (2 + limit * (7 + 1))));
|
|
|
|
int2store(chunk0_header + header_length - 2, 2 + limit * (7 + 1));
|
|
|
|
}
|
|
|
|
int2store(chunk0_header + header_length, groups.elements - curr_group);
|
|
|
|
translog_write_data_on_page(&horizon, &cursor, header_fixed_part,
|
|
|
|
chunk0_header);
|
|
|
|
for (i= curr_group; i < limit + curr_group; i++)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
struct st_translog_group_descriptor *grp_ptr;
|
|
|
|
grp_ptr= dynamic_element(&groups, i,
|
|
|
|
struct st_translog_group_descriptor *);
|
|
|
|
lsn_store(group_desc, grp_ptr->addr);
|
|
|
|
group_desc[7]= grp_ptr->num;
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_write_data_on_page(&horizon, &cursor, (7 + 1), group_desc);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (chunk0_pages == 1 && record_rest != 0)
|
|
|
|
translog_write_parts_on_page(&horizon, &cursor, record_rest, parts);
|
|
|
|
|
|
|
|
chunk0_pages--;
|
|
|
|
curr_group+= limit;
|
|
|
|
|
|
|
|
} while (chunk0_pages != 0);
|
|
|
|
rc= translog_buffer_lock(cursor.buffer);
|
|
|
|
if (cmp_translog_addr(cursor.buffer->last_lsn, *lsn) < 0)
|
|
|
|
cursor.buffer->last_lsn= *lsn;
|
|
|
|
translog_buffer_decrease_writers(cursor.buffer);
|
|
|
|
rc|= translog_buffer_unlock(cursor.buffer);
|
|
|
|
|
2007-08-19 21:27:43 +02:00
|
|
|
if (translog_set_lsn_for_files(file_of_the_first_group, LSN_FILE_NO(*lsn),
|
|
|
|
*lsn, FALSE))
|
|
|
|
goto err;
|
|
|
|
translog_mark_file_finished(file_of_the_first_group);
|
|
|
|
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
delete_dynamic(&groups);
|
|
|
|
DBUG_RETURN(rc);
|
2007-02-19 22:01:27 +01:00
|
|
|
|
|
|
|
err_unlock:
|
|
|
|
translog_unlock();
|
|
|
|
err:
|
|
|
|
delete_dynamic(&groups);
|
|
|
|
DBUG_RETURN(1);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
/**
|
|
|
|
@brief Write the variable length log record.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@param lsn LSN of the record will be written here
|
|
|
|
@param type the log record type
|
|
|
|
@param short_trid Short transaction ID or 0 if it has no sense
|
|
|
|
@param parts Descriptor of record source parts
|
|
|
|
@param trn Transaction structure pointer for hooks by
|
|
|
|
record log type, for short_id
|
|
|
|
@param hook_arg Argument which will be passed to pre-write and
|
|
|
|
in-write hooks of this record.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_write_variable_record(LSN *lsn,
|
|
|
|
enum translog_record_type type,
|
2007-08-01 15:52:57 +02:00
|
|
|
MARIA_HA *tbl_info,
|
2007-02-02 08:41:32 +01:00
|
|
|
SHORT_TRANSACTION_ID short_trid,
|
|
|
|
struct st_translog_parts *parts,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
TRN *trn, void *hook_arg)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
struct st_translog_buffer *buffer_to_flush= NULL;
|
|
|
|
uint header_length1= 1 + 2 + 2 +
|
|
|
|
translog_variable_record_length_bytes(parts->record_length);
|
|
|
|
ulong buffer_rest;
|
|
|
|
uint page_rest;
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Max number of such LSNs per record is 2 */
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar compressed_LSNs[MAX_NUMBER_OF_LSNS_PER_RECORD *
|
2007-06-15 00:45:55 +02:00
|
|
|
COMPRESSED_LSN_MAX_STORE_SIZE];
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
my_bool res;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_write_variable_record");
|
|
|
|
|
|
|
|
translog_lock();
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("horizon: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon)));
|
2007-02-19 22:01:27 +01:00
|
|
|
page_rest= TRANSLOG_PAGE_SIZE - log_descriptor.bc.current_page_fill;
|
|
|
|
DBUG_PRINT("info", ("header length: %u page_rest: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
header_length1, page_rest));
|
|
|
|
|
|
|
|
/*
|
2007-06-07 00:01:43 +02:00
|
|
|
header and part which we should read have to fit in one chunk
|
|
|
|
TODO: allow to divide readable header
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
if (page_rest <
|
|
|
|
(header_length1 + log_record_type_descriptor[type].read_header_len))
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info",
|
2007-02-19 22:01:27 +01:00
|
|
|
("Next page, size: %u header: %u + %u",
|
|
|
|
log_descriptor.bc.current_page_fill,
|
2007-02-02 08:41:32 +01:00
|
|
|
header_length1,
|
|
|
|
log_record_type_descriptor[type].read_header_len));
|
|
|
|
translog_page_next(&log_descriptor.horizon, &log_descriptor.bc,
|
|
|
|
&buffer_to_flush);
|
2007-07-02 19:45:15 +02:00
|
|
|
/* Chunk 2 header is 1 byte, so full page capacity will be one uchar more */
|
2007-02-02 08:41:32 +01:00
|
|
|
page_rest= log_descriptor.page_capacity_chunk_2 + 1;
|
|
|
|
DBUG_PRINT("info", ("page_rest: %u", page_rest));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
To minimize compressed size we will compress always relative to
|
|
|
|
very first chunk address (log_descriptor.horizon for now)
|
|
|
|
*/
|
2007-02-19 22:01:27 +01:00
|
|
|
if (log_record_type_descriptor[type].compressed_LSN > 0)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-12 13:23:43 +01:00
|
|
|
if (translog_relative_LSN_encode(parts, log_descriptor.horizon,
|
2007-02-02 08:41:32 +01:00
|
|
|
log_record_type_descriptor[type].
|
2007-02-19 22:01:27 +01:00
|
|
|
compressed_LSN, compressed_LSNs))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
translog_unlock();
|
2007-02-02 08:41:32 +01:00
|
|
|
if (buffer_to_flush != NULL)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
/*
|
|
|
|
It is just try to finish log in nice way in case of error, so we
|
|
|
|
do not check result of the following functions, because we are
|
|
|
|
going return error state in any case
|
|
|
|
*/
|
|
|
|
translog_buffer_flush(buffer_to_flush);
|
|
|
|
translog_buffer_unlock(buffer_to_flush);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
/* recalculate header length after compression */
|
|
|
|
header_length1= 1 + 2 + 2 +
|
|
|
|
translog_variable_record_length_bytes(parts->record_length);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("after compressing LSN(s) header length: %u "
|
|
|
|
"record length: %lu",
|
2007-02-12 13:23:43 +01:00
|
|
|
header_length1, (ulong)parts->record_length));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* TODO: check space on current page for header + few bytes */
|
|
|
|
if (page_rest >= parts->record_length + header_length1)
|
|
|
|
{
|
|
|
|
/* following function makes translog_unlock(); */
|
2007-08-01 15:52:57 +02:00
|
|
|
res= translog_write_variable_record_1chunk(lsn, type, tbl_info,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
short_trid,
|
|
|
|
parts, buffer_to_flush,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
header_length1, trn, hook_arg);
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
DBUG_RETURN(res);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
buffer_rest= translog_get_current_group_size();
|
|
|
|
|
|
|
|
if (buffer_rest >= parts->record_length + header_length1 - page_rest)
|
|
|
|
{
|
|
|
|
/* following function makes translog_unlock(); */
|
2007-08-01 15:52:57 +02:00
|
|
|
res= translog_write_variable_record_1group(lsn, type, tbl_info,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
short_trid,
|
|
|
|
parts, buffer_to_flush,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
header_length1, trn, hook_arg);
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
DBUG_RETURN(res);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
/* following function makes translog_unlock(); */
|
2007-08-01 15:52:57 +02:00
|
|
|
res= translog_write_variable_record_mgroup(lsn, type, tbl_info,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
short_trid,
|
|
|
|
parts, buffer_to_flush,
|
|
|
|
header_length1,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
buffer_rest, trn, hook_arg);
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
DBUG_RETURN(res);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
/**
|
|
|
|
@brief Write the fixed and pseudo-fixed log record.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@param lsn LSN of the record will be written here
|
|
|
|
@param type the log record type
|
|
|
|
@param short_trid Short transaction ID or 0 if it has no sense
|
|
|
|
@param parts Descriptor of record source parts
|
|
|
|
@param trn Transaction structure pointer for hooks by
|
|
|
|
record log type, for short_id
|
|
|
|
@param hook_arg Argument which will be passed to pre-write and
|
|
|
|
in-write hooks of this record.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool translog_write_fixed_record(LSN *lsn,
|
|
|
|
enum translog_record_type type,
|
2007-08-01 15:52:57 +02:00
|
|
|
MARIA_HA *tbl_info,
|
2007-02-02 08:41:32 +01:00
|
|
|
SHORT_TRANSACTION_ID short_trid,
|
|
|
|
struct st_translog_parts *parts,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
TRN *trn, void *hook_arg)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
struct st_translog_buffer *buffer_to_flush= NULL;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar chunk1_header[1 + 2];
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Max number of such LSNs per record is 2 */
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar compressed_LSNs[MAX_NUMBER_OF_LSNS_PER_RECORD *
|
2007-06-15 00:45:55 +02:00
|
|
|
COMPRESSED_LSN_MAX_STORE_SIZE];
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
LEX_STRING *part;
|
2007-02-02 08:41:32 +01:00
|
|
|
int rc;
|
|
|
|
DBUG_ENTER("translog_write_fixed_record");
|
|
|
|
DBUG_ASSERT((log_record_type_descriptor[type].class ==
|
|
|
|
LOGRECTYPE_FIXEDLENGTH &&
|
|
|
|
parts->record_length ==
|
|
|
|
log_record_type_descriptor[type].fixed_length) ||
|
|
|
|
(log_record_type_descriptor[type].class ==
|
|
|
|
LOGRECTYPE_PSEUDOFIXEDLENGTH &&
|
2007-06-15 00:45:55 +02:00
|
|
|
parts->record_length ==
|
2007-02-02 08:41:32 +01:00
|
|
|
log_record_type_descriptor[type].fixed_length));
|
|
|
|
|
|
|
|
translog_lock();
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("horizon: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.horizon)));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(log_descriptor.bc.current_page_fill <= TRANSLOG_PAGE_SIZE);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_PRINT("info",
|
2007-02-19 22:01:27 +01:00
|
|
|
("Page size: %u record: %u next cond: %d",
|
|
|
|
log_descriptor.bc.current_page_fill,
|
2007-06-15 00:45:55 +02:00
|
|
|
(parts->record_length +
|
2007-02-19 22:01:27 +01:00
|
|
|
log_record_type_descriptor[type].compressed_LSN * 2 + 3),
|
|
|
|
((((uint) log_descriptor.bc.current_page_fill) +
|
2007-06-15 00:45:55 +02:00
|
|
|
(parts->record_length +
|
2007-02-19 22:01:27 +01:00
|
|
|
log_record_type_descriptor[type].compressed_LSN * 2 + 3)) >
|
2007-02-02 08:41:32 +01:00
|
|
|
TRANSLOG_PAGE_SIZE)));
|
|
|
|
/*
|
2007-06-15 00:45:55 +02:00
|
|
|
check that there is enough place on current page.
|
|
|
|
NOTE: compressing may increase page LSN size on two bytes for every LSN
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
2007-02-19 22:01:27 +01:00
|
|
|
if ((((uint) log_descriptor.bc.current_page_fill) +
|
2007-06-15 00:45:55 +02:00
|
|
|
(parts->record_length +
|
2007-02-19 22:01:27 +01:00
|
|
|
log_record_type_descriptor[type].compressed_LSN * 2 + 3)) >
|
2007-02-02 08:41:32 +01:00
|
|
|
TRANSLOG_PAGE_SIZE)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("Next page"));
|
|
|
|
translog_page_next(&log_descriptor.horizon, &log_descriptor.bc,
|
|
|
|
&buffer_to_flush);
|
|
|
|
}
|
|
|
|
|
|
|
|
*lsn= log_descriptor.horizon;
|
2007-08-19 21:27:43 +02:00
|
|
|
if (translog_set_lsn_for_files(LSN_FILE_NO(*lsn), LSN_FILE_NO(*lsn),
|
|
|
|
*lsn, TRUE) ||
|
|
|
|
(log_record_type_descriptor[type].inwrite_hook &&
|
|
|
|
(*log_record_type_descriptor[type].inwrite_hook) (type, trn, tbl_info,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
lsn, hook_arg)))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
rc= 1;
|
|
|
|
goto err;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* compress LSNs */
|
|
|
|
if (log_record_type_descriptor[type].class == LOGRECTYPE_PSEUDOFIXEDLENGTH)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(log_record_type_descriptor[type].compressed_LSN > 0);
|
2007-02-12 13:23:43 +01:00
|
|
|
if (translog_relative_LSN_encode(parts, *lsn,
|
2007-02-02 08:41:32 +01:00
|
|
|
log_record_type_descriptor[type].
|
2007-02-19 22:01:27 +01:00
|
|
|
compressed_LSN, compressed_LSNs))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
rc= 1;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2007-02-19 22:01:27 +01:00
|
|
|
Write the whole record at once (we know that there is enough place on
|
|
|
|
the destination page)
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(parts->current != 0); /* first part is left for header */
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
part= parts->parts + (--parts->current);
|
|
|
|
parts->total_record_length+= (part->length= 1 + 2);
|
|
|
|
part->str= (char*)chunk1_header;
|
2007-07-02 19:45:15 +02:00
|
|
|
*chunk1_header= (uchar) (type | TRANSLOG_CHUNK_FIXED);
|
2007-02-02 08:41:32 +01:00
|
|
|
int2store(chunk1_header + 1, short_trid);
|
|
|
|
|
|
|
|
rc= translog_write_parts_on_page(&log_descriptor.horizon,
|
|
|
|
&log_descriptor.bc,
|
|
|
|
parts->total_record_length, parts);
|
|
|
|
|
|
|
|
log_descriptor.bc.buffer->last_lsn= *lsn;
|
2007-02-19 22:01:27 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
err:
|
|
|
|
rc|= translog_unlock();
|
|
|
|
|
|
|
|
/*
|
|
|
|
check if we switched buffer and need process it (current buffer is
|
|
|
|
unlocked already => we will not delay other threads
|
|
|
|
*/
|
|
|
|
if (buffer_to_flush != NULL)
|
|
|
|
{
|
|
|
|
if (!rc)
|
|
|
|
rc= translog_buffer_flush(buffer_to_flush);
|
|
|
|
rc|= translog_buffer_unlock(buffer_to_flush);
|
|
|
|
}
|
|
|
|
|
|
|
|
DBUG_RETURN(rc);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/**
|
|
|
|
@brief Writes the log record
|
|
|
|
|
|
|
|
If share has no 2-byte-id yet, gives an id to the share and logs
|
|
|
|
LOGREC_FILE_ID. If transaction has not logged LOGREC_LONG_TRANSACTION_ID
|
|
|
|
yet, logs it.
|
|
|
|
|
|
|
|
@param lsn LSN of the record will be written here
|
|
|
|
@param type the log record type
|
|
|
|
@param trn Transaction structure pointer for hooks by
|
|
|
|
record log type, for short_id
|
2007-08-01 15:52:57 +02:00
|
|
|
@param tbl_info MARIA_HA of table or NULL
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
@param rec_len record length or 0 (count it)
|
|
|
|
@param part_no number of parts or 0 (count it)
|
|
|
|
@param parts_data zero ended (in case of number of parts is 0)
|
|
|
|
array of LEX_STRINGs (parts), first
|
|
|
|
TRANSLOG_INTERNAL_PARTS positions in the log
|
|
|
|
should be unused (need for loghandler)
|
2007-08-01 15:52:57 +02:00
|
|
|
@param store_share_id if tbl_info!=NULL then share's id will
|
|
|
|
automatically be stored in the two first bytes
|
|
|
|
pointed (so pointer is assumed to be !=NULL)
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
@param hook_arg argument which will be passed to pre-write and
|
|
|
|
in-write hooks of this record.
|
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool translog_write_record(LSN *lsn,
|
|
|
|
enum translog_record_type type,
|
2007-08-01 15:52:57 +02:00
|
|
|
TRN *trn, MARIA_HA *tbl_info,
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
translog_size_t rec_len,
|
|
|
|
uint part_no,
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
LEX_STRING *parts_data,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
uchar *store_share_id,
|
|
|
|
void *hook_arg)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
struct st_translog_parts parts;
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
LEX_STRING *part;
|
2007-02-02 08:41:32 +01:00
|
|
|
int rc;
|
2007-06-11 16:29:53 +02:00
|
|
|
uint short_trid= trn->short_id;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_write_record");
|
2007-09-09 18:15:10 +02:00
|
|
|
DBUG_PRINT("enter", ("type: %u ShortTrID: %u rec_len: %lu",
|
|
|
|
(uint) type, (uint) short_trid, (ulong) rec_len));
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-08-01 15:52:57 +02:00
|
|
|
if (tbl_info)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-08-01 15:52:57 +02:00
|
|
|
MARIA_SHARE *share= tbl_info->s;
|
Maria:
* Don't modify share->base.born_transactional; now it is a value carved
in stone at creation time. share->now_transactional is what can be
modified: it starts at born_transactional, can become false during
ALTER TABLE (when we want no logging), and restored later.
* Not resetting create_rename_lsn to 0 during delete_all or repair.
* when we temporarily disable transactionality, we also change
the page type to PAGECACHE_PLAIN_PAGE: it bypasses some work in the
page cache (optimization), and avoids assertions related to LSNs.
* Disable INSERT DELAYED for transactional tables, because
durability could not be guaranteed (insertion may even not happen)
mysys/mf_keycache.c:
comment
storage/maria/ha_maria.cc:
* a transactional table cannot do INSERT DELAYED
* ha_maria::save_transactional not needed anymore, as now instead
we don't modify MARIA_SHARE::MARIA_BASE_INFO::born_transactional
(born_transactional plays the role of save_transactional), and modify
MARIA_SHARE::now_transactional.
* REPAIR_TABLE log record is now logged by maria_repair()
* comment why we rely on born_transactional to know if we should
skipping a transaction.
* putting together two if()s which test for F_UNLCK
storage/maria/ha_maria.h:
ha_maria::save_transactional not needed anymore (moved to the C layer)
storage/maria/ma_blockrec.c:
* For the block record's code (writing/updating/deleting records),
all that counts is now_transactional, not born_transactional.
* As we now set the page type to PAGECACHE_PLAIN_PAGE for tables
which have now_transactional==FALSE, pagecache will not expect
a meaningful LSN for them in pagecache_unlock_by_link(), so
we can pass it LSN_IMPOSSIBLE.
storage/maria/ma_check.c:
* writing LOGREC_REPAIR_TABLE moves from ha_maria::repair()
to maria_repair(), sounds cleaner (less functions to export).
* when opening a table during REPAIR, don't use the realpath-ed name,
as this may fail if the table has symlinked files (maria_open()
would try to find the data and index file in the directory
of unique_file_name, it would fail if data and index files are in
different dirs); use the unresolved name, open_file_name, which is
the argument which was passed to the maria_open() which created 'info'.
storage/maria/ma_close.c:
assert that when a statement is done with a table, it cleans up
storage/maria/ma_create.c:
new name
storage/maria/ma_delete_all.c:
* using now_transactional
* no reason to reset create_rename_lsn during delete_all (a bug);
also no reason to do it during repair: it was put there because
a positive create_rename_lsn caused a call to check_and_set_lsn()
which asserted in DBUG_ASSERT(block->type == PAGECACHE_LSN_PAGE);
first solution was to use LSN_IMPOSSIBLE in _ma_unpin_all_pages() if
not transactional; but then in the case of ALTER TABLE, with
transactionality temporarily disabled, it asserted in
DBUG_ASSERT(LSN_VALID(lsn)) in pagecache_fwrite() (PAGECACHE_LSN_PAGE
page with zero LSN - bad). The additional solution is to use
PAGECACHE_PLAIN_PAGE when we disable transactionality temporarily: this
avoids checks on the LSN, and also bypasses (optimization) the "flush
log up to LSN" call when the pagecache flushes our page (in other
words, no WAL needed).
storage/maria/ma_delete_table.c:
use now_transactional
storage/maria/ma_locking.c:
assert that when a statement is done with a table, it cleans up.
storage/maria/ma_loghandler.c:
* now_transactional should be used to test if we want a log record.
* Assertions to make sure dummy_transaction_object is not spoilt
by its many users.
storage/maria/ma_open.c:
base.transactional -> base.born_transactional
storage/maria/ma_pagecache.c:
missing name for page's type. Comment for future.
storage/maria/ma_rename.c:
use now_transactional
storage/maria/maria_chk.c:
use born_transactional
storage/maria/maria_def.h:
MARIA_BASE_INFO::transactional renamed to born_transactional.
MARIA_SHARE::now_transactional introduced.
_ma_repair_write_log_record() is made local to ma_check.c.
Macros to temporarily disable, and re-enable, transactionality for a
table.
storage/maria/maria_read_log.c:
assertions and using the new macros. Adding a forgotten resetting
when we finally close all tables.
2007-07-03 15:20:41 +02:00
|
|
|
if (!share->now_transactional)
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("It is not transactional table"));
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
if (unlikely(share->id == 0))
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
First log write for this MARIA_SHARE; give it a short id.
|
|
|
|
When the lock manager is enabled and needs a short id, it should be
|
|
|
|
assigned in the lock manager (because row locks will be taken before
|
|
|
|
log records are written; for example SELECT FOR UPDATE takes locks but
|
|
|
|
writes no log record.
|
|
|
|
*/
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
if (unlikely(translog_assign_id_to_share(tbl_info, trn)))
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
fileid_store(store_share_id, share->id);
|
|
|
|
}
|
|
|
|
if (unlikely(!(trn->first_undo_lsn & TRANSACTION_LOGGED_LONG_ID)))
|
|
|
|
{
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
LSN dummy_lsn;
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
|
|
|
|
uchar log_data[6];
|
|
|
|
int6store(log_data, trn->trid);
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
|
|
|
|
trn->first_undo_lsn|= TRANSACTION_LOGGED_LONG_ID; /* no recursion */
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
if (unlikely(translog_write_record(&dummy_lsn, LOGREC_LONG_TRANSACTION_ID,
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
trn, NULL, sizeof(log_data),
|
|
|
|
sizeof(log_array)/sizeof(log_array[0]),
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
log_array, NULL, NULL)))
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
DBUG_RETURN(1);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
parts.parts= parts_data;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
/* count parts if they are not counted by upper level */
|
|
|
|
if (part_no == 0)
|
2007-02-19 22:01:27 +01:00
|
|
|
{
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
for (part_no= TRANSLOG_INTERNAL_PARTS;
|
|
|
|
parts_data[part_no].length != 0;
|
|
|
|
part_no++);
|
2007-02-19 22:01:27 +01:00
|
|
|
}
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
parts.elements= part_no;
|
|
|
|
parts.current= TRANSLOG_INTERNAL_PARTS;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
/* clear TRANSLOG_INTERNAL_PARTS */
|
2007-06-15 00:45:55 +02:00
|
|
|
DBUG_ASSERT(TRANSLOG_INTERNAL_PARTS != 0);
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
parts_data[0].str= 0;
|
|
|
|
parts_data[0].length= 0;
|
|
|
|
|
|
|
|
/* count length of the record */
|
|
|
|
if (rec_len == 0)
|
2007-02-19 22:01:27 +01:00
|
|
|
{
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
for(part= parts_data + TRANSLOG_INTERNAL_PARTS;\
|
|
|
|
part < parts_data + part_no;
|
|
|
|
part++)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
rec_len+= part->length;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
}
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
parts.record_length= rec_len;
|
2007-02-19 22:01:27 +01:00
|
|
|
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
#ifndef DBUG_OFF
|
|
|
|
{
|
|
|
|
uint i;
|
|
|
|
uint len= 0;
|
2007-07-01 15:20:57 +02:00
|
|
|
#ifdef HAVE_purify
|
2007-06-09 13:52:17 +02:00
|
|
|
ha_checksum checksum= 0;
|
|
|
|
#endif
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
for (i= TRANSLOG_INTERNAL_PARTS; i < part_no; i++)
|
2007-06-09 13:52:17 +02:00
|
|
|
{
|
2007-07-01 15:20:57 +02:00
|
|
|
#ifdef HAVE_purify
|
2007-06-09 13:52:17 +02:00
|
|
|
/* Find unitialized bytes early */
|
|
|
|
checksum+= my_checksum(checksum, parts_data[i].str,
|
|
|
|
parts_data[i].length);
|
|
|
|
#endif
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
len+= parts_data[i].length;
|
2007-06-09 13:52:17 +02:00
|
|
|
}
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
DBUG_ASSERT(len == rec_len);
|
|
|
|
}
|
|
|
|
#endif
|
2007-02-19 22:01:27 +01:00
|
|
|
/*
|
|
|
|
Start total_record_length from record_length then overhead will
|
|
|
|
be add
|
|
|
|
*/
|
|
|
|
parts.total_record_length= parts.record_length;
|
2007-09-09 18:15:10 +02:00
|
|
|
DBUG_PRINT("info", ("record length: %lu", (ulong) parts.record_length));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/* process this parts */
|
|
|
|
if (!(rc= (log_record_type_descriptor[type].prewrite_hook &&
|
2007-06-11 16:29:53 +02:00
|
|
|
(*log_record_type_descriptor[type].prewrite_hook) (type, trn,
|
2007-08-01 15:52:57 +02:00
|
|
|
tbl_info,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
hook_arg))))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
switch (log_record_type_descriptor[type].class) {
|
2007-02-02 08:41:32 +01:00
|
|
|
case LOGRECTYPE_VARIABLE_LENGTH:
|
2007-08-01 15:52:57 +02:00
|
|
|
rc= translog_write_variable_record(lsn, type, tbl_info,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
short_trid, &parts, trn, hook_arg);
|
2007-02-02 08:41:32 +01:00
|
|
|
break;
|
|
|
|
case LOGRECTYPE_PSEUDOFIXEDLENGTH:
|
|
|
|
case LOGRECTYPE_FIXEDLENGTH:
|
2007-08-01 15:52:57 +02:00
|
|
|
rc= translog_write_fixed_record(lsn, type, tbl_info,
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
short_trid, &parts, trn, hook_arg);
|
2007-02-02 08:41:32 +01:00
|
|
|
break;
|
|
|
|
case LOGRECTYPE_NOT_ALLOWED:
|
|
|
|
default:
|
|
|
|
DBUG_ASSERT(0);
|
|
|
|
rc= 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-09-13 09:37:51 +02:00
|
|
|
DBUG_PRINT("info", ("LSN: (%lu,0x%lx)", LSN_IN_PARTS(*lsn)));
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(rc);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Decode compressed (relative) LSN(s)
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_relative_lsn_decode()
|
|
|
|
base_lsn LSN for encoding
|
|
|
|
src Decode LSN(s) from here
|
|
|
|
dst Put decoded LSNs here
|
|
|
|
lsns number of LSN(s)
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
position in sources after decoded LSN(s)
|
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
static uchar *translog_relative_LSN_decode(LSN base_lsn,
|
|
|
|
uchar *src, uchar *dst, uint lsns)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
uint i;
|
2007-02-19 22:01:27 +01:00
|
|
|
for (i= 0; i < lsns; i++, dst+= LSN_STORE_SIZE)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
src= translog_get_LSN_from_diff(base_lsn, src, dst);
|
|
|
|
}
|
|
|
|
return src;
|
|
|
|
}
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
/**
|
|
|
|
@brief Get header of fixed/pseudo length record and call hook for
|
|
|
|
it processing
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
@param page Pointer to the buffer with page where LSN chunk is
|
|
|
|
placed
|
|
|
|
@param page_offset Offset of the first chunk in the page
|
|
|
|
@param buff Buffer to be filled with header data
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
@return Length of header or operation status
|
|
|
|
@retval # number of bytes in TRANSLOG_HEADER_BUFFER::header where
|
|
|
|
stored decoded part of the header
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
static int translog_fixed_length_header(uchar *page,
|
|
|
|
translog_size_t page_offset,
|
|
|
|
TRANSLOG_HEADER_BUFFER *buff)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
struct st_log_record_type_descriptor *desc=
|
|
|
|
log_record_type_descriptor + buff->type;
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *src= page + page_offset + 3;
|
|
|
|
uchar *dst= buff->header;
|
|
|
|
uchar *start= src;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint lsns= desc->compressed_LSN;
|
2007-06-15 00:45:55 +02:00
|
|
|
uint length= desc->fixed_length;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_ENTER("translog_fixed_length_header");
|
|
|
|
|
|
|
|
buff->record_length= length;
|
|
|
|
|
|
|
|
if (desc->class == LOGRECTYPE_PSEUDOFIXEDLENGTH)
|
|
|
|
{
|
|
|
|
DBUG_ASSERT(lsns > 0);
|
2007-02-12 13:23:43 +01:00
|
|
|
src= translog_relative_LSN_decode(buff->lsn, src, dst, lsns);
|
2007-02-19 22:01:27 +01:00
|
|
|
lsns*= LSN_STORE_SIZE;
|
2007-02-02 08:41:32 +01:00
|
|
|
dst+= lsns;
|
|
|
|
length-= lsns;
|
2007-06-15 00:45:55 +02:00
|
|
|
buff->compressed_LSN_economy= (lsns - (src - start));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
buff->compressed_LSN_economy= 0;
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
memcpy(dst, src, length);
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->non_header_data_start_offset= page_offset +
|
|
|
|
((src + length) - (page + page_offset));
|
|
|
|
buff->non_header_data_len= 0;
|
|
|
|
DBUG_RETURN(buff->record_length);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Free resources used by TRANSLOG_HEADER_BUFFER
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_free_record_header();
|
|
|
|
*/
|
|
|
|
|
|
|
|
void translog_free_record_header(TRANSLOG_HEADER_BUFFER *buff)
|
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_free_record_header");
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
if (buff->groups_no != 0)
|
|
|
|
{
|
2007-07-02 19:45:15 +02:00
|
|
|
my_free((uchar*) buff->groups, MYF(0));
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->groups_no= 0;
|
|
|
|
}
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/**
|
|
|
|
@brief Returns the current horizon at the end of the current log
|
2007-02-02 08:41:32 +01:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
@return Horizon
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
TRANSLOG_ADDRESS translog_get_horizon()
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
TRANSLOG_ADDRESS res;
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_lock();
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
res= log_descriptor.horizon;
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_unlock();
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
return res;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
/**
|
|
|
|
@brief Returns the current horizon at the end of the current log, caller is
|
|
|
|
assumed to already hold the lock
|
|
|
|
|
|
|
|
@return Horizon
|
|
|
|
*/
|
|
|
|
|
|
|
|
TRANSLOG_ADDRESS translog_get_horizon_no_lock()
|
|
|
|
{
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
translog_lock_assert_owner();
|
|
|
|
return log_descriptor.horizon;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
/*
|
|
|
|
Set last page in the scanner data structure
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_scanner_set_last_page()
|
|
|
|
scanner Information about current chunk during scanning
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
static my_bool translog_scanner_set_last_page(TRANSLOG_SCANNER_DATA
|
2007-02-02 08:41:32 +01:00
|
|
|
*scanner)
|
|
|
|
{
|
|
|
|
my_bool page_ok;
|
2007-10-01 17:48:20 +02:00
|
|
|
if (LSN_FILE_NO(scanner->page_addr) == LSN_FILE_NO(scanner->horizon))
|
|
|
|
{
|
|
|
|
/* It is last file => we can easy find last page address by horizon */
|
|
|
|
uint pagegrest= LSN_OFFSET(scanner->horizon) % TRANSLOG_PAGE_SIZE;
|
|
|
|
scanner->last_file_page= (scanner->horizon -
|
|
|
|
(pagegrest ? pagegrest : TRANSLOG_PAGE_SIZE));
|
|
|
|
return (0);
|
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
scanner->last_file_page= scanner->page_addr;
|
2007-02-19 22:01:27 +01:00
|
|
|
return (translog_get_last_page_addr(&scanner->last_file_page, &page_ok));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
/**
|
|
|
|
@brief Get page from page cache according to requested method
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
@param scanner The scanner data
|
|
|
|
|
|
|
|
@return operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool
|
|
|
|
translog_scanner_get_page(TRANSLOG_SCANNER_DATA *scanner)
|
|
|
|
{
|
|
|
|
TRANSLOG_VALIDATOR_DATA data;
|
|
|
|
DBUG_ENTER("translog_scanner_get_page");
|
|
|
|
data.addr= &scanner->page_addr;
|
|
|
|
data.was_recovered= 0;
|
|
|
|
DBUG_RETURN((scanner->page=
|
|
|
|
translog_get_page(&data, scanner->buffer,
|
|
|
|
(scanner->use_direct_link ?
|
|
|
|
&scanner->direct_link :
|
|
|
|
NULL))) ==
|
|
|
|
NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
@brief Initialize reader scanner.
|
|
|
|
|
|
|
|
@param lsn LSN with which it have to be inited
|
|
|
|
@param fixed_horizon true if it is OK do not read records which was written
|
2007-02-02 08:41:32 +01:00
|
|
|
after scanning beginning
|
2007-09-27 13:18:28 +02:00
|
|
|
@param scanner scanner which have to be inited
|
|
|
|
@param use_direct prefer using direct lings from page handler
|
|
|
|
where it is possible.
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
@note If direct link was used translog_destroy_scanner should be
|
|
|
|
called after it using
|
|
|
|
|
|
|
|
@return status of the operation
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
my_bool translog_init_scanner(LSN lsn,
|
|
|
|
my_bool fixed_horizon,
|
2007-09-27 13:18:28 +02:00
|
|
|
TRANSLOG_SCANNER_DATA *scanner,
|
|
|
|
my_bool use_direct)
|
2007-02-19 22:01:27 +01:00
|
|
|
{
|
|
|
|
TRANSLOG_VALIDATOR_DATA data;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_init_scanner");
|
2007-10-01 22:51:57 +02:00
|
|
|
DBUG_PRINT("enter", ("Scanner: 0x%lx LSN: (0x%lu,0x%lx)",
|
|
|
|
(ulong) scanner, LSN_IN_PARTS(lsn)));
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_OFFSET(lsn) % TRANSLOG_PAGE_SIZE != 0);
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-19 22:01:27 +01:00
|
|
|
|
|
|
|
data.addr= &scanner->page_addr;
|
|
|
|
data.was_recovered= 0;
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
scanner->page_offset= LSN_OFFSET(lsn) % TRANSLOG_PAGE_SIZE;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
scanner->fixed_horizon= fixed_horizon;
|
2007-09-27 13:18:28 +02:00
|
|
|
scanner->use_direct_link= use_direct;
|
|
|
|
scanner->direct_link= NULL;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
scanner->horizon= translog_get_horizon();
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("horizon: (0x%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(scanner->horizon)));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
/* lsn < horizon */
|
2007-09-20 18:02:36 +02:00
|
|
|
DBUG_ASSERT(lsn < scanner->horizon);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
scanner->page_addr= lsn;
|
|
|
|
scanner->page_addr-= scanner->page_offset; /*decrease offset */
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
if (translog_scanner_set_last_page(scanner))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
if (translog_scanner_get_page(scanner))
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
/**
|
|
|
|
@brief Destroy scanner object;
|
|
|
|
|
|
|
|
@param scanner The scanner object to destroy
|
|
|
|
*/
|
|
|
|
|
|
|
|
void translog_destroy_scanner(TRANSLOG_SCANNER_DATA *scanner)
|
|
|
|
{
|
2007-10-01 22:51:57 +02:00
|
|
|
DBUG_ENTER("translog_destroy_scanner");
|
|
|
|
DBUG_PRINT("enter", ("Scanner: 0x%lx", (ulong)scanner));
|
2007-09-27 13:18:28 +02:00
|
|
|
translog_free_link(scanner->direct_link);
|
2007-10-01 22:51:57 +02:00
|
|
|
DBUG_VOID_RETURN;
|
2007-09-27 13:18:28 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
/*
|
|
|
|
Checks End of the Log
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_scanner_eol()
|
|
|
|
scanner Information about current chunk during scanning
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
1 End of the Log
|
|
|
|
0 OK
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
2007-07-30 15:05:43 +02:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
static my_bool translog_scanner_eol(TRANSLOG_SCANNER_DATA *scanner)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_scanner_eol");
|
|
|
|
DBUG_PRINT("enter",
|
2007-02-19 22:01:27 +01:00
|
|
|
("Horizon: (%lu, 0x%lx) Current: (%lu, 0x%lx+0x%x=0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(scanner->horizon),
|
|
|
|
LSN_IN_PARTS(scanner->page_addr),
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) scanner->page_offset,
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) (LSN_OFFSET(scanner->page_addr) + scanner->page_offset)));
|
|
|
|
if (scanner->horizon > (scanner->page_addr +
|
|
|
|
scanner->page_offset))
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("Horizon is not reached"));
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
if (scanner->fixed_horizon)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("Horizon is fixed and reached"));
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
scanner->horizon= translog_get_horizon();
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_PRINT("info",
|
|
|
|
("Horizon is re-read, EOL: %d",
|
2007-02-12 13:23:43 +01:00
|
|
|
scanner->horizon <= (scanner->page_addr +
|
|
|
|
scanner->page_offset)));
|
|
|
|
DBUG_RETURN(scanner->horizon <= (scanner->page_addr +
|
|
|
|
scanner->page_offset));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Cheks End of the Page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_scanner_eop()
|
|
|
|
scanner Information about current chunk during scanning
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
1 End of the Page
|
|
|
|
0 OK
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
2007-07-30 15:05:43 +02:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
static my_bool translog_scanner_eop(TRANSLOG_SCANNER_DATA *scanner)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_scanner_eop");
|
|
|
|
DBUG_RETURN(scanner->page_offset >= TRANSLOG_PAGE_SIZE ||
|
|
|
|
scanner->page[scanner->page_offset] == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Checks End of the File (I.e. we are scanning last page, which do not
|
|
|
|
mean end of this page)
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_scanner_eof()
|
|
|
|
scanner Information about current chunk during scanning
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
1 End of the File
|
|
|
|
0 OK
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
2007-07-30 15:05:43 +02:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
static my_bool translog_scanner_eof(TRANSLOG_SCANNER_DATA *scanner)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
DBUG_ENTER("translog_scanner_eof");
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_FILE_NO(scanner->page_addr) ==
|
|
|
|
LSN_FILE_NO(scanner->last_file_page));
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("curr Page: 0x%lx last page: 0x%lx "
|
|
|
|
"normal EOF: %d",
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_OFFSET(scanner->page_addr),
|
|
|
|
(ulong) LSN_OFFSET(scanner->last_file_page),
|
|
|
|
LSN_OFFSET(scanner->page_addr) ==
|
|
|
|
LSN_OFFSET(scanner->last_file_page)));
|
2007-02-02 08:41:32 +01:00
|
|
|
/*
|
|
|
|
TODO: detect damaged file EOF,
|
|
|
|
TODO: issue warning if damaged file EOF detected
|
|
|
|
*/
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_RETURN(scanner->page_addr ==
|
|
|
|
scanner->last_file_page);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
Move scanner to the next chunk
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_get_next_chunk()
|
|
|
|
scanner Information about current chunk during scanning
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
static my_bool
|
|
|
|
translog_get_next_chunk(TRANSLOG_SCANNER_DATA *scanner)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint16 len;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_get_next_chunk");
|
2007-02-19 22:01:27 +01:00
|
|
|
|
|
|
|
if ((len= translog_get_total_chunk_length(scanner->page,
|
|
|
|
scanner->page_offset)) == 0)
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
scanner->page_offset+= len;
|
|
|
|
|
|
|
|
if (translog_scanner_eol(scanner))
|
|
|
|
{
|
|
|
|
scanner->page= &end_of_log;
|
|
|
|
scanner->page_offset= 0;
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
if (translog_scanner_eop(scanner))
|
|
|
|
{
|
2007-09-27 13:18:28 +02:00
|
|
|
/* before reading next page we should unpin current one if it was pinned */
|
|
|
|
translog_free_link(scanner->direct_link);
|
2007-02-02 08:41:32 +01:00
|
|
|
if (translog_scanner_eof(scanner))
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("horizon: (%lu,0x%lx) pageaddr: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(scanner->horizon),
|
|
|
|
LSN_IN_PARTS(scanner->page_addr)));
|
2007-02-02 08:41:32 +01:00
|
|
|
/* if it is log end it have to be caught before */
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_FILE_NO(scanner->horizon) >
|
|
|
|
LSN_FILE_NO(scanner->page_addr));
|
|
|
|
scanner->page_addr+= LSN_ONE_FILE;
|
|
|
|
scanner->page_addr= LSN_REPLACE_OFFSET(scanner->page_addr,
|
|
|
|
TRANSLOG_PAGE_SIZE);
|
2007-02-02 08:41:32 +01:00
|
|
|
if (translog_scanner_set_last_page(scanner))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2007-02-12 13:23:43 +01:00
|
|
|
scanner->page_addr+= TRANSLOG_PAGE_SIZE; /* offset increased */
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
if (translog_scanner_get_page(scanner))
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
scanner->page_offset= translog_get_first_chunk_offset(scanner->page);
|
|
|
|
if (translog_scanner_eol(scanner))
|
|
|
|
{
|
|
|
|
scanner->page= &end_of_log;
|
|
|
|
scanner->page_offset= 0;
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(scanner->page[scanner->page_offset]);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
/**
|
|
|
|
@brief Get header of variable length record and call hook for it processing
|
2007-08-19 21:27:43 +02:00
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
@param page Pointer to the buffer with page where LSN chunk is
|
|
|
|
placed
|
|
|
|
@param page_offset Offset of the first chunk in the page
|
|
|
|
@param buff Buffer to be filled with header data
|
|
|
|
@param scanner If present should be moved to the header page if
|
2007-08-29 08:03:10 +02:00
|
|
|
it differ from LSN page
|
|
|
|
|
|
|
|
@return Length of header or operation status
|
2007-07-30 15:05:43 +02:00
|
|
|
@retval RECHEADER_READ_ERROR error
|
|
|
|
@retval # number of bytes in
|
|
|
|
TRANSLOG_HEADER_BUFFER::header where
|
|
|
|
stored decoded part of the header
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-09-14 08:17:36 +02:00
|
|
|
static int
|
|
|
|
translog_variable_length_header(uchar *page, translog_size_t page_offset,
|
|
|
|
TRANSLOG_HEADER_BUFFER *buff,
|
|
|
|
TRANSLOG_SCANNER_DATA *scanner)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
struct st_log_record_type_descriptor *desc= (log_record_type_descriptor +
|
|
|
|
buff->type);
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *src= page + page_offset + 1 + 2;
|
|
|
|
uchar *dst= buff->header;
|
2007-02-02 08:41:32 +01:00
|
|
|
LSN base_lsn;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint lsns= desc->compressed_LSN;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint16 chunk_len;
|
2007-06-15 00:45:55 +02:00
|
|
|
uint16 length= desc->read_header_len;
|
2007-02-02 08:41:32 +01:00
|
|
|
uint16 buffer_length= length;
|
|
|
|
uint16 body_len;
|
2007-02-19 22:01:27 +01:00
|
|
|
TRANSLOG_SCANNER_DATA internal_scanner;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_variable_length_header");
|
|
|
|
|
|
|
|
buff->record_length= translog_variable_record_1group_decode_len(&src);
|
|
|
|
chunk_len= uint2korr(src);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("rec len: %lu chunk len: %u length: %u bufflen: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) buff->record_length, (uint) chunk_len,
|
|
|
|
(uint) length, (uint) buffer_length));
|
|
|
|
if (chunk_len == 0)
|
|
|
|
{
|
|
|
|
uint16 page_rest;
|
|
|
|
DBUG_PRINT("info", ("1 group"));
|
|
|
|
src+= 2;
|
|
|
|
page_rest= TRANSLOG_PAGE_SIZE - (src - page);
|
|
|
|
|
|
|
|
base_lsn= buff->lsn;
|
2007-02-19 22:01:27 +01:00
|
|
|
body_len= min(page_rest, buff->record_length);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
uint grp_no, curr;
|
|
|
|
uint header_to_skip;
|
|
|
|
uint16 page_rest;
|
|
|
|
|
|
|
|
DBUG_PRINT("info", ("multi-group"));
|
|
|
|
grp_no= buff->groups_no= uint2korr(src + 2);
|
2007-02-19 22:01:27 +01:00
|
|
|
if (!(buff->groups=
|
|
|
|
(TRANSLOG_GROUP*) my_malloc(sizeof(TRANSLOG_GROUP) * grp_no,
|
|
|
|
MYF(0))))
|
2007-07-30 15:05:43 +02:00
|
|
|
DBUG_RETURN(RECHEADER_READ_ERROR);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_PRINT("info", ("Groups: %u", (uint) grp_no));
|
|
|
|
src+= (2 + 2);
|
|
|
|
page_rest= TRANSLOG_PAGE_SIZE - (src - page);
|
|
|
|
curr= 0;
|
|
|
|
header_to_skip= src - (page + page_offset);
|
|
|
|
buff->chunk0_pages= 0;
|
|
|
|
|
|
|
|
for (;;)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint i, read= grp_no;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
buff->chunk0_pages++;
|
|
|
|
if (page_rest < grp_no * (7 + 1))
|
|
|
|
read= page_rest / (7 + 1);
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Read chunk0 page#%u read: %u left: %u "
|
|
|
|
"start from: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->chunk0_pages, read, grp_no, curr));
|
|
|
|
for (i= 0; i < read; i++, curr++)
|
|
|
|
{
|
|
|
|
DBUG_ASSERT(curr < buff->groups_no);
|
2007-02-19 22:01:27 +01:00
|
|
|
buff->groups[curr].addr= lsn_korr(src + i * (7 + 1));
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->groups[curr].num= src[i * (7 + 1) + 7];
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("group #%u (%lu,0x%lx) chunks: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
curr,
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(buff->groups[curr].addr),
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buff->groups[curr].num));
|
|
|
|
}
|
|
|
|
grp_no-= read;
|
|
|
|
if (grp_no == 0)
|
|
|
|
{
|
|
|
|
if (scanner)
|
|
|
|
{
|
|
|
|
buff->chunk0_data_addr= scanner->page_addr;
|
2007-02-12 13:23:43 +01:00
|
|
|
buff->chunk0_data_addr+= (page_offset + header_to_skip +
|
2007-02-19 22:01:27 +01:00
|
|
|
read * (7 + 1)); /* offset increased */
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
buff->chunk0_data_addr= buff->lsn;
|
2007-02-12 13:23:43 +01:00
|
|
|
/* offset increased */
|
2007-02-19 22:01:27 +01:00
|
|
|
buff->chunk0_data_addr+= (header_to_skip + read * (7 + 1));
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
buff->chunk0_data_len= chunk_len - 2 - read * (7 + 1);
|
|
|
|
DBUG_PRINT("info", ("Data address: (%lu,0x%lx) len: %u",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(buff->chunk0_data_addr),
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->chunk0_data_len));
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (scanner == NULL)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("use internal scanner for header reading"));
|
2007-02-02 08:41:32 +01:00
|
|
|
scanner= &internal_scanner;
|
2007-09-27 13:18:28 +02:00
|
|
|
if (translog_init_scanner(buff->lsn, 1, scanner, 0))
|
2007-07-30 15:05:43 +02:00
|
|
|
DBUG_RETURN(RECHEADER_READ_ERROR);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
2007-07-30 15:05:43 +02:00
|
|
|
if (translog_get_next_chunk(scanner))
|
|
|
|
DBUG_RETURN(RECHEADER_READ_ERROR);
|
2007-02-02 08:41:32 +01:00
|
|
|
page= scanner->page;
|
|
|
|
page_offset= scanner->page_offset;
|
|
|
|
src= page + page_offset + header_to_skip;
|
|
|
|
chunk_len= uint2korr(src - 2 - 2);
|
|
|
|
DBUG_PRINT("info", ("Chunk len: %u", (uint) chunk_len));
|
|
|
|
page_rest= TRANSLOG_PAGE_SIZE - (src - page);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (scanner == NULL)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("use internal scanner"));
|
|
|
|
scanner= &internal_scanner;
|
|
|
|
}
|
2007-10-01 22:51:57 +02:00
|
|
|
else
|
|
|
|
{
|
|
|
|
translog_destroy_scanner(scanner);
|
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
base_lsn= buff->groups[0].addr;
|
2007-09-27 13:18:28 +02:00
|
|
|
translog_init_scanner(base_lsn, 1, scanner, scanner == &internal_scanner);
|
2007-02-02 08:41:32 +01:00
|
|
|
/* first group chunk is always chunk type 2 */
|
|
|
|
page= scanner->page;
|
|
|
|
page_offset= scanner->page_offset;
|
|
|
|
src= page + page_offset + 1;
|
|
|
|
page_rest= TRANSLOG_PAGE_SIZE - (src - page);
|
|
|
|
body_len= page_rest;
|
2007-09-27 13:18:28 +02:00
|
|
|
if (scanner == &internal_scanner)
|
|
|
|
translog_destroy_scanner(scanner);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
if (lsns)
|
|
|
|
{
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *start= src;
|
2007-02-12 13:23:43 +01:00
|
|
|
src= translog_relative_LSN_decode(base_lsn, src, dst, lsns);
|
2007-02-19 22:01:27 +01:00
|
|
|
lsns*= LSN_STORE_SIZE;
|
2007-02-02 08:41:32 +01:00
|
|
|
dst+= lsns;
|
|
|
|
length-= lsns;
|
|
|
|
buff->record_length+= (buff->compressed_LSN_economy=
|
2007-06-15 00:45:55 +02:00
|
|
|
(lsns - (src - start)));
|
|
|
|
DBUG_PRINT("info", ("lsns: %u length: %u economy: %d new length: %lu",
|
2007-02-19 22:01:27 +01:00
|
|
|
lsns / LSN_STORE_SIZE, (uint) length,
|
2007-06-15 00:45:55 +02:00
|
|
|
(int) buff->compressed_LSN_economy,
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) buff->record_length));
|
|
|
|
body_len-= (src - start);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
buff->compressed_LSN_economy= 0;
|
|
|
|
|
|
|
|
DBUG_ASSERT(body_len >= length);
|
|
|
|
body_len-= length;
|
2007-02-19 22:01:27 +01:00
|
|
|
memcpy(dst, src, length);
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->non_header_data_start_offset= src + length - page;
|
|
|
|
buff->non_header_data_len= body_len;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("non_header_data_start_offset: %u len: %u buffer: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->non_header_data_start_offset,
|
|
|
|
buff->non_header_data_len, buffer_length));
|
|
|
|
DBUG_RETURN(buffer_length);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
/**
|
|
|
|
@brief Read record header from the given buffer
|
|
|
|
|
|
|
|
@param page page content buffer
|
|
|
|
@param page_offset offset of the chunk in the page
|
|
|
|
@param buff destination buffer
|
|
|
|
@param scanner If this is set the scanner will be moved to the
|
|
|
|
record header page (differ from LSN page in case of
|
|
|
|
multi-group records)
|
|
|
|
|
|
|
|
@return Length of header or operation status
|
|
|
|
@retval RECHEADER_READ_ERROR error
|
|
|
|
@retval # number of bytes in
|
|
|
|
TRANSLOG_HEADER_BUFFER::header where
|
|
|
|
stored decoded part of the header
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
int translog_read_record_header_from_buffer(uchar *page,
|
|
|
|
uint16 page_offset,
|
|
|
|
TRANSLOG_HEADER_BUFFER *buff,
|
|
|
|
TRANSLOG_SCANNER_DATA *scanner)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
translog_size_t res;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_read_record_header_from_buffer");
|
|
|
|
DBUG_ASSERT((page[page_offset] & TRANSLOG_CHUNK_TYPE) ==
|
|
|
|
TRANSLOG_CHUNK_LSN ||
|
|
|
|
(page[page_offset] & TRANSLOG_CHUNK_TYPE) ==
|
|
|
|
TRANSLOG_CHUNK_FIXED);
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->type= (page[page_offset] & TRANSLOG_REC_TYPE);
|
|
|
|
buff->short_trid= uint2korr(page + page_offset + 1);
|
2007-06-11 16:29:53 +02:00
|
|
|
DBUG_PRINT("info", ("Type %u, Short TrID %u, LSN (%lu,0x%lx)",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) buff->type, (uint)buff->short_trid,
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(buff->lsn)));
|
2007-02-02 08:41:32 +01:00
|
|
|
/* Read required bytes from the header and call hook */
|
2007-02-19 22:01:27 +01:00
|
|
|
switch (log_record_type_descriptor[buff->type].class) {
|
2007-02-02 08:41:32 +01:00
|
|
|
case LOGRECTYPE_VARIABLE_LENGTH:
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
res= translog_variable_length_header(page, page_offset, buff,
|
|
|
|
scanner);
|
|
|
|
break;
|
2007-02-02 08:41:32 +01:00
|
|
|
case LOGRECTYPE_PSEUDOFIXEDLENGTH:
|
|
|
|
case LOGRECTYPE_FIXEDLENGTH:
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
res= translog_fixed_length_header(page, page_offset, buff);
|
|
|
|
break;
|
2007-02-02 08:41:32 +01:00
|
|
|
default:
|
2007-09-13 09:37:51 +02:00
|
|
|
DBUG_ASSERT(0); /* we read some junk (got no LSN) */
|
2007-07-30 15:05:43 +02:00
|
|
|
res= RECHEADER_READ_ERROR;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
DBUG_RETURN(res);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
/**
|
|
|
|
@brief Read record header and some fixed part of a record (the part depend
|
|
|
|
on record type).
|
2007-08-19 21:27:43 +02:00
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
@param lsn log record serial number (address of the record)
|
|
|
|
@param buff log record header buffer
|
2007-08-19 21:27:43 +02:00
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
@note Some type of record can be read completely by this call
|
|
|
|
@note "Decoded" header stored in TRANSLOG_HEADER_BUFFER::header (relative
|
|
|
|
LSN can be translated to absolute one), some fields can be added (like
|
|
|
|
actual header length in the record if the header has variable length)
|
|
|
|
|
|
|
|
@return Length of header or operation status
|
|
|
|
@retval RECHEADER_READ_ERROR error
|
|
|
|
@retval # number of bytes in
|
|
|
|
TRANSLOG_HEADER_BUFFER::header where
|
|
|
|
stored decoded part of the header
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
int translog_read_record_header(LSN lsn, TRANSLOG_HEADER_BUFFER *buff)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar buffer[TRANSLOG_PAGE_SIZE], *page;
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
translog_size_t res, page_offset= LSN_OFFSET(lsn) % TRANSLOG_PAGE_SIZE;
|
2007-09-27 16:41:21 +02:00
|
|
|
PAGECACHE_BLOCK_LINK *direct_link;
|
2007-02-19 22:01:27 +01:00
|
|
|
TRANSLOG_ADDRESS addr;
|
|
|
|
TRANSLOG_VALIDATOR_DATA data;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_read_record_header");
|
2007-09-13 09:37:51 +02:00
|
|
|
DBUG_PRINT("enter", ("LSN: (0x%lu,0x%lx)", LSN_IN_PARTS(lsn)));
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_OFFSET(lsn) % TRANSLOG_PAGE_SIZE != 0);
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
buff->lsn= lsn;
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->groups_no= 0;
|
2007-02-19 22:01:27 +01:00
|
|
|
data.addr= &addr;
|
|
|
|
data.was_recovered= 0;
|
|
|
|
addr= lsn;
|
|
|
|
addr-= page_offset; /* offset decreasing */
|
2007-09-27 13:18:28 +02:00
|
|
|
res= (!(page= translog_get_page(&data, buffer, &direct_link))) ?
|
|
|
|
RECHEADER_READ_ERROR :
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
translog_read_record_header_from_buffer(page, page_offset, buff, 0);
|
2007-09-27 13:18:28 +02:00
|
|
|
translog_free_link(direct_link);
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
DBUG_RETURN(res);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
/**
|
|
|
|
@brief Read record header and some fixed part of a record (the part depend
|
|
|
|
on record type).
|
2007-08-19 21:27:43 +02:00
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
@param scan scanner position to read
|
|
|
|
@param buff log record header buffer
|
|
|
|
@param move_scanner request to move scanner to the header position
|
2007-08-19 21:27:43 +02:00
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
@note Some type of record can be read completely by this call
|
|
|
|
@note "Decoded" header stored in TRANSLOG_HEADER_BUFFER::header (relative
|
|
|
|
LSN can be translated to absolute one), some fields can be added (like
|
|
|
|
actual header length in the record if the header has variable length)
|
|
|
|
|
|
|
|
@return Length of header or operation status
|
|
|
|
@retval RECHEADER_READ_ERROR error
|
|
|
|
@retval # number of bytes in
|
|
|
|
TRANSLOG_HEADER_BUFFER::header where stored
|
|
|
|
decoded part of the header
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
int translog_read_record_header_scan(TRANSLOG_SCANNER_DATA *scanner,
|
|
|
|
TRANSLOG_HEADER_BUFFER *buff,
|
|
|
|
my_bool move_scanner)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
translog_size_t res;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_read_record_header_scan");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Scanner: Cur: (%lu,0x%lx) Hrz: (%lu,0x%lx) "
|
|
|
|
"Lst: (%lu,0x%lx) Offset: %u(%x) fixed %d",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(scanner->page_addr),
|
|
|
|
LSN_IN_PARTS(scanner->horizon),
|
|
|
|
LSN_IN_PARTS(scanner->last_file_page),
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) scanner->page_offset,
|
|
|
|
(uint) scanner->page_offset, scanner->fixed_horizon));
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
buff->groups_no= 0;
|
|
|
|
buff->lsn= scanner->page_addr;
|
2007-02-12 13:23:43 +01:00
|
|
|
buff->lsn+= scanner->page_offset; /* offset increasing */
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
res= translog_read_record_header_from_buffer(scanner->page,
|
|
|
|
scanner->page_offset,
|
|
|
|
buff,
|
|
|
|
(move_scanner ?
|
|
|
|
scanner : 0));
|
|
|
|
DBUG_RETURN(res);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
/**
|
|
|
|
@brief Read record header and some fixed part of the next record (the part
|
|
|
|
depend on record type).
|
2007-09-14 14:01:44 +02:00
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
@param scanner data for scanning if lsn is NULL scanner data
|
|
|
|
will be used for continue scanning.
|
|
|
|
The scanner can be NULL.
|
|
|
|
|
|
|
|
@param buff log record header buffer
|
|
|
|
|
|
|
|
@return Length of header or operation status
|
|
|
|
@retval RECHEADER_READ_ERROR error
|
|
|
|
@retval RECHEADER_READ_EOF EOF
|
|
|
|
@retval # number of bytes in
|
|
|
|
TRANSLOG_HEADER_BUFFER::header where
|
|
|
|
stored decoded part of the header
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
2007-02-12 13:23:43 +01:00
|
|
|
|
2007-07-30 15:05:43 +02:00
|
|
|
int translog_read_next_record_header(TRANSLOG_SCANNER_DATA *scanner,
|
|
|
|
TRANSLOG_HEADER_BUFFER *buff)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
uint8 chunk_type;
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
translog_size_t res;
|
2007-02-12 13:23:43 +01:00
|
|
|
buff->groups_no= 0; /* to be sure that we will free it right */
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_ENTER("translog_read_next_record_header");
|
|
|
|
DBUG_PRINT("enter", ("scanner: 0x%lx", (ulong) scanner));
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Scanner: Cur: (%lu,0x%lx) Hrz: (%lu,0x%lx) "
|
|
|
|
"Lst: (%lu,0x%lx) Offset: %u(%x) fixed: %d",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(scanner->page_addr),
|
|
|
|
LSN_IN_PARTS(scanner->horizon),
|
|
|
|
LSN_IN_PARTS(scanner->last_file_page),
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) scanner->page_offset,
|
|
|
|
(uint) scanner->page_offset, scanner->fixed_horizon));
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
do
|
|
|
|
{
|
|
|
|
if (translog_get_next_chunk(scanner))
|
2007-07-30 15:05:43 +02:00
|
|
|
DBUG_RETURN(RECHEADER_READ_ERROR);
|
2007-02-02 08:41:32 +01:00
|
|
|
chunk_type= scanner->page[scanner->page_offset] & TRANSLOG_CHUNK_TYPE;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("type: %x byte: %x", (uint) chunk_type,
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) scanner->page[scanner->page_offset]));
|
|
|
|
} while (chunk_type != TRANSLOG_CHUNK_LSN && chunk_type !=
|
|
|
|
TRANSLOG_CHUNK_FIXED && scanner->page[scanner->page_offset] != 0);
|
|
|
|
|
|
|
|
if (scanner->page[scanner->page_offset] == 0)
|
|
|
|
{
|
|
|
|
/* Last record was read */
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
buff->lsn= LSN_IMPOSSIBLE;
|
2007-02-19 22:01:27 +01:00
|
|
|
/* Return 'end of log' marker */
|
2007-07-30 15:05:43 +02:00
|
|
|
res= RECHEADER_READ_EOF;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
else
|
|
|
|
res= translog_read_record_header_scan(scanner, buff, 0);
|
|
|
|
DBUG_RETURN(res);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Moves record data reader to the next chunk and fill the data reader
|
|
|
|
information about that chunk.
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_record_read_next_chunk()
|
|
|
|
data data cursor
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
2007-02-12 13:23:43 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
static my_bool translog_record_read_next_chunk(struct st_translog_reader_data
|
|
|
|
*data)
|
|
|
|
{
|
|
|
|
translog_size_t new_current_offset= data->current_offset + data->chunk_size;
|
|
|
|
uint16 chunk_header_len, chunk_len;
|
|
|
|
uint8 type;
|
|
|
|
DBUG_ENTER("translog_record_read_next_chunk");
|
|
|
|
|
|
|
|
if (data->eor)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("end of the record flag set"));
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (data->header.groups_no &&
|
|
|
|
data->header.groups_no - 1 != data->current_group &&
|
|
|
|
data->header.groups[data->current_group].num == data->current_chunk)
|
|
|
|
{
|
|
|
|
/* Goto next group */
|
|
|
|
data->current_group++;
|
|
|
|
data->current_chunk= 0;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("skip to group: #%u", data->current_group));
|
2007-09-27 13:18:28 +02:00
|
|
|
translog_destroy_scanner(&data->scanner);
|
2007-02-12 13:23:43 +01:00
|
|
|
translog_init_scanner(data->header.groups[data->current_group].addr,
|
2007-09-27 13:18:28 +02:00
|
|
|
1, &data->scanner, 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
data->current_chunk++;
|
|
|
|
if (translog_get_next_chunk(&data->scanner))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
type= data->scanner.page[data->scanner.page_offset] & TRANSLOG_CHUNK_TYPE;
|
|
|
|
|
|
|
|
if (type == TRANSLOG_CHUNK_LSN && data->header.groups_no)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info",
|
2007-02-19 22:01:27 +01:00
|
|
|
("Last chunk: data len: %u offset: %u group: %u of %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
data->header.chunk0_data_len, data->scanner.page_offset,
|
|
|
|
data->current_group, data->header.groups_no - 1));
|
|
|
|
DBUG_ASSERT(data->header.groups_no - 1 == data->current_group);
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(data->header.lsn ==
|
|
|
|
data->scanner.page_addr + data->scanner.page_offset);
|
2007-09-27 13:18:28 +02:00
|
|
|
translog_destroy_scanner(&data->scanner);
|
|
|
|
translog_init_scanner(data->header.chunk0_data_addr, 1, &data->scanner, 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
data->chunk_size= data->header.chunk0_data_len;
|
|
|
|
data->body_offset= data->scanner.page_offset;
|
|
|
|
data->current_offset= new_current_offset;
|
|
|
|
data->eor= 1;
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (type == TRANSLOG_CHUNK_LSN || type == TRANSLOG_CHUNK_FIXED)
|
|
|
|
{
|
|
|
|
data->eor= 1;
|
|
|
|
DBUG_RETURN(1); /* End of record */
|
|
|
|
}
|
|
|
|
|
|
|
|
chunk_header_len=
|
|
|
|
translog_get_chunk_header_length(data->scanner.page,
|
|
|
|
data->scanner.page_offset);
|
|
|
|
chunk_len= translog_get_total_chunk_length(data->scanner.page,
|
|
|
|
data->scanner.page_offset);
|
|
|
|
data->chunk_size= chunk_len - chunk_header_len;
|
|
|
|
data->body_offset= data->scanner.page_offset + chunk_header_len;
|
|
|
|
data->current_offset= new_current_offset;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("grp: %u chunk: %u body_offset: %u chunk_size: %u "
|
|
|
|
"current_offset: %lu",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) data->current_group,
|
|
|
|
(uint) data->current_chunk,
|
|
|
|
(uint) data->body_offset,
|
|
|
|
(uint) data->chunk_size, (ulong) data->current_offset));
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Initialize record reader data from LSN
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_init_reader_data()
|
|
|
|
lsn reference to LSN we should start from
|
|
|
|
data reader data to initialize
|
|
|
|
|
|
|
|
RETURN
|
2007-02-19 22:01:27 +01:00
|
|
|
0 OK
|
|
|
|
1 Error
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
static my_bool translog_init_reader_data(LSN lsn,
|
2007-02-02 08:41:32 +01:00
|
|
|
struct st_translog_reader_data *data)
|
|
|
|
{
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
int read_header;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_init_reader_data");
|
2007-09-27 13:18:28 +02:00
|
|
|
if (translog_init_scanner(lsn, 1, &data->scanner, 1) ||
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
((read_header=
|
|
|
|
translog_read_record_header_scan(&data->scanner, &data->header, 1))
|
|
|
|
== RECHEADER_READ_ERROR))
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(1);
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
data->read_header= read_header;
|
2007-02-02 08:41:32 +01:00
|
|
|
data->body_offset= data->header.non_header_data_start_offset;
|
|
|
|
data->chunk_size= data->header.non_header_data_len;
|
|
|
|
data->current_offset= data->read_header;
|
|
|
|
data->current_group= 0;
|
|
|
|
data->current_chunk= 0;
|
|
|
|
data->eor= 0;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("read_header: %u "
|
|
|
|
"body_offset: %u chunk_size: %u current_offset: %lu",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) data->read_header,
|
|
|
|
(uint) data->body_offset,
|
|
|
|
(uint) data->chunk_size, (ulong) data->current_offset));
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
/**
|
|
|
|
@brief Destroy reader data object
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void translog_destroy_reader_data(struct st_translog_reader_data *data)
|
|
|
|
{
|
|
|
|
translog_destroy_scanner(&data->scanner);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
/*
|
|
|
|
Read a part of the record.
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
translog_read_record_header()
|
|
|
|
lsn log record serial number (address of the record)
|
2007-09-27 16:41:21 +02:00
|
|
|
offset From the beginning of the record beginning (read
|
2007-02-02 08:41:32 +01:00
|
|
|
by translog_read_record_header).
|
2007-02-19 22:01:27 +01:00
|
|
|
length Length of record part which have to be read.
|
|
|
|
buffer Buffer where to read the record part (have to be at
|
2007-02-02 08:41:32 +01:00
|
|
|
least 'length' bytes length)
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
length of data actually read
|
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
translog_size_t translog_read_record(LSN lsn,
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_size_t offset,
|
|
|
|
translog_size_t length,
|
2007-07-02 19:45:15 +02:00
|
|
|
uchar *buffer,
|
2007-02-02 08:41:32 +01:00
|
|
|
struct st_translog_reader_data *data)
|
|
|
|
{
|
|
|
|
translog_size_t requested_length= length;
|
|
|
|
translog_size_t end= offset + length;
|
|
|
|
struct st_translog_reader_data internal_data;
|
|
|
|
DBUG_ENTER("translog_read_record");
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
if (data == NULL)
|
|
|
|
{
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
DBUG_ASSERT(lsn != LSN_IMPOSSIBLE);
|
2007-02-02 08:41:32 +01:00
|
|
|
data= &internal_data;
|
|
|
|
}
|
|
|
|
if (lsn ||
|
|
|
|
(offset < data->current_offset &&
|
|
|
|
!(offset < data->read_header && offset + length < data->read_header)))
|
|
|
|
{
|
|
|
|
if (translog_init_reader_data(lsn, data))
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Offset: %lu length: %lu "
|
|
|
|
"Scanner: Cur: (%lu,0x%lx) Hrz: (%lu,0x%lx) "
|
|
|
|
"Lst: (%lu,0x%lx) Offset: %u(%x) fixed: %d",
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) offset, (ulong) length,
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(data->scanner.page_addr),
|
|
|
|
LSN_IN_PARTS(data->scanner.horizon),
|
|
|
|
LSN_IN_PARTS(data->scanner.last_file_page),
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) data->scanner.page_offset,
|
|
|
|
(uint) data->scanner.page_offset,
|
|
|
|
data->scanner.fixed_horizon));
|
|
|
|
if (offset < data->read_header)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint16 len= min(data->read_header, end) - offset;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_PRINT("info",
|
2007-02-19 22:01:27 +01:00
|
|
|
("enter header offset: %lu length: %lu",
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) offset, (ulong) length));
|
2007-02-19 22:01:27 +01:00
|
|
|
memcpy(buffer, data->header.header + offset, len);
|
2007-02-02 08:41:32 +01:00
|
|
|
length-= len;
|
|
|
|
if (length == 0)
|
2007-09-27 13:18:28 +02:00
|
|
|
{
|
|
|
|
translog_destroy_reader_data(data);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(requested_length);
|
2007-09-27 13:18:28 +02:00
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
offset+= len;
|
|
|
|
buffer+= len;
|
|
|
|
DBUG_PRINT("info",
|
2007-02-19 22:01:27 +01:00
|
|
|
("len: %u offset: %lu curr: %lu length: %lu",
|
2007-02-02 08:41:32 +01:00
|
|
|
len, (ulong) offset, (ulong) data->current_offset,
|
|
|
|
(ulong) length));
|
|
|
|
}
|
|
|
|
/* TODO: find first page which we should read by offset */
|
|
|
|
|
|
|
|
/* read the record chunk by chunk */
|
2007-02-19 22:01:27 +01:00
|
|
|
for(;;)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
uint page_end= data->current_offset + data->chunk_size;
|
|
|
|
DBUG_PRINT("info",
|
2007-02-19 22:01:27 +01:00
|
|
|
("enter body offset: %lu curr: %lu "
|
|
|
|
"length: %lu page_end: %lu",
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) offset, (ulong) data->current_offset, (ulong) length,
|
|
|
|
(ulong) page_end));
|
|
|
|
if (offset < page_end)
|
|
|
|
{
|
|
|
|
uint len= page_end - offset;
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
set_if_smaller(len, length); /* in case we read beyond record's end */
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(offset >= data->current_offset);
|
|
|
|
memcpy(buffer,
|
2007-02-02 08:41:32 +01:00
|
|
|
data->scanner.page + data->body_offset +
|
|
|
|
(offset - data->current_offset), len);
|
|
|
|
length-= len;
|
|
|
|
if (length == 0)
|
2007-09-27 13:18:28 +02:00
|
|
|
{
|
|
|
|
translog_destroy_reader_data(data);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(requested_length);
|
2007-09-27 13:18:28 +02:00
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
offset+= len;
|
|
|
|
buffer+= len;
|
|
|
|
DBUG_PRINT("info",
|
2007-02-19 22:01:27 +01:00
|
|
|
("len: %u offset: %lu curr: %lu length: %lu",
|
2007-02-02 08:41:32 +01:00
|
|
|
len, (ulong) offset, (ulong) data->current_offset,
|
|
|
|
(ulong) length));
|
|
|
|
}
|
|
|
|
if (translog_record_read_next_chunk(data))
|
2007-09-27 13:18:28 +02:00
|
|
|
{
|
|
|
|
translog_destroy_reader_data(data);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(requested_length - length);
|
2007-09-27 13:18:28 +02:00
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-10-07 22:27:03 +02:00
|
|
|
@brief Force skipping to the next buffer
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-10-07 22:27:03 +02:00
|
|
|
@todo Do not copy old page content if all page protections are switched off
|
|
|
|
(because we do not need calculate something or change old parts of the page)
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static void translog_force_current_buffer_to_finish()
|
|
|
|
{
|
2007-04-06 03:58:05 +02:00
|
|
|
TRANSLOG_ADDRESS new_buff_beginning;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint16 old_buffer_no= log_descriptor.bc.buffer_no;
|
|
|
|
uint16 new_buffer_no= (old_buffer_no + 1) % TRANSLOG_BUFFERS_NO;
|
|
|
|
struct st_translog_buffer *new_buffer= (log_descriptor.buffers +
|
|
|
|
new_buffer_no);
|
2007-02-02 08:41:32 +01:00
|
|
|
struct st_translog_buffer *old_buffer= log_descriptor.bc.buffer;
|
2007-08-13 14:17:49 +02:00
|
|
|
uchar *data= log_descriptor.bc.ptr - log_descriptor.bc.current_page_fill;
|
2007-02-19 22:01:27 +01:00
|
|
|
uint16 left= TRANSLOG_PAGE_SIZE - log_descriptor.bc.current_page_fill;
|
|
|
|
uint16 current_page_fill, write_counter, previous_offset;
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ENTER("translog_force_current_buffer_to_finish");
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("enter", ("Buffer #%u 0x%lx "
|
|
|
|
"Buffer addr: (%lu,0x%lx) "
|
|
|
|
"Page addr: (%lu,0x%lx) "
|
|
|
|
"size: %lu (%lu) Pg: %u left: %u",
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) log_descriptor.bc.buffer_no,
|
|
|
|
(ulong) log_descriptor.bc.buffer,
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.bc.buffer->offset),
|
2007-02-12 13:23:43 +01:00
|
|
|
(ulong) LSN_FILE_NO(log_descriptor.horizon),
|
|
|
|
(ulong) (LSN_OFFSET(log_descriptor.horizon) -
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.bc.current_page_fill),
|
2007-02-02 08:41:32 +01:00
|
|
|
(ulong) log_descriptor.bc.buffer->size,
|
|
|
|
(ulong) (log_descriptor.bc.ptr -log_descriptor.bc.
|
|
|
|
buffer->buffer),
|
2007-02-19 22:01:27 +01:00
|
|
|
(uint) log_descriptor.bc.current_page_fill,
|
2007-02-02 08:41:32 +01:00
|
|
|
(uint) left));
|
2007-02-19 22:01:27 +01:00
|
|
|
|
2007-04-06 03:58:05 +02:00
|
|
|
LINT_INIT(current_page_fill);
|
|
|
|
new_buff_beginning= log_descriptor.bc.buffer->offset;
|
|
|
|
new_buff_beginning+= log_descriptor.bc.buffer->size; /* increase offset */
|
2007-02-19 22:01:27 +01:00
|
|
|
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_ASSERT(log_descriptor.bc.ptr !=NULL);
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(LSN_FILE_NO(log_descriptor.horizon) ==
|
|
|
|
LSN_FILE_NO(log_descriptor.bc.buffer->offset));
|
2007-02-21 22:30:58 +01:00
|
|
|
DBUG_EXECUTE("info", translog_check_cursor(&log_descriptor.bc););
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_ASSERT(left < TRANSLOG_PAGE_SIZE);
|
|
|
|
if (left != 0)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
TODO: if 'left' is so small that can't hold any other record
|
|
|
|
then do not move the page
|
|
|
|
*/
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("left: %u", (uint) left));
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
/* decrease offset */
|
2007-04-06 03:58:05 +02:00
|
|
|
new_buff_beginning-= log_descriptor.bc.current_page_fill;
|
2007-02-19 22:01:27 +01:00
|
|
|
current_page_fill= log_descriptor.bc.current_page_fill;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
bzero(log_descriptor.bc.ptr, left);
|
|
|
|
log_descriptor.bc.buffer->size+= left;
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("Finish Page buffer #%u: 0x%lx "
|
2007-02-02 08:41:32 +01:00
|
|
|
"Size: %lu",
|
|
|
|
(uint) log_descriptor.bc.buffer->buffer_no,
|
|
|
|
(ulong) log_descriptor.bc.buffer,
|
|
|
|
(ulong) log_descriptor.bc.buffer->size));
|
|
|
|
DBUG_ASSERT(log_descriptor.bc.buffer->buffer_no ==
|
|
|
|
log_descriptor.bc.buffer_no);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.bc.current_page_fill= 0;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
translog_buffer_lock(new_buffer);
|
|
|
|
translog_wait_for_buffer_free(new_buffer);
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
write_counter= log_descriptor.bc.write_counter;
|
|
|
|
previous_offset= log_descriptor.bc.previous_offset;
|
|
|
|
translog_start_buffer(new_buffer, &log_descriptor.bc, new_buffer_no);
|
2007-04-06 03:58:05 +02:00
|
|
|
log_descriptor.bc.buffer->offset= new_buff_beginning;
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.bc.write_counter= write_counter;
|
|
|
|
log_descriptor.bc.previous_offset= previous_offset;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
if (data[TRANSLOG_PAGE_FLAGS] & TRANSLOG_SECTOR_PROTECTION)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
translog_put_sector_protection(data, &log_descriptor.bc);
|
|
|
|
if (left)
|
|
|
|
{
|
|
|
|
log_descriptor.bc.write_counter++;
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.bc.previous_offset= current_page_fill;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("drop write_counter"));
|
|
|
|
log_descriptor.bc.write_counter= 0;
|
|
|
|
log_descriptor.bc.previous_offset= 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
if (data[TRANSLOG_PAGE_FLAGS] & TRANSLOG_PAGE_CRC)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint32 crc= translog_crc(data + log_descriptor.page_overhead,
|
|
|
|
TRANSLOG_PAGE_SIZE -
|
|
|
|
log_descriptor.page_overhead);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_PRINT("info", ("CRC: 0x%lx", (ulong) crc));
|
|
|
|
int4store(data + 3 + 3 + 1, crc);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (left)
|
|
|
|
{
|
2007-09-21 12:48:57 +02:00
|
|
|
/*
|
|
|
|
TODO: do not copy begining of the page if we have no CRC or sector
|
|
|
|
checks on
|
|
|
|
*/
|
2007-02-19 22:01:27 +01:00
|
|
|
memcpy(new_buffer->buffer, data, current_page_fill);
|
2007-08-13 14:17:49 +02:00
|
|
|
log_descriptor.bc.ptr+= current_page_fill;
|
2007-02-19 22:01:27 +01:00
|
|
|
log_descriptor.bc.buffer->size= log_descriptor.bc.current_page_fill=
|
|
|
|
current_page_fill;
|
2007-02-02 08:41:32 +01:00
|
|
|
new_buffer->overlay= old_buffer;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
translog_new_page_header(&log_descriptor.horizon, &log_descriptor.bc);
|
2007-08-13 14:17:49 +02:00
|
|
|
old_buffer->next_buffer_offset= new_buffer->offset;
|
2007-02-02 08:41:32 +01:00
|
|
|
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
2007-02-19 22:01:27 +01:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/**
|
|
|
|
@brief Flush the log up to given LSN (included)
|
|
|
|
|
|
|
|
@param lsn log record serial number up to which (inclusive)
|
|
|
|
the log has to be flushed
|
|
|
|
|
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
|
|
|
|
|
|
|
@todo LOG: when a log write fails, we should not write to this log anymore
|
|
|
|
(if we add more log records to this log they will be unreadable: we will hit
|
|
|
|
the broken log record): all translog_flush() should be made to fail (because
|
|
|
|
translog_flush() is when a a transaction wants something durable and we
|
|
|
|
cannot make anything durable as log is corrupted). For that, a "my_bool
|
|
|
|
st_translog_descriptor::write_error" could be set to 1 when a
|
|
|
|
translog_write_record() or translog_flush() fails, and translog_flush()
|
|
|
|
would test this var (and translog_write_record() could also test this var if
|
|
|
|
it wants, though it's not absolutely needed).
|
|
|
|
Then, either shut Maria down immediately, or switch to a new log (but if we
|
|
|
|
get write error after write error, that would create too many logs).
|
|
|
|
A popular open-source transactional engine intentionally crashes as soon as
|
|
|
|
a log flush fails (we however don't want to crash the entire mysqld, but
|
|
|
|
stopping all engine's operations immediately would make sense).
|
|
|
|
Same applies to translog_write_record().
|
2007-09-21 12:48:57 +02:00
|
|
|
|
|
|
|
@todo: remove serialization and make group commit.
|
2007-02-02 08:41:32 +01:00
|
|
|
*/
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
my_bool translog_flush(LSN lsn)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
LSN old_flushed, sent_to_file;
|
|
|
|
int rc= 0;
|
|
|
|
uint i;
|
|
|
|
my_bool full_circle= 0;
|
|
|
|
DBUG_ENTER("translog_flush");
|
2007-09-13 09:37:51 +02:00
|
|
|
DBUG_PRINT("enter", ("Flush up to LSN: (%lu,0x%lx)", LSN_IN_PARTS(lsn)));
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-02-02 08:41:32 +01:00
|
|
|
|
2007-09-21 12:48:57 +02:00
|
|
|
pthread_mutex_lock(&log_descriptor.log_flush_lock);
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_lock();
|
|
|
|
old_flushed= log_descriptor.flushed;
|
|
|
|
for (;;)
|
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
uint16 buffer_no= log_descriptor.bc.buffer_no;
|
|
|
|
uint16 buffer_start= buffer_no;
|
2007-02-02 08:41:32 +01:00
|
|
|
struct st_translog_buffer *buffer_unlock= log_descriptor.bc.buffer;
|
|
|
|
struct st_translog_buffer *buffer= log_descriptor.bc.buffer;
|
|
|
|
/* we can't flush in future */
|
2007-02-12 13:23:43 +01:00
|
|
|
DBUG_ASSERT(cmp_translog_addr(log_descriptor.horizon, lsn) >= 0);
|
|
|
|
if (cmp_translog_addr(log_descriptor.flushed, lsn) >= 0)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-02-19 22:01:27 +01:00
|
|
|
DBUG_PRINT("info", ("already flushed: (%lu,0x%lx)",
|
2007-09-13 09:37:51 +02:00
|
|
|
LSN_IN_PARTS(log_descriptor.flushed)));
|
2007-09-21 12:48:57 +02:00
|
|
|
goto out;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
/* send to the file if it is not sent */
|
2007-08-13 14:17:49 +02:00
|
|
|
sent_to_file= translog_get_sent_to_file();
|
2007-02-12 13:23:43 +01:00
|
|
|
if (cmp_translog_addr(sent_to_file, lsn) >= 0)
|
2007-02-02 08:41:32 +01:00
|
|
|
break;
|
|
|
|
|
|
|
|
do
|
|
|
|
{
|
|
|
|
buffer_no= (buffer_no + 1) % TRANSLOG_BUFFERS_NO;
|
|
|
|
buffer= log_descriptor.buffers + buffer_no;
|
|
|
|
translog_buffer_lock(buffer);
|
|
|
|
translog_buffer_unlock(buffer_unlock);
|
|
|
|
buffer_unlock= buffer;
|
2007-02-19 22:01:27 +01:00
|
|
|
if (buffer->file != -1)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
buffer_unlock= NULL;
|
|
|
|
if (buffer_start == buffer_no)
|
|
|
|
{
|
|
|
|
/* we made a circle */
|
|
|
|
full_circle= 1;
|
|
|
|
translog_force_current_buffer_to_finish();
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
} while ((buffer_start != buffer_no) &&
|
2007-02-12 13:23:43 +01:00
|
|
|
cmp_translog_addr(log_descriptor.flushed, lsn) < 0);
|
2007-09-21 12:48:57 +02:00
|
|
|
if (buffer_unlock != NULL && buffer_unlock != buffer)
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_buffer_unlock(buffer_unlock);
|
2007-02-19 22:01:27 +01:00
|
|
|
rc= translog_buffer_flush(buffer);
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_buffer_unlock(buffer);
|
2007-02-19 22:01:27 +01:00
|
|
|
if (rc)
|
2007-09-21 12:48:57 +02:00
|
|
|
{
|
|
|
|
rc= 1;
|
|
|
|
goto out;
|
|
|
|
}
|
2007-02-02 08:41:32 +01:00
|
|
|
if (!full_circle)
|
|
|
|
translog_lock();
|
|
|
|
}
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
for (i= LSN_FILE_NO(old_flushed); i <= LSN_FILE_NO(lsn); i++)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
uint cache_index;
|
|
|
|
File file;
|
|
|
|
|
2007-02-12 13:23:43 +01:00
|
|
|
if ((cache_index= LSN_FILE_NO(log_descriptor.horizon) - i) <
|
|
|
|
OPENED_FILES_NUM)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
/* file in the cache */
|
2007-02-19 22:01:27 +01:00
|
|
|
if (log_descriptor.log_file_num[cache_index] == -1)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
|
|
|
if ((log_descriptor.log_file_num[cache_index]=
|
2007-02-19 22:01:27 +01:00
|
|
|
open_logfile_by_number_no_cache(i)) == -1)
|
2007-02-02 08:41:32 +01:00
|
|
|
{
|
2007-09-21 12:48:57 +02:00
|
|
|
rc= 1;
|
|
|
|
goto out;
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
file= log_descriptor.log_file_num[cache_index];
|
|
|
|
rc|= my_sync(file, MYF(MY_WME));
|
|
|
|
}
|
2007-02-19 22:01:27 +01:00
|
|
|
/* We sync file when we are closing it => do nothing if file closed */
|
2007-02-02 08:41:32 +01:00
|
|
|
}
|
|
|
|
log_descriptor.flushed= sent_to_file;
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/** @todo LOG decide if syncing of directory is needed */
|
2007-09-07 17:03:36 +02:00
|
|
|
rc|= my_sync(log_descriptor.directory_fd, MYF(MY_WME | MY_IGNORE_BADFD));
|
2007-09-21 12:48:57 +02:00
|
|
|
out:
|
2007-02-02 08:41:32 +01:00
|
|
|
translog_unlock();
|
2007-09-21 12:48:57 +02:00
|
|
|
pthread_mutex_unlock(&log_descriptor.log_flush_lock);
|
2007-02-02 08:41:32 +01:00
|
|
|
DBUG_RETURN(rc);
|
|
|
|
}
|
2007-06-11 16:29:53 +02:00
|
|
|
|
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/**
|
|
|
|
@brief Gives a 2-byte-id to MARIA_SHARE and logs this fact
|
|
|
|
|
|
|
|
If a MARIA_SHARE does not yet have a 2-byte-id (unique over all currently
|
|
|
|
open MARIA_SHAREs), give it one and record this assignment in the log
|
|
|
|
(LOGREC_FILE_ID log record).
|
|
|
|
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
@param tbl_info table
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
@param trn calling transaction
|
|
|
|
|
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
|
|
|
|
|
|
|
@note Can be called even if share already has an id (then will do nothing)
|
|
|
|
*/
|
|
|
|
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
int translog_assign_id_to_share(MARIA_HA *tbl_info, TRN *trn)
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
{
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
MARIA_SHARE *share= tbl_info->s;
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/*
|
|
|
|
If you give an id to a non-BLOCK_RECORD table, you also need to release
|
|
|
|
this id somewhere. Then you can change the assertion.
|
|
|
|
*/
|
|
|
|
DBUG_ASSERT(share->data_file_type == BLOCK_RECORD);
|
|
|
|
/* re-check under mutex to avoid having 2 ids for the same share */
|
|
|
|
pthread_mutex_lock(&share->intern_lock);
|
|
|
|
if (likely(share->id == 0))
|
|
|
|
{
|
|
|
|
/* Inspired by set_short_trid() of trnman.c */
|
2007-06-26 22:30:09 +02:00
|
|
|
uint i= share->kfile.file % SHARE_ID_MAX + 1;
|
|
|
|
do
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
{
|
2007-06-26 22:30:09 +02:00
|
|
|
my_atomic_rwlock_wrlock(&LOCK_id_to_share);
|
|
|
|
for ( ; i <= SHARE_ID_MAX ; i++) /* the range is [1..SHARE_ID_MAX] */
|
|
|
|
{
|
|
|
|
void *tmp= NULL;
|
|
|
|
if (id_to_share[i] == NULL &&
|
|
|
|
my_atomic_casptr((void **)&id_to_share[i], &tmp, share))
|
|
|
|
{
|
|
|
|
share->id= (uint16)i;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
my_atomic_rwlock_wrunlock(&LOCK_id_to_share);
|
|
|
|
i= 1; /* scan the whole array */
|
|
|
|
} while (share->id == 0);
|
2007-08-01 15:52:57 +02:00
|
|
|
DBUG_PRINT("info", ("id_to_share: 0x%lx -> %u", (ulong)share, share->id));
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
LSN lsn;
|
|
|
|
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 2];
|
|
|
|
uchar log_data[FILEID_STORE_SIZE];
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
|
|
|
|
/*
|
|
|
|
open_file_name is an unresolved name (symlinks are not resolved, datadir
|
|
|
|
is not realpath-ed, etc) which is good: the log can be moved to another
|
|
|
|
directory and continue working.
|
|
|
|
*/
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 1].str= share->open_file_name;
|
|
|
|
/**
|
|
|
|
@todo if we had the name's length in MARIA_SHARE we could avoid this
|
|
|
|
strlen()
|
|
|
|
*/
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 1].length=
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
strlen(share->open_file_name) + 1;
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
if (unlikely(translog_write_record(&lsn, LOGREC_FILE_ID, trn, tbl_info,
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
sizeof(log_data) +
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS +
|
|
|
|
1].length,
|
|
|
|
sizeof(log_array)/sizeof(log_array[0]),
|
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1)
is achieved in this patch. The table's live checksum
(info->s->state.state.checksum) is updated in inwrite_rec_hook's
under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE
and REDO_DELETE_ALL. The checksum variation caused by the operation
is stored in these UNDOs, so that the REDO phase, when it sees such
UNDOs, can update the live checksum if it is older (state.is_of_lsn is
lower) than the record. It is also used, as a nice add-on with no
cost, to do less row checksum computation during the UNDO phase
(as we have it in the record already).
Doing this work, it became pressing to move in-write hooks
(write_hook_for_redo() et al) to ma_blockrec.c.
The 'parts' argument of inwrite_rec_hook is unpredictable (it comes
mangled at this stage, for example by LSN compression) so it is
replaced by a 'void* hook_arg', which is used to pass down information,
currently only to write_hook_for_clr_end() (previous undo_lsn and
type of undone record).
* If from ha_maria, we print to stderr how many seconds (with one
fractional digit) the REDO phase took, same for UNDO phase and for
final table close. Just to give an indication for debugging and maybe
also for Support.
storage/maria/ha_maria.cc:
question for Monty
storage/maria/ma_blockrec.c:
* log in-write hooks (write_hook_for_redo() etc) move from
ma_loghandler.c to here; this is natural: the hooks are coupled
to their callers (functions in ma_blockrec.c).
* translog_write_record() now has a new argument "hook_arg";
using it to pass down to write_hook_for_clr_end() the transaction's
previous_undo_lsn and the type of the being undone record, and also
to pass down to all UNDOs the live checksum variation caused by the
operation.
* If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE
and in CLR_END the checksum variation ("delta") caused by the
operation. For example if a DELETE caused the table's live checksum
to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes,
the value 333 (456-123).
* Instead of hard-coded "1" as length of the place where we store
the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE;
use macros clr_type_store and clr_type_korr.
* write_block_record() has a new parameter 'old_record_checksum'
which is the pre-computed checksum of old_record; that value is used
to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END.
* In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE
the row's checksum is already computed.
* _ma_update_block_record2() now expect the new row's checksum into
cur_row.checksum (was already true) and the old row's checksum into
new_row.checksum (that's new). Its two callers, maria_update() and
_ma_apply_undo_row_update(), honour this.
* When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick
up the checksum delta from the log record. It is then used to update
the table's live checksum when writing CLR_END, and saves us a
computation of record.
storage/maria/ma_blockrec.h:
in-write hooks move from ma_loghandler.c
storage/maria/ma_check.c:
more straightforward size of buffer
storage/maria/ma_checkpoint.c:
<= is enough
storage/maria/ma_commit.c:
new prototype of translog_write_record()
storage/maria/ma_create.c:
new prototype of translog_write_record()
storage/maria/ma_delete.c:
The row's checksum must be computed before calling(*delete_record)(),
not after, because it must be known inside _ma_delete_block_record()
(to update the table's live checksum when writing UNDO_ROW_DELETE).
If deleting from a transactional table, live checksum was already updated
when writing UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
@todo is now done (in ma_loghandler.c)
storage/maria/ma_delete_table.c:
new prototype of translog_write_record()
storage/maria/ma_loghandler.c:
* in-write hooks move to ma_blockrec.c.
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
* fix for compiler warning (unused buffer_start when compiling without
debug support)
* Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE
and CLR_END, but only if the table has live checksum, these records
are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their
length is X if no live checksum and X+4 otherwise).
* add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the
table's live checksum. Update it also in hooks of UNDO_ROW_INSERT|
DELETE and REDO_DELETE_ALL and CLR_END.
* Bugfix: when reading a record in translog_read_record(), it happened
that "length" became negative, because the function assumed that
the record extended beyond the page's end, whereas it may be shorter.
storage/maria/ma_loghandler.h:
* Instead of hard-coded "1" and "4", use symbols and macros
to store/retrieve the type of record which the CLR_END corresponds
to, and the checksum variation caused by the operation which logs the
record
* translog_write_record() gets a new argument 'hook_arg' which is
passed down to pre|inwrite_rec_hook. It is more useful that 'parts'
for those hooks, because when those hooks are called, 'parts' has
possibly been mangled (like with LSN compression) and is so
unpredictable.
storage/maria/ma_open.c:
fix for "empty body in if() statement" (when compiling without safemutex)
storage/maria/ma_pagecache.c:
<= is enough
storage/maria/ma_recovery.c:
* print the time that each recovery phase (REDO/UNDO/flush) took;
this is enabled only when recovering from ha_maria. Is it printed
n seconds with a fractional part of one digit (like 123.4 seconds).
* In the REDO phase, update the table's live checksum by using
the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END.
Update it too when seeing REDO_DELETE_ALL.
* In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does
not have live checksum then reading the record's header (as done by
the master loop of run_undo_phase()) is enough; otherwise we
do a translog_read_record() to have the checksum delta ready
for _ma_apply_undo_row_insert().
* When at the end of the REDO phase we notice that there is an unfinished
group of REDOs, don't assert in debug binaries, as I verified that it
can happen in real life (with kill -9)
* removing ' in #error as it confuses gcc3
storage/maria/ma_rename.c:
new prototype of translog_write_record()
storage/maria/ma_test_recovery.expected:
Change in output of ma_test_recovery: now all live checksums of
original tables equal those of tables recreated by the REDO phase
and those of tables fixed by the UNDO phase. I.e. recovery of
the live checksum looks like working (which was after all the only
goal of this changeset).
I checked by hand that it's not just all live checksums which are
now 0 and that's why they match. They are the old values like
3757530372. maria.test has hard-coded checksum values in its result
file so checks this too.
storage/maria/ma_update.c:
* It's useless to put up HA_STATE_CHANGED in 'key_changed',
as we put up HA_STATE_CHANGED in info->update anyway.
* We need to compute the old and new rows' checksum before calling
(*update_record)(), as checksum delta must be known when logging
UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that
some functions change the 'newrec' record (at least _ma_check_unique()
does) so we cannot move the checksum computation too early in the
function.
storage/maria/ma_write.c:
If inserting into a transactional table, live's checksum was
already updated when writing UNDO_ROW_INSERT. The multiplication
is a trick to save an if().
storage/maria/unittest/ma_test_loghandler-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_first_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_max_lsn-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_noflush-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
new prototype of translog_write_record()
storage/maria/unittest/ma_test_loghandler_purge-t.c:
new prototype of translog_write_record()
storage/myisam/sort.c:
fix for compiler warnings in pushbuild (write_merge_key* functions
didn't have their declaration match MARIA_HA::write_key).
2007-10-02 18:02:09 +02:00
|
|
|
log_array, log_data, NULL)))
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&share->intern_lock);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
@brief Recycles a MARIA_SHARE's short id.
|
|
|
|
|
|
|
|
@param share table
|
|
|
|
|
|
|
|
@note Must be called only if share has an id (i.e. id != 0)
|
|
|
|
*/
|
|
|
|
|
|
|
|
void translog_deassign_id_from_share(MARIA_SHARE *share)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("id_to_share: 0x%lx id %u -> 0",
|
|
|
|
(ulong)share, share->id));
|
|
|
|
/*
|
|
|
|
We don't need any mutex as we are called only when closing the last
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
instance of the table or at the end of REPAIR: no writes can be
|
|
|
|
happening. But a Checkpoint may be reading share->id, so we require this
|
|
|
|
mutex:
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
*/
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
safe_mutex_assert_owner(&share->intern_lock);
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
my_atomic_rwlock_rdlock(&LOCK_id_to_share);
|
|
|
|
my_atomic_storeptr((void **)&id_to_share[share->id], 0);
|
|
|
|
my_atomic_rwlock_rdunlock(&LOCK_id_to_share);
|
WL#3072 Maria Recovery
misc fixes of execution of UNDOs in the UNDO phase:
- into the CLR_END, store the LSN of the _previous_ UNDO (we debated
what was best, so far we're going with "previous"; later we can change
to "current" if needed), and store the type of record which is being
undone (needed to know how to update state.records when we see the
CLR_END during the REDO phase).
- declaring all UNDOs and CLR_END as "compressed"
- when executing an UNDO in the UNDO phase, state.records is updated
as a hook when writing CLR_END (needed for "recovery of the state"),
and so is trn->undo_lsn (needed for when we have checkpoints).
- bugfix (execution of UNDO_ROW_DELETE didn't store the correct checksum
into the re-inserted row, maria_chk -r thus threw the row away).
- modifications of ma_test1: where to stop is now driven by --testflag;
--test-undo just tells how to stop (flush data, flush log, nothing).
- ma_test_recovery: testing of the UNDO phase, more testing of the
REDO phase, identification of a bug.
storage/maria/ma_blockrec.c:
- bugfix: execution of UNDO_ROW_DELETE didn't store the correct
checksum into the row (leading to "maria_chk -r" eliminating the
re-inserted row, net effect was that rollback appeared to have
rolled back no deletion). Reason was that write_block_record() used
info->cur_row.checksum, while "row" can be != &info->cur_row
(case of UNDO_ROW_DELETE). After fixing this, problems with
_ma_update_block_record() appeared; indeed checksum was computed
by allocate_and_write_block_record() while _ma_update_block_record()
directly calls write_block_record(). Solution is to compute checksum
in write_block_record() instead.
- when executing an UNDO, we now pass the LSN of the _previous_ UNDO
to block_format functions. This LSN can be 0 (if the being-executed UNDO
was the transaction's first UNDO), so "undo_lsn==0" cannot work
anymore to indicate "this is not UNDO work". Using undo_lsn==LSN_ERROR
instead (this is an impossible LSN).
- store into CLR_END the type of log record which was undone
(INSERT/UPDATE/DELETE); needed for Recovery to know if/how it has
to update state.records if it sees this CLR_END in the REDO phase.
- when writing the CLR_END in _ma_apply_undo_row_insert(),
the place to store file's id is log_data+LSN_STORE_SIZE.
- in _ma_apply_undo_row_insert(), the records-- is moved
to a hook when writing the CLR_END (this way it is under log's mutex
which is needed for "recovery of the state")
storage/maria/ma_loghandler.c:
- all UNDOs, and CLR_END, start with the LSN of another UNDO; so
we can declare them "compressed".
- write_hook_for_clr_end() to set trn->undo_lsn (to the previous
UNDO's LSN) under log's lock (like UNDOs set trn->undo_lsn under log's
lock), and also update, if appropriate, state.records.
- reset share->id to 0 when deassigning; not useful for now but
sounds logical.
storage/maria/ma_recovery.c:
- if no table is found for a REDO, it's not an error; for an UNDO, it is
- in the REDO phase, when we see a CLR_END we must update trn->undo_lsn
and sometimes state.records.
- in the UNDO phase, when we execute an UNDO_ROW_INSERT:
* update trn->undo_lsn only after executing the record
* store the _previous_ undo_lsn into the CLR_END
- at the end of the REDO phase, when we recreate TRN objects, they
have already their long id in the log (either via a
LOGREC_LONG_TRANSACTION_ID, or in a checkpoint record), don't write
a new, useless LOGREC_LONG_TRANSACTION_ID for them.
storage/maria/ma_test1.c:
* where to stop execution is now driven by --testflag and not --test-undo
(ma_test2 already has --testflag for the same purpose). This allows
us to do a clean stop (with commit) at any point.
* --test-undo=# tells how to abort (flush all pages (which implies
flushing log) or only log or nothing); all such "ways of crashing"
are tested in ma_test_recovery
storage/maria/ma_test_recovery:
* Testing execution of UNDOs, with and without BLOBs.
* Testing idempotency of REDOs.
* See @todo for a probable bug with BLOBs.
* maria_chk -rq instead of -r, as with -q it nicely stops on any
problem in the data file (like the checksum bug see comment of
ma_blockrec.c).
* Testing if log was written by UNDO phase (often expected),
not written by REDO phase (always expected).
* Less output on the screen, compares with expected output in the end.
* some shell thingies like "set --" and $# are courtesy of
Danny and Pekka.
storage/maria/maria_read_log.c:
when only displaying the records, don't do an UNDO phase
storage/maria/ma_test_recovery.expected:
This is the expected output of a great part of ma_test_recovery.
ma_test_recovery compares its output to the expected output
and tells if different.
If we look at this file it mentions differences in checksum
(normal, it's not recovered yet) and in records count
(getting a correct records' count when recovery starts on an
already existing table, like when testing rollback,
is coded but not yet pushed).
2007-09-06 16:04:36 +02:00
|
|
|
share->id= 0;
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
/* useless but safety: */
|
|
|
|
share->lsn_of_file_id= LSN_IMPOSSIBLE;
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
}
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
|
|
|
|
|
WL#3072 Maria recovery
* create page cache before initializing engine and not after, because
Maria's recovery needs a page cache
* make the creation of a bitmap page more crash-resistent
* bugfix (see ma_blockrec.c)
* back to old way: create an 8k bitmap page when creating table
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* maria_chk tags repaired table with a special LSN
* reworking all around in ma_recovery.c (less duplication)
mysys/my_realloc.c:
noted an issue in my_realloc()
sql/mysqld.cc:
page cache needs to be created before engines are initialized,
because Maria's initialization may do a recovery which needs
the page cache.
storage/maria/ha_maria.cc:
update to new prototype
storage/maria/ma_bitmap.c:
when creating the first bitmap page we used chsize to 8192 bytes then
pwrite (overwrite) the last 2 bytes (8191-8192). If crash between
the two operations, this leaves a bitmap page full without its end
marker. A later recovery may try to read this page and find it
exists and misses a marker and conclude it's corrupted and fail.
Changing the chsize to only 8190 bytes: recovery will then find
the page is too short and recreate it entirely.
storage/maria/ma_blockrec.c:
Fix for a bug: when executing a REDO, if the data page is created,
data_file_length was increased before _ma_bitmap_set():
_ma_bitmap_set() called _ma_read_bitmap_page() which, due to the
increased data_file_length, expected to find a bitmap page on disk
with a correct end marker; if the bitmap page didn't exist already
in fact, this failed. Fixed by increasing data_file_length only after
_ma_read_bitmap_page() has created the new bitmap page correctly.
This bug could happen every time a REDO is about creating a new
bitmap page.
storage/maria/ma_check.c:
empty data file has a bitmap page
storage/maria/ma_control_file.c:
useless parameter to ma_control_file_create_or_open(), just
test if this is recovery.
storage/maria/ma_control_file.h:
new prototype
storage/maria/ma_create.c:
Back to how it was before: maria_create() creates an 8k bitmap page.
Thus (bugfix) data_file_length needs to reflect this instead of being 0.
storage/maria/ma_loghandler.c:
as ma_test1 and ma_test2 now use real transactions and not
dummy_transaction_object, REDO for INSERT/UPDATE/DELETE are always
about real transactions, can assert this.
A function for Recovery to assign a short id to a table.
storage/maria/ma_loghandler.h:
new function
storage/maria/ma_loghandler_lsn.h:
maria_chk tags repaired tables with this LSN
storage/maria/ma_open.c:
* enforce that DMLs on transactional tables use real transactions
and not dummy_transaction_object.
* test if table was repaired with maria_chk (which has to been
seen as an import of an external table into the server), test
validity of create_rename_lsn (header corruption detection)
* comments.
storage/maria/ma_recovery.c:
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* reworking all around (less duplication)
storage/maria/ma_recovery.h:
a parameter to say if the UNDO phase should be skipped
storage/maria/maria_chk.c:
tag repaired tables with a special LSN
storage/maria/maria_read_log.c:
* update to new prototype
* no UNDO phase in maria_read_log for now
storage/maria/trnman.c:
* a function for Recovery to create a transaction (TRN), needed
in the UNDO phase
* a function for Recovery to grab an existing transaction, needed
in the UNDO phase (rollback all existing transactions)
storage/maria/trnman_public.h:
new functions
2007-08-29 16:43:01 +02:00
|
|
|
void translog_assign_id_to_share_from_recovery(MARIA_SHARE *share,
|
|
|
|
uint16 id)
|
|
|
|
{
|
|
|
|
DBUG_ASSERT(maria_in_recovery && !maria_multi_threaded);
|
|
|
|
DBUG_ASSERT(share->data_file_type == BLOCK_RECORD);
|
|
|
|
DBUG_ASSERT(share->id == 0);
|
|
|
|
DBUG_ASSERT(id_to_share[id] == NULL);
|
|
|
|
id_to_share[share->id= id]= share;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-08-13 21:54:29 +02:00
|
|
|
/**
|
|
|
|
@brief check if such log file exists
|
|
|
|
|
|
|
|
@param file_no number of the file to test
|
|
|
|
|
|
|
|
@retval 0 no such file
|
|
|
|
@retval 1 there is file with such number
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool translog_is_file(uint file_no)
|
|
|
|
{
|
|
|
|
MY_STAT stat_buff;
|
|
|
|
char path[FN_REFLEN];
|
|
|
|
return (test(my_stat(translog_filename_by_fileno(file_no, path),
|
2007-08-27 20:55:39 +02:00
|
|
|
&stat_buff, MYF(0))));
|
2007-08-13 21:54:29 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
/**
|
2007-08-27 20:55:39 +02:00
|
|
|
@brief returns minimum log file number
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
|
2007-08-27 20:55:39 +02:00
|
|
|
@param horizon the end of the log
|
|
|
|
@param is_protected true if it is under purge_log protection
|
|
|
|
|
|
|
|
@retval minimum file number
|
|
|
|
@retval 0 no files found
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
*/
|
|
|
|
|
2007-08-27 20:55:39 +02:00
|
|
|
static uint32 translog_first_file(TRANSLOG_ADDRESS horizon, int is_protected)
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
{
|
2007-08-27 20:55:39 +02:00
|
|
|
uint min_file= 1, max_file;
|
|
|
|
DBUG_ENTER("translog_first_file");
|
|
|
|
if (!is_protected)
|
|
|
|
pthread_mutex_lock(&log_descriptor.purger_lock);
|
|
|
|
if (log_descriptor.min_file_number &&
|
|
|
|
translog_is_file(log_descriptor.min_file_number))
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("cached %lu",
|
|
|
|
(ulong) log_descriptor.min_file_number));
|
|
|
|
if (!is_protected)
|
|
|
|
pthread_mutex_unlock(&log_descriptor.purger_lock);
|
|
|
|
DBUG_RETURN(log_descriptor.min_file_number);
|
|
|
|
}
|
2007-08-13 21:54:29 +02:00
|
|
|
|
2007-08-27 20:55:39 +02:00
|
|
|
max_file= LSN_FILE_NO(horizon);
|
|
|
|
|
|
|
|
if (MAKE_LSN(1, TRANSLOG_PAGE_SIZE) >= horizon)
|
2007-08-13 21:54:29 +02:00
|
|
|
{
|
|
|
|
/* there is no first page yet */
|
2007-08-27 20:55:39 +02:00
|
|
|
DBUG_RETURN(0);
|
2007-08-13 21:54:29 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* binary search for last file */
|
|
|
|
while (min_file != max_file && min_file != (max_file - 1))
|
|
|
|
{
|
|
|
|
uint test= (min_file + max_file) / 2;
|
|
|
|
DBUG_PRINT("info", ("min_file: %u test: %u max_file: %u",
|
|
|
|
min_file, test, max_file));
|
|
|
|
if (test == max_file)
|
|
|
|
test--;
|
|
|
|
if (translog_is_file(test))
|
|
|
|
max_file= test;
|
|
|
|
else
|
|
|
|
min_file= test;
|
|
|
|
}
|
2007-08-27 20:55:39 +02:00
|
|
|
log_descriptor.min_file_number= max_file;
|
|
|
|
if (!is_protected)
|
|
|
|
pthread_mutex_unlock(&log_descriptor.purger_lock);
|
|
|
|
DBUG_RETURN(max_file);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-09-14 14:01:44 +02:00
|
|
|
/**
|
|
|
|
@brief returns the most close LSN higher the given chunk address
|
|
|
|
|
|
|
|
@param addr the chunk address to start from
|
|
|
|
@param horizon the horizon if it is known or LSN_IMPOSSIBLE
|
|
|
|
|
|
|
|
@retval LSN_ERROR Error
|
|
|
|
@retval LSN_IMPOSSIBLE no LSNs after the address
|
|
|
|
@retval # LSN of the most close LSN higher the given chunk address
|
|
|
|
*/
|
|
|
|
|
|
|
|
LSN translog_next_LSN(TRANSLOG_ADDRESS addr, TRANSLOG_ADDRESS horizon)
|
|
|
|
{
|
|
|
|
uint chunk_type;
|
|
|
|
TRANSLOG_SCANNER_DATA scanner;
|
2007-09-27 13:18:28 +02:00
|
|
|
LSN result;
|
2007-09-14 14:01:44 +02:00
|
|
|
DBUG_ENTER("translog_next_LSN");
|
|
|
|
|
|
|
|
if (horizon == LSN_IMPOSSIBLE)
|
|
|
|
horizon= translog_get_horizon();
|
|
|
|
|
|
|
|
if (addr == horizon)
|
|
|
|
DBUG_RETURN(LSN_IMPOSSIBLE);
|
|
|
|
|
2007-09-27 13:18:28 +02:00
|
|
|
translog_init_scanner(addr, 0, &scanner, 1);
|
2007-09-14 14:01:44 +02:00
|
|
|
|
|
|
|
chunk_type= scanner.page[scanner.page_offset] & TRANSLOG_CHUNK_TYPE;
|
|
|
|
DBUG_PRINT("info", ("type: %x byte: %x", (uint) chunk_type,
|
|
|
|
(uint) scanner.page[scanner.page_offset]));
|
|
|
|
while (chunk_type != TRANSLOG_CHUNK_LSN &&
|
|
|
|
chunk_type != TRANSLOG_CHUNK_FIXED &&
|
|
|
|
scanner.page[scanner.page_offset] != 0)
|
|
|
|
{
|
|
|
|
if (translog_get_next_chunk(&scanner))
|
|
|
|
DBUG_RETURN(LSN_ERROR);
|
|
|
|
chunk_type= scanner.page[scanner.page_offset] & TRANSLOG_CHUNK_TYPE;
|
|
|
|
DBUG_PRINT("info", ("type: %x byte: %x", (uint) chunk_type,
|
|
|
|
(uint) scanner.page[scanner.page_offset]));
|
|
|
|
}
|
2007-09-27 13:18:28 +02:00
|
|
|
|
2007-09-14 14:01:44 +02:00
|
|
|
if (scanner.page[scanner.page_offset] == 0)
|
2007-09-27 13:18:28 +02:00
|
|
|
result= LSN_IMPOSSIBLE; /* reached page filler */
|
|
|
|
else
|
|
|
|
result= scanner.page_addr + scanner.page_offset;
|
|
|
|
translog_destroy_scanner(&scanner);
|
|
|
|
DBUG_RETURN(result);
|
2007-09-14 14:01:44 +02:00
|
|
|
}
|
|
|
|
|
2007-08-27 20:55:39 +02:00
|
|
|
/**
|
|
|
|
@brief returns the LSN of the first record starting in this log
|
|
|
|
|
|
|
|
@retval LSN_ERROR Error
|
2007-09-13 09:37:51 +02:00
|
|
|
@retval LSN_IMPOSSIBLE no log or the log is empty
|
2007-08-27 20:55:39 +02:00
|
|
|
@retval # LSN of the first record
|
|
|
|
*/
|
|
|
|
|
|
|
|
LSN translog_first_lsn_in_log()
|
|
|
|
{
|
|
|
|
TRANSLOG_ADDRESS addr, horizon= translog_get_horizon();
|
|
|
|
TRANSLOG_VALIDATOR_DATA data;
|
|
|
|
uint file;
|
|
|
|
uint16 chunk_offset;
|
|
|
|
uchar *page;
|
|
|
|
DBUG_ENTER("translog_first_lsn_in_log");
|
2007-09-13 09:37:51 +02:00
|
|
|
DBUG_PRINT("info", ("Horizon: (%lu,0x%lx)", LSN_IN_PARTS(addr)));
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-08-27 20:55:39 +02:00
|
|
|
|
|
|
|
if (!(file= translog_first_file(horizon, 0)))
|
|
|
|
{
|
|
|
|
/* log has no records yet */
|
|
|
|
DBUG_RETURN(LSN_IMPOSSIBLE);
|
|
|
|
}
|
2007-08-13 21:54:29 +02:00
|
|
|
|
2007-08-27 20:55:39 +02:00
|
|
|
addr= MAKE_LSN(file, TRANSLOG_PAGE_SIZE); /* the first page of the file */
|
2007-08-13 21:54:29 +02:00
|
|
|
data.addr= &addr;
|
|
|
|
{
|
2007-09-14 14:01:44 +02:00
|
|
|
uchar buffer[TRANSLOG_PAGE_SIZE];
|
2007-09-27 13:18:28 +02:00
|
|
|
if ((page= translog_get_page(&data, buffer, NULL)) == NULL ||
|
2007-09-14 14:01:44 +02:00
|
|
|
(chunk_offset= translog_get_first_chunk_offset(page)) == 0)
|
2007-08-13 21:54:29 +02:00
|
|
|
DBUG_RETURN(LSN_ERROR);
|
|
|
|
}
|
2007-09-14 14:01:44 +02:00
|
|
|
addr+= chunk_offset;
|
|
|
|
|
|
|
|
DBUG_RETURN(translog_next_LSN(addr, horizon));
|
2007-08-13 21:54:29 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
@brief returns theoretical first LSN if first log is present
|
|
|
|
|
|
|
|
@retval LSN_ERROR Error
|
|
|
|
@retval LSN_IMPOSSIBLE no log
|
|
|
|
@retval # LSN of the first record
|
|
|
|
*/
|
|
|
|
|
|
|
|
LSN translog_first_theoretical_lsn()
|
|
|
|
{
|
|
|
|
TRANSLOG_ADDRESS addr= translog_get_horizon();
|
|
|
|
uchar buffer[TRANSLOG_PAGE_SIZE], *page;
|
|
|
|
TRANSLOG_VALIDATOR_DATA data;
|
|
|
|
DBUG_ENTER("translog_first_theoretical_lsn");
|
2007-09-13 09:37:51 +02:00
|
|
|
DBUG_PRINT("info", ("Horizon: (%lu,0x%lx)", LSN_IN_PARTS(addr)));
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-08-13 21:54:29 +02:00
|
|
|
|
|
|
|
if (!translog_is_file(1))
|
|
|
|
DBUG_RETURN(LSN_IMPOSSIBLE);
|
|
|
|
if (addr == MAKE_LSN(1, TRANSLOG_PAGE_SIZE))
|
|
|
|
{
|
2007-08-27 20:55:39 +02:00
|
|
|
/* log has no records yet */
|
2007-08-13 21:54:29 +02:00
|
|
|
DBUG_RETURN(MAKE_LSN(1, TRANSLOG_PAGE_SIZE +
|
|
|
|
log_descriptor.page_overhead));
|
|
|
|
}
|
|
|
|
|
|
|
|
addr= MAKE_LSN(1, TRANSLOG_PAGE_SIZE); /* the first page of the file */
|
|
|
|
data.addr= &addr;
|
2007-09-27 13:18:28 +02:00
|
|
|
if ((page= translog_get_page(&data, buffer, NULL)) == NULL)
|
2007-08-13 21:54:29 +02:00
|
|
|
DBUG_RETURN(LSN_ERROR);
|
|
|
|
|
|
|
|
DBUG_RETURN(MAKE_LSN(1, TRANSLOG_PAGE_SIZE +
|
|
|
|
page_overhead[page[TRANSLOG_PAGE_FLAGS]]));
|
WL#3072 Maria Recovery
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
2007-06-26 16:49:23 +02:00
|
|
|
}
|
2007-08-27 20:55:39 +02:00
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
@brief Check given low water mark and purge files if it is need
|
|
|
|
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
@param low the last (minimum) address which is need
|
2007-08-27 20:55:39 +02:00
|
|
|
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
|
|
|
*/
|
|
|
|
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
my_bool translog_purge(TRANSLOG_ADDRESS low)
|
2007-08-27 20:55:39 +02:00
|
|
|
{
|
|
|
|
uint32 last_need_file= LSN_FILE_NO(low);
|
|
|
|
TRANSLOG_ADDRESS horizon= translog_get_horizon();
|
|
|
|
int rc= 0;
|
|
|
|
DBUG_ENTER("translog_purge");
|
2007-09-13 09:37:51 +02:00
|
|
|
DBUG_PRINT("enter", ("low: (%lu,0x%lx)", LSN_IN_PARTS(low)));
|
2007-09-14 08:17:36 +02:00
|
|
|
DBUG_ASSERT(translog_inited == 1);
|
2007-08-27 20:55:39 +02:00
|
|
|
|
|
|
|
pthread_mutex_lock(&log_descriptor.purger_lock);
|
|
|
|
if (LSN_FILE_NO(log_descriptor.last_lsn_checked) < last_need_file)
|
|
|
|
{
|
|
|
|
uint32 i;
|
|
|
|
uint32 min_file= translog_first_file(horizon, 1);
|
|
|
|
DBUG_ASSERT(min_file != 0); /* log is already started */
|
|
|
|
|
|
|
|
for(i= min_file; i < last_need_file && rc == 0; i++)
|
|
|
|
{
|
|
|
|
LSN lsn= translog_get_file_max_lsn_stored(i);
|
|
|
|
if (lsn == LSN_IMPOSSIBLE)
|
|
|
|
break; /* files are still in writing */
|
|
|
|
if (lsn == LSN_ERROR)
|
|
|
|
{
|
|
|
|
rc= 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (cmp_translog_addr(lsn, low) >= 0)
|
|
|
|
break;
|
|
|
|
DBUG_PRINT("info", ("purge file %lu", (ulong) i));
|
|
|
|
{
|
|
|
|
char path[FN_REFLEN], *file_name;
|
|
|
|
file_name= translog_filename_by_fileno(i, path);
|
|
|
|
rc= test(my_delete(file_name, MYF(MY_WME)));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
pthread_mutex_unlock(&log_descriptor.purger_lock);
|
|
|
|
DBUG_RETURN(rc);
|
|
|
|
}
|