2006-04-11 16:45:10 +03:00
|
|
|
/* Copyright (C) 2006 MySQL AB & MySQL Finland AB & TCX DataKonsult AB
|
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
2007-03-02 11:20:23 +01:00
|
|
|
the Free Software Foundation; version 2 of the License.
|
2006-04-11 16:45:10 +03:00
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program; if not, write to the Free Software
|
|
|
|
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
|
|
|
|
|
|
|
|
/*
|
|
|
|
Rename a table
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "ma_fulltext.h"
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
#include "trnman_public.h"
|
|
|
|
|
|
|
|
/**
|
|
|
|
@brief renames a table
|
|
|
|
|
|
|
|
@param old_name current name of table
|
|
|
|
@param new_name table should be renamed to this name
|
|
|
|
|
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval !=0 Error
|
|
|
|
*/
|
2006-04-11 16:45:10 +03:00
|
|
|
|
|
|
|
int maria_rename(const char *old_name, const char *new_name)
|
|
|
|
{
|
|
|
|
char from[FN_REFLEN],to[FN_REFLEN];
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
int data_file_rename_error;
|
2006-04-11 16:45:10 +03:00
|
|
|
#ifdef USE_RAID
|
|
|
|
uint raid_type=0,raid_chunks=0;
|
|
|
|
#endif
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
MARIA_HA *info;
|
|
|
|
MARIA_SHARE *share;
|
|
|
|
myf sync_dir;
|
2006-04-11 16:45:10 +03:00
|
|
|
DBUG_ENTER("maria_rename");
|
|
|
|
|
|
|
|
#ifdef EXTRA_DEBUG
|
|
|
|
_ma_check_table_is_closed(old_name,"rename old_table");
|
|
|
|
_ma_check_table_is_closed(new_name,"rename new table2");
|
|
|
|
#endif
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/** @todo LOCK take X-lock on table */
|
|
|
|
if (!(info= maria_open(old_name, O_RDWR, HA_OPEN_FOR_REPAIR)))
|
|
|
|
DBUG_RETURN(my_errno);
|
|
|
|
share= info->s;
|
2006-04-11 16:45:10 +03:00
|
|
|
#ifdef USE_RAID
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
raid_type = share->base.raid_type;
|
|
|
|
raid_chunks = share->base.raid_chunks;
|
|
|
|
#endif
|
|
|
|
|
Maria:
* Don't modify share->base.born_transactional; now it is a value carved
in stone at creation time. share->now_transactional is what can be
modified: it starts at born_transactional, can become false during
ALTER TABLE (when we want no logging), and restored later.
* Not resetting create_rename_lsn to 0 during delete_all or repair.
* when we temporarily disable transactionality, we also change
the page type to PAGECACHE_PLAIN_PAGE: it bypasses some work in the
page cache (optimization), and avoids assertions related to LSNs.
* Disable INSERT DELAYED for transactional tables, because
durability could not be guaranteed (insertion may even not happen)
mysys/mf_keycache.c:
comment
storage/maria/ha_maria.cc:
* a transactional table cannot do INSERT DELAYED
* ha_maria::save_transactional not needed anymore, as now instead
we don't modify MARIA_SHARE::MARIA_BASE_INFO::born_transactional
(born_transactional plays the role of save_transactional), and modify
MARIA_SHARE::now_transactional.
* REPAIR_TABLE log record is now logged by maria_repair()
* comment why we rely on born_transactional to know if we should
skipping a transaction.
* putting together two if()s which test for F_UNLCK
storage/maria/ha_maria.h:
ha_maria::save_transactional not needed anymore (moved to the C layer)
storage/maria/ma_blockrec.c:
* For the block record's code (writing/updating/deleting records),
all that counts is now_transactional, not born_transactional.
* As we now set the page type to PAGECACHE_PLAIN_PAGE for tables
which have now_transactional==FALSE, pagecache will not expect
a meaningful LSN for them in pagecache_unlock_by_link(), so
we can pass it LSN_IMPOSSIBLE.
storage/maria/ma_check.c:
* writing LOGREC_REPAIR_TABLE moves from ha_maria::repair()
to maria_repair(), sounds cleaner (less functions to export).
* when opening a table during REPAIR, don't use the realpath-ed name,
as this may fail if the table has symlinked files (maria_open()
would try to find the data and index file in the directory
of unique_file_name, it would fail if data and index files are in
different dirs); use the unresolved name, open_file_name, which is
the argument which was passed to the maria_open() which created 'info'.
storage/maria/ma_close.c:
assert that when a statement is done with a table, it cleans up
storage/maria/ma_create.c:
new name
storage/maria/ma_delete_all.c:
* using now_transactional
* no reason to reset create_rename_lsn during delete_all (a bug);
also no reason to do it during repair: it was put there because
a positive create_rename_lsn caused a call to check_and_set_lsn()
which asserted in DBUG_ASSERT(block->type == PAGECACHE_LSN_PAGE);
first solution was to use LSN_IMPOSSIBLE in _ma_unpin_all_pages() if
not transactional; but then in the case of ALTER TABLE, with
transactionality temporarily disabled, it asserted in
DBUG_ASSERT(LSN_VALID(lsn)) in pagecache_fwrite() (PAGECACHE_LSN_PAGE
page with zero LSN - bad). The additional solution is to use
PAGECACHE_PLAIN_PAGE when we disable transactionality temporarily: this
avoids checks on the LSN, and also bypasses (optimization) the "flush
log up to LSN" call when the pagecache flushes our page (in other
words, no WAL needed).
storage/maria/ma_delete_table.c:
use now_transactional
storage/maria/ma_locking.c:
assert that when a statement is done with a table, it cleans up.
storage/maria/ma_loghandler.c:
* now_transactional should be used to test if we want a log record.
* Assertions to make sure dummy_transaction_object is not spoilt
by its many users.
storage/maria/ma_open.c:
base.transactional -> base.born_transactional
storage/maria/ma_pagecache.c:
missing name for page's type. Comment for future.
storage/maria/ma_rename.c:
use now_transactional
storage/maria/maria_chk.c:
use born_transactional
storage/maria/maria_def.h:
MARIA_BASE_INFO::transactional renamed to born_transactional.
MARIA_SHARE::now_transactional introduced.
_ma_repair_write_log_record() is made local to ma_check.c.
Macros to temporarily disable, and re-enable, transactionality for a
table.
storage/maria/maria_read_log.c:
assertions and using the new macros. Adding a forgotten resetting
when we finally close all tables.
2007-07-03 15:20:41 +02:00
|
|
|
/*
|
|
|
|
the renaming of an internal table to the final table (like in ALTER TABLE)
|
|
|
|
is the moment when this table receives its correct create_rename_lsn and
|
|
|
|
this is important; make sure transactionality has been re-enabled.
|
|
|
|
*/
|
|
|
|
DBUG_ASSERT(share->now_transactional == share->base.born_transactional);
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
sync_dir= (share->now_transactional && !share->temporary &&
|
|
|
|
!maria_in_recovery) ? MY_SYNC_DIR : 0;
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
if (sync_dir)
|
2006-04-11 16:45:10 +03:00
|
|
|
{
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
LSN lsn;
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 2];
|
|
|
|
uint old_name_len= strlen(old_name)+1, new_name_len= strlen(new_name)+1;
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char *)old_name;
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= old_name_len;
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 1].str= (char *)new_name;
|
|
|
|
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= new_name_len;
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/*
|
|
|
|
For this record to be of any use for Recovery, we need the upper
|
|
|
|
MySQL layer to be crash-safe, which it is not now (that would require
|
|
|
|
work using the ddl_log of sql/sql_table.cc); when it is, we should
|
|
|
|
reconsider the moment of writing this log record (before or after op,
|
2007-06-28 14:01:57 +02:00
|
|
|
under THR_LOCK_maria or not...), how to use it in Recovery.
|
|
|
|
For now it can serve to apply logs to a backup so we sync it.
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
*/
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
if (unlikely(translog_write_record(&lsn, LOGREC_REDO_RENAME_TABLE,
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
&dummy_transaction_object, NULL,
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
old_name_len + new_name_len,
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
sizeof(log_array)/sizeof(log_array[0]),
|
2007-06-28 14:01:57 +02:00
|
|
|
log_array, NULL) ||
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
translog_flush(lsn)))
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
{
|
|
|
|
maria_close(info);
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
store LSN into file, needed for Recovery to not be confused if a
|
|
|
|
RENAME happened (applying REDOs to the wrong table).
|
|
|
|
*/
|
WL#3071 Maria checkpoint
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
2007-09-12 11:27:34 +02:00
|
|
|
if (_ma_update_create_rename_lsn(share, lsn, TRUE))
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
{
|
|
|
|
maria_close(info);
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
2006-04-11 16:45:10 +03:00
|
|
|
}
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
|
|
|
|
maria_close(info);
|
|
|
|
#ifdef USE_RAID
|
2006-04-11 16:45:10 +03:00
|
|
|
#ifdef EXTRA_DEBUG
|
|
|
|
_ma_check_table_is_closed(old_name,"rename raidcheck");
|
|
|
|
#endif
|
|
|
|
#endif /* USE_RAID */
|
|
|
|
|
|
|
|
fn_format(from,old_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
|
|
|
fn_format(to,new_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
if (my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir)))
|
2006-04-11 16:45:10 +03:00
|
|
|
DBUG_RETURN(my_errno);
|
|
|
|
fn_format(from,old_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
|
|
|
fn_format(to,new_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
|
|
|
#ifdef USE_RAID
|
|
|
|
if (raid_type)
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
data_file_rename_error= my_raid_rename(from, to, raid_chunks,
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
MYF(MY_WME | sync_dir));
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
else
|
2006-04-11 16:45:10 +03:00
|
|
|
#endif
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
data_file_rename_error=
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir));
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
if (data_file_rename_error)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
now we have a renamed index file and a non-renamed data file, try to
|
|
|
|
undo the rename of the index file.
|
|
|
|
*/
|
|
|
|
data_file_rename_error= my_errno;
|
|
|
|
fn_format(from, old_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT));
|
|
|
|
fn_format(to, new_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT));
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
my_rename_with_symlink(to, from, MYF(MY_WME | sync_dir));
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
}
|
|
|
|
DBUG_RETURN(data_file_rename_error);
|
|
|
|
|
2006-04-11 16:45:10 +03:00
|
|
|
}
|