This commit is based on the work of Michal Schorm, rebased on the
earliest MariaDB version.
Th command line used to generate this diff was:
find ./ -type f \
-exec sed -i -e 's/Foundation, Inc., 59 Temple Place, Suite 330, Boston, /Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, /g' {} \; \
-exec sed -i -e 's/Foundation, Inc. 59 Temple Place.* Suite 330, Boston, /Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, /g' {} \; \
-exec sed -i -e 's/MA.*.....-1307.*USA/MA 02110-1335 USA/g' {} \; \
-exec sed -i -e 's/Foundation, Inc., 59 Temple/Foundation, Inc., 51 Franklin/g' {} \; \
-exec sed -i -e 's/Place, Suite 330, Boston, MA.*02111-1307.*USA/Street, Fifth Floor, Boston, MA 02110-1335 USA/g' {} \; \
-exec sed -i -e 's/MA.*.....-1307/MA 02110-1335/g' {} \;
instead of fprintf(stderr) when a task (with no user connected) gets
an error, use my_printf_error(). Flags ME_JUST_WARNING and ME_JUST_INFO
added to my_error()/my_printf_error(), which pass it to
my_message_sql() which is modified to call the appropriate
sql_print_*(). This way recovery can signal its start and end with
[Note] and not [ERROR] (but failure with [ERROR]).
Recovery's detailed progress (percents etc) still uses stderr as they
have to stay on one single line.
sql_print_error() changed to use my_progname_short (nicer display).
mysql-test-run.pl --gdb/--ddd does not run mysqld, because
a breakpoint in mysql_parse is too late to debug startup problems;
instead, dev should set the breakpoints it wants and then "run" ("r").
include/my_sys.h:
new flags to tell error_handler_hook that this is not an error
but an information or warning
mysql-test/mysql-test-run.pl:
when running with --gdb/--ddd to debug mysqld, breaking at mysql_parse
is too late to debug startup problems; now, it does not run mysqld,
does not set breakpoints, developer can set as early breakpoints
as it wants and is responsible for typing "run" (or "r")
mysys/my_init.c:
set my_progname_short
mysys/my_static.c:
my_progname_short added
sql/mysqld.cc:
* my_message_sql() can now receive info or warning, not only error;
this allows mysys to tell the user (or the error log if no user)
about an info or warning. Used from Maria.
* plugins (or engines like Maria) may want to call my_error(), so
set up the error handler hook (my_message_sql) before initializing
plugins; otherwise they get my_message_no_curses which is less
integrated into mysqld (is just fputs())
* using my_progname_short instead of my_progname, in my_message_sql()
(less space on screen)
storage/maria/ma_checkpoint.c:
fprintf(stderr) -> ma_message_no_user()
storage/maria/ma_checkpoint.h:
function for any Maria task, not connected to a user (example:
checkpoint, recovery; soon could be deleted records purger)
to report a message (calls my_printf_error() which, when inside ha_maria,
leads to sql_print_*(), and when outside, leads to
my_message_no_curses i.e. stderr).
storage/maria/ma_recovery.c:
To tell that recovery starts and ends we use ma_message_no_user()
(sql_print_*() in practice). Detailed progress info still uses
stderr as sql_print() cannot put several messages on one line.
071116 18:42:16 [Note] mysqld: Maria engine: starting recovery
recovered pages: 0% 67% 100% (0.0 seconds); transactions to roll back: 1 0 (0.0
seconds); tables to flush: 1 0 (0.0 seconds);
071116 18:42:16 [Note] mysqld: Maria engine: recovery done
storage/maria/maria_chk.c:
my_progname_short moved to mysys
storage/maria/maria_read_log.c:
my_progname_short moved to mysys
storage/myisam/myisamchk.c:
my_progname_short moved to mysys
- serializing calls to flush_pagecache_blocks_int() on the same file
to avoid known concurrency bugs
- having that, we can now enable the background thread, as the
flushes it does are now supposedly safe in concurrent situations.
- new type of flush FLUSH_KEEP_LAZY: when the background checkpoint
thread is flushing a packet of dirty pages between two checkpoints,
it uses this flush type, indeed if a file is already being flushed
by another thread it's smarter to move on to the next file than wait.
- maria_checkpoint_frequency renamed to maria_checkpoint_interval.
include/my_sys.h:
new type of flushing for the page cache: FLUSH_KEEP_LAZY
mysql-test/r/maria.result:
result update
mysys/mf_keycache.c:
indentation. No FLUSH_KEEP_LAZY support in key cache.
storage/maria/ha_maria.cc:
maria_checkpoint_frequency was somehow a hidden part of the
Checkpoint API and that was not good. Now we have checkpoint_interval,
local to ha_maria.cc, which serves as container for the user-visible
maria_checkpoint_interval global variable; setting it calls
update_checkpoint_interval which passes the new value to
ma_checkpoint_init(). There is no hiding anymore.
By default, enable background thread which does checkpoints
every 30 seconds, and dirty page flush in between. That thread takes
a checkpoint when it ends, so no need for maria_hton_panic to take one.
The | is | and not ||, because maria_panic() must always be called.
frequency->interval.
storage/maria/ma_checkpoint.c:
Use FLUSH_KEEP_LAZY for background thread when it flushes packets of
dirty pages between two checkpoints: it is smarter to move on to
the next file than wait for it to have been completely flushed, which
may take long.
Comments about flush concurrency bugs moved from ma_pagecache.c.
Removing out-of-date comment.
frequency->interval.
create_background_thread -> (interval>0).
In ma_checkpoint_background(), some variables need to be preserved
between iterations.
storage/maria/ma_checkpoint.h:
new prototype
storage/maria/ma_pagecache.c:
- concurrent calls of flush_pagecache_blocks_int() on the same file
cause bugs (see @note in that function); we fix them by serializing
in this situation. For that we use a global hash of (file, wqueue).
When flush_pagecache_blocks_int() starts it looks into the hash,
using the file as key. If not found, it inserts (file,wqueue) into the
hash, flushes the file, and finally removes itself from the hash and
wakes up any waiter in the queue. If found, it adds itself to the
wqueue and waits.
- As a by-product, we can remove changed_blocks_is_incomplete
and replace it by scanning the hash, replace the sleep() by a queue wait.
- new type of flush FLUSH_KEEP_LAZY: when flushing a file, if it's
already being flushed by another thread (even partially), return
immediately.
storage/maria/ma_pagecache.h:
In pagecache, a hash of files currently being flushed (i.e. there
is a call to flush_pagecache_blocks_int() for them).
storage/maria/ma_recovery.c:
new prototype
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
Note that non-debug build fails in log handler functions, mail sent.
storage/maria/ma_blockrec.c:
fix for compiler warning
storage/maria/ma_checkpoint.c:
Debug build does not catch this situation
static int f();
...
f(2);
...
static int f(int a, int b);
Maybe this is because it believes the declaration is K&R. Non-debug
build catches it. Adding (void) as an habit to avoid such errors.
storage/maria/ma_checkpoint.h:
adding (void)
storage/maria/ma_recovery.c:
adding (void)
storage/maria/ma_recovery.h:
adding (void)
Finally this is the real checkpoint code.
It however exhibits unstabilities when a checkpoint runs concurrently
with data-modifying clients (table corruption, transaction log's
assertions) so for now a checkpoint is taken only at startup after
recovery and at shutdown, i.e. not in concurrent situations. Later
we will let it run periodically, as well as flush dirty pages
periodically (almost all needed code is there already, only pagecache
code is written but not committed).
WL#3072 Maria recovery
* replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via
ma_test2 which has INSERTs failing with duplicate keys.
* replaying of REDO_RENAME_TABLE
Now, off to test Recovery in ha_maria :)
BitKeeper/deleted/.del-ma_least_recently_dirtied.c:
Delete: storage/maria/ma_least_recently_dirtied.c
BitKeeper/deleted/.del-ma_least_recently_dirtied.h:
Delete: storage/maria/ma_least_recently_dirtied.h
storage/maria/Makefile.am:
compile Checkpoint module
storage/maria/ha_maria.cc:
When ha_maria starts, do a recovery from last checkpoint.
Take a checkpoint when that recovery has ended and when ha_maria
shuts down cleanly.
storage/maria/ma_blockrec.c:
* even if my_sync() fails we have to my_close() (otherwise we leak
a descriptor)
* UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT,
as promised in the old comment; it gives us skipping during the
UNDO phase.
storage/maria/ma_check.c:
All REDOs before create_rename_lsn are ignored by Recovery. So
create_rename_lsn must be set only after all data/index has been
flushed and forced to disk. We thus move write_log_record_for_repair()
to after _ma_flush_tables_files_after_repair().
storage/maria/ma_checkpoint.c:
Checkpoint module.
storage/maria/ma_checkpoint.h:
optional argument if caller wants a thread to periodically take
checkpoints and flush dirty pages.
storage/maria/ma_create.c:
* no need to init some vars as the initial bzero(share) takes care of this.
* update to new function's name
* even if we fail in my_sync() we have to my_close()
storage/maria/ma_extra.c:
Checkpoint reads share->last_version under intern_lock, so we make
maria_extra() update it under intern_lock. THR_LOCK_maria still needed
because of _ma_test_if_reopen().
storage/maria/ma_init.c:
destroy checkpoint module when Maria shuts down.
storage/maria/ma_loghandler.c:
* UNDO_ROW_PURGE gone (see ma_blockrec.c)
* we need to remember the LSN of the LOGREC_FILE_ID for a share,
because this LSN is needed into the checkpoint record (Recovery wants
to know the validity domain of an id->name mapping)
* translog_get_horizon_no_lock() needed for Checkpoint
* comment about failing assertion (Sanja knows)
* translog_init_reader_data() thought that translog_read_record_header_scan()
returns 0 in case of error, but 0 just means "0-length header".
* translog_assign_id_to_share() now needs the MARIA_HA because
LOGREC_FILE_ID uses a log-write hook.
* Verify that (de)assignment of share->id happens only under intern_lock,
as Checkpoint reads this id with intern_lock.
* translog_purge() can accept TRANSLOG_ADDRESS, not necessarily
a real LSN.
storage/maria/ma_loghandler.h:
prototype updates
storage/maria/ma_open.c:
no need to initialize "res"
storage/maria/ma_pagecache.c:
When taking a checkpoint, we don't need to know the maximum rec_lsn
of dirty pages; this LSN was intended to be used in the two-checkpoint
rule, but last_checkpoint_lsn is as good.
4 bytes for stored_list_size is enough as PAGECACHE::blocks (number
of blocks which the pagecache can contain) is int.
storage/maria/ma_pagecache.h:
new prototype
storage/maria/ma_recovery.c:
* added replaying of REDO_RENAME_TABLE
* UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END
* Recovery from the last checkpoint record now possible
* In new_table() we skip the table if the id->name mapping is older than
create_rename_lsn (mapping dates from lsn_of_file_id).
* in get_MARIA_HA_from_REDO_record() we skip the record
if the id->name mapping is newer than the record (can happen if processing
a record which is before the checkpoint record).
* parse_checkpoint_record() has to return a LSN, that's what caller expects
storage/maria/ma_rename.c:
new function's name; log end zeroes of tables' names (ease recovery)
storage/maria/ma_test2.c:
* equivalent of ma_test1's --test-undo added (named -u here).
* -t=1 now stops right after creating the table, so that
we can test undoing of INSERTs with duplicate keys (which tests the
CLR_END logged by _ma_write_abort_block_record()).
storage/maria/ma_test_recovery.expected:
Result of testing undoing of INSERTs with duplicate keys; there are
some differences in maria_chk -dvv but they are normal (removing
records does not shrink data/index file, does not put back the
"analyzed, optimized keys"(etc) index state.
storage/maria/ma_test_recovery:
Test undoing of INSERTs with duplicate keys, using ma_test2;
when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE,
CLR_END; we abort after that, and test that CLR_END causes recovery
to jump over UNDO_INSERT.
storage/maria/ma_write.c:
comment
storage/maria/maria_chk.c:
comment
storage/maria/maria_def.h:
* a new bit in MARIA_SHARE::in_checkpoint, used to build a list
of unique shares during Checkpoint.
* MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID
for this share; needed to know to which LSN domain the mappings
found in the Checkpoint record apply (new mappings should not apply
to old REDOs).
storage/maria/trnman.c:
* small changes to how trnman_collect_transactions() fills its buffer;
it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h
- new program maria_read_log to display and apply log records
found in a Maria log (see file's revision comment)
- minor, misc fixes
storage/maria/Makefile.am:
new program maria_read_log
storage/maria/ha_maria.cc:
create control file if missing
storage/maria/ma_blockrec.c:
0 -> LSN_IMPOSSIBLE; comments
storage/maria/ma_checkpoint.h:
preparations for Checkpoint module
storage/maria/ma_close.c:
comment
storage/maria/ma_control_file.c:
renaming constants.
Possibility to say "open control file but don't create it if it's
missing" (used by maria_read_log which does not want to create
anything)
storage/maria/ma_control_file.h:
renaming constants
storage/maria/ma_create.c:
I had duplicated "linkname" and "linkname_ptr", now I see it's not
needed, reverting. Indeed those variables don't contain interesting
information; fixing log record accordingly (the links are in
ci->data/index_file_name). Storing keystart in log record is needed,
to know at which size we must extend the file if we replay
LOGREC_CREATE_TABLE.
storage/maria/ma_loghandler.c:
some structures need to be known to maria_read_log.c, taking
them to ma_loghandler.h
storage/maria/ma_loghandler.h:
we have page_store, adding page_korr.
translog_lock() made public, because Checkpoint will need it (to
write to control file).
Some structures moved from ma_loghandler.c because maria_read_log.c
needs them (needs to know the execute-in-REDO-phase hooks of each
record).
storage/maria/ma_loghandler_lsn.h:
constants defined in ma_control_file.h serve everywhere,
and they relate to LSNs, so putting them in ma_loghandler_lsn.h.
Stronger constraints in LSN_VALID().
storage/maria/ma_pagecache.c:
renaming constants
storage/maria/ma_recovery.h:
copyright
storage/maria/ma_test1.c:
new prototype
storage/maria/ma_test2.c:
new prototype
storage/maria/trnman_public.h:
double-inclusion safe
storage/maria/unittest/ma_control_file-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
constants renamed, new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
constants renamed, new prototype
storage/myisam/mi_close.c:
comment
storage/maria/maria_read_log.c:
program to read and print log records from a Maria transaction log,
and optionally apply them to tables. Very basic, early version.
Should serve as a base for Recovery's code. Designed to be idempotent.
Create a log by running maria.test, then cd to var/master-data
and run "maria_read_log --only-display" to see info about records;
run "maria_read_log --display-and-apply" to also apply the records
to tables (it's more interesting if you first wipe out the
tables in var/master-data/test, to see how they get re-created).
Only a few records are handled by now: LONG_TRANSACTION_ID,
COMMIT, FILE_ID, REDO_CREATE_TABLE; place is ready for
REDO_INSERT_ROW_HEAD where I could use Monty's help (search for
"Monty" in the file). Note: changes to the index pages, index's header
and bitmap pages are not properly logged yet, so don't expect
the program to work with that.
changing pseudocode to use the structures of the Maria pagecache
("pagecache->changed_blocks" etc) and other Maria structures
inherited from MyISAM (THR_LOCK_maria etc).
mysys/mf_pagecache.c:
comment
storage/maria/ma_checkpoint.c:
changing pseudocode to use the structures of the Maria pagecache
("pagecache->changed_blocks" etc) and other Maria structures
inherited from MyISAM (THR_LOCK_maria etc).
storage/maria/ma_checkpoint.h:
copyright
storage/maria/ma_control_file.c:
copyright
storage/maria/ma_control_file.h:
copyright
storage/maria/ma_least_recently_dirtied.c:
copyright
storage/maria/ma_least_recently_dirtied.h:
copyright
storage/maria/ma_recovery.c:
copyright
storage/maria/ma_recovery.h:
copyright
storage/maria/unittest/Makefile.am:
copyright
- fixes to the control file module
- unit test for it
- renames of all Maria files I created to start with ma_
storage/maria/ma_checkpoint.c:
Rename: storage/maria/checkpoint.c -> storage/maria/ma_checkpoint.c
storage/maria/ma_checkpoint.h:
Rename: storage/maria/checkpoint.h -> storage/maria/ma_checkpoint.h
storage/maria/ma_least_recently_dirtied.c:
Rename: storage/maria/least_recently_dirtied.c -> storage/maria/ma_least_recently_dirtied.c
storage/maria/ma_least_recently_dirtied.h:
Rename: storage/maria/least_recently_dirtied.h -> storage/maria/ma_least_recently_dirtied.h
storage/maria/ma_recovery.c:
Rename: storage/maria/recovery.c -> storage/maria/ma_recovery.c
storage/maria/ma_recovery.h:
Rename: storage/maria/recovery.h -> storage/maria/ma_recovery.h
storage/maria/Makefile.am:
control file module and its unit test program
storage/maria/ma_control_file.c:
DBUG_ tags. Fix for gcc warnings.
log_no -> logno (I felt "_no" sounded like a standalone "No" word).
ma_ prefix for some functions.
last_checkpoint_lsn_at_startup -> last_checkpoint_lsn (no need
to make special vars for the values at startup). Same for last_logno.
ma_control_file_write_and_force() now updates last_checkpoint_lsn
and last_logno, the idea being that they belong to the module,
others should not update them.
And thus when the module shuts down, it zeroes those vars.
storage/maria/ma_control_file.h:
importing structs from Sanja to get the control file module to compile;
we'll remove that when Sanja pushes the log handler.
CONTROL_FILE_IMPOSSIBLE_LOGNO is 0, not FFFFFFFF.
storage/maria/ma_control_file_test.c:
Unit test program for the Maria control file module.
Modelled after other ma_test* files in this directory (so, does
not follow the unit test framework recently introduced with libtap;
TODO as a task on all ma_test* programs).
We test that writing to the control file works, and re-reading from it
too, we check (by reading the file by ourselves) that its content
on disk is correct, and check that a corrupted control file is detected.
2006-09-01 17:53:10 +02:00
Renamed from storage/maria/checkpoint.h (Browse further)