mirror of
https://github.com/MariaDB/server.git
synced 2025-01-18 04:53:01 +01:00
38 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
unknown
|
914f219c83 |
fix for some gcc -ansi warnings.
storage/maria/ma_checkpoint.c: gcc -ansi warnings storage/maria/ma_pagecache.c: comment storage/maria/ma_recovery.c: gcc -ansi warnings |
||
unknown
|
d72c22dee4 |
WL#3072 - Maria recovery.
* fix for bitmap vs checkpoint bug which could lead to corrupted tables in case of crashes at certain moments: a bitmap could be flushed to disk even though it was inconsistent with the log (it could be flushed before REDO-UNDO are written to the log). One bug remains, need code from others. Tests added. Fix is to pin unflushable bitmap pages, and let checkpoint wait for them to be flushable. * fix for long_trid!=0 assertion failure at Recovery. * less useless wakeups in the background flush|checkpoint thread. * store global_trid_generator in checkpoint record. mysql-test/r/maria-recovery.result: result update mysql-test/t/maria-recovery.test: make it easier to locate subtests storage/maria/ma_bitmap.c: When we send a bitmap to the pagecache, if this bitmap is not in a flushable state we keep it pinned and add it to a list, it will be unpinned when the bitmap is flushable again. A new function _ma_bitmap_flush_all() used by checkpoint. A new function _ma_bitmap_flushable() used by block format to signal when it starts modifying a bitmap and when it is done with it. storage/maria/ma_blockrec.c: When starting a row operation (insert/update/delete), mark that the bitmap is not flushable (because for example INSERT is going to over-allocate in the bitmap to prevent other threads from using our data pages). If a checkpoint comes at this moment it will wait for the bitmap to be flushable before flushing it. When the operation ends, bitmap becomes flushable again; that transition is done under the bitmap's mutex (needed for correct synchro with a concurrent checkpoint); but for INSERT/UPDATE this happens inside _ma_bitmap_release_unused() at a place where it already has the mutex, so the only penalty (mutex adding) is in DELETE and UNDO of INSERT. In case of errors after setting the bitmap unflushable, we must always set it back to flushable or checkpoint would block. Debug possibilities to force a sleep while the bitmap is over-allocated. In case of error in get_head_or_tail() in allocate_and_write_block_record(), we still need to unpin all pages. Bugfix: _ma_apply_redo_insert_row_blobs() produced wrong data_file_length. storage/maria/ma_blockrec.h: new bitmap calls. storage/maria/ma_checkpoint.c: filter_flush_indirect not needed anymore (flushing bitmap pages happens in _ma_bitmap_flush_all() now). So st_filter_param::is_data_file|pages_covered_by_bitmap not needed. Other filter_flush* don't need to flush bitmap anymore. Add debug possibility to flush all bitmap pages outside of a checkpoint, to simulate pagecache LRU eviction. When the background flush/checkpoint thread notices it has nothing to flush, it now sleeps directly until the next potential checkpoint moment instead of waking up every second. When in checkpoint we decide to not store a table in the checkpoint record (because it has logged no writes for example), we can also skip flushing this table. storage/maria/ma_commit.c: comment is out-of-date storage/maria/ma_key_recover.c: comment fix storage/maria/ma_loghandler.c: comment is out-of-date storage/maria/ma_open.c: comment is out-of-date storage/maria/ma_pagecache.c: comment for bug to fix. And we don't take checkpoints at end of REDO phase yet so can trust block->type. storage/maria/ma_recovery.c: Comments. Now-unneeded code for incomplete REDO-UNDO groups removed. When we forget about an old transaction we must really forget about it with bzero() (fixes the "long_trid!=0 assertion" recovery bug). When we delete a row with maria_delete() we turn on STATE_NOT_OPTIMIZED_ROWS so we do the same when we see a CLR_END for an UNDO_ROW_INSERT or when we execute an UNDO_ROW_INSERT (in both cases a row was deleted). Pick up max_long_trid from the checkpoint record. storage/maria/maria_chk.c: comment storage/maria/maria_def.h: MARIA_FILE_BITMAP gets new members: 'flushable', 'bitmap_cond' and 'pinned_pages'. storage/maria/trnman.c: I used to think that recovery only needs to know the maximum TrID of the lists of active and committed transactions. But no, sometimes both lists can even be empty and their TrID should not be reused. So Checkpoint now saves global_trid_generator in the checkpoint record. storage/maria/trnman_public.h: macros to read/store a TrID mysql-test/r/maria-recovery-bitmap.result: result is ok. Without the code fix, we would get a corruption message about the bitmap page in CHECK TABLE EXTENDED. mysql-test/t/maria-recovery-bitmap-master.opt: usual when we crash mysqld in tests mysql-test/t/maria-recovery-bitmap.test: test of recovery problems specific of the bitmap pages. |
||
unknown
|
b5b0d94dc0 |
Merge desktop.sanja.is.com.ua:/home/bell/mysql/bk/mysql-maria
into desktop.sanja.is.com.ua:/home/bell/mysql/bk/work-maria-logpurge storage/maria/ma_checkpoint.c: Auto merged storage/maria/ma_loghandler.c: Auto merged storage/maria/ma_loghandler.h: Auto merged storage/maria/ma_recovery.c: Auto merged |
||
unknown
|
13f45b160b |
WL#3072 Maria recovery:
fix for bug: if a crash happened right after writing a REDO like this: REDO - UNDO - REDO*, then recovery would ignore the last REDO* (ok), rollback: REDO - UNDO - REDO* - REDO - CLR, and a next recovery would thus execute REDO* instead of skipping it again. Recovery now logs LOGREC_INCOMPLETE_GROUP when it meets REDO* for the first time, to draw a boundary and ensure it is always skipped. Tested by hand. Note: ma_test_all fails "maria_chk: error: Key 1 - Found too many records" not due to this patch (failed before). BitKeeper/triggers/post-commit: no truncation of the commit mail, or how to review patches? mysql-test/include/maria_verify_recovery.inc: let caller choose the statement used to crash (sometimes we want the crash to happen at special places) mysql-test/t/maria-recovery.test: user of maria_verify_recovery.inc now specifies statement which the script should use for crashing. storage/maria/ma_bitmap.c: it's easier to search for all places using functions from the bitmap module (like in ma_blockrec.c) if those exported functions all start with "_ma_bitmap": renaming some of them. Assertion that when we read a bitmap page, overwriting bitmap->map, we are not losing information (i.e. bitmap->changed is false). storage/maria/ma_blockrec.c: update to new names. Adding code (disabled, protected by a #ifdef) that I use to test certain crash scenarios (more to come). storage/maria/ma_blockrec.h: update to new names storage/maria/ma_checkpoint.c: update to new names storage/maria/ma_extra.c: update to new names storage/maria/ma_loghandler.c: new LOGREC_INCOMPLETE_GROUP storage/maria/ma_loghandler.h: new LOGREC_INCOMPLETE_GROUP storage/maria/ma_recovery.c: When at the end of the REDO phase we have identified some transactions with incomplete REDO groups (REDOs without an UNDO or CLR_END), for each of them we log LOGREC_INCOMPLETE_GROUP. This way, the upcoming UNDO phase can write more records for such transaction, a future recovery won't pair the incomplete group with the CLR_END (as there is LOGREC_INCOMPLETE_GROUP to draw a boundary). |
||
unknown
|
1ec645bd40 |
Merge desktop.sanja.is.com.ua:/home/bell/mysql/bk/mysql-maria
into desktop.sanja.is.com.ua:/home/bell/mysql/bk/work-maria-logpurge mysql-test/r/maria.result: Auto merged storage/maria/ma_checkpoint.c: Auto merged storage/maria/ma_recovery.c: Auto merged storage/maria/ha_maria.cc: Merge storage/maria/ma_loghandler.c: Merge storage/maria/ma_loghandler.h: Merge |
||
unknown
|
771296eb06 |
Manageable transactional log purge and file size
support added to maria. mysql-test/r/maria.result: New variables added. storage/maria/ha_maria.cc: Variable for transactional log purge method added. Variable for transactional log size added. SHOW for engine logs added. Log flush purge logs in case of "ondemand" type of log processing. storage/maria/ma_checkpoint.c: log purge call enabled. storage/maria/ma_loghandler.c: Support for different methods of log purge added. Functions for getting information about logs state added. Functions for getting/setting log size. storage/maria/ma_loghandler.h: Fixed defines. Functions for for transactional log mannegment added. storage/maria/ma_recovery.c: Dependence on TRANSLOG_FILE_SIZE removed. mysql-test/r/maria-purge.result: New BitKeeper file ``mysql-test/r/maria-purge.result'' mysql-test/t/maria-purge.test: New BitKeeper file ``mysql-test/t/maria-purge.test'' |
||
unknown
|
9f1aaeffbb |
Merge bk-internal.mysql.com:/home/bk/mysql-maria
into mysql.com:/home/my/mysql-maria include/my_sys.h: Auto merged sql/mysqld.cc: Auto merged storage/maria/ma_checkpoint.c: Auto merged storage/maria/ma_pagecache.c: Auto merged storage/maria/ma_pagecache.h: Auto merged storage/maria/maria_chk.c: Auto merged storage/maria/ma_recovery.c: SCCS merged |
||
unknown
|
fc0a25ec49 |
WL#3071 Maria checkpoint, WL#3072 Maria recovery
instead of fprintf(stderr) when a task (with no user connected) gets an error, use my_printf_error(). Flags ME_JUST_WARNING and ME_JUST_INFO added to my_error()/my_printf_error(), which pass it to my_message_sql() which is modified to call the appropriate sql_print_*(). This way recovery can signal its start and end with [Note] and not [ERROR] (but failure with [ERROR]). Recovery's detailed progress (percents etc) still uses stderr as they have to stay on one single line. sql_print_error() changed to use my_progname_short (nicer display). mysql-test-run.pl --gdb/--ddd does not run mysqld, because a breakpoint in mysql_parse is too late to debug startup problems; instead, dev should set the breakpoints it wants and then "run" ("r"). include/my_sys.h: new flags to tell error_handler_hook that this is not an error but an information or warning mysql-test/mysql-test-run.pl: when running with --gdb/--ddd to debug mysqld, breaking at mysql_parse is too late to debug startup problems; now, it does not run mysqld, does not set breakpoints, developer can set as early breakpoints as it wants and is responsible for typing "run" (or "r") mysys/my_init.c: set my_progname_short mysys/my_static.c: my_progname_short added sql/mysqld.cc: * my_message_sql() can now receive info or warning, not only error; this allows mysys to tell the user (or the error log if no user) about an info or warning. Used from Maria. * plugins (or engines like Maria) may want to call my_error(), so set up the error handler hook (my_message_sql) before initializing plugins; otherwise they get my_message_no_curses which is less integrated into mysqld (is just fputs()) * using my_progname_short instead of my_progname, in my_message_sql() (less space on screen) storage/maria/ma_checkpoint.c: fprintf(stderr) -> ma_message_no_user() storage/maria/ma_checkpoint.h: function for any Maria task, not connected to a user (example: checkpoint, recovery; soon could be deleted records purger) to report a message (calls my_printf_error() which, when inside ha_maria, leads to sql_print_*(), and when outside, leads to my_message_no_curses i.e. stderr). storage/maria/ma_recovery.c: To tell that recovery starts and ends we use ma_message_no_user() (sql_print_*() in practice). Detailed progress info still uses stderr as sql_print() cannot put several messages on one line. 071116 18:42:16 [Note] mysqld: Maria engine: starting recovery recovered pages: 0% 67% 100% (0.0 seconds); transactions to roll back: 1 0 (0.0 seconds); tables to flush: 1 0 (0.0 seconds); 071116 18:42:16 [Note] mysqld: Maria engine: recovery done storage/maria/maria_chk.c: my_progname_short moved to mysys storage/maria/maria_read_log.c: my_progname_short moved to mysys storage/myisam/myisamchk.c: my_progname_short moved to mysys |
||
unknown
|
ce2fbd9e9a |
WL#3071 Maria checkpoint
background page flushing was using dfile even when it wanted to flush the index file. storage/maria/ma_checkpoint.c: * filter_flush_data_file* functions are in fact for the index file too, renaming them. * flush of index file was using dfile (bad copy-paste) |
||
unknown
|
f134f91f90 |
Flush status differentiation between error and skipping pinned pages.
storage/maria/ma_checkpoint.c: React only on errors during the flush. |
||
unknown
|
422375fc1b |
Merge bk-internal.mysql.com:/home/bk/mysql-maria
into mysql.com:/home/my/mysql-maria storage/maria/ha_maria.cc: Auto merged storage/maria/ma_bitmap.c: Auto merged storage/maria/ma_checkpoint.c: Auto merged storage/maria/ma_close.c: Auto merged storage/maria/ma_loghandler.c: Auto merged storage/maria/ma_loghandler.h: Auto merged storage/maria/ma_open.c: Auto merged storage/maria/ma_pagecache.h: Auto merged storage/maria/ma_write.c: Auto merged storage/maria/maria_def.h: Auto merged storage/maria/unittest/ma_pagecache_single.c: Auto merged storage/maria/ma_blockrec.c: Manual merge storage/maria/ma_page.c: Manual merge storage/maria/ma_pagecache.c: Manual merge storage/maria/ma_preload.c: Manual merge storage/maria/ma_recovery.c: Manual merge Add _ma_unpin_all_pages() to all new UNDO redo_exec_hook's |
||
unknown
|
21fd2a5a36 |
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows Checksum for MyISAM now ignores NULL and not used part of VARCHAR Renamed some variables that caused shadow compiler warnings Moved extra() call when waiting for tables to not be used to after tables are removed from cache. Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug. pagecache_unlock_by_ulink() now has extra argument to say if page was changed. Give error message if we fail to open control file Mark page cache variables as not flushable include/maria.h: Made min page cache larger (needed for pinning key page) Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion Added write_comp_flag to move some runtime code to maria_open() include/my_base.h: Added new error message to be used when handler initialization failed include/my_global.h: Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables include/my_handler.h: Added const to some parameters mysys/array.c: More DBUG mysys/my_error.c: Fixed indentation mysys/my_handler.c: Added const to some parameters Added missing error messages sql/field.h: Renamed variables to avoid variable shadowing sql/handler.h: Renamed parameter to avoid variable name conflict sql/item.h: Renamed variables to avoid variable shadowing sql/log_event_old.h: Renamed variables to avoid variable shadowing sql/set_var.h: Renamed variables to avoid variable shadowing sql/sql_delete.cc: Removed maria hack for temporary tables Fixed indentation sql/sql_table.cc: Moved extra() call when waiting for tables to not be used to after tables are removed from cache. This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use. sql/table.cc: Copy page_checksum from share Removed Maria hack storage/maria/Makefile.am: Added new files storage/maria/ha_maria.cc: Renamed records -> record_count and info -> create_info to avoid variable name conflicts Mark page cache variables as not flushable storage/maria/ma_blockrec.c: Moved _ma_unpin_all_pages() to ma_key_recover.c Moved init of info->pinned_pages to ma_open.c Moved _ma_finalize_row() to maria_key_recover.h Renamed some variables to avoid variable name conflicts Mark page_link.changed for blocks we change directly Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index) storage/maria/ma_blockrec.h: Removed extra empty line storage/maria/ma_checkpoint.c: Remove not needed trnman.h storage/maria/ma_close.c: Free pinned pages (which are now always allocated) storage/maria/ma_control_file.c: Give error message if we fail to open control file storage/maria/ma_delete.c: Changes for redo logging (first part, logging of underflow not yet done) - Log undo-key-delete - Log delete of key - Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert() - Added new arguments to some functions to be able to write redo information - Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway Changed 2 bmove_upp() to bmove() as this made code easer to understand More function comments Indentation fixes storage/maria/ma_ft_update.c: New arguments to _ma_write_keypage() storage/maria/ma_loghandler.c: Fixed some DBUG_PRINT messages Simplify code Added new log entrys for key page redo Renamed some variables to avoid variable name shadowing storage/maria/ma_loghandler.h: Moved some defines here Added define for storing key number on key pages Added new translog record types Added enum for type of operations in LOGREC_REDO_INDEX storage/maria/ma_open.c: Always allocate info.pinned_pages (we need now also for normal key page usage) Update keyinfo->key_nr Added virtual functions to convert record position o number to be stored on key pages Update keyinfo->write_comp_flag to value of search flag to be used when writing key storage/maria/ma_page.c: Added redo for key pages - Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE - _ma_fetch_keypage() now pin's pages if needed - Extended _ma_write_keypage() with type of locks to be used - ma_dispose() now locks info->s->state.key_del from other threads - ma_dispose() writes redo log record - ma_new() locks info->s->state.key_del from other threads if it was used - ma_new() now pins read page Other things: - Removed some not needed arguments from _ma_new() and _ma_dispose) - Added some new variables to simplify code - If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes storage/maria/ma_pagecache.h: Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed Added some defines for pagecache priority levels that one can use storage/maria/ma_range.c: Added new arguments for call to _ma_fetch_keypage() storage/maria/ma_recovery.c: - Added hooks for new translog types: REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and UNDO_KEY_DELETE_WITH_ROOT. - Moved variable declarations to start of function (portability fixes) - Removed some not needed initializations - Set only relevant state changes for each redo/undo entry storage/maria/lockman.c: Removed end space storage/maria/ma_check.c: Removed end space storage/maria/ma_create.c: Removed end space storage/maria/ma_locking.c: Removed end space storage/maria/ma_packrec.c: Removed end space storage/maria/ma_pagecache.c: Removed end space storage/maria/ma_panic.c: Removed end space storage/maria/ma_rt_index.c: Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new() Fixed indentation storage/maria/ma_rt_key.c: Added new arguments for call to _ma_fetch_keypage() storage/maria/ma_rt_split.c: Added new arguments for call to _ma_new() Use new keypage header Added new arguments for call to _ma_write_keypage() storage/maria/ma_search.c: Updated comments & indentation Added new arguments for call to _ma_fetch_keypage() Made some variables and arguments const Added virtual functions for converting row position to number to be stored in key use MARIA_RECORD_POS of record position instead of my_off_t Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO) storage/maria/ma_sort.c: Removed end space storage/maria/ma_statrec.c: Updated arguments for call to _ma_rec_pos() storage/maria/ma_test1.c: Fixed too small buffer to init_pagecache() Fixed bug when using insert_count and test_flag storage/maria/ma_test2.c: Use more resonable pagecache size Remove not used code Reset blob_length to fix wrong output message storage/maria/ma_test_all.sh: Fixed wrong test storage/maria/ma_write.c: Lots of new code to handle REDO of key pages No logic changes because of REDO code, mostly adding new arguments and adding new code for logging Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open() Zerofill new used pages for: - To remove possible sensitive data left in buffer - To get idenitical data on pages after running redo - Better compression of pages if archived storage/maria/maria_chk.c: Added information if table is crash safe storage/maria/maria_def.h: New virtual function to convert between record position on key and normal record position Aded mutex and extra variables to handle locking of share->state.key_del Moved some structure variables to get things more aligned Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert Added argument to MARIA_PINNED_PAGE to indicate if page was changed Updated prototypes for functions Added some structures for signaling changes in REDO handling storage/maria/unittest/ma_pagecache_single.c: Updated arguments for changed function calls storage/myisam/mi_check.c: Made calc_check_checksum virtual storage/myisam/mi_checksum.c: Update checksums to ignore null columns storage/myisam/mi_create.c: Mark if table has null column (to know when we have to use mi_checksum()) storage/myisam/mi_open.c: Added virtual function for calculating checksum to be able to easily ignore NULL fields storage/myisam/mi_test2.c: Fixed bug storage/myisam/myisamdef.h: Added virtual function for calculating checksum during check table Removed ha_key_cmp() as this is in handler.h storage/maria/ma_key_recover.c: New BitKeeper file ``storage/maria/ma_key_recover.c'' storage/maria/ma_key_recover.h: New BitKeeper file ``storage/maria/ma_key_recover.h'' storage/maria/ma_key_redo.c: New BitKeeper file ``storage/maria/ma_key_redo.c'' |
||
unknown
|
5fbd5dafe7 |
* WL#4137 Maria- Framework for testing recovery in mysql-test-run
See test maria-recovery.test for a model; all include scripts have an "API" section at start if they do take parameters from outside. * Fixing bug reported by Jani and Monty (when two REDOs about the same page in one group, see ma_blockrec.c). * Fixing small bugs in recovery mysql-test/include/wait_until_connected_again.inc: be sure to enter the loop (the previous query by the caller may not have failed: it could be query; mysqladmin shutdown; call this script). mysql-test/lib/mtr_process.pl: * Through the "expect" file a test can tell mtr that a server crash is expected. What the file contains is irrelevant. Now if its last line starts with "wait", mtr will wait before restarting (it will wait for the last line to not start with "wait"). This is for tests which need to mangle files under the feet of a dead mysqld. * Remove "expect" file before restarting; otherwise there could be a race condition: tests sees server restarted, does something, writes an "expect" file, and then mtr removes that file, then test kills mysqld, and then mtr will never restart it. storage/maria/ma_blockrec.c: - when applying a REDO in recovery, we don't anymore put UNDO's LSN on the page at once; indeed if in this REDO's group there comes another REDO for the same page it would be wrongly skipped. Instead, we keep pages pinned, don't change their LSN. When done with all REDOs of the group we unpin them and stamp them with UNDO's LSN. - fixing bug in applying of REDO_PURGE_BLOCKS in recovery: page_range sometimes has TAIL_BIT set, need to turn it down to know the real page range. - Both bugs are covered in maria-recovery.test storage/maria/ma_checkpoint.c: Capability to, in debug builds only, do some special operations (flush all bitmap and data pages, flush state, flush log) and crash mysqld, to later test recovery. Driven by some --debug=d, symbols. storage/maria/ma_open.c: debugging info storage/maria/ma_pagecache.c: Now that we can _ma_unpin_all_pages() during the REDO phase to set page's LSN, the assertion needs to be relaxed. storage/maria/ma_recovery.c: - open trace file in append mode (useful when a test triggers several recoveries, we see them all). - fixing wrong error detection, it's possible that during recovery we want to open an already open table. - when applying a REDO in recovery, we don't anymore put UNDO's LSN on the page at once; indeed if in this REDO's group there comes another REDO for the same page it would be wrongly skipped. Instead, we keep pages pinned, don't change their LSN. When done with all REDOs of the group we unpin them and stamp them with UNDO's LSN. - we verify that all log records of a group are about the same table, for debugging. mysql-test/r/maria-recovery.result: result mysql-test/t/maria-recovery-master.opt: crash is expected, core file would take room, stack trace would wake pushbuild up. mysql-test/t/maria-recovery.test: Test of recovery from mysql-test (it is already tested as unit tests in ma_test_recovery) (WL#4137) - test that, if recovery is made to start on an empty table it can replay the effects of committed and uncommitted statements (having only the committed ones in the end result). This should be the first test for someone writing code of new REDOs. - test that, if mysqld is crashed and recovery runs we have only committed statements in the end result. Crashes are done in different ways: flush nothing (so, uncommitted statement is often missing from the log => no rollback to do); flush pagecache (implicitely flushes log (WAL)) and flush log, both causes rollbacks; flush log can also flush state (state.records etc) to test recovery of the state (not tested well now as we repair the index anyway). - test of bug found by Jani and Monty in recovery (two REDO about the same page in one group). mysql-test/include/maria_empty_logs.inc: removes logs, to have a clean sheet for testing recovery. mysql-test/include/maria_make_snapshot.inc: copies a table to another directory, or back, or compares both (comparison is not implemented as physical comparison is impossible if an UNDO phase happened). mysql-test/include/maria_make_snapshot_for_comparison.inc: copies tables to another directory so that they can later serve as a comparison reference (they are the good tables, recovery should produce similar ones). mysql-test/include/maria_make_snapshot_for_feeding_recovery.inc: When we want to force recovery to start on old tables, we prepare old tables with this script: we put them in a spare directory. They are later copied back over mysqltest tables while mysqld is dead. We also need to copy back the control file, otherwise mysqld, in recovery, would start from the latest checkpoint: latest checkpoint plus old tables is not a recovery-possible scenario of course. mysql-test/include/maria_verify_recovery.inc: causes mysqld to crash, restores old tables if requested, lets recovery run, compares resulting tables with reference tables by using CHECKSUM TABLE. We don't do any sanity checks on page's LSN in resulting tables, yet. |
||
unknown
|
5e2aaf69d8 | comments | ||
unknown
|
086b34c935 |
WL#3071 Maria checkpoint
Fixing bad comments (I remember my maths' teacher "one late night you'll obey to the simplifications made by your tired neurons"; exactly what happened here). In Checkpoint, when we flush a table's state we must flush all log records (WAL), not only those before checkpoint started. storage/maria/ma_bitmap.c: there was a flaw in reasoning, bug does exist. storage/maria/ma_blockrec.c: moving piece of comment to ma_checkpoint.c storage/maria/ma_checkpoint.c: Comments. When checkpoint flushes a state, WAL imposes that all records up to this state have been flushed, not only up to checkpoint_start_log_horizon. storage/maria/ma_recovery.c: finishing comment. |
||
unknown
|
c2084d2a13 |
WL#3071 - Maria checkpoint
Observe WAL for the table's state: all log records needed for undoing uncommitted state must be in the log before we flush state. storage/maria/ha_maria.cc: comments storage/maria/ma_bitmap.c: Comment for why there is no bug storage/maria/ma_blockrec.c: comment for why there is no bug storage/maria/ma_checkpoint.c: Observe WAL for the table's state: all log records needed for undoing uncommitted state must be in the log before we flush state. I tested by hand that the bug existed (create table, insert one row into it but let that insert pause after increasing data_file_length, let checkpoint start but kill it after it has flushed state). Log contains nothing, table is not recovered though it has a too big data_file_length. With this bugfix, the log contains REDO so table is opened so data_file_length is corrected. storage/maria/ma_close.c: If table is read-only we must never write to it. Should be a no-change in fact, as if read-only, share->changed is normally always false. storage/maria/ma_recovery.c: documenting bug found by Monty. Print when fixing data_file_length. |
||
unknown
|
77017191de |
WL#3071 - Maria checkpoint
- serializing calls to flush_pagecache_blocks_int() on the same file to avoid known concurrency bugs - having that, we can now enable the background thread, as the flushes it does are now supposedly safe in concurrent situations. - new type of flush FLUSH_KEEP_LAZY: when the background checkpoint thread is flushing a packet of dirty pages between two checkpoints, it uses this flush type, indeed if a file is already being flushed by another thread it's smarter to move on to the next file than wait. - maria_checkpoint_frequency renamed to maria_checkpoint_interval. include/my_sys.h: new type of flushing for the page cache: FLUSH_KEEP_LAZY mysql-test/r/maria.result: result update mysys/mf_keycache.c: indentation. No FLUSH_KEEP_LAZY support in key cache. storage/maria/ha_maria.cc: maria_checkpoint_frequency was somehow a hidden part of the Checkpoint API and that was not good. Now we have checkpoint_interval, local to ha_maria.cc, which serves as container for the user-visible maria_checkpoint_interval global variable; setting it calls update_checkpoint_interval which passes the new value to ma_checkpoint_init(). There is no hiding anymore. By default, enable background thread which does checkpoints every 30 seconds, and dirty page flush in between. That thread takes a checkpoint when it ends, so no need for maria_hton_panic to take one. The | is | and not ||, because maria_panic() must always be called. frequency->interval. storage/maria/ma_checkpoint.c: Use FLUSH_KEEP_LAZY for background thread when it flushes packets of dirty pages between two checkpoints: it is smarter to move on to the next file than wait for it to have been completely flushed, which may take long. Comments about flush concurrency bugs moved from ma_pagecache.c. Removing out-of-date comment. frequency->interval. create_background_thread -> (interval>0). In ma_checkpoint_background(), some variables need to be preserved between iterations. storage/maria/ma_checkpoint.h: new prototype storage/maria/ma_pagecache.c: - concurrent calls of flush_pagecache_blocks_int() on the same file cause bugs (see @note in that function); we fix them by serializing in this situation. For that we use a global hash of (file, wqueue). When flush_pagecache_blocks_int() starts it looks into the hash, using the file as key. If not found, it inserts (file,wqueue) into the hash, flushes the file, and finally removes itself from the hash and wakes up any waiter in the queue. If found, it adds itself to the wqueue and waits. - As a by-product, we can remove changed_blocks_is_incomplete and replace it by scanning the hash, replace the sleep() by a queue wait. - new type of flush FLUSH_KEEP_LAZY: when flushing a file, if it's already being flushed by another thread (even partially), return immediately. storage/maria/ma_pagecache.h: In pagecache, a hash of files currently being flushed (i.e. there is a call to flush_pagecache_blocks_int() for them). storage/maria/ma_recovery.c: new prototype storage/maria/ma_test1.c: new prototype storage/maria/ma_test2.c: new prototype |
||
unknown
|
0f1feefa03 |
WL#3071 Maria checkpoint
Ability for flush_pagecache_blocks() to flush only certain pages of a file, as instructed by an option "filter" pointer-to-function argument; Checkpoint and background dirty page flushing use that to flush only pages which have been dirty for long enough and bitmap pages. Fix for a bug in flush_cached_blocks() (no idea if it could produce a bug in real life, but theoretically it is). Testing checkpoint in ma_test_recovery via ma_test1 and ma_test2. Background checkpoint & dirty pages flush thread is still disabled by default in ha_maria. mysql-test/r/maria.result: result update storage/maria/ha_maria.cc: blank after function comment storage/maria/ma_checkpoint.c: Using an enum instead of 0/1/2 (applying Sanja's review comments). The comment about "this is an horizon" can be removed as Sanja created translog_next_LSN() which parse_checkpoint_record() uses. Variables in ma_checkpoint_background() cannot be declared in the for() as their value must not be reset at each iteration! storage/maria/ma_pagecache.c: adding to flush_pagecache_blocks() optional arguments 'filter' (pointer to function) and 'filter_arg'; if filter!=NULL this function will be called for each block of the file and will reply if this block and following ones should be flushed or not (3 possible replies). Fixing a bug when flush_cached_blocks() skips a pinned page: it has to unset PCBLOCK_IN_FLUSH set by flush_pagecache_blocks_int(). storage/maria/ma_pagecache.h: flush_pagecache_blocks() is changed to take "filter" and "filter_arg" arguments. "filter", if it is not NULL, may return one value among enum pagecache_flush_filter_result. storage/maria/ma_recovery.c: open_count=0 when closing tables at the end of recovery. storage/maria/ma_test1.c: Optional checkpoints (-H#) at various stages (stages similar to --testflag), for testing of checkpoints. storage/maria/ma_test2.c: Optional checkpoints (-H#) at various stages (stages similar to -t), for testing of checkpoints. storage/maria/ma_test_recovery.expected: Result update: the results of the additional test run with -H# (checkpoints) are added here. They are exactly identical to without checkpoints except that the index's Root (printed by maria_chk) is more correct when using checkpoints. This is because checkpoint flushed the state, so it happens to be correct, while no-checkpoint does not flush the state, and recovery does not recover indexes so Root is never fixed. When we recover indices, this will go away. storage/maria/ma_test_recovery: We duplicate the loop of tests to add an additional run with checkpoints at various stages, to see if maria_read_log uses them fine. |
||
unknown
|
791b0aa081 |
WL#3071 - Maria checkpoint
* Preparation for having a background checkpoint thread: frequency of checkpoint taken by that thread is now configurable by the user: global variable maria_checkpoint_frequency, in seconds, default 30 (checkpoint every 30th second); 0 means no checkpoints (and thus no background thread, thus no background flushing, that will probably only be used for testing). * Don't take checkpoints in Recovery if it didn't do anything significant; thus no checkpoint after a clean shutdown/restart. The only checkpoint which is never skipped is the one at shutdown. * fix for a test failure (after-merge fix) include/maria.h: new variable mysql-test/suite/rpl/r/rpl_row_flsh_tbls.result: result update mysql-test/suite/rpl/t/rpl_row_flsh_tbls.test: position update (=after merge fix, as this position was already changed into 5.1 and not merged here, causing test to fail) storage/maria/ha_maria.cc: Checkpoint's frequency is now configurable by the user: global variable maria_checkpoint_frequency. Changing it on the fly requires us to shutdown/restart the background checkpoint thread, as the loop done in that thread assumes a constant checkpoint interval. Default value is 30: a checkpoint every 30 seconds (yes, I know, physicists will remind that it should be named "period" then). ha_maria now asks for a background checkpoint thread when it starts, but this is still overruled (disabled) in ma_checkpoint_init(). storage/maria/ma_checkpoint.c: Checkpoint's frequency is now configurable by the user: background thread takes a checkpoint every maria_checkpoint_interval-th second. If that variable is 0, no checkpoints are taken. Note, I will enable the background thread only in a later changeset. storage/maria/ma_recovery.c: Don't take checkpoints at the end of the REDO phase and at the end of Recovery if Recovery didn't make anything significant (didn't open any tables, didn't rollback any transactions). With this, after a clean shutdown, Recovery shouldn't take any checkpoint, which makes starting faster (we save a few fsync()s of the log and control file). |
||
unknown
|
63ff9877a5 |
WL#3072 Maria recovery
Misc changes: - fix for benign Valgrind error, compiler warnings - fix for a segfault in execution of maria_delete_all_rows() and one when taking multiple checkpoints - fix for too paranoid assertion - adding ability to take checkpoints at the end of the REDO phase and at the end of recovery. - other minor changes storage/maria/ha_maria.cc: The checkpoint done after Recovery is finished, is moved to maria_recover(). storage/maria/ma_bitmap.c: fix for Valgrind error: the "shadow debug copy" of the bitmap page started unitialized and so ma_print_bitmap() would use it uninitialized storage/maria/ma_checkpoint.c: * reset pointers to NULL after freeing them, or we segfault at next checkpoint in my_realloc(). * fix for compiler warnings. storage/maria/ma_delete_all.c: info->trn is NULL for non-transactional tables storage/maria/ma_locking.c: correct assertion (it fired wrongly in execution of REDO_DROP_TABLE due to the maria_extra(HA_PREPARE_FOR_DROP)->_ma_decrement_open_count() ->maria_lock_database(F_UNLCK); another solution would have been to not call _ma_decrement_open_count() (it's ok to have a wrong open count in a table which we are dropping), but the same problem would still exist for REDO_RENAME_TABLE. storage/maria/ma_loghandler.c: fail early if UNRECOVERABLE_ERROR storage/maria/ma_recovery.c: * new argument to maria_apply_log(): should it take checkpoints (at end of REDO phase and at the very end) or no. * moving the call to translog_next_LSN() into parse_checkpoint_record() ("hide the details"). * Refining an error detection for something which could happen if there is a checkpoint record in the log. * Using close_one_table() instead of maria_extra(HA_EXTRA_PREPARE_FOR_DROP|RENAME), as it looks safer, and also changing how close_one_table() works: it now limits itself to scanning all_tables[], thus having one loopp instead of two, which should be faster (as a result, it does not close tables not registered in this array, which is ok as there should not be any). storage/maria/ma_recovery.h: new parameter storage/maria/maria_read_log.c: update to new prototype |
||
unknown
|
d0b9387b88 |
WL#3072 - Maria recovery.
* Recovery of the table's live checksum (CREATE TABLE ... CHECKSUM=1) is achieved in this patch. The table's live checksum (info->s->state.state.checksum) is updated in inwrite_rec_hook's under the log mutex when writing UNDO_ROW_INSERT|UPDATE|DELETE and REDO_DELETE_ALL. The checksum variation caused by the operation is stored in these UNDOs, so that the REDO phase, when it sees such UNDOs, can update the live checksum if it is older (state.is_of_lsn is lower) than the record. It is also used, as a nice add-on with no cost, to do less row checksum computation during the UNDO phase (as we have it in the record already). Doing this work, it became pressing to move in-write hooks (write_hook_for_redo() et al) to ma_blockrec.c. The 'parts' argument of inwrite_rec_hook is unpredictable (it comes mangled at this stage, for example by LSN compression) so it is replaced by a 'void* hook_arg', which is used to pass down information, currently only to write_hook_for_clr_end() (previous undo_lsn and type of undone record). * If from ha_maria, we print to stderr how many seconds (with one fractional digit) the REDO phase took, same for UNDO phase and for final table close. Just to give an indication for debugging and maybe also for Support. storage/maria/ha_maria.cc: question for Monty storage/maria/ma_blockrec.c: * log in-write hooks (write_hook_for_redo() etc) move from ma_loghandler.c to here; this is natural: the hooks are coupled to their callers (functions in ma_blockrec.c). * translog_write_record() now has a new argument "hook_arg"; using it to pass down to write_hook_for_clr_end() the transaction's previous_undo_lsn and the type of the being undone record, and also to pass down to all UNDOs the live checksum variation caused by the operation. * If table has live checksum, store in UNDO_ROW_INSERT|UPDATE|DELETE and in CLR_END the checksum variation ("delta") caused by the operation. For example if a DELETE caused the table's live checksum to change from 123 to 456, we store in the UNDO_ROW_DELETE, in 4 bytes, the value 333 (456-123). * Instead of hard-coded "1" as length of the place where we store the undone record's type in CLR_END, use a symbol CLR_TYPE_STORE_SIZE; use macros clr_type_store and clr_type_korr. * write_block_record() has a new parameter 'old_record_checksum' which is the pre-computed checksum of old_record; that value is used to update the table's live checksum when writing UNDO_ROW_UPDATE|CLR_END. * In allocate_write_block_record(), if we are executing UNDO_ROW_DELETE the row's checksum is already computed. * _ma_update_block_record2() now expect the new row's checksum into cur_row.checksum (was already true) and the old row's checksum into new_row.checksum (that's new). Its two callers, maria_update() and _ma_apply_undo_row_update(), honour this. * When executing an UNDO_ROW_INSERT|UPDATE|DELETE in UNDO phase, pick up the checksum delta from the log record. It is then used to update the table's live checksum when writing CLR_END, and saves us a computation of record. storage/maria/ma_blockrec.h: in-write hooks move from ma_loghandler.c storage/maria/ma_check.c: more straightforward size of buffer storage/maria/ma_checkpoint.c: <= is enough storage/maria/ma_commit.c: new prototype of translog_write_record() storage/maria/ma_create.c: new prototype of translog_write_record() storage/maria/ma_delete.c: The row's checksum must be computed before calling(*delete_record)(), not after, because it must be known inside _ma_delete_block_record() (to update the table's live checksum when writing UNDO_ROW_DELETE). If deleting from a transactional table, live checksum was already updated when writing UNDO_ROW_DELETE. storage/maria/ma_delete_all.c: @todo is now done (in ma_loghandler.c) storage/maria/ma_delete_table.c: new prototype of translog_write_record() storage/maria/ma_loghandler.c: * in-write hooks move to ma_blockrec.c. * translog_write_record() gets a new argument 'hook_arg' which is passed down to pre|inwrite_rec_hook. It is more useful that 'parts' for those hooks, because when those hooks are called, 'parts' has possibly been mangled (like with LSN compression) and is so unpredictable. * fix for compiler warning (unused buffer_start when compiling without debug support) * Because checksum delta is stored into UNDO_ROW_INSERT|UPDATE|DELETE and CLR_END, but only if the table has live checksum, these records are not PSEUDOFIXEDLENGTH anymore, they are now VARIABLE_LENGTH (their length is X if no live checksum and X+4 otherwise). * add an inwrite_rec_hook for UNDO_ROW_UPDATE, which updates the table's live checksum. Update it also in hooks of UNDO_ROW_INSERT| DELETE and REDO_DELETE_ALL and CLR_END. * Bugfix: when reading a record in translog_read_record(), it happened that "length" became negative, because the function assumed that the record extended beyond the page's end, whereas it may be shorter. storage/maria/ma_loghandler.h: * Instead of hard-coded "1" and "4", use symbols and macros to store/retrieve the type of record which the CLR_END corresponds to, and the checksum variation caused by the operation which logs the record * translog_write_record() gets a new argument 'hook_arg' which is passed down to pre|inwrite_rec_hook. It is more useful that 'parts' for those hooks, because when those hooks are called, 'parts' has possibly been mangled (like with LSN compression) and is so unpredictable. storage/maria/ma_open.c: fix for "empty body in if() statement" (when compiling without safemutex) storage/maria/ma_pagecache.c: <= is enough storage/maria/ma_recovery.c: * print the time that each recovery phase (REDO/UNDO/flush) took; this is enabled only when recovering from ha_maria. Is it printed n seconds with a fractional part of one digit (like 123.4 seconds). * In the REDO phase, update the table's live checksum by using the checksum delta stored in UNDO_ROW_INSERT|DELETE|UPDATE and CLR_END. Update it too when seeing REDO_DELETE_ALL. * In the UNDO phase, when executing UNDO_ROW_INSERT, if the table does not have live checksum then reading the record's header (as done by the master loop of run_undo_phase()) is enough; otherwise we do a translog_read_record() to have the checksum delta ready for _ma_apply_undo_row_insert(). * When at the end of the REDO phase we notice that there is an unfinished group of REDOs, don't assert in debug binaries, as I verified that it can happen in real life (with kill -9) * removing ' in #error as it confuses gcc3 storage/maria/ma_rename.c: new prototype of translog_write_record() storage/maria/ma_test_recovery.expected: Change in output of ma_test_recovery: now all live checksums of original tables equal those of tables recreated by the REDO phase and those of tables fixed by the UNDO phase. I.e. recovery of the live checksum looks like working (which was after all the only goal of this changeset). I checked by hand that it's not just all live checksums which are now 0 and that's why they match. They are the old values like 3757530372. maria.test has hard-coded checksum values in its result file so checks this too. storage/maria/ma_update.c: * It's useless to put up HA_STATE_CHANGED in 'key_changed', as we put up HA_STATE_CHANGED in info->update anyway. * We need to compute the old and new rows' checksum before calling (*update_record)(), as checksum delta must be known when logging UNDO_ROW_UPDATE which is done by _ma_update_block_record(). Note that some functions change the 'newrec' record (at least _ma_check_unique() does) so we cannot move the checksum computation too early in the function. storage/maria/ma_write.c: If inserting into a transactional table, live's checksum was already updated when writing UNDO_ROW_INSERT. The multiplication is a trick to save an if(). storage/maria/unittest/ma_test_loghandler-t.c: new prototype of translog_write_record() storage/maria/unittest/ma_test_loghandler_first_lsn-t.c: new prototype of translog_write_record() storage/maria/unittest/ma_test_loghandler_max_lsn-t.c: new prototype of translog_write_record() storage/maria/unittest/ma_test_loghandler_multigroup-t.c: new prototype of translog_write_record() storage/maria/unittest/ma_test_loghandler_multithread-t.c: new prototype of translog_write_record() storage/maria/unittest/ma_test_loghandler_noflush-t.c: new prototype of translog_write_record() storage/maria/unittest/ma_test_loghandler_pagecache-t.c: new prototype of translog_write_record() storage/maria/unittest/ma_test_loghandler_purge-t.c: new prototype of translog_write_record() storage/myisam/sort.c: fix for compiler warnings in pushbuild (write_merge_key* functions didn't have their declaration match MARIA_HA::write_key). |
||
unknown
|
8b5dddbc00 |
WL#3072 Maria recovery
Progress reports on stderr if doing recovery from ha_maria; don't do checkpoints if activity since last checkpoint < 2MB (no change in fact as background thread is disabled for now); recovery trace is only if EXTRA_DEBUG now (better for benchmarks). storage/maria/ma_checkpoint.c: don't do checkpoints if activity (log writes plus page flushes) since last checkpoint was < 2MB. storage/maria/ma_recovery.c: progress reports in recovery (10%, transactions left to rollback etc); that is only if from ha_maria and is displayed on stderr. Recovery trace is now created only if EXTRA_DEBUG. storage/maria/ma_test_recovery.expected: update (--debug gone) storage/maria/ma_test_recovery: don't use --debug, as it can absent from binary |
||
unknown
|
95420b947e |
fix for non-debug compilation errors.
Note that non-debug build fails in log handler functions, mail sent. storage/maria/ma_blockrec.c: fix for compiler warning storage/maria/ma_checkpoint.c: Debug build does not catch this situation static int f(); ... f(2); ... static int f(int a, int b); Maybe this is because it believes the declaration is K&R. Non-debug build catches it. Adding (void) as an habit to avoid such errors. storage/maria/ma_checkpoint.h: adding (void) storage/maria/ma_recovery.c: adding (void) storage/maria/ma_recovery.h: adding (void) |
||
unknown
|
9c2ff270fa |
WL#3072 Maria Recovery
* recovery from ha_maria now skips replaying DDLs (too dangerous) * maria_read_log still replays DDLs, print warning about issues * fixes to replaying of REDO_RENAME * don't replay DDLs on corrupted tables (safer) * print a one-line message when really doing a recovery (applies to ha_maria, not maria_read_log) i.e. some REDOs or UNDOs are read. storage/maria/ma_checkpoint.c: fix for assertion failure storage/maria/ma_recovery.c: * Recovery from ha_maria now skips replaying DDLs (as the initial plan said) as this is unsafe in case of crashes during the DDL; applying the records may do harm (destroy important files) so we prefer to leave the "mess" of files untouched. A proper recovery of DDLs requires very careful thinking, probably testing separately the existence of the data and index file instead of using maria_open() which tests the existence of both, and maybe storing create_rename_lsn in the data file too. * maria_read_log still replays DDLs, we print a warning about dangers (due to ALTER TABLE not logging insertions into the tmp table; we will maybe need an option to have logging of those insertions). * fixes to replaying of REDO_RENAME (test create_rename_lsn of 'new_name' table if it exists; if that table exists and is more recent than the record, remove the 'old_name' table). * don't replay DDLs on corrupted tables (play safe) * fail also in non-debug builds if table is open when it should not be (when creating it for example, it should not be already open). * when the trace file is not stdout (i.e. when this is ha_maria), if really doing a recovery (reading REDOs or UNDOs), print a one-line message to stderr to inform about start and end of recovery (useful to know what mysqld is doing, especially if it takes long or crashes). storage/maria/ma_recovery.h: parameter to replay DDLs or not storage/maria/maria_read_log.c: replay DDLs in maria_read_log, to be able to recreate tables from scratch. |
||
unknown
|
a303f5b2c8 |
Fixes of the empty log problem.
storage/maria/ma_checkpoint.c: The new macro for easier printing LSN added. storage/maria/ma_loghandler.c: The assertion returned. The new macro for easier printing LSN added. storage/maria/ma_loghandler_lsn.h: The new macro for easier printing LSN added. storage/maria/ma_pagecache.c: The new macro for easier printing LSN added. storage/maria/ma_recovery.c: Recovery checks empty log state. RECHEADER_READ_ERROR means some real error. storage/maria/maria_read_log.c: Read log starts from real beggining of the log and precess error and empty log states. The new macro for easier printing LSN added. storage/maria/unittest/ma_test_loghandler-t.c: The new macro for easier printing LSN added. storage/maria/unittest/ma_test_loghandler_first_lsn-t.c: The new macro for easier printing LSN added. storage/maria/unittest/ma_test_loghandler_max_lsn-t.c: The new macro for easier printing LSN added. storage/maria/unittest/ma_test_loghandler_multigroup-t.c: The new macro for easier printing LSN added. storage/maria/unittest/ma_test_loghandler_multithread-t.c: The new macro for easier printing LSN added. storage/maria/unittest/ma_test_loghandler_noflush-t.c: The new macro for easier printing LSN added. |
||
unknown
|
20d871e5de |
fix for pushbuild test failure (my_realloc() failed => checkpoint
failed => Maria didn't start => tables were created as MyISAM). storage/maria/ma_checkpoint.c: safemalloc complains if my_realloc() is passed NULL and MY_ALLOW_ZERO_PTR is not used. |
||
unknown
|
a5f4e79db9 |
WL#3072 Maria Recovery
* added replaying of REDO_REPAIR_TABLE, but disabled it as mysterious linker errors appear. * after replaying RENAME/REPAIR, we must bump create_rename_lsn for idempotency of maria_read_log. sql/mysqld.cc: typo storage/maria/ma_checkpoint.c: silence compiler warning storage/maria/ma_recovery.c: * added replaying of REDO_REPAIR_TABLE, but disabled it as mysterious linker errors appear. * after replaying RENAME/REPAIR, we must bump create_rename_lsn for idempotency of maria_read_log. |
||
unknown
|
cec8ac3e07 |
WL#3071 Maria checkpoint
Finally this is the real checkpoint code. It however exhibits unstabilities when a checkpoint runs concurrently with data-modifying clients (table corruption, transaction log's assertions) so for now a checkpoint is taken only at startup after recovery and at shutdown, i.e. not in concurrent situations. Later we will let it run periodically, as well as flush dirty pages periodically (almost all needed code is there already, only pagecache code is written but not committed). WL#3072 Maria recovery * replacing UNDO_ROW_PURGE with CLR_END; testing of those CLR_END via ma_test2 which has INSERTs failing with duplicate keys. * replaying of REDO_RENAME_TABLE Now, off to test Recovery in ha_maria :) BitKeeper/deleted/.del-ma_least_recently_dirtied.c: Delete: storage/maria/ma_least_recently_dirtied.c BitKeeper/deleted/.del-ma_least_recently_dirtied.h: Delete: storage/maria/ma_least_recently_dirtied.h storage/maria/Makefile.am: compile Checkpoint module storage/maria/ha_maria.cc: When ha_maria starts, do a recovery from last checkpoint. Take a checkpoint when that recovery has ended and when ha_maria shuts down cleanly. storage/maria/ma_blockrec.c: * even if my_sync() fails we have to my_close() (otherwise we leak a descriptor) * UNDO_ROW_PURGE is replaced by a simple CLR_END for UNDO_ROW_INSERT, as promised in the old comment; it gives us skipping during the UNDO phase. storage/maria/ma_check.c: All REDOs before create_rename_lsn are ignored by Recovery. So create_rename_lsn must be set only after all data/index has been flushed and forced to disk. We thus move write_log_record_for_repair() to after _ma_flush_tables_files_after_repair(). storage/maria/ma_checkpoint.c: Checkpoint module. storage/maria/ma_checkpoint.h: optional argument if caller wants a thread to periodically take checkpoints and flush dirty pages. storage/maria/ma_create.c: * no need to init some vars as the initial bzero(share) takes care of this. * update to new function's name * even if we fail in my_sync() we have to my_close() storage/maria/ma_extra.c: Checkpoint reads share->last_version under intern_lock, so we make maria_extra() update it under intern_lock. THR_LOCK_maria still needed because of _ma_test_if_reopen(). storage/maria/ma_init.c: destroy checkpoint module when Maria shuts down. storage/maria/ma_loghandler.c: * UNDO_ROW_PURGE gone (see ma_blockrec.c) * we need to remember the LSN of the LOGREC_FILE_ID for a share, because this LSN is needed into the checkpoint record (Recovery wants to know the validity domain of an id->name mapping) * translog_get_horizon_no_lock() needed for Checkpoint * comment about failing assertion (Sanja knows) * translog_init_reader_data() thought that translog_read_record_header_scan() returns 0 in case of error, but 0 just means "0-length header". * translog_assign_id_to_share() now needs the MARIA_HA because LOGREC_FILE_ID uses a log-write hook. * Verify that (de)assignment of share->id happens only under intern_lock, as Checkpoint reads this id with intern_lock. * translog_purge() can accept TRANSLOG_ADDRESS, not necessarily a real LSN. storage/maria/ma_loghandler.h: prototype updates storage/maria/ma_open.c: no need to initialize "res" storage/maria/ma_pagecache.c: When taking a checkpoint, we don't need to know the maximum rec_lsn of dirty pages; this LSN was intended to be used in the two-checkpoint rule, but last_checkpoint_lsn is as good. 4 bytes for stored_list_size is enough as PAGECACHE::blocks (number of blocks which the pagecache can contain) is int. storage/maria/ma_pagecache.h: new prototype storage/maria/ma_recovery.c: * added replaying of REDO_RENAME_TABLE * UNDO_ROW_PURGE gone (see ma_blockrec.c), replaced by CLR_END * Recovery from the last checkpoint record now possible * In new_table() we skip the table if the id->name mapping is older than create_rename_lsn (mapping dates from lsn_of_file_id). * in get_MARIA_HA_from_REDO_record() we skip the record if the id->name mapping is newer than the record (can happen if processing a record which is before the checkpoint record). * parse_checkpoint_record() has to return a LSN, that's what caller expects storage/maria/ma_rename.c: new function's name; log end zeroes of tables' names (ease recovery) storage/maria/ma_test2.c: * equivalent of ma_test1's --test-undo added (named -u here). * -t=1 now stops right after creating the table, so that we can test undoing of INSERTs with duplicate keys (which tests the CLR_END logged by _ma_write_abort_block_record()). storage/maria/ma_test_recovery.expected: Result of testing undoing of INSERTs with duplicate keys; there are some differences in maria_chk -dvv but they are normal (removing records does not shrink data/index file, does not put back the "analyzed, optimized keys"(etc) index state. storage/maria/ma_test_recovery: Test undoing of INSERTs with duplicate keys, using ma_test2; when such INSERT happens, it logs REDO_INSERT, UNDO_INSERT, REDO_DELETE, CLR_END; we abort after that, and test that CLR_END causes recovery to jump over UNDO_INSERT. storage/maria/ma_write.c: comment storage/maria/maria_chk.c: comment storage/maria/maria_def.h: * a new bit in MARIA_SHARE::in_checkpoint, used to build a list of unique shares during Checkpoint. * MARIA_SHARE::lsn_of_file_id added: the LSN of the last LOGREC_FILE_ID for this share; needed to know to which LSN domain the mappings found in the Checkpoint record apply (new mappings should not apply to old REDOs). storage/maria/trnman.c: * small changes to how trnman_collect_transactions() fills its buffer; it also uses a non-dummy lsn_read_non_atomic() found in ma_checkpoint.h |
||
unknown
|
4cf6756eb0 |
First LSN calls added for transaction log.
storage/maria/ma_checkpoint.c: Definitions of LSN should be collected in the one file (ma_loghandler_lsn.h) storage/maria/ma_loghandler.c: New calls to get first theoretical and first real LSN. storage/maria/ma_loghandler.h: New calls to get first theoretical and first real LSN. storage/maria/ma_loghandler_lsn.h: Defined yet another impossible LSN to indicate error. storage/maria/ma_recovery.c: The first LSN call changed. storage/maria/maria_read_log.c: The first LSN call changed. storage/maria/unittest/Makefile.am: New unittest added. storage/maria/unittest/ma_test_loghandler_first_lsn-t.c: New BitKeeper file ``storage/maria/unittest/ma_test_loghandler_first_lsn-t.c'' |
||
unknown
|
631ecaabea |
Merged with mysql-5.1 main tree.
BUILD/compile-pentium-debug-max: Added definition after macro was removed from main tree. This will be fixed back in main tree later. |
||
unknown
|
46922b5125 |
GPL license update (same change as was done for all files in 5.1).
storage/maria/Makefile.am: GPL license update storage/maria/ft_maria.c: GPL license update storage/maria/ha_maria.cc: GPL license update storage/maria/ha_maria.h: GPL license update storage/maria/lockman.c: GPL license update storage/maria/lockman.h: GPL license update storage/maria/ma_bitmap.c: GPL license update storage/maria/ma_blockrec.c: GPL license update storage/maria/ma_blockrec.h: GPL license update storage/maria/ma_cache.c: GPL license update storage/maria/ma_changed.c: GPL license update storage/maria/ma_check.c: GPL license update storage/maria/ma_checkpoint.c: GPL license update storage/maria/ma_checkpoint.h: GPL license update storage/maria/ma_checksum.c: GPL license update storage/maria/ma_close.c: GPL license update storage/maria/ma_control_file.c: GPL license update storage/maria/ma_control_file.h: GPL license update storage/maria/ma_create.c: GPL license update storage/maria/ma_dbug.c: GPL license update storage/maria/ma_delete.c: GPL license update storage/maria/ma_delete_all.c: GPL license update storage/maria/ma_delete_table.c: GPL license update storage/maria/ma_dynrec.c: GPL license update storage/maria/ma_extra.c: GPL license update storage/maria/ma_ft_boolean_search.c: GPL license update storage/maria/ma_ft_eval.c: GPL license update storage/maria/ma_ft_eval.h: GPL license update storage/maria/ma_ft_nlq_search.c: GPL license update storage/maria/ma_ft_parser.c: GPL license update storage/maria/ma_ft_stem.c: GPL license update storage/maria/ma_ft_test1.c: GPL license update storage/maria/ma_ft_test1.h: GPL license update storage/maria/ma_ft_update.c: GPL license update storage/maria/ma_ftdefs.h: GPL license update storage/maria/ma_fulltext.h: GPL license update storage/maria/ma_info.c: GPL license update storage/maria/ma_init.c: GPL license update storage/maria/ma_key.c: GPL license update storage/maria/ma_keycache.c: GPL license update storage/maria/ma_least_recently_dirtied.c: GPL license update storage/maria/ma_least_recently_dirtied.h: GPL license update storage/maria/ma_locking.c: GPL license update storage/maria/ma_open.c: GPL license update storage/maria/ma_packrec.c: GPL license update storage/maria/ma_page.c: GPL license update storage/maria/ma_panic.c: GPL license update storage/maria/ma_preload.c: GPL license update storage/maria/ma_range.c: GPL license update storage/maria/ma_recovery.c: GPL license update storage/maria/ma_recovery.h: GPL license update storage/maria/ma_rename.c: GPL license update storage/maria/ma_rfirst.c: GPL license update storage/maria/ma_rkey.c: GPL license update storage/maria/ma_rlast.c: GPL license update storage/maria/ma_rnext.c: GPL license update storage/maria/ma_rnext_same.c: GPL license update storage/maria/ma_rprev.c: GPL license update storage/maria/ma_rrnd.c: GPL license update storage/maria/ma_rsame.c: GPL license update storage/maria/ma_rsamepos.c: GPL license update storage/maria/ma_rt_index.c: GPL license update storage/maria/ma_rt_index.h: GPL license update storage/maria/ma_rt_key.c: GPL license update storage/maria/ma_rt_key.h: GPL license update storage/maria/ma_rt_mbr.c: GPL license update storage/maria/ma_rt_mbr.h: GPL license update storage/maria/ma_rt_split.c: GPL license update storage/maria/ma_rt_test.c: GPL license update storage/maria/ma_scan.c: GPL license update storage/maria/ma_search.c: GPL license update storage/maria/ma_sort.c: GPL license update storage/maria/ma_sp_defs.h: GPL license update storage/maria/ma_sp_key.c: GPL license update storage/maria/ma_sp_test.c: GPL license update storage/maria/ma_static.c: GPL license update storage/maria/ma_statrec.c: GPL license update storage/maria/ma_test1.c: GPL license update storage/maria/ma_test2.c: GPL license update storage/maria/ma_test3.c: GPL license update storage/maria/ma_unique.c: GPL license update storage/maria/ma_update.c: GPL license update storage/maria/ma_write.c: GPL license update storage/maria/maria_chk.c: GPL license update storage/maria/maria_def.h: GPL license update storage/maria/maria_ftdump.c: GPL license update storage/maria/maria_pack.c: GPL license update storage/maria/tablockman.c: GPL license update storage/maria/tablockman.h: GPL license update storage/maria/trnman.c: GPL license update storage/maria/trnman.h: GPL license update |
||
unknown
|
b635df555a |
very minor comments and merges from MyISAM into Maria.
storage/maria/ma_checkpoint.c: comments storage/maria/ma_close.c: comments storage/maria/ma_write.c: merge from myisam storage/maria/maria_def.h: typo storage/myisam/mi_delete.c: unneeded {}, making it identical to Maria |
||
unknown
|
649b3b4605 |
WL#3071 - Maria checkpoint:
a function to store information about transactions into buffers, is added to the transaction manager, and called by the Checkpoint module. storage/maria/ma_checkpoint.c: "collecting info about transactions" moves to trnman.c storage/maria/trnman.c: a function to store information about the active transactions list and committed transactions list, into buffers, for use by the Checkpoint module. This function needs to know how many trns there are in the committed list, so we introduce a counter, trnman_committed_transactions. m_string.h is needed for LEX_STRING. storage/maria/trnman.h: A function to store information about the active transactions list and committed transactions list, into buffers, for use by the Checkpoint module. storage/maria/unittest/trnman-t.c: trnman.h needs LEX_STRING so m_string.h |
||
unknown
|
7199c90559 |
WL#3071 Maria checkpoint
- cleanups, simplifications - moving the construction of the "dirty pages table" into the pagecache where it belongs (because it's the pagecache which knows dirty pages). TODO: do the same soon for the "transactions table". - fix for a small bug in the pagecache (decrementation of "changed_blocks") include/pagecache.h: prototype mysys/mf_pagecache.c: m_string.h moves up for LEX_STRING to be known for pagecache.h. In pagecache_delete_page(), we must decrement "blocks_changed" even if we just delete the page without flushing it. A new function pagecache_collect_changed_blocks_with_LSN() (used by the Checkpoint module), which stores information about the changed blocks (a.k.a. "the dirty pages table") into a LEX_STRING. This function is not tested now, it will be when there is a Checkpoint. storage/maria/ma_checkpoint.c: refining the checkpoint code: factoring functions, moving the construction of the "dirty pages table" into mf_pagecache.c (I'll do the same with the construction of the "transactions table" once Serg tells me what's the best way to do it). storage/maria/ma_least_recently_dirtied.c: Simplifying the thread which does background flushing of least-recently-dirtied pages: - in first version that thread will not flush, just do checkpoints - in 2nd version, flushing should re-use existing page cache functions like flush_pagecache_blocks(). unittest/mysys/test_file.h: m_string.h moves up for LEX_STRING to be known in pagecache.h |
||
unknown
|
71b404973c |
WL#3071 - Maria checkpoint. Correcting comment about a bad problem.
storage/maria/ma_checkpoint.c: I was too optimistic; problem 1) is really a bad problem. |
||
unknown
|
fa05e9c9f4 |
WL#3071 - Maria checkpoint
Adding rec_lsn to Maria's page cache. Misc fixes to Checkpoint. mysys/mf_pagecache.c: adding rec_lsn, the LSN when a page first became dirty. It is set when unlocking a page (TODO: should also be set when the unlocking is an implicit part of pagecache_write()). It is reset in link_to_file_list() and free_block() (one of which is used every time we flush a block). It is a ulonglong and not LSN, because its destination is comparisons for which ulonglong is better than a struct. storage/maria/ma_checkpoint.c: misc fixes to Checkpoint (updates now that the transaction manager and the page cache are more known) storage/maria/ma_close.c: an important note for the future. storage/maria/ma_least_recently_dirtied.c: comment |
||
unknown
|
cdf831cf94 |
WL#3071 Maria checkpoint:
changing pseudocode to use the structures of the Maria pagecache ("pagecache->changed_blocks" etc) and other Maria structures inherited from MyISAM (THR_LOCK_maria etc). mysys/mf_pagecache.c: comment storage/maria/ma_checkpoint.c: changing pseudocode to use the structures of the Maria pagecache ("pagecache->changed_blocks" etc) and other Maria structures inherited from MyISAM (THR_LOCK_maria etc). storage/maria/ma_checkpoint.h: copyright storage/maria/ma_control_file.c: copyright storage/maria/ma_control_file.h: copyright storage/maria/ma_least_recently_dirtied.c: copyright storage/maria/ma_least_recently_dirtied.h: copyright storage/maria/ma_recovery.c: copyright storage/maria/ma_recovery.h: copyright storage/maria/unittest/Makefile.am: copyright |
||
unknown
|
a1f25544d5 |
WL#3234 "Maria - control file manager"
- fixes to the control file module - unit test for it - renames of all Maria files I created to start with ma_ storage/maria/ma_checkpoint.c: Rename: storage/maria/checkpoint.c -> storage/maria/ma_checkpoint.c storage/maria/ma_checkpoint.h: Rename: storage/maria/checkpoint.h -> storage/maria/ma_checkpoint.h storage/maria/ma_least_recently_dirtied.c: Rename: storage/maria/least_recently_dirtied.c -> storage/maria/ma_least_recently_dirtied.c storage/maria/ma_least_recently_dirtied.h: Rename: storage/maria/least_recently_dirtied.h -> storage/maria/ma_least_recently_dirtied.h storage/maria/ma_recovery.c: Rename: storage/maria/recovery.c -> storage/maria/ma_recovery.c storage/maria/ma_recovery.h: Rename: storage/maria/recovery.h -> storage/maria/ma_recovery.h storage/maria/Makefile.am: control file module and its unit test program storage/maria/ma_control_file.c: DBUG_ tags. Fix for gcc warnings. log_no -> logno (I felt "_no" sounded like a standalone "No" word). ma_ prefix for some functions. last_checkpoint_lsn_at_startup -> last_checkpoint_lsn (no need to make special vars for the values at startup). Same for last_logno. ma_control_file_write_and_force() now updates last_checkpoint_lsn and last_logno, the idea being that they belong to the module, others should not update them. And thus when the module shuts down, it zeroes those vars. storage/maria/ma_control_file.h: importing structs from Sanja to get the control file module to compile; we'll remove that when Sanja pushes the log handler. CONTROL_FILE_IMPOSSIBLE_LOGNO is 0, not FFFFFFFF. storage/maria/ma_control_file_test.c: Unit test program for the Maria control file module. Modelled after other ma_test* files in this directory (so, does not follow the unit test framework recently introduced with libtap; TODO as a task on all ma_test* programs). We test that writing to the control file works, and re-reading from it too, we check (by reading the file by ourselves) that its content on disk is correct, and check that a corrupted control file is detected. |
Renamed from storage/maria/checkpoint.c (Browse further)