2007-01-18 20:38:14 +01:00
|
|
|
/* Copyright (C) 2007 Michael Widenius
|
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
2007-03-02 11:20:23 +01:00
|
|
|
the Free Software Foundation; version 2 of the License.
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program; if not, write to the Free Software
|
|
|
|
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
|
|
|
|
|
|
|
|
/*
|
|
|
|
Bitmap handling (for records in block)
|
|
|
|
|
|
|
|
The data file starts with a bitmap page, followed by as many data
|
|
|
|
pages as the bitmap can cover. After this there is a new bitmap page
|
|
|
|
and more data pages etc.
|
|
|
|
|
|
|
|
The bitmap code assumes there is always an active bitmap page and thus
|
|
|
|
that there is at least one bitmap page in the file
|
|
|
|
|
|
|
|
Structure of bitmap page:
|
|
|
|
|
|
|
|
Fixed size records (to be implemented later):
|
|
|
|
|
|
|
|
2 bits are used to indicate:
|
|
|
|
|
|
|
|
0 Empty
|
2007-04-19 12:18:56 +02:00
|
|
|
1 0-75 % full (at least room for 2 records)
|
2007-01-18 20:38:14 +01:00
|
|
|
2 75-100 % full (at least room for one record)
|
|
|
|
3 100 % full (no more room for records)
|
|
|
|
|
|
|
|
Assuming 8K pages, this will allow us to map:
|
|
|
|
8192 (bytes per page) * 4 (pages mapped per byte) * 8192 (page size)= 256M
|
|
|
|
|
|
|
|
(For Maria this will be 7*4 * 8192 = 224K smaller because of LSN)
|
|
|
|
|
|
|
|
Note that for fixed size rows, we can't add more columns without doing
|
|
|
|
a full reorganization of the table. The user can always force a dynamic
|
|
|
|
size row format by specifying ROW_FORMAT=dynamic.
|
|
|
|
|
|
|
|
|
|
|
|
Dynamic size records:
|
|
|
|
|
|
|
|
3 bits are used to indicate
|
|
|
|
|
|
|
|
0 Empty page
|
|
|
|
1 0-30 % full (at least room for 3 records)
|
|
|
|
2 30-60 % full (at least room for 2 records)
|
|
|
|
3 60-90 % full (at least room for one record)
|
|
|
|
4 100 % full (no more room for records)
|
|
|
|
5 Tail page, 0-40 % full
|
|
|
|
6 Tail page, 40-80 % full
|
|
|
|
7 Full tail page or full blob page
|
|
|
|
|
|
|
|
Assuming 8K pages, this will allow us to map:
|
|
|
|
8192 (bytes per page) * 8 bits/byte / 3 bits/page * 8192 (page size)= 170.7M
|
|
|
|
|
|
|
|
Note that values 1-3 may be adjust for each individual table based on
|
|
|
|
'min record length'. Tail pages are for overflow data which can be of
|
|
|
|
any size and thus doesn't have to be adjusted for different tables.
|
|
|
|
If we add more columns to the table, some of the originally calculated
|
|
|
|
'cut off' points may not be optimal, but they shouldn't be 'drasticly
|
|
|
|
wrong'.
|
|
|
|
|
|
|
|
When allocating data from the bitmap, we are trying to do it in a
|
|
|
|
'best fit' manner. Blobs and varchar blocks are given out in large
|
|
|
|
continuous extents to allow fast access to these. Before allowing a
|
|
|
|
row to 'flow over' to other blocks, we will compact the page and use
|
|
|
|
all space on it. If there is many rows in the page, we will ensure
|
|
|
|
there is *LEFT_TO_GROW_ON_SPLIT* bytes left on the page to allow other
|
|
|
|
rows to grow.
|
|
|
|
|
|
|
|
The bitmap format allows us to extend the row file in big chunks, if needed.
|
|
|
|
|
|
|
|
When calculating the size for a packed row, we will calculate the following
|
|
|
|
things separately:
|
|
|
|
- Row header + null_bits + empty_bits fixed size segments etc.
|
|
|
|
- Size of all char/varchar fields
|
|
|
|
- Size of each blob field
|
|
|
|
|
|
|
|
The bitmap handler will get all the above information and return
|
2007-10-19 23:24:22 +02:00
|
|
|
either one page or a set of pages to put the different parts.
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
Bitmaps are read on demand in response to insert/delete/update operations.
|
|
|
|
The following bitmap pointers will be cached and stored on disk on close:
|
|
|
|
- Current insert_bitmap; When inserting new data we will first try to
|
2007-04-19 12:18:56 +02:00
|
|
|
fill this one.
|
2007-01-18 20:38:14 +01:00
|
|
|
- First bitmap which is not completely full. This is updated when we
|
2007-04-19 12:18:56 +02:00
|
|
|
free data with an update or delete.
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
While flushing out bitmaps, we will cache the status of the bitmap in memory
|
|
|
|
to avoid having to read a bitmap for insert of new data that will not
|
|
|
|
be of any use
|
|
|
|
- Total empty space
|
|
|
|
- Largest number of continuous pages
|
|
|
|
|
|
|
|
Bitmap ONLY goes to disk in the following scenarios
|
|
|
|
- The file is closed (and we flush all changes to disk)
|
|
|
|
- On checkpoint
|
|
|
|
(Ie: When we do a checkpoint, we have to ensure that all bitmaps are
|
|
|
|
put on disk even if they are not in the page cache).
|
|
|
|
- When explicitely requested (for example on backup or after recvoery,
|
|
|
|
to simplify things)
|
2007-07-01 15:20:57 +02:00
|
|
|
|
|
|
|
The flow of writing a row is that:
|
|
|
|
- Lock the bitmap
|
|
|
|
- Decide which data pages we will write to
|
|
|
|
- Mark them full in the bitmap page so that other threads do not try to
|
|
|
|
use the same data pages as us
|
|
|
|
- We unlock the bitmap
|
|
|
|
- Write the data pages
|
|
|
|
- Lock the bitmap
|
|
|
|
- Correct the bitmap page with the true final occupation of the data
|
|
|
|
pages (that is, we marked pages full but when we are done we realize
|
|
|
|
we didn't fill them)
|
|
|
|
- Unlock the bitmap.
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include "maria_def.h"
|
|
|
|
#include "ma_blockrec.h"
|
|
|
|
|
|
|
|
/* Number of pages to store blob parts */
|
|
|
|
#define BLOB_SEGMENT_MIN_SIZE 128
|
|
|
|
|
|
|
|
#define FULL_HEAD_PAGE 4
|
|
|
|
#define FULL_TAIL_PAGE 7
|
|
|
|
|
2007-10-09 20:09:50 +02:00
|
|
|
/* If we don't have page checksum enabled, the bitmap page ends with this */
|
|
|
|
uchar maria_bitmap_marker[4]=
|
|
|
|
{(uchar) 255, (uchar) 255, (uchar) 255, (uchar) 254};
|
|
|
|
uchar maria_normal_page_marker[4]=
|
|
|
|
{(uchar) 255, (uchar) 255, (uchar) 255, (uchar) 255};
|
2007-04-19 12:18:56 +02:00
|
|
|
|
|
|
|
static my_bool _ma_read_bitmap_page(MARIA_SHARE *share,
|
|
|
|
MARIA_FILE_BITMAP *bitmap,
|
|
|
|
ulonglong page);
|
|
|
|
|
|
|
|
|
|
|
|
/* Write bitmap page to key cache */
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
static inline my_bool write_changed_bitmap(MARIA_SHARE *share,
|
|
|
|
MARIA_FILE_BITMAP *bitmap)
|
|
|
|
{
|
2007-04-04 22:37:09 +02:00
|
|
|
DBUG_ASSERT(share->pagecache->block_size == bitmap->block_size);
|
|
|
|
return (pagecache_write(share->pagecache,
|
2007-04-19 17:48:36 +02:00
|
|
|
&bitmap->file, bitmap->page, 0,
|
2007-07-02 19:45:15 +02:00
|
|
|
(uchar*) bitmap->map, PAGECACHE_PLAIN_PAGE,
|
2007-04-04 22:37:09 +02:00
|
|
|
PAGECACHE_LOCK_LEFT_UNLOCKED,
|
2007-04-18 11:55:09 +02:00
|
|
|
PAGECACHE_PIN_LEFT_UNPINNED,
|
2007-11-05 14:07:50 +01:00
|
|
|
PAGECACHE_WRITE_DELAY, 0,
|
|
|
|
LSN_IMPOSSIBLE));
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2007-04-19 12:18:56 +02:00
|
|
|
Initialize bitmap variables in share
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_bitmap_init()
|
|
|
|
share Share handler
|
|
|
|
file data file handler
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
This is called the first time a file is opened.
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_bitmap_init(MARIA_SHARE *share, File file)
|
|
|
|
{
|
|
|
|
uint aligned_bit_blocks;
|
|
|
|
uint max_page_size;
|
|
|
|
MARIA_FILE_BITMAP *bitmap= &share->bitmap;
|
|
|
|
uint size= share->block_size;
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
/* We want to have a copy of the bitmap to be able to print differences */
|
|
|
|
size*= 2;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
if (!(bitmap->map= (uchar*) my_malloc(size, MYF(MY_WME))))
|
|
|
|
return 1;
|
|
|
|
|
2007-04-19 17:48:36 +02:00
|
|
|
bitmap->file.file= file;
|
2007-01-18 20:38:14 +01:00
|
|
|
bitmap->changed= 0;
|
|
|
|
bitmap->block_size= share->block_size;
|
|
|
|
/* Size needs to be alligned on 6 */
|
2007-10-09 20:09:50 +02:00
|
|
|
aligned_bit_blocks= (share->block_size - PAGE_SUFFIX_SIZE) / 6;
|
2007-01-18 20:38:14 +01:00
|
|
|
bitmap->total_size= aligned_bit_blocks * 6;
|
|
|
|
/*
|
|
|
|
In each 6 bytes, we have 6*8/3 = 16 pages covered
|
|
|
|
The +1 is to add the bitmap page, as this doesn't have to be covered
|
|
|
|
*/
|
|
|
|
bitmap->pages_covered= aligned_bit_blocks * 16 + 1;
|
|
|
|
|
|
|
|
/* Update size for bits */
|
|
|
|
/* TODO; Make this dependent of the row size */
|
|
|
|
max_page_size= share->block_size - PAGE_OVERHEAD_SIZE;
|
|
|
|
bitmap->sizes[0]= max_page_size; /* Empty page */
|
|
|
|
bitmap->sizes[1]= max_page_size - max_page_size * 30 / 100;
|
|
|
|
bitmap->sizes[2]= max_page_size - max_page_size * 60 / 100;
|
|
|
|
bitmap->sizes[3]= max_page_size - max_page_size * 90 / 100;
|
|
|
|
bitmap->sizes[4]= 0; /* Full page */
|
|
|
|
bitmap->sizes[5]= max_page_size - max_page_size * 40 / 100;
|
|
|
|
bitmap->sizes[6]= max_page_size - max_page_size * 80 / 100;
|
|
|
|
bitmap->sizes[7]= 0;
|
|
|
|
|
|
|
|
pthread_mutex_init(&share->bitmap.bitmap_lock, MY_MUTEX_INIT_SLOW);
|
|
|
|
|
|
|
|
/*
|
2007-04-19 17:48:36 +02:00
|
|
|
We can't read a page yet, as in some case we don't have an active
|
|
|
|
page cache yet.
|
|
|
|
Pretend we have a dummy, full and not changed bitmap page in memory.
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
2007-10-19 23:24:22 +02:00
|
|
|
|
2007-04-19 17:48:36 +02:00
|
|
|
bitmap->page= ~(ulonglong) 0;
|
|
|
|
bitmap->used_size= bitmap->total_size;
|
|
|
|
bfill(bitmap->map, share->block_size, 255);
|
WL#3072 Maria recovery
Misc changes:
- fix for benign Valgrind error, compiler warnings
- fix for a segfault in execution of maria_delete_all_rows() and one
when taking multiple checkpoints
- fix for too paranoid assertion
- adding ability to take checkpoints at the end of the REDO phase
and at the end of recovery.
- other minor changes
storage/maria/ha_maria.cc:
The checkpoint done after Recovery is finished, is moved to
maria_recover().
storage/maria/ma_bitmap.c:
fix for Valgrind error: the "shadow debug copy" of the bitmap page
started unitialized and so ma_print_bitmap() would use it uninitialized
storage/maria/ma_checkpoint.c:
* reset pointers to NULL after freeing them, or we segfault at
next checkpoint in my_realloc().
* fix for compiler warnings.
storage/maria/ma_delete_all.c:
info->trn is NULL for non-transactional tables
storage/maria/ma_locking.c:
correct assertion (it fired wrongly in execution of REDO_DROP_TABLE
due to the maria_extra(HA_PREPARE_FOR_DROP)->_ma_decrement_open_count()
->maria_lock_database(F_UNLCK); another solution would have been to
not call _ma_decrement_open_count() (it's ok to have a wrong open
count in a table which we are dropping), but the same problem
would still exist for REDO_RENAME_TABLE.
storage/maria/ma_loghandler.c:
fail early if UNRECOVERABLE_ERROR
storage/maria/ma_recovery.c:
* new argument to maria_apply_log(): should it take checkpoints
(at end of REDO phase and at the very end) or no.
* moving the call to translog_next_LSN() into
parse_checkpoint_record() ("hide the details").
* Refining an error detection for something which could happen
if there is a checkpoint record in the log.
* Using close_one_table() instead of maria_extra(HA_EXTRA_PREPARE_FOR_DROP|RENAME),
as it looks safer, and also changing how close_one_table() works:
it now limits itself to scanning all_tables[], thus having one loopp
instead of two, which should be faster (as a result, it does not
close tables not registered in this array, which is ok as there
should not be any).
storage/maria/ma_recovery.h:
new parameter
storage/maria/maria_read_log.c:
update to new prototype
2007-10-08 19:08:25 +02:00
|
|
|
#ifndef DBUG_OFF
|
|
|
|
memcpy(bitmap->map + bitmap->block_size, bitmap->map, bitmap->block_size);
|
|
|
|
#endif
|
2007-04-20 14:16:43 +02:00
|
|
|
if (share->state.first_bitmap_with_space == ~(ulonglong) 0)
|
|
|
|
{
|
|
|
|
/* Start scanning for free space from start of file */
|
|
|
|
share->state.first_bitmap_with_space = 0;
|
|
|
|
}
|
2007-04-19 17:48:36 +02:00
|
|
|
return 0;
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Free data allocated by _ma_bitmap_init
|
2007-04-19 12:18:56 +02:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_bitmap_end()
|
|
|
|
share Share handler
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_bitmap_end(MARIA_SHARE *share)
|
|
|
|
{
|
2007-04-19 12:18:56 +02:00
|
|
|
my_bool res= _ma_flush_bitmap(share);
|
2007-01-18 20:38:14 +01:00
|
|
|
pthread_mutex_destroy(&share->bitmap.bitmap_lock);
|
2007-07-02 19:45:15 +02:00
|
|
|
my_free((uchar*) share->bitmap.map, MYF(MY_ALLOW_ZERO_PTR));
|
2007-04-12 11:05:30 +02:00
|
|
|
share->bitmap.map= 0;
|
2007-01-18 20:38:14 +01:00
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
Send updated bitmap to the page cache
|
2007-04-19 12:18:56 +02:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_flush_bitmap()
|
|
|
|
share Share handler
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
In the future, _ma_flush_bitmap() will be called to flush changes don't
|
|
|
|
by this thread (ie, checking the changed flag is ok). The reason we
|
|
|
|
check it again in the mutex is that if someone else did a flush at the
|
|
|
|
same time, we don't have to do the write.
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_flush_bitmap(MARIA_SHARE *share)
|
|
|
|
{
|
|
|
|
my_bool res= 0;
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
DBUG_ENTER("_ma_flush_bitmap");
|
2007-01-18 20:38:14 +01:00
|
|
|
if (share->bitmap.changed)
|
|
|
|
{
|
|
|
|
pthread_mutex_lock(&share->bitmap.bitmap_lock);
|
|
|
|
if (share->bitmap.changed)
|
|
|
|
{
|
|
|
|
res= write_changed_bitmap(share, &share->bitmap);
|
|
|
|
share->bitmap.changed= 0;
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&share->bitmap.bitmap_lock);
|
|
|
|
}
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
DBUG_RETURN(res);
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
/*
|
|
|
|
Intialize bitmap in memory to a zero bitmap
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_bitmap_delete_all()
|
|
|
|
share Share handler
|
|
|
|
|
|
|
|
NOTES
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
This is called on maria_delete_all_rows (truncate data file).
|
2007-04-19 12:18:56 +02:00
|
|
|
*/
|
|
|
|
|
2007-04-12 11:05:30 +02:00
|
|
|
void _ma_bitmap_delete_all(MARIA_SHARE *share)
|
|
|
|
{
|
|
|
|
MARIA_FILE_BITMAP *bitmap= &share->bitmap;
|
|
|
|
if (bitmap->map) /* Not in create */
|
|
|
|
{
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
bzero(bitmap->map, bitmap->block_size);
|
|
|
|
memcpy(bitmap->map + bitmap->block_size - sizeof(maria_bitmap_marker),
|
|
|
|
maria_bitmap_marker, sizeof(maria_bitmap_marker));
|
2007-07-03 23:50:17 +02:00
|
|
|
bitmap->changed= 1;
|
2007-04-12 11:05:30 +02:00
|
|
|
bitmap->page= 0;
|
|
|
|
bitmap->used_size= bitmap->total_size;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
/*
|
|
|
|
Return bitmap pattern for the smallest head block that can hold 'size'
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
size_to_head_pattern()
|
|
|
|
bitmap Bitmap
|
|
|
|
size Requested size
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0-3 For a description of the bitmap sizes, see the header
|
|
|
|
*/
|
|
|
|
|
|
|
|
static uint size_to_head_pattern(MARIA_FILE_BITMAP *bitmap, uint size)
|
|
|
|
{
|
|
|
|
if (size <= bitmap->sizes[3])
|
|
|
|
return 3;
|
|
|
|
if (size <= bitmap->sizes[2])
|
|
|
|
return 2;
|
|
|
|
if (size <= bitmap->sizes[1])
|
|
|
|
return 1;
|
|
|
|
DBUG_ASSERT(size <= bitmap->sizes[0]);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-04-19 12:18:56 +02:00
|
|
|
Return bitmap pattern for head block where there is size bytes free
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_free_size_to_head_pattern()
|
|
|
|
bitmap Bitmap
|
|
|
|
size Requested size
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0-4 (Possible bitmap patterns for head block)
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
uint _ma_free_size_to_head_pattern(MARIA_FILE_BITMAP *bitmap, uint size)
|
|
|
|
{
|
|
|
|
if (size < bitmap->sizes[3])
|
|
|
|
return 4;
|
|
|
|
if (size < bitmap->sizes[2])
|
|
|
|
return 3;
|
|
|
|
if (size < bitmap->sizes[1])
|
|
|
|
return 2;
|
|
|
|
return (size < bitmap->sizes[0]) ? 1 : 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Return bitmap pattern for the smallest tail block that can hold 'size'
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
size_to_tail_pattern()
|
|
|
|
bitmap Bitmap
|
|
|
|
size Requested size
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0, 5 or 6 For a description of the bitmap sizes, see the header
|
|
|
|
*/
|
|
|
|
|
|
|
|
static uint size_to_tail_pattern(MARIA_FILE_BITMAP *bitmap, uint size)
|
|
|
|
{
|
|
|
|
if (size <= bitmap->sizes[6])
|
|
|
|
return 6;
|
|
|
|
if (size <= bitmap->sizes[5])
|
|
|
|
return 5;
|
|
|
|
DBUG_ASSERT(size <= bitmap->sizes[0]);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
/*
|
|
|
|
Return bitmap pattern for tail block where there is size bytes free
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
free_size_to_tail_pattern()
|
|
|
|
bitmap Bitmap
|
|
|
|
size Requested size
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0, 5, 6, 7 For a description of the bitmap sizes, see the header
|
|
|
|
*/
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
static uint free_size_to_tail_pattern(MARIA_FILE_BITMAP *bitmap, uint size)
|
|
|
|
{
|
|
|
|
if (size >= bitmap->sizes[0])
|
|
|
|
return 0; /* Revert to empty page */
|
|
|
|
if (size < bitmap->sizes[6])
|
|
|
|
return 7;
|
|
|
|
if (size < bitmap->sizes[5])
|
|
|
|
return 6;
|
|
|
|
return 5;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Return size guranteed to be available on a page
|
|
|
|
|
|
|
|
SYNOPSIS
|
2007-04-19 12:18:56 +02:00
|
|
|
pattern_to_head_size()
|
2007-01-18 20:38:14 +01:00
|
|
|
bitmap Bitmap
|
|
|
|
pattern Pattern (0-7)
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 - block_size
|
|
|
|
*/
|
|
|
|
|
|
|
|
static inline uint pattern_to_size(MARIA_FILE_BITMAP *bitmap, uint pattern)
|
|
|
|
{
|
|
|
|
DBUG_ASSERT(pattern <= 7);
|
|
|
|
return bitmap->sizes[pattern];
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Print bitmap for debugging
|
2007-04-19 12:18:56 +02:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_print_bitmap()
|
|
|
|
bitmap Bitmap to print
|
|
|
|
|
|
|
|
IMPLEMENTATION
|
|
|
|
Prints all changed bits since last call to _ma_print_bitmap().
|
|
|
|
This is done by having a copy of the last bitmap in
|
|
|
|
bitmap->map+bitmap->block_size.
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
|
|
|
|
const char *bits_to_txt[]=
|
|
|
|
{
|
|
|
|
"empty", "00-30% full", "30-60% full", "60-90% full", "full",
|
|
|
|
"tail 00-40 % full", "tail 40-80 % full", "tail/blob full"
|
|
|
|
};
|
|
|
|
|
2007-10-19 23:24:22 +02:00
|
|
|
static void _ma_print_bitmap_changes(MARIA_FILE_BITMAP *bitmap)
|
2007-01-18 20:38:14 +01:00
|
|
|
{
|
|
|
|
uchar *pos, *end, *org_pos;
|
|
|
|
ulong page;
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
end= bitmap->map + bitmap->used_size;
|
2007-01-18 20:38:14 +01:00
|
|
|
DBUG_LOCK_FILE;
|
|
|
|
fprintf(DBUG_FILE,"\nBitmap page changes at page %lu\n",
|
|
|
|
(ulong) bitmap->page);
|
|
|
|
|
|
|
|
page= (ulong) bitmap->page+1;
|
2007-04-19 12:18:56 +02:00
|
|
|
for (pos= bitmap->map, org_pos= bitmap->map + bitmap->block_size ;
|
|
|
|
pos < end ;
|
2007-01-18 20:38:14 +01:00
|
|
|
pos+= 6, org_pos+= 6)
|
|
|
|
{
|
|
|
|
ulonglong bits= uint6korr(pos); /* 6 bytes = 6*8/3= 16 patterns */
|
|
|
|
ulonglong org_bits= uint6korr(org_pos);
|
|
|
|
uint i;
|
2007-04-19 12:18:56 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
Test if there is any changes in the next 16 bitmaps (to not have to
|
|
|
|
loop through all bits if we know they are the same)
|
|
|
|
*/
|
2007-01-18 20:38:14 +01:00
|
|
|
if (bits != org_bits)
|
|
|
|
{
|
|
|
|
for (i= 0; i < 16 ; i++, bits>>= 3, org_bits>>= 3)
|
|
|
|
{
|
|
|
|
if ((bits & 7) != (org_bits & 7))
|
|
|
|
fprintf(DBUG_FILE, "Page: %8lu %s -> %s\n", page+i,
|
|
|
|
bits_to_txt[org_bits & 7], bits_to_txt[bits & 7]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
page+= 16;
|
|
|
|
}
|
|
|
|
fputc('\n', DBUG_FILE);
|
|
|
|
DBUG_UNLOCK_FILE;
|
2007-04-19 12:18:56 +02:00
|
|
|
memcpy(bitmap->map + bitmap->block_size, bitmap->map, bitmap->block_size);
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
|
2007-10-19 23:24:22 +02:00
|
|
|
|
|
|
|
/* Print content of bitmap for debugging */
|
|
|
|
|
|
|
|
void _ma_print_bitmap(MARIA_FILE_BITMAP *bitmap, uchar *data,
|
|
|
|
ulonglong page)
|
|
|
|
{
|
|
|
|
uchar *pos, *end;
|
|
|
|
char llbuff[22];
|
|
|
|
|
|
|
|
end= bitmap->map + bitmap->used_size;
|
|
|
|
DBUG_LOCK_FILE;
|
|
|
|
fprintf(DBUG_FILE,"\nDump of bitmap page at %s\n", llstr(page, llbuff));
|
|
|
|
|
|
|
|
page++; /* Skip bitmap page */
|
|
|
|
for (pos= data, end= pos + bitmap->total_size;
|
|
|
|
pos < end ;
|
|
|
|
pos+= 6)
|
|
|
|
{
|
|
|
|
ulonglong bits= uint6korr(pos); /* 6 bytes = 6*8/3= 16 patterns */
|
|
|
|
|
|
|
|
/*
|
|
|
|
Test if there is any changes in the next 16 bitmaps (to not have to
|
|
|
|
loop through all bits if we know they are the same)
|
|
|
|
*/
|
|
|
|
if (bits)
|
|
|
|
{
|
|
|
|
uint i;
|
|
|
|
for (i= 0; i < 16 ; i++, bits>>= 3)
|
|
|
|
{
|
|
|
|
if (bits & 7)
|
|
|
|
fprintf(DBUG_FILE, "Page: %8s %s\n", llstr(page+i, llbuff),
|
|
|
|
bits_to_txt[bits & 7]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
page+= 16;
|
|
|
|
}
|
|
|
|
fputc('\n', DBUG_FILE);
|
|
|
|
DBUG_UNLOCK_FILE;
|
|
|
|
}
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
#endif /* DBUG_OFF */
|
|
|
|
|
|
|
|
|
|
|
|
/***************************************************************************
|
|
|
|
Reading & writing bitmap pages
|
|
|
|
***************************************************************************/
|
|
|
|
|
|
|
|
/*
|
|
|
|
Read a given bitmap page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
read_bitmap_page()
|
|
|
|
info Maria handler
|
|
|
|
bitmap Bitmap handler
|
|
|
|
page Page to read
|
|
|
|
|
|
|
|
TODO
|
|
|
|
Update 'bitmap->used_size' to real size of used bitmap
|
|
|
|
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
NOTE
|
|
|
|
We don't always have share->bitmap.bitmap_lock here
|
|
|
|
(when called from_ma_check_bitmap_data() for example).
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error (Error writing old bitmap or reading bitmap page)
|
|
|
|
*/
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
static my_bool _ma_read_bitmap_page(MARIA_SHARE *share,
|
|
|
|
MARIA_FILE_BITMAP *bitmap,
|
|
|
|
ulonglong page)
|
2007-01-18 20:38:14 +01:00
|
|
|
{
|
WL#3072 Maria recovery
* create page cache before initializing engine and not after, because
Maria's recovery needs a page cache
* make the creation of a bitmap page more crash-resistent
* bugfix (see ma_blockrec.c)
* back to old way: create an 8k bitmap page when creating table
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* maria_chk tags repaired table with a special LSN
* reworking all around in ma_recovery.c (less duplication)
mysys/my_realloc.c:
noted an issue in my_realloc()
sql/mysqld.cc:
page cache needs to be created before engines are initialized,
because Maria's initialization may do a recovery which needs
the page cache.
storage/maria/ha_maria.cc:
update to new prototype
storage/maria/ma_bitmap.c:
when creating the first bitmap page we used chsize to 8192 bytes then
pwrite (overwrite) the last 2 bytes (8191-8192). If crash between
the two operations, this leaves a bitmap page full without its end
marker. A later recovery may try to read this page and find it
exists and misses a marker and conclude it's corrupted and fail.
Changing the chsize to only 8190 bytes: recovery will then find
the page is too short and recreate it entirely.
storage/maria/ma_blockrec.c:
Fix for a bug: when executing a REDO, if the data page is created,
data_file_length was increased before _ma_bitmap_set():
_ma_bitmap_set() called _ma_read_bitmap_page() which, due to the
increased data_file_length, expected to find a bitmap page on disk
with a correct end marker; if the bitmap page didn't exist already
in fact, this failed. Fixed by increasing data_file_length only after
_ma_read_bitmap_page() has created the new bitmap page correctly.
This bug could happen every time a REDO is about creating a new
bitmap page.
storage/maria/ma_check.c:
empty data file has a bitmap page
storage/maria/ma_control_file.c:
useless parameter to ma_control_file_create_or_open(), just
test if this is recovery.
storage/maria/ma_control_file.h:
new prototype
storage/maria/ma_create.c:
Back to how it was before: maria_create() creates an 8k bitmap page.
Thus (bugfix) data_file_length needs to reflect this instead of being 0.
storage/maria/ma_loghandler.c:
as ma_test1 and ma_test2 now use real transactions and not
dummy_transaction_object, REDO for INSERT/UPDATE/DELETE are always
about real transactions, can assert this.
A function for Recovery to assign a short id to a table.
storage/maria/ma_loghandler.h:
new function
storage/maria/ma_loghandler_lsn.h:
maria_chk tags repaired tables with this LSN
storage/maria/ma_open.c:
* enforce that DMLs on transactional tables use real transactions
and not dummy_transaction_object.
* test if table was repaired with maria_chk (which has to been
seen as an import of an external table into the server), test
validity of create_rename_lsn (header corruption detection)
* comments.
storage/maria/ma_recovery.c:
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* reworking all around (less duplication)
storage/maria/ma_recovery.h:
a parameter to say if the UNDO phase should be skipped
storage/maria/maria_chk.c:
tag repaired tables with a special LSN
storage/maria/maria_read_log.c:
* update to new prototype
* no UNDO phase in maria_read_log for now
storage/maria/trnman.c:
* a function for Recovery to create a transaction (TRN), needed
in the UNDO phase
* a function for Recovery to grab an existing transaction, needed
in the UNDO phase (rollback all existing transactions)
storage/maria/trnman_public.h:
new functions
2007-08-29 16:43:01 +02:00
|
|
|
my_off_t end_of_page= (page + 1) * bitmap->block_size;
|
2007-01-18 20:38:14 +01:00
|
|
|
my_bool res;
|
|
|
|
DBUG_ENTER("_ma_read_bitmap_page");
|
|
|
|
DBUG_ASSERT(page % bitmap->pages_covered == 0);
|
|
|
|
|
|
|
|
bitmap->page= page;
|
WL#3072 Maria recovery
* create page cache before initializing engine and not after, because
Maria's recovery needs a page cache
* make the creation of a bitmap page more crash-resistent
* bugfix (see ma_blockrec.c)
* back to old way: create an 8k bitmap page when creating table
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* maria_chk tags repaired table with a special LSN
* reworking all around in ma_recovery.c (less duplication)
mysys/my_realloc.c:
noted an issue in my_realloc()
sql/mysqld.cc:
page cache needs to be created before engines are initialized,
because Maria's initialization may do a recovery which needs
the page cache.
storage/maria/ha_maria.cc:
update to new prototype
storage/maria/ma_bitmap.c:
when creating the first bitmap page we used chsize to 8192 bytes then
pwrite (overwrite) the last 2 bytes (8191-8192). If crash between
the two operations, this leaves a bitmap page full without its end
marker. A later recovery may try to read this page and find it
exists and misses a marker and conclude it's corrupted and fail.
Changing the chsize to only 8190 bytes: recovery will then find
the page is too short and recreate it entirely.
storage/maria/ma_blockrec.c:
Fix for a bug: when executing a REDO, if the data page is created,
data_file_length was increased before _ma_bitmap_set():
_ma_bitmap_set() called _ma_read_bitmap_page() which, due to the
increased data_file_length, expected to find a bitmap page on disk
with a correct end marker; if the bitmap page didn't exist already
in fact, this failed. Fixed by increasing data_file_length only after
_ma_read_bitmap_page() has created the new bitmap page correctly.
This bug could happen every time a REDO is about creating a new
bitmap page.
storage/maria/ma_check.c:
empty data file has a bitmap page
storage/maria/ma_control_file.c:
useless parameter to ma_control_file_create_or_open(), just
test if this is recovery.
storage/maria/ma_control_file.h:
new prototype
storage/maria/ma_create.c:
Back to how it was before: maria_create() creates an 8k bitmap page.
Thus (bugfix) data_file_length needs to reflect this instead of being 0.
storage/maria/ma_loghandler.c:
as ma_test1 and ma_test2 now use real transactions and not
dummy_transaction_object, REDO for INSERT/UPDATE/DELETE are always
about real transactions, can assert this.
A function for Recovery to assign a short id to a table.
storage/maria/ma_loghandler.h:
new function
storage/maria/ma_loghandler_lsn.h:
maria_chk tags repaired tables with this LSN
storage/maria/ma_open.c:
* enforce that DMLs on transactional tables use real transactions
and not dummy_transaction_object.
* test if table was repaired with maria_chk (which has to been
seen as an import of an external table into the server), test
validity of create_rename_lsn (header corruption detection)
* comments.
storage/maria/ma_recovery.c:
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* reworking all around (less duplication)
storage/maria/ma_recovery.h:
a parameter to say if the UNDO phase should be skipped
storage/maria/maria_chk.c:
tag repaired tables with a special LSN
storage/maria/maria_read_log.c:
* update to new prototype
* no UNDO phase in maria_read_log for now
storage/maria/trnman.c:
* a function for Recovery to create a transaction (TRN), needed
in the UNDO phase
* a function for Recovery to grab an existing transaction, needed
in the UNDO phase (rollback all existing transactions)
storage/maria/trnman_public.h:
new functions
2007-08-29 16:43:01 +02:00
|
|
|
if (end_of_page > share->state.state.data_file_length)
|
2007-01-18 20:38:14 +01:00
|
|
|
{
|
WL#3072 Maria recovery
* create page cache before initializing engine and not after, because
Maria's recovery needs a page cache
* make the creation of a bitmap page more crash-resistent
* bugfix (see ma_blockrec.c)
* back to old way: create an 8k bitmap page when creating table
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* maria_chk tags repaired table with a special LSN
* reworking all around in ma_recovery.c (less duplication)
mysys/my_realloc.c:
noted an issue in my_realloc()
sql/mysqld.cc:
page cache needs to be created before engines are initialized,
because Maria's initialization may do a recovery which needs
the page cache.
storage/maria/ha_maria.cc:
update to new prototype
storage/maria/ma_bitmap.c:
when creating the first bitmap page we used chsize to 8192 bytes then
pwrite (overwrite) the last 2 bytes (8191-8192). If crash between
the two operations, this leaves a bitmap page full without its end
marker. A later recovery may try to read this page and find it
exists and misses a marker and conclude it's corrupted and fail.
Changing the chsize to only 8190 bytes: recovery will then find
the page is too short and recreate it entirely.
storage/maria/ma_blockrec.c:
Fix for a bug: when executing a REDO, if the data page is created,
data_file_length was increased before _ma_bitmap_set():
_ma_bitmap_set() called _ma_read_bitmap_page() which, due to the
increased data_file_length, expected to find a bitmap page on disk
with a correct end marker; if the bitmap page didn't exist already
in fact, this failed. Fixed by increasing data_file_length only after
_ma_read_bitmap_page() has created the new bitmap page correctly.
This bug could happen every time a REDO is about creating a new
bitmap page.
storage/maria/ma_check.c:
empty data file has a bitmap page
storage/maria/ma_control_file.c:
useless parameter to ma_control_file_create_or_open(), just
test if this is recovery.
storage/maria/ma_control_file.h:
new prototype
storage/maria/ma_create.c:
Back to how it was before: maria_create() creates an 8k bitmap page.
Thus (bugfix) data_file_length needs to reflect this instead of being 0.
storage/maria/ma_loghandler.c:
as ma_test1 and ma_test2 now use real transactions and not
dummy_transaction_object, REDO for INSERT/UPDATE/DELETE are always
about real transactions, can assert this.
A function for Recovery to assign a short id to a table.
storage/maria/ma_loghandler.h:
new function
storage/maria/ma_loghandler_lsn.h:
maria_chk tags repaired tables with this LSN
storage/maria/ma_open.c:
* enforce that DMLs on transactional tables use real transactions
and not dummy_transaction_object.
* test if table was repaired with maria_chk (which has to been
seen as an import of an external table into the server), test
validity of create_rename_lsn (header corruption detection)
* comments.
storage/maria/ma_recovery.c:
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* reworking all around (less duplication)
storage/maria/ma_recovery.h:
a parameter to say if the UNDO phase should be skipped
storage/maria/maria_chk.c:
tag repaired tables with a special LSN
storage/maria/maria_read_log.c:
* update to new prototype
* no UNDO phase in maria_read_log for now
storage/maria/trnman.c:
* a function for Recovery to create a transaction (TRN), needed
in the UNDO phase
* a function for Recovery to grab an existing transaction, needed
in the UNDO phase (rollback all existing transactions)
storage/maria/trnman_public.h:
new functions
2007-08-29 16:43:01 +02:00
|
|
|
/*
|
|
|
|
Inexistent or half-created page (could be crash in the middle of
|
|
|
|
_ma_bitmap_create_first(), before appending maria_bitmap_marker).
|
|
|
|
*/
|
2007-11-11 15:27:07 +01:00
|
|
|
/**
|
|
|
|
@todo RECOVERY BUG
|
|
|
|
We are updating data_file_length before writing any log record for the
|
|
|
|
row operation. What if now state is flushed by a checkpoint with the
|
|
|
|
new value, and crash before the checkpoint record is written, recovery
|
|
|
|
may not even open the table (no log records) so not fix
|
|
|
|
data_file_length ("WAL violation")?
|
|
|
|
Scenario: assume share->id==0, then:
|
|
|
|
thread 1 (here) thread 2 (checkpoint)
|
|
|
|
update data_file_length
|
|
|
|
copy state to memory, flush log
|
|
|
|
set share->id and write FILE_ID (not flushed)
|
|
|
|
see share->id!=0 so flush state
|
|
|
|
crash
|
|
|
|
FILE_ID will be missing, Recovery will not open table and not fix
|
|
|
|
data_file_length. This bug should be fixed with other "checkpoint vs
|
|
|
|
bitmap" bugs.
|
|
|
|
One possibility will be logging a standalone LOGREC_CREATE_BITMAP in a
|
|
|
|
separate transaction (using dummy_transaction_object).
|
2007-11-09 23:30:31 +01:00
|
|
|
*/
|
WL#3072 Maria recovery
* create page cache before initializing engine and not after, because
Maria's recovery needs a page cache
* make the creation of a bitmap page more crash-resistent
* bugfix (see ma_blockrec.c)
* back to old way: create an 8k bitmap page when creating table
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* maria_chk tags repaired table with a special LSN
* reworking all around in ma_recovery.c (less duplication)
mysys/my_realloc.c:
noted an issue in my_realloc()
sql/mysqld.cc:
page cache needs to be created before engines are initialized,
because Maria's initialization may do a recovery which needs
the page cache.
storage/maria/ha_maria.cc:
update to new prototype
storage/maria/ma_bitmap.c:
when creating the first bitmap page we used chsize to 8192 bytes then
pwrite (overwrite) the last 2 bytes (8191-8192). If crash between
the two operations, this leaves a bitmap page full without its end
marker. A later recovery may try to read this page and find it
exists and misses a marker and conclude it's corrupted and fail.
Changing the chsize to only 8190 bytes: recovery will then find
the page is too short and recreate it entirely.
storage/maria/ma_blockrec.c:
Fix for a bug: when executing a REDO, if the data page is created,
data_file_length was increased before _ma_bitmap_set():
_ma_bitmap_set() called _ma_read_bitmap_page() which, due to the
increased data_file_length, expected to find a bitmap page on disk
with a correct end marker; if the bitmap page didn't exist already
in fact, this failed. Fixed by increasing data_file_length only after
_ma_read_bitmap_page() has created the new bitmap page correctly.
This bug could happen every time a REDO is about creating a new
bitmap page.
storage/maria/ma_check.c:
empty data file has a bitmap page
storage/maria/ma_control_file.c:
useless parameter to ma_control_file_create_or_open(), just
test if this is recovery.
storage/maria/ma_control_file.h:
new prototype
storage/maria/ma_create.c:
Back to how it was before: maria_create() creates an 8k bitmap page.
Thus (bugfix) data_file_length needs to reflect this instead of being 0.
storage/maria/ma_loghandler.c:
as ma_test1 and ma_test2 now use real transactions and not
dummy_transaction_object, REDO for INSERT/UPDATE/DELETE are always
about real transactions, can assert this.
A function for Recovery to assign a short id to a table.
storage/maria/ma_loghandler.h:
new function
storage/maria/ma_loghandler_lsn.h:
maria_chk tags repaired tables with this LSN
storage/maria/ma_open.c:
* enforce that DMLs on transactional tables use real transactions
and not dummy_transaction_object.
* test if table was repaired with maria_chk (which has to been
seen as an import of an external table into the server), test
validity of create_rename_lsn (header corruption detection)
* comments.
storage/maria/ma_recovery.c:
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* reworking all around (less duplication)
storage/maria/ma_recovery.h:
a parameter to say if the UNDO phase should be skipped
storage/maria/maria_chk.c:
tag repaired tables with a special LSN
storage/maria/maria_read_log.c:
* update to new prototype
* no UNDO phase in maria_read_log for now
storage/maria/trnman.c:
* a function for Recovery to create a transaction (TRN), needed
in the UNDO phase
* a function for Recovery to grab an existing transaction, needed
in the UNDO phase (rollback all existing transactions)
storage/maria/trnman_public.h:
new functions
2007-08-29 16:43:01 +02:00
|
|
|
share->state.state.data_file_length= end_of_page;
|
2007-01-18 20:38:14 +01:00
|
|
|
bzero(bitmap->map, bitmap->block_size);
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
memcpy(bitmap->map + bitmap->block_size - sizeof(maria_bitmap_marker),
|
|
|
|
maria_bitmap_marker, sizeof(maria_bitmap_marker));
|
2007-01-18 20:38:14 +01:00
|
|
|
bitmap->used_size= 0;
|
2007-06-04 13:07:18 +02:00
|
|
|
#ifndef DBUG_OFF
|
|
|
|
memcpy(bitmap->map + bitmap->block_size, bitmap->map, bitmap->block_size);
|
|
|
|
#endif
|
2007-01-18 20:38:14 +01:00
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
bitmap->used_size= bitmap->total_size;
|
2007-04-04 22:37:09 +02:00
|
|
|
DBUG_ASSERT(share->pagecache->block_size == bitmap->block_size);
|
2007-10-09 20:09:50 +02:00
|
|
|
res= pagecache_read(share->pagecache,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
(PAGECACHE_FILE*)&bitmap->file, page, 0,
|
2007-07-26 12:15:47 +02:00
|
|
|
(uchar*) bitmap->map,
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
PAGECACHE_PLAIN_PAGE,
|
2007-10-09 20:09:50 +02:00
|
|
|
PAGECACHE_LOCK_LEFT_UNLOCKED, 0) == NULL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
We can't check maria_bitmap_marker here as if the bitmap page
|
|
|
|
previously had a true checksum and the user switched mode to not checksum
|
|
|
|
this may have any value, except maria_normal_page_marker.
|
|
|
|
|
|
|
|
Using maria_normal_page_marker gives us a protection against bugs
|
|
|
|
when running without any checksums.
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (!res && !(share->options & HA_OPTION_PAGE_CHECKSUM) &&
|
|
|
|
!memcmp(bitmap->map + bitmap->block_size -
|
|
|
|
sizeof(maria_normal_page_marker),
|
|
|
|
maria_normal_page_marker,
|
|
|
|
sizeof(maria_normal_page_marker)))
|
|
|
|
{
|
|
|
|
res= 1;
|
|
|
|
my_errno= HA_ERR_WRONG_IN_RECORD; /* File crashed */
|
|
|
|
}
|
2007-01-18 20:38:14 +01:00
|
|
|
#ifndef DBUG_OFF
|
|
|
|
if (!res)
|
2007-04-19 12:18:56 +02:00
|
|
|
memcpy(bitmap->map + bitmap->block_size, bitmap->map, bitmap->block_size);
|
2007-01-18 20:38:14 +01:00
|
|
|
#endif
|
|
|
|
DBUG_RETURN(res);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Change to another bitmap page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_change_bitmap_page()
|
|
|
|
info Maria handler
|
|
|
|
bitmap Bitmap handler
|
|
|
|
page Bitmap page to read
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
If old bitmap was changed, write it out before reading new one
|
|
|
|
We return empty bitmap if page is outside of file size
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error (Error writing old bitmap or reading bitmap page)
|
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool _ma_change_bitmap_page(MARIA_HA *info,
|
|
|
|
MARIA_FILE_BITMAP *bitmap,
|
|
|
|
ulonglong page)
|
|
|
|
{
|
|
|
|
DBUG_ENTER("_ma_change_bitmap_page");
|
|
|
|
|
|
|
|
if (bitmap->changed)
|
|
|
|
{
|
2007-11-11 15:27:07 +01:00
|
|
|
/**
|
|
|
|
@todo RECOVERY BUG this is going to flush the bitmap page possibly to
|
|
|
|
disk even though it could be over-allocated with not yet any REDO-UNDO
|
|
|
|
complete group (WAL violation: no way to undo the over-allocation if
|
|
|
|
crash). See also collect_tables().
|
|
|
|
*/
|
2007-01-18 20:38:14 +01:00
|
|
|
if (write_changed_bitmap(info->s, bitmap))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
bitmap->changed= 0;
|
|
|
|
}
|
|
|
|
DBUG_RETURN(_ma_read_bitmap_page(info->s, bitmap, page));
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Read next suitable bitmap
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
move_to_next_bitmap()
|
|
|
|
bitmap Bitmap handle
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
NOTES
|
|
|
|
The found bitmap may be full, so calling function may need to call this
|
|
|
|
repeatedly until it finds enough space.
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
TODO
|
|
|
|
Add cache of bitmaps to not read something that is not usable
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error (either couldn't save old bitmap or read new one
|
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool move_to_next_bitmap(MARIA_HA *info, MARIA_FILE_BITMAP *bitmap)
|
|
|
|
{
|
|
|
|
ulonglong page= bitmap->page;
|
|
|
|
MARIA_STATE_INFO *state= &info->s->state;
|
|
|
|
DBUG_ENTER("move_to_next_bitmap");
|
|
|
|
|
|
|
|
if (state->first_bitmap_with_space != ~(ulonglong) 0 &&
|
|
|
|
state->first_bitmap_with_space != page)
|
|
|
|
{
|
|
|
|
page= state->first_bitmap_with_space;
|
|
|
|
state->first_bitmap_with_space= ~(ulonglong) 0;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
page+= bitmap->pages_covered;
|
|
|
|
DBUG_RETURN(_ma_change_bitmap_page(info, bitmap, page));
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/****************************************************************************
|
|
|
|
Allocate data in bitmaps
|
|
|
|
****************************************************************************/
|
|
|
|
|
|
|
|
/*
|
|
|
|
Store data in 'block' and mark the place used in the bitmap
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
fill_block()
|
|
|
|
bitmap Bitmap handle
|
|
|
|
block Store data about what we found
|
2007-07-02 19:45:15 +02:00
|
|
|
best_data Pointer to best 6 uchar aligned area in bitmap->map
|
2007-01-18 20:38:14 +01:00
|
|
|
best_pos Which bit in *best_data the area starts
|
|
|
|
0 = first bit pattern, 1 second bit pattern etc
|
2007-04-19 12:18:56 +02:00
|
|
|
best_bits The original value of the bits at best_pos
|
2007-01-18 20:38:14 +01:00
|
|
|
fill_pattern Bitmap pattern to store in best_data[best_pos]
|
2007-04-19 12:18:56 +02:00
|
|
|
|
|
|
|
NOTES
|
|
|
|
We mark all pages to be 'TAIL's, which means that
|
|
|
|
block->page_count is really a row position inside the page.
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static void fill_block(MARIA_FILE_BITMAP *bitmap,
|
|
|
|
MARIA_BITMAP_BLOCK *block,
|
|
|
|
uchar *best_data, uint best_pos, uint best_bits,
|
|
|
|
uint fill_pattern)
|
|
|
|
{
|
|
|
|
uint page, offset, tmp;
|
|
|
|
uchar *data;
|
|
|
|
|
|
|
|
/* For each 6 bytes we have 6*8/3= 16 patterns */
|
|
|
|
page= (best_data - bitmap->map) / 6 * 16 + best_pos;
|
|
|
|
block->page= bitmap->page + 1 + page;
|
|
|
|
block->page_count= 1 + TAIL_BIT;
|
|
|
|
block->empty_space= pattern_to_size(bitmap, best_bits);
|
|
|
|
block->sub_blocks= 1;
|
|
|
|
block->org_bitmap_value= best_bits;
|
2007-04-19 12:18:56 +02:00
|
|
|
block->used= BLOCKUSED_TAIL; /* See _ma_bitmap_release_unused() */
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
Mark place used by reading/writing 2 bytes at a time to handle
|
|
|
|
bitmaps in overlapping bytes
|
|
|
|
*/
|
|
|
|
best_pos*= 3;
|
|
|
|
data= best_data+ best_pos / 8;
|
|
|
|
offset= best_pos & 7;
|
|
|
|
tmp= uint2korr(data);
|
2007-04-19 12:18:56 +02:00
|
|
|
|
|
|
|
/* we turn off the 3 bits and replace them with fill_pattern */
|
2007-01-18 20:38:14 +01:00
|
|
|
tmp= (tmp & ~(7 << offset)) | (fill_pattern << offset);
|
|
|
|
int2store(data, tmp);
|
|
|
|
bitmap->changed= 1;
|
2007-10-19 23:24:22 +02:00
|
|
|
DBUG_EXECUTE("bitmap", _ma_print_bitmap_changes(bitmap););
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Allocate data for head block
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
allocate_head()
|
|
|
|
bitmap bitmap
|
2007-04-19 12:18:56 +02:00
|
|
|
size Size of data region we need to store
|
2007-01-18 20:38:14 +01:00
|
|
|
block Store found information here
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
IMPLEMENTATION
|
|
|
|
Find the best-fit page to put a region of 'size'
|
|
|
|
This is defined as the first page of the set of pages
|
|
|
|
with the smallest free space that can hold 'size'.
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
RETURN
|
|
|
|
0 ok (block is updated)
|
|
|
|
1 error (no space in bitmap; block is not touched)
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
|
|
static my_bool allocate_head(MARIA_FILE_BITMAP *bitmap, uint size,
|
|
|
|
MARIA_BITMAP_BLOCK *block)
|
|
|
|
{
|
|
|
|
uint min_bits= size_to_head_pattern(bitmap, size);
|
|
|
|
uchar *data= bitmap->map, *end= data + bitmap->used_size;
|
|
|
|
uchar *best_data= 0;
|
|
|
|
uint best_bits= (uint) -1, best_pos;
|
|
|
|
DBUG_ENTER("allocate_head");
|
|
|
|
|
|
|
|
LINT_INIT(best_pos);
|
|
|
|
DBUG_ASSERT(size <= FULL_PAGE_SIZE(bitmap->block_size));
|
|
|
|
|
|
|
|
for (; data < end; data += 6)
|
|
|
|
{
|
|
|
|
ulonglong bits= uint6korr(data); /* 6 bytes = 6*8/3= 16 patterns */
|
|
|
|
uint i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
Skip common patterns
|
|
|
|
We can skip empty pages (if we already found a match) or
|
|
|
|
anything matching the following pattern as this will be either
|
|
|
|
a full page or a tail page
|
|
|
|
*/
|
|
|
|
if ((!bits && best_data) ||
|
|
|
|
((bits & LL(04444444444444444)) == LL(04444444444444444)))
|
|
|
|
continue;
|
|
|
|
for (i= 0; i < 16 ; i++, bits >>= 3)
|
|
|
|
{
|
|
|
|
uint pattern= bits & 7;
|
|
|
|
if (pattern <= min_bits)
|
|
|
|
{
|
2007-04-19 12:18:56 +02:00
|
|
|
/* There is enough space here */
|
2007-01-18 20:38:14 +01:00
|
|
|
if (pattern == min_bits)
|
|
|
|
{
|
2007-04-19 12:18:56 +02:00
|
|
|
/* There is exactly enough space here, return this page */
|
2007-01-18 20:38:14 +01:00
|
|
|
best_bits= min_bits;
|
|
|
|
best_data= data;
|
|
|
|
best_pos= i;
|
|
|
|
goto found;
|
|
|
|
}
|
|
|
|
if ((int) pattern > (int) best_bits)
|
|
|
|
{
|
2007-04-19 12:18:56 +02:00
|
|
|
/*
|
|
|
|
There is more than enough space here and it's better than what
|
|
|
|
we have found so far. Remember it, as we will choose it if we
|
|
|
|
don't find anything in this bitmap page.
|
|
|
|
*/
|
2007-01-18 20:38:14 +01:00
|
|
|
best_bits= pattern;
|
|
|
|
best_data= data;
|
|
|
|
best_pos= i;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2007-04-19 12:18:56 +02:00
|
|
|
if (!best_data) /* Found no place */
|
2007-01-18 20:38:14 +01:00
|
|
|
{
|
|
|
|
if (bitmap->used_size == bitmap->total_size)
|
2007-04-19 12:18:56 +02:00
|
|
|
DBUG_RETURN(1); /* No space in bitmap */
|
2007-01-18 20:38:14 +01:00
|
|
|
/* Allocate data at end of bitmap */
|
|
|
|
bitmap->used_size+= 6;
|
|
|
|
best_data= data;
|
|
|
|
best_pos= best_bits= 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
found:
|
|
|
|
fill_block(bitmap, block, best_data, best_pos, best_bits, FULL_HEAD_PAGE);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Allocate data for tail block
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
allocate_tail()
|
|
|
|
bitmap bitmap
|
|
|
|
size Size of block we need to find
|
|
|
|
block Store found information here
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok (block is updated)
|
|
|
|
1 error (no space in bitmap; block is not touched)
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
|
|
static my_bool allocate_tail(MARIA_FILE_BITMAP *bitmap, uint size,
|
|
|
|
MARIA_BITMAP_BLOCK *block)
|
|
|
|
{
|
|
|
|
uint min_bits= size_to_tail_pattern(bitmap, size);
|
|
|
|
uchar *data= bitmap->map, *end= data + bitmap->used_size;
|
|
|
|
uchar *best_data= 0;
|
|
|
|
uint best_bits= (uint) -1, best_pos;
|
|
|
|
DBUG_ENTER("allocate_tail");
|
|
|
|
DBUG_PRINT("enter", ("size: %u", size));
|
|
|
|
|
|
|
|
LINT_INIT(best_pos);
|
|
|
|
DBUG_ASSERT(size <= FULL_PAGE_SIZE(bitmap->block_size));
|
|
|
|
|
|
|
|
for (; data < end; data += 6)
|
|
|
|
{
|
|
|
|
ulonglong bits= uint6korr(data); /* 6 bytes = 6*8/3= 16 patterns */
|
|
|
|
uint i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
Skip common patterns
|
|
|
|
We can skip empty pages (if we already found a match) or
|
2007-04-19 12:18:56 +02:00
|
|
|
the following patterns: 1-4 (head pages, not suitable for tail) or
|
|
|
|
7 (full tail page). See 'Dynamic size records' comment at start of file.
|
|
|
|
|
|
|
|
At the moment we only skip full tail pages (ie, all bits are
|
|
|
|
set) as this is easy to detect with one simple test and is a
|
|
|
|
quite common case if we have blobs.
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
if ((!bits && best_data) || bits == LL(0xffffffffffff))
|
|
|
|
continue;
|
|
|
|
for (i= 0; i < 16; i++, bits >>= 3)
|
|
|
|
{
|
|
|
|
uint pattern= bits & 7;
|
|
|
|
if (pattern <= min_bits && (!pattern || pattern >= 5))
|
|
|
|
{
|
|
|
|
if (pattern == min_bits)
|
|
|
|
{
|
|
|
|
best_bits= min_bits;
|
|
|
|
best_data= data;
|
|
|
|
best_pos= i;
|
|
|
|
goto found;
|
|
|
|
}
|
|
|
|
if ((int) pattern > (int) best_bits)
|
|
|
|
{
|
|
|
|
best_bits= pattern;
|
|
|
|
best_data= data;
|
|
|
|
best_pos= i;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!best_data)
|
|
|
|
{
|
|
|
|
if (bitmap->used_size == bitmap->total_size)
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
/* Allocate data at end of bitmap */
|
Fix for three bugs:
number 1: "./mtr --mysqld=--default-storage-engine=maria backup"
restored no rows (forgot to flush data pages before my_copy(),
and also the maria_repair() used by ha_maria::restore() needed
a correct data_file_length to not miss rows). [note that BACKUP
TABLE will be removed anyway in 5.2]
number 2: "./mtr --mysqld=--default-storage-engine=maria bootstrap"
caused segfault (uninitialized variable)
number 3: "./mtr --mysqld=--default-storage-engine=maria check"
showed warning in CHECK TABLE (maria_create() created a non-empty
data file with data_file_length==0).
storage/maria/ha_maria.cc:
in ha_maria::backup, need to flush the data file before copying it,
otherwise data misses from the copy (bug 1)
storage/maria/ma_bitmap.c:
when allocating data at the end of the bitmap, best_data is at "end",
should not be left to 0 (bug 2)
storage/maria/ma_check.c:
_ma_scan_block_record() is used in QUICK repair. It relies on
data_file_length. RESTORE TABLE mixes the MAI of an empty table
(so, data_file_length==0) with an non-empty MAD, and does a
QUICK repair; that got fooled (thought it had hit EOF immediately,
so found no records) (bug 1)
storage/maria/ma_create.c:
At the end of maria_create() we have, in the index file,
data_file_length==0, while the data file has a bitmap page (8192).
This inconsistency makes CHECK TABLE rightly complain.
Fixed by not creating a first bitmap page during maria_create()
(also saves disk space) (bug 3) Question for Monty.
storage/maria/ma_extra.c:
A function to flush the data and index files before one can
use OS syscalls (reads, writes) on those files. For example,
ha_maria::backup() does a my_copy() of the data file and so
all cached pieces of this file must be sent to the OS (bug 1)
This function will have to be used elsewhere in Maria, several places
have not been updated when we added pagecache-ing of the data file
(they still only flush the index file), they are probable bugs.
storage/maria/maria_def.h:
new function. Needs to be visible from ha_maria::backup.
2007-08-07 16:06:42 +02:00
|
|
|
best_data= end;
|
2007-01-18 20:38:14 +01:00
|
|
|
bitmap->used_size+= 6;
|
|
|
|
best_pos= best_bits= 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
found:
|
|
|
|
fill_block(bitmap, block, best_data, best_pos, best_bits, FULL_TAIL_PAGE);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Allocate data for full blocks
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
allocate_full_pages()
|
|
|
|
bitmap bitmap
|
|
|
|
pages_needed Total size in pages (bitmap->total_size) we would like to have
|
|
|
|
block Store found information here
|
|
|
|
full_page 1 if we are not allowed to split extent
|
|
|
|
|
|
|
|
IMPLEMENTATION
|
|
|
|
We will return the smallest area >= size. If there is no such
|
|
|
|
block, we will return the biggest area that satisfies
|
|
|
|
area_size >= min(BLOB_SEGMENT_MIN_SIZE*full_page_size, size)
|
|
|
|
|
|
|
|
To speed up searches, we will only consider areas that has at least 16 free
|
|
|
|
pages starting on an even boundary. When finding such an area, we will
|
|
|
|
extend it with all previous and following free pages. This will ensure
|
|
|
|
we don't get holes between areas
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
# Blocks used
|
|
|
|
0 error (no space in bitmap; block is not touched)
|
|
|
|
*/
|
|
|
|
|
|
|
|
static ulong allocate_full_pages(MARIA_FILE_BITMAP *bitmap,
|
|
|
|
ulong pages_needed,
|
|
|
|
MARIA_BITMAP_BLOCK *block, my_bool full_page)
|
|
|
|
{
|
|
|
|
uchar *data= bitmap->map, *data_end= data + bitmap->used_size;
|
|
|
|
uchar *page_end= data + bitmap->total_size;
|
|
|
|
uchar *best_data= 0;
|
|
|
|
uint min_size;
|
|
|
|
uint best_area_size, best_prefix_area_size, best_suffix_area_size;
|
|
|
|
uint page, size;
|
|
|
|
ulonglong best_prefix_bits;
|
|
|
|
DBUG_ENTER("allocate_full_pages");
|
|
|
|
DBUG_PRINT("enter", ("pages_needed: %lu", pages_needed));
|
|
|
|
|
|
|
|
/* Following variables are only used if best_data is set */
|
|
|
|
LINT_INIT(best_prefix_bits);
|
|
|
|
LINT_INIT(best_prefix_area_size);
|
|
|
|
LINT_INIT(best_suffix_area_size);
|
|
|
|
|
|
|
|
min_size= pages_needed;
|
|
|
|
if (!full_page && min_size > BLOB_SEGMENT_MIN_SIZE)
|
|
|
|
min_size= BLOB_SEGMENT_MIN_SIZE;
|
|
|
|
best_area_size= ~(uint) 0;
|
|
|
|
|
|
|
|
for (; data < page_end; data+= 6)
|
|
|
|
{
|
|
|
|
ulonglong bits= uint6korr(data); /* 6 bytes = 6*8/3= 16 patterns */
|
|
|
|
uchar *data_start;
|
|
|
|
ulonglong prefix_bits= 0;
|
|
|
|
uint area_size, prefix_area_size, suffix_area_size;
|
|
|
|
|
|
|
|
/* Find area with at least 16 free pages */
|
|
|
|
if (bits)
|
|
|
|
continue;
|
|
|
|
data_start= data;
|
|
|
|
/* Find size of area */
|
|
|
|
for (data+=6 ; data < data_end ; data+= 6)
|
|
|
|
{
|
|
|
|
if ((bits= uint6korr(data)))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
area_size= (data - data_start) / 6 * 16;
|
|
|
|
if (area_size >= best_area_size)
|
|
|
|
continue;
|
|
|
|
prefix_area_size= suffix_area_size= 0;
|
|
|
|
if (!bits)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
End of page; All the rest of the bits on page are part of area
|
|
|
|
This is needed because bitmap->used_size only covers the set bits
|
|
|
|
in the bitmap.
|
|
|
|
*/
|
|
|
|
area_size+= (page_end - data) / 6 * 16;
|
|
|
|
if (area_size >= best_area_size)
|
|
|
|
break;
|
|
|
|
data= page_end;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Add bits at end of page */
|
|
|
|
for (; !(bits & 7); bits >>= 3)
|
|
|
|
suffix_area_size++;
|
|
|
|
area_size+= suffix_area_size;
|
|
|
|
}
|
|
|
|
if (data_start != bitmap->map)
|
|
|
|
{
|
|
|
|
/* Add bits before page */
|
|
|
|
bits= prefix_bits= uint6korr(data_start - 6);
|
|
|
|
DBUG_ASSERT(bits != 0);
|
|
|
|
/* 111 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 */
|
|
|
|
if (!(bits & LL(07000000000000000)))
|
|
|
|
{
|
|
|
|
data_start-= 6;
|
|
|
|
do
|
|
|
|
{
|
|
|
|
prefix_area_size++;
|
|
|
|
bits<<= 3;
|
|
|
|
} while (!(bits & LL(07000000000000000)));
|
|
|
|
area_size+= prefix_area_size;
|
|
|
|
/* Calculate offset to page from data_start */
|
|
|
|
prefix_area_size= 16 - prefix_area_size;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (area_size >= min_size && area_size <= best_area_size)
|
|
|
|
{
|
|
|
|
best_data= data_start;
|
|
|
|
best_area_size= area_size;
|
|
|
|
best_prefix_bits= prefix_bits;
|
|
|
|
best_prefix_area_size= prefix_area_size;
|
|
|
|
best_suffix_area_size= suffix_area_size;
|
|
|
|
|
|
|
|
/* Prefer to put data in biggest possible area */
|
|
|
|
if (area_size <= pages_needed)
|
|
|
|
min_size= area_size;
|
|
|
|
else
|
|
|
|
min_size= pages_needed;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!best_data)
|
|
|
|
DBUG_RETURN(0); /* No room on page */
|
|
|
|
|
|
|
|
/*
|
|
|
|
Now allocate min(pages_needed, area_size), starting from
|
|
|
|
best_start + best_prefix_area_size
|
|
|
|
*/
|
|
|
|
if (best_area_size > pages_needed)
|
|
|
|
best_area_size= pages_needed;
|
|
|
|
|
|
|
|
/* For each 6 bytes we have 6*8/3= 16 patterns */
|
|
|
|
page= ((best_data - bitmap->map) * 8) / 3 + best_prefix_area_size;
|
|
|
|
block->page= bitmap->page + 1 + page;
|
|
|
|
block->page_count= best_area_size;
|
|
|
|
block->empty_space= 0;
|
|
|
|
block->sub_blocks= 1;
|
|
|
|
block->org_bitmap_value= 0;
|
|
|
|
block->used= 0;
|
|
|
|
DBUG_PRINT("info", ("page: %lu page_count: %u",
|
|
|
|
(ulong) block->page, block->page_count));
|
|
|
|
|
|
|
|
if (best_prefix_area_size)
|
|
|
|
{
|
|
|
|
ulonglong tmp;
|
|
|
|
/* Convert offset back to bits */
|
|
|
|
best_prefix_area_size= 16 - best_prefix_area_size;
|
|
|
|
if (best_area_size < best_prefix_area_size)
|
|
|
|
{
|
|
|
|
tmp= (LL(1) << best_area_size*3) - 1;
|
|
|
|
best_area_size= best_prefix_area_size; /* for easy end test */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
tmp= (LL(1) << best_prefix_area_size*3) - 1;
|
|
|
|
tmp<<= (16 - best_prefix_area_size) * 3;
|
|
|
|
DBUG_ASSERT((best_prefix_bits & tmp) == 0);
|
|
|
|
best_prefix_bits|= tmp;
|
|
|
|
int6store(best_data, best_prefix_bits);
|
|
|
|
if (!(best_area_size-= best_prefix_area_size))
|
|
|
|
{
|
2007-10-19 23:24:22 +02:00
|
|
|
DBUG_EXECUTE("bitmap", _ma_print_bitmap_changes(bitmap););
|
2007-01-18 20:38:14 +01:00
|
|
|
DBUG_RETURN(block->page_count);
|
|
|
|
}
|
|
|
|
best_data+= 6;
|
|
|
|
}
|
|
|
|
best_area_size*= 3; /* Bits to set */
|
|
|
|
size= best_area_size/8; /* Bytes to set */
|
|
|
|
bfill(best_data, size, 255);
|
|
|
|
best_data+= size;
|
|
|
|
if ((best_area_size-= size * 8))
|
|
|
|
{
|
2007-07-02 19:45:15 +02:00
|
|
|
/* fill last uchar */
|
2007-01-18 20:38:14 +01:00
|
|
|
*best_data|= (uchar) ((1 << best_area_size) -1);
|
|
|
|
best_data++;
|
|
|
|
}
|
|
|
|
if (data_end < best_data)
|
|
|
|
bitmap->used_size= (uint) (best_data - bitmap->map);
|
|
|
|
bitmap->changed= 1;
|
2007-10-19 23:24:22 +02:00
|
|
|
DBUG_EXECUTE("bitmap", _ma_print_bitmap_changes(bitmap););
|
2007-01-18 20:38:14 +01:00
|
|
|
DBUG_RETURN(block->page_count);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/****************************************************************************
|
|
|
|
Find right bitmaps where to store data
|
|
|
|
****************************************************************************/
|
|
|
|
|
|
|
|
/*
|
|
|
|
Find right bitmap and position for head block
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
SYNOPSIS
|
|
|
|
find_head()
|
|
|
|
info Maria handler
|
|
|
|
length Size of data region we need store
|
|
|
|
position Position in bitmap_blocks where to store the
|
|
|
|
information for the head block.
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool find_head(MARIA_HA *info, uint length, uint position)
|
|
|
|
{
|
|
|
|
MARIA_FILE_BITMAP *bitmap= &info->s->bitmap;
|
|
|
|
MARIA_BITMAP_BLOCK *block;
|
2007-04-19 12:18:56 +02:00
|
|
|
/*
|
|
|
|
There is always place for the head block in bitmap_blocks as these are
|
|
|
|
preallocated at _ma_init_block_record().
|
|
|
|
*/
|
2007-01-18 20:38:14 +01:00
|
|
|
block= dynamic_element(&info->bitmap_blocks, position, MARIA_BITMAP_BLOCK *);
|
|
|
|
|
|
|
|
while (allocate_head(bitmap, length, block))
|
|
|
|
if (move_to_next_bitmap(info, bitmap))
|
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Find right bitmap and position for tail
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
SYNOPSIS
|
|
|
|
find_tail()
|
|
|
|
info Maria handler
|
|
|
|
length Size of data region we need store
|
|
|
|
position Position in bitmap_blocks where to store the
|
|
|
|
information for the head block.
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool find_tail(MARIA_HA *info, uint length, uint position)
|
|
|
|
{
|
|
|
|
MARIA_FILE_BITMAP *bitmap= &info->s->bitmap;
|
|
|
|
MARIA_BITMAP_BLOCK *block;
|
|
|
|
DBUG_ENTER("find_tail");
|
|
|
|
|
|
|
|
/* Needed, as there is no error checking in dynamic_element */
|
|
|
|
if (allocate_dynamic(&info->bitmap_blocks, position))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
block= dynamic_element(&info->bitmap_blocks, position, MARIA_BITMAP_BLOCK *);
|
|
|
|
|
|
|
|
while (allocate_tail(bitmap, length, block))
|
|
|
|
if (move_to_next_bitmap(info, bitmap))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Find right bitmap and position for full blocks in one extent
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
SYNOPSIS
|
|
|
|
find_mid()
|
|
|
|
info Maria handler.
|
|
|
|
pages How many pages to allocate.
|
|
|
|
position Position in bitmap_blocks where to store the
|
|
|
|
information for the head block.
|
2007-01-18 20:38:14 +01:00
|
|
|
NOTES
|
|
|
|
This is used to allocate the main extent after the 'head' block
|
2007-04-19 12:18:56 +02:00
|
|
|
(Ie, the middle part of the head-middle-tail entry)
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool find_mid(MARIA_HA *info, ulong pages, uint position)
|
|
|
|
{
|
|
|
|
MARIA_FILE_BITMAP *bitmap= &info->s->bitmap;
|
|
|
|
MARIA_BITMAP_BLOCK *block;
|
|
|
|
block= dynamic_element(&info->bitmap_blocks, position, MARIA_BITMAP_BLOCK *);
|
|
|
|
|
2007-04-05 13:38:05 +02:00
|
|
|
while (!allocate_full_pages(bitmap, pages, block, 1))
|
2007-01-18 20:38:14 +01:00
|
|
|
{
|
|
|
|
if (move_to_next_bitmap(info, bitmap))
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Find right bitmap and position for putting a blob
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
SYNOPSIS
|
|
|
|
find_blob()
|
|
|
|
info Maria handler.
|
|
|
|
length Length of the blob
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
NOTES
|
|
|
|
The extents are stored last in info->bitmap_blocks
|
|
|
|
|
|
|
|
IMPLEMENTATION
|
|
|
|
Allocate all full pages for the block + optionally one tail
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
|
|
|
static my_bool find_blob(MARIA_HA *info, ulong length)
|
|
|
|
{
|
|
|
|
MARIA_FILE_BITMAP *bitmap= &info->s->bitmap;
|
|
|
|
uint full_page_size= FULL_PAGE_SIZE(info->s->block_size);
|
|
|
|
ulong pages;
|
|
|
|
uint rest_length, used;
|
|
|
|
uint first_block_pos;
|
|
|
|
MARIA_BITMAP_BLOCK *first_block= 0;
|
|
|
|
DBUG_ENTER("find_blob");
|
|
|
|
DBUG_PRINT("enter", ("length: %lu", length));
|
2007-10-11 17:45:42 +02:00
|
|
|
LINT_INIT(first_block_pos);
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
pages= length / full_page_size;
|
|
|
|
rest_length= (uint) (length - pages * full_page_size);
|
|
|
|
if (rest_length >= MAX_TAIL_SIZE(info->s->block_size))
|
|
|
|
{
|
|
|
|
pages++;
|
|
|
|
rest_length= 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (pages)
|
|
|
|
{
|
|
|
|
MARIA_BITMAP_BLOCK *block;
|
|
|
|
if (allocate_dynamic(&info->bitmap_blocks,
|
|
|
|
info->bitmap_blocks.elements +
|
|
|
|
pages / BLOB_SEGMENT_MIN_SIZE + 2))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
first_block_pos= info->bitmap_blocks.elements;
|
|
|
|
block= dynamic_element(&info->bitmap_blocks, info->bitmap_blocks.elements,
|
|
|
|
MARIA_BITMAP_BLOCK*);
|
|
|
|
first_block= block;
|
|
|
|
do
|
|
|
|
{
|
|
|
|
used= allocate_full_pages(bitmap,
|
|
|
|
(pages >= 65535 ? 65535 : (uint) pages), block,
|
|
|
|
0);
|
2007-04-20 14:16:43 +02:00
|
|
|
if (!used)
|
|
|
|
{
|
|
|
|
if (move_to_next_bitmap(info, bitmap))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
pages-= used;
|
|
|
|
info->bitmap_blocks.elements++;
|
|
|
|
block++;
|
|
|
|
}
|
|
|
|
} while (pages != 0);
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
if (rest_length && find_tail(info, rest_length,
|
|
|
|
info->bitmap_blocks.elements++))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
if (first_block)
|
|
|
|
first_block->sub_blocks= info->bitmap_blocks.elements - first_block_pos;
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
/*
|
|
|
|
Find pages to put ALL blobs
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
allocate_blobs()
|
|
|
|
info Maria handler
|
|
|
|
row Information of what is in the row (from calc_record_size())
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
static my_bool allocate_blobs(MARIA_HA *info, MARIA_ROW *row)
|
|
|
|
{
|
|
|
|
ulong *length, *end;
|
|
|
|
uint elements;
|
|
|
|
/*
|
|
|
|
Reserve size for:
|
|
|
|
head block
|
|
|
|
one extent
|
|
|
|
tail block
|
|
|
|
*/
|
|
|
|
elements= info->bitmap_blocks.elements;
|
|
|
|
for (length= row->blob_lengths, end= length + info->s->base.blobs;
|
|
|
|
length < end; length++)
|
|
|
|
{
|
|
|
|
if (*length && find_blob(info, *length))
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
row->extents_count= (info->bitmap_blocks.elements - elements);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
/*
|
|
|
|
Store in the bitmap the new size for a head page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
use_head()
|
|
|
|
info Maria handler
|
|
|
|
page Page number to update
|
|
|
|
(Note that caller guarantees this is in the active
|
|
|
|
bitmap)
|
|
|
|
size How much free space is left on the page
|
|
|
|
block_position In which info->bitmap_block we have the
|
|
|
|
information about the head block.
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
This is used on update where we are updating an existing head page
|
|
|
|
*/
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
static void use_head(MARIA_HA *info, ulonglong page, uint size,
|
|
|
|
uint block_position)
|
|
|
|
{
|
|
|
|
MARIA_FILE_BITMAP *bitmap= &info->s->bitmap;
|
|
|
|
MARIA_BITMAP_BLOCK *block;
|
|
|
|
uchar *data;
|
|
|
|
uint offset, tmp, offset_page;
|
|
|
|
|
|
|
|
block= dynamic_element(&info->bitmap_blocks, block_position,
|
|
|
|
MARIA_BITMAP_BLOCK*);
|
|
|
|
block->page= page;
|
|
|
|
block->page_count= 1 + TAIL_BIT;
|
|
|
|
block->empty_space= size;
|
|
|
|
block->sub_blocks= 1;
|
|
|
|
block->used= BLOCKUSED_TAIL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
Mark place used by reading/writing 2 bytes at a time to handle
|
|
|
|
bitmaps in overlapping bytes
|
|
|
|
*/
|
|
|
|
offset_page= (uint) (page - bitmap->page - 1) * 3;
|
|
|
|
offset= offset_page & 7;
|
|
|
|
data= bitmap->map + offset_page / 8;
|
|
|
|
tmp= uint2korr(data);
|
|
|
|
block->org_bitmap_value= (tmp >> offset) & 7;
|
|
|
|
tmp= (tmp & ~(7 << offset)) | (FULL_HEAD_PAGE << offset);
|
|
|
|
int2store(data, tmp);
|
|
|
|
bitmap->changed= 1;
|
2007-10-19 23:24:22 +02:00
|
|
|
DBUG_EXECUTE("bitmap", _ma_print_bitmap_changes(bitmap););
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-04-19 12:18:56 +02:00
|
|
|
Find out where to split the row (ie, what goes in head, middle, tail etc)
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
find_where_to_split_row()
|
|
|
|
share Maria share
|
|
|
|
row Information of what is in the row (from calc_record_size())
|
|
|
|
extents_length Number of bytes needed to store all extents
|
|
|
|
split_size Free size on the page (The head length must be less
|
|
|
|
than this)
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
row_length for the head block.
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
static uint find_where_to_split_row(MARIA_SHARE *share, MARIA_ROW *row,
|
|
|
|
uint extents_length, uint split_size)
|
|
|
|
{
|
|
|
|
uint row_length= row->base_length;
|
|
|
|
uint *lengths, *lengths_end;
|
|
|
|
|
|
|
|
DBUG_ASSERT(row_length < split_size);
|
|
|
|
/*
|
|
|
|
Store first in all_field_lengths the different parts that are written
|
|
|
|
to the row. This needs to be in same order as in
|
|
|
|
ma_block_rec.c::write_block_record()
|
|
|
|
*/
|
|
|
|
row->null_field_lengths[-3]= extents_length;
|
|
|
|
row->null_field_lengths[-2]= share->base.fixed_not_null_fields_length;
|
|
|
|
row->null_field_lengths[-1]= row->field_lengths_length;
|
|
|
|
for (lengths= row->null_field_lengths - EXTRA_LENGTH_FIELDS,
|
|
|
|
lengths_end= (lengths + share->base.pack_fields - share->base.blobs +
|
|
|
|
EXTRA_LENGTH_FIELDS); lengths < lengths_end; lengths++)
|
|
|
|
{
|
|
|
|
if (row_length + *lengths > split_size)
|
|
|
|
break;
|
|
|
|
row_length+= *lengths;
|
|
|
|
}
|
|
|
|
return row_length;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
/*
|
|
|
|
Find where to write the middle parts of the row and the tail
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
write_rest_of_head()
|
|
|
|
info Maria handler
|
|
|
|
position Position in bitmap_blocks. Is 0 for rows that needs
|
|
|
|
full blocks (ie, has a head, middle part and optional tail)
|
|
|
|
rest_length How much left of the head block to write.
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
static my_bool write_rest_of_head(MARIA_HA *info, uint position,
|
|
|
|
ulong rest_length)
|
|
|
|
{
|
|
|
|
MARIA_SHARE *share= info->s;
|
|
|
|
uint full_page_size= FULL_PAGE_SIZE(share->block_size);
|
|
|
|
MARIA_BITMAP_BLOCK *block;
|
2007-04-05 13:38:05 +02:00
|
|
|
DBUG_ENTER("write_rest_of_head");
|
|
|
|
DBUG_PRINT("enter", ("position: %u rest_length: %lu", position,
|
|
|
|
rest_length));
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
if (position == 0)
|
|
|
|
{
|
|
|
|
/* Write out full pages */
|
|
|
|
uint pages= rest_length / full_page_size;
|
|
|
|
|
|
|
|
rest_length%= full_page_size;
|
|
|
|
if (rest_length >= MAX_TAIL_SIZE(share->block_size))
|
|
|
|
{
|
|
|
|
/* Put tail on a full page */
|
|
|
|
pages++;
|
|
|
|
rest_length= 0;
|
|
|
|
}
|
2007-04-05 13:38:05 +02:00
|
|
|
if (find_mid(info, pages, 1))
|
|
|
|
DBUG_RETURN(1);
|
2007-01-18 20:38:14 +01:00
|
|
|
/*
|
|
|
|
Insert empty block after full pages, to allow write_block_record() to
|
|
|
|
split segment into used + free page
|
|
|
|
*/
|
|
|
|
block= dynamic_element(&info->bitmap_blocks, 2, MARIA_BITMAP_BLOCK*);
|
|
|
|
block->page_count= 0;
|
|
|
|
block->used= 0;
|
|
|
|
}
|
|
|
|
if (rest_length)
|
|
|
|
{
|
|
|
|
if (find_tail(info, rest_length, ELEMENTS_RESERVED_FOR_MAIN_PART - 1))
|
2007-04-05 13:38:05 +02:00
|
|
|
DBUG_RETURN(1);
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Empty tail block */
|
|
|
|
block= dynamic_element(&info->bitmap_blocks,
|
|
|
|
ELEMENTS_RESERVED_FOR_MAIN_PART - 1,
|
|
|
|
MARIA_BITMAP_BLOCK *);
|
|
|
|
block->page_count= 0;
|
|
|
|
block->used= 0;
|
|
|
|
}
|
2007-04-05 13:38:05 +02:00
|
|
|
DBUG_RETURN(0);
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Find where to store one row
|
|
|
|
|
|
|
|
SYNPOSIS
|
|
|
|
_ma_bitmap_find_place()
|
|
|
|
info Maria handler
|
|
|
|
row Information about row to write
|
|
|
|
blocks Store data about allocated places here
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
2007-09-05 01:57:53 +02:00
|
|
|
row->space_on_head_page contains minimum number of bytes we
|
|
|
|
expect to put on the head page.
|
2007-01-18 20:38:14 +01:00
|
|
|
1 error
|
2007-10-09 20:09:50 +02:00
|
|
|
my_errno is set to error
|
2007-01-18 20:38:14 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_bitmap_find_place(MARIA_HA *info, MARIA_ROW *row,
|
|
|
|
MARIA_BITMAP_BLOCKS *blocks)
|
|
|
|
{
|
|
|
|
MARIA_SHARE *share= info->s;
|
|
|
|
my_bool res= 1;
|
|
|
|
uint full_page_size, position, max_page_size;
|
|
|
|
uint head_length, row_length, rest_length, extents_length;
|
|
|
|
DBUG_ENTER("_ma_bitmap_find_place");
|
|
|
|
|
|
|
|
blocks->count= 0;
|
|
|
|
blocks->tail_page_skipped= blocks->page_skipped= 0;
|
|
|
|
row->extents_count= 0;
|
2007-04-20 14:16:43 +02:00
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
/*
|
2007-04-20 14:16:43 +02:00
|
|
|
Reserve place for the following blocks:
|
2007-01-18 20:38:14 +01:00
|
|
|
- Head block
|
|
|
|
- Full page block
|
|
|
|
- Marker block to allow write_block_record() to split full page blocks
|
|
|
|
into full and free part
|
|
|
|
- Tail block
|
|
|
|
*/
|
|
|
|
|
|
|
|
info->bitmap_blocks.elements= ELEMENTS_RESERVED_FOR_MAIN_PART;
|
|
|
|
max_page_size= (share->block_size - PAGE_OVERHEAD_SIZE);
|
|
|
|
|
|
|
|
pthread_mutex_lock(&share->bitmap.bitmap_lock);
|
|
|
|
|
|
|
|
if (row->total_length <= max_page_size)
|
|
|
|
{
|
|
|
|
/* Row fits in one page */
|
|
|
|
position= ELEMENTS_RESERVED_FOR_MAIN_PART - 1;
|
|
|
|
if (find_head(info, (uint) row->total_length, position))
|
|
|
|
goto abort;
|
2007-09-05 01:57:53 +02:00
|
|
|
row->space_on_head_page= row->total_length;
|
2007-01-18 20:38:14 +01:00
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
First allocate all blobs (so that we can find out the needed size for
|
|
|
|
the main block.
|
|
|
|
*/
|
|
|
|
if (row->blob_length && allocate_blobs(info, row))
|
|
|
|
goto abort;
|
|
|
|
|
|
|
|
extents_length= row->extents_count * ROW_EXTENT_SIZE;
|
|
|
|
if ((head_length= (row->head_length + extents_length)) <= max_page_size)
|
|
|
|
{
|
|
|
|
/* Main row part fits into one page */
|
|
|
|
position= ELEMENTS_RESERVED_FOR_MAIN_PART - 1;
|
|
|
|
if (find_head(info, head_length, position))
|
|
|
|
goto abort;
|
2007-09-05 01:57:53 +02:00
|
|
|
row->space_on_head_page= head_length;
|
2007-01-18 20:38:14 +01:00
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Allocate enough space */
|
|
|
|
head_length+= ELEMENTS_RESERVED_FOR_MAIN_PART * ROW_EXTENT_SIZE;
|
|
|
|
|
|
|
|
/* The first segment size is stored in 'row_length' */
|
|
|
|
row_length= find_where_to_split_row(share, row, extents_length,
|
|
|
|
max_page_size);
|
|
|
|
|
|
|
|
full_page_size= FULL_PAGE_SIZE(share->block_size);
|
|
|
|
position= 0;
|
|
|
|
if (head_length - row_length <= full_page_size)
|
|
|
|
position= ELEMENTS_RESERVED_FOR_MAIN_PART -2; /* Only head and tail */
|
|
|
|
if (find_head(info, row_length, position))
|
|
|
|
goto abort;
|
2007-09-05 01:57:53 +02:00
|
|
|
row->space_on_head_page= row_length;
|
2007-01-18 20:38:14 +01:00
|
|
|
rest_length= head_length - row_length;
|
|
|
|
if (write_rest_of_head(info, position, rest_length))
|
|
|
|
goto abort;
|
|
|
|
|
|
|
|
end:
|
|
|
|
blocks->block= dynamic_element(&info->bitmap_blocks, position,
|
|
|
|
MARIA_BITMAP_BLOCK*);
|
|
|
|
blocks->block->sub_blocks= ELEMENTS_RESERVED_FOR_MAIN_PART - position;
|
|
|
|
/* First block's page_count is for all blocks */
|
|
|
|
blocks->count= info->bitmap_blocks.elements - position;
|
|
|
|
res= 0;
|
|
|
|
|
|
|
|
abort:
|
|
|
|
pthread_mutex_unlock(&share->bitmap.bitmap_lock);
|
|
|
|
DBUG_RETURN(res);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Find where to put row on update (when head page is already defined)
|
|
|
|
|
|
|
|
SYNPOSIS
|
|
|
|
_ma_bitmap_find_new_place()
|
|
|
|
info Maria handler
|
|
|
|
row Information about row to write
|
|
|
|
page On which page original row was stored
|
|
|
|
free_size Free size on head page
|
|
|
|
blocks Store data about allocated places here
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
This function is only called when the new row can't fit in the space of
|
|
|
|
the old row in the head page.
|
|
|
|
|
|
|
|
This is essently same as _ma_bitmap_find_place() except that
|
|
|
|
we don't call find_head() to search in bitmaps where to put the page.
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_bitmap_find_new_place(MARIA_HA *info, MARIA_ROW *row,
|
|
|
|
ulonglong page, uint free_size,
|
|
|
|
MARIA_BITMAP_BLOCKS *blocks)
|
|
|
|
{
|
|
|
|
MARIA_SHARE *share= info->s;
|
|
|
|
my_bool res= 1;
|
|
|
|
uint full_page_size, position;
|
|
|
|
uint head_length, row_length, rest_length, extents_length;
|
2007-10-19 23:24:22 +02:00
|
|
|
ulonglong bitmap_page;
|
2007-01-18 20:38:14 +01:00
|
|
|
DBUG_ENTER("_ma_bitmap_find_new_place");
|
|
|
|
|
|
|
|
blocks->count= 0;
|
|
|
|
blocks->tail_page_skipped= blocks->page_skipped= 0;
|
|
|
|
row->extents_count= 0;
|
|
|
|
info->bitmap_blocks.elements= ELEMENTS_RESERVED_FOR_MAIN_PART;
|
|
|
|
|
|
|
|
pthread_mutex_lock(&share->bitmap.bitmap_lock);
|
2007-10-19 23:24:22 +02:00
|
|
|
bitmap_page= page / share->bitmap.pages_covered;
|
|
|
|
bitmap_page*= share->bitmap.pages_covered;
|
|
|
|
|
|
|
|
if (share->bitmap.page != bitmap_page &&
|
|
|
|
_ma_change_bitmap_page(info, &share->bitmap, bitmap_page))
|
2007-01-18 20:38:14 +01:00
|
|
|
goto abort;
|
|
|
|
|
|
|
|
/*
|
|
|
|
First allocate all blobs (so that we can find out the needed size for
|
|
|
|
the main block.
|
|
|
|
*/
|
|
|
|
if (row->blob_length && allocate_blobs(info, row))
|
|
|
|
goto abort;
|
|
|
|
|
|
|
|
extents_length= row->extents_count * ROW_EXTENT_SIZE;
|
|
|
|
if ((head_length= (row->head_length + extents_length)) <= free_size)
|
|
|
|
{
|
|
|
|
/* Main row part fits into one page */
|
|
|
|
position= ELEMENTS_RESERVED_FOR_MAIN_PART - 1;
|
|
|
|
use_head(info, page, head_length, position);
|
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Allocate enough space */
|
|
|
|
head_length+= ELEMENTS_RESERVED_FOR_MAIN_PART * ROW_EXTENT_SIZE;
|
|
|
|
|
|
|
|
/* The first segment size is stored in 'row_length' */
|
|
|
|
row_length= find_where_to_split_row(share, row, extents_length, free_size);
|
|
|
|
|
|
|
|
full_page_size= FULL_PAGE_SIZE(share->block_size);
|
|
|
|
position= 0;
|
|
|
|
if (head_length - row_length <= full_page_size)
|
|
|
|
position= ELEMENTS_RESERVED_FOR_MAIN_PART -2; /* Only head and tail */
|
|
|
|
use_head(info, page, row_length, position);
|
|
|
|
rest_length= head_length - row_length;
|
|
|
|
|
|
|
|
if (write_rest_of_head(info, position, rest_length))
|
|
|
|
goto abort;
|
|
|
|
|
|
|
|
end:
|
|
|
|
blocks->block= dynamic_element(&info->bitmap_blocks, position,
|
|
|
|
MARIA_BITMAP_BLOCK*);
|
|
|
|
blocks->block->sub_blocks= ELEMENTS_RESERVED_FOR_MAIN_PART - position;
|
|
|
|
/* First block's page_count is for all blocks */
|
|
|
|
blocks->count= info->bitmap_blocks.elements - position;
|
|
|
|
res= 0;
|
|
|
|
|
|
|
|
abort:
|
|
|
|
pthread_mutex_unlock(&share->bitmap.bitmap_lock);
|
|
|
|
DBUG_RETURN(res);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/****************************************************************************
|
|
|
|
Clear and reset bits
|
|
|
|
****************************************************************************/
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
/*
|
|
|
|
Set fill pattern for a page
|
|
|
|
|
|
|
|
set_page_bits()
|
|
|
|
info Maria handler
|
|
|
|
bitmap Bitmap handler
|
|
|
|
page Adress to page
|
|
|
|
fill_pattern Pattern (not size) for page
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
Page may not be part of active bitmap
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
static my_bool set_page_bits(MARIA_HA *info, MARIA_FILE_BITMAP *bitmap,
|
|
|
|
ulonglong page, uint fill_pattern)
|
|
|
|
{
|
|
|
|
ulonglong bitmap_page;
|
|
|
|
uint offset_page, offset, tmp, org_tmp;
|
|
|
|
uchar *data;
|
|
|
|
DBUG_ENTER("set_page_bits");
|
|
|
|
|
2007-04-20 14:16:43 +02:00
|
|
|
bitmap_page= page - page % bitmap->pages_covered;
|
2007-01-18 20:38:14 +01:00
|
|
|
if (bitmap_page != bitmap->page &&
|
|
|
|
_ma_change_bitmap_page(info, bitmap, bitmap_page))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
|
|
|
/* Find page number from start of bitmap */
|
|
|
|
offset_page= page - bitmap->page - 1;
|
|
|
|
/*
|
|
|
|
Mark place used by reading/writing 2 bytes at a time to handle
|
|
|
|
bitmaps in overlapping bytes
|
|
|
|
*/
|
|
|
|
offset_page*= 3;
|
|
|
|
offset= offset_page & 7;
|
|
|
|
data= bitmap->map + offset_page / 8;
|
|
|
|
org_tmp= tmp= uint2korr(data);
|
|
|
|
tmp= (tmp & ~(7 << offset)) | (fill_pattern << offset);
|
|
|
|
if (tmp == org_tmp)
|
|
|
|
DBUG_RETURN(0); /* No changes */
|
|
|
|
int2store(data, tmp);
|
|
|
|
|
|
|
|
bitmap->changed= 1;
|
2007-10-19 23:24:22 +02:00
|
|
|
DBUG_EXECUTE("bitmap", _ma_print_bitmap_changes(bitmap););
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
if (fill_pattern != 3 && fill_pattern != 7)
|
|
|
|
set_if_smaller(info->s->state.first_bitmap_with_space, bitmap_page);
|
|
|
|
/*
|
|
|
|
Note that if the condition above is false (page is full), and all pages of
|
|
|
|
this bitmap are now full, and that bitmap page was
|
|
|
|
first_bitmap_with_space, we don't modify first_bitmap_with_space, indeed
|
|
|
|
its value still tells us where to start our search for a bitmap with space
|
|
|
|
(which is for sure after this full one).
|
|
|
|
That does mean that first_bitmap_with_space is only a lower bound.
|
|
|
|
*/
|
2007-01-18 20:38:14 +01:00
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Get bitmap pattern for a given page
|
|
|
|
|
|
|
|
SYNOPSIS
|
2007-04-19 12:18:56 +02:00
|
|
|
get_page_bits()
|
|
|
|
info Maria handler
|
|
|
|
bitmap Bitmap handler
|
|
|
|
page Page number
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
RETURN
|
|
|
|
0-7 Bitmap pattern
|
|
|
|
~0 Error (couldn't read page)
|
|
|
|
*/
|
|
|
|
|
|
|
|
static uint get_page_bits(MARIA_HA *info, MARIA_FILE_BITMAP *bitmap,
|
|
|
|
ulonglong page)
|
|
|
|
{
|
|
|
|
ulonglong bitmap_page;
|
|
|
|
uint offset_page, offset, tmp;
|
|
|
|
uchar *data;
|
|
|
|
DBUG_ENTER("get_page_bits");
|
|
|
|
|
2007-04-20 14:16:43 +02:00
|
|
|
bitmap_page= page - page % bitmap->pages_covered;
|
2007-01-18 20:38:14 +01:00
|
|
|
if (bitmap_page != bitmap->page &&
|
|
|
|
_ma_change_bitmap_page(info, bitmap, bitmap_page))
|
|
|
|
DBUG_RETURN(~ (uint) 0);
|
|
|
|
|
|
|
|
/* Find page number from start of bitmap */
|
|
|
|
offset_page= page - bitmap->page - 1;
|
|
|
|
/*
|
|
|
|
Mark place used by reading/writing 2 bytes at a time to handle
|
|
|
|
bitmaps in overlapping bytes
|
|
|
|
*/
|
|
|
|
offset_page*= 3;
|
|
|
|
offset= offset_page & 7;
|
|
|
|
data= bitmap->map + offset_page / 8;
|
|
|
|
tmp= uint2korr(data);
|
|
|
|
DBUG_RETURN((tmp >> offset) & 7);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Mark all pages in a region as free
|
|
|
|
|
|
|
|
SYNOPSIS
|
2007-04-19 12:18:56 +02:00
|
|
|
_ma_reset_full_page_bits()
|
2007-01-18 20:38:14 +01:00
|
|
|
info Maria handler
|
|
|
|
bitmap Bitmap handler
|
|
|
|
page Start page
|
|
|
|
page_count Number of pages
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
We assume that all pages in region is covered by same bitmap
|
|
|
|
One must have a lock on info->s->bitmap.bitmap_lock
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 Error (when reading bitmap)
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_reset_full_page_bits(MARIA_HA *info, MARIA_FILE_BITMAP *bitmap,
|
|
|
|
ulonglong page, uint page_count)
|
|
|
|
{
|
|
|
|
ulonglong bitmap_page;
|
|
|
|
uint offset, bit_start, bit_count, tmp;
|
|
|
|
uchar *data;
|
|
|
|
DBUG_ENTER("_ma_reset_full_page_bits");
|
|
|
|
DBUG_PRINT("enter", ("page: %lu page_count: %u", (ulong) page, page_count));
|
|
|
|
safe_mutex_assert_owner(&info->s->bitmap.bitmap_lock);
|
2007-10-19 23:24:22 +02:00
|
|
|
|
2007-04-20 14:16:43 +02:00
|
|
|
bitmap_page= page - page % bitmap->pages_covered;
|
2007-01-18 20:38:14 +01:00
|
|
|
if (bitmap_page != bitmap->page &&
|
|
|
|
_ma_change_bitmap_page(info, bitmap, bitmap_page))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
|
|
|
/* Find page number from start of bitmap */
|
|
|
|
page= page - bitmap->page - 1;
|
|
|
|
|
|
|
|
/* Clear bits from 'page * 3' -> '(page + page_count) * 3' */
|
|
|
|
bit_start= page * 3;
|
|
|
|
bit_count= page_count * 3;
|
|
|
|
|
|
|
|
data= bitmap->map + bit_start / 8;
|
|
|
|
offset= bit_start & 7;
|
|
|
|
|
|
|
|
tmp= (255 << offset); /* Bits to keep */
|
|
|
|
if (bit_count + offset < 8)
|
|
|
|
{
|
|
|
|
/* Only clear bits between 'offset' and 'offset+bit_count-1' */
|
|
|
|
tmp^= (255 << (offset + bit_count));
|
|
|
|
}
|
|
|
|
*data&= ~tmp;
|
|
|
|
|
|
|
|
if ((int) (bit_count-= (8 - offset)) > 0)
|
|
|
|
{
|
|
|
|
uint fill;
|
|
|
|
data++;
|
|
|
|
/*
|
|
|
|
-1 is here to avoid one 'if' statement and to let the following code
|
|
|
|
handle the last byte
|
|
|
|
*/
|
|
|
|
if ((fill= (bit_count - 1) / 8))
|
|
|
|
{
|
|
|
|
bzero(data, fill);
|
|
|
|
data+= fill;
|
|
|
|
}
|
|
|
|
bit_count-= fill * 8; /* Bits left to clear */
|
|
|
|
tmp= (1 << bit_count) - 1;
|
|
|
|
*data&= ~tmp;
|
|
|
|
}
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
set_if_smaller(info->s->state.first_bitmap_with_space, bitmap_page);
|
2007-01-18 20:38:14 +01:00
|
|
|
bitmap->changed= 1;
|
2007-10-19 23:24:22 +02:00
|
|
|
DBUG_EXECUTE("bitmap", _ma_print_bitmap_changes(bitmap););
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
Set all pages in a region as used
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_set_full_page_bits()
|
|
|
|
info Maria handler
|
|
|
|
bitmap Bitmap handler
|
|
|
|
page Start page
|
|
|
|
page_count Number of pages
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
We assume that all pages in region is covered by same bitmap
|
|
|
|
One must have a lock on info->s->bitmap.bitmap_lock
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 Error (when reading bitmap)
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_set_full_page_bits(MARIA_HA *info, MARIA_FILE_BITMAP *bitmap,
|
|
|
|
ulonglong page, uint page_count)
|
|
|
|
{
|
|
|
|
ulonglong bitmap_page;
|
|
|
|
uint offset, bit_start, bit_count, tmp;
|
|
|
|
uchar *data;
|
|
|
|
DBUG_ENTER("_ma_set_full_page_bits");
|
|
|
|
DBUG_PRINT("enter", ("page: %lu page_count: %u", (ulong) page, page_count));
|
|
|
|
safe_mutex_assert_owner(&info->s->bitmap.bitmap_lock);
|
|
|
|
|
|
|
|
bitmap_page= page - page % bitmap->pages_covered;
|
|
|
|
if (bitmap_page != bitmap->page &&
|
|
|
|
_ma_change_bitmap_page(info, bitmap, bitmap_page))
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
|
|
|
|
/* Find page number from start of bitmap */
|
|
|
|
page= page - bitmap->page - 1;
|
|
|
|
|
|
|
|
/* Set bits from 'page * 3' -> '(page + page_count) * 3' */
|
|
|
|
bit_start= page * 3;
|
|
|
|
bit_count= page_count * 3;
|
|
|
|
|
|
|
|
data= bitmap->map + bit_start / 8;
|
|
|
|
offset= bit_start & 7;
|
|
|
|
|
|
|
|
tmp= (255 << offset); /* Bits to keep */
|
|
|
|
if (bit_count + offset < 8)
|
|
|
|
{
|
|
|
|
/* Only set bits between 'offset' and 'offset+bit_count-1' */
|
|
|
|
tmp^= (255 << (offset + bit_count));
|
|
|
|
}
|
|
|
|
*data|= tmp;
|
|
|
|
|
|
|
|
if ((int) (bit_count-= (8 - offset)) > 0)
|
|
|
|
{
|
|
|
|
uint fill;
|
|
|
|
data++;
|
|
|
|
/*
|
|
|
|
-1 is here to avoid one 'if' statement and to let the following code
|
|
|
|
handle the last byte
|
|
|
|
*/
|
|
|
|
if ((fill= (bit_count - 1) / 8))
|
|
|
|
{
|
|
|
|
bfill(data, fill, 255);
|
|
|
|
data+= fill;
|
|
|
|
}
|
|
|
|
bit_count-= fill * 8; /* Bits left to set */
|
|
|
|
tmp= (1 << bit_count) - 1;
|
|
|
|
*data|= tmp;
|
|
|
|
}
|
|
|
|
bitmap->changed= 1;
|
|
|
|
DBUG_EXECUTE("bitmap", _ma_print_bitmap_changes(bitmap););
|
2007-01-18 20:38:14 +01:00
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Correct bitmap pages to reflect the true allocation
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_bitmap_release_unused()
|
|
|
|
info Maria handle
|
|
|
|
blocks Bitmap blocks
|
|
|
|
|
|
|
|
IMPLEMENTATION
|
|
|
|
If block->used & BLOCKUSED_TAIL is set:
|
|
|
|
If block->used & BLOCKUSED_USED is set, then the bits for the
|
|
|
|
corresponding page is set according to block->empty_space
|
|
|
|
If block->used & BLOCKUSED_USED is not set, then the bits for
|
|
|
|
the corresponding page is set to org_bitmap_value;
|
|
|
|
|
|
|
|
If block->used & BLOCKUSED_TAIL is not set:
|
|
|
|
if block->used is not set, the bits for the corresponding page are
|
|
|
|
cleared
|
|
|
|
|
|
|
|
For the first block (head block) the logic is same as for a tail block
|
|
|
|
|
2007-04-05 13:38:05 +02:00
|
|
|
Note that we may have 'filler blocks' that are used to split a block
|
|
|
|
in half; These can be recognized by that they have page_count == 0.
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error (Couldn't write or read bitmap page)
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_bitmap_release_unused(MARIA_HA *info, MARIA_BITMAP_BLOCKS *blocks)
|
|
|
|
{
|
|
|
|
MARIA_BITMAP_BLOCK *block= blocks->block, *end= block + blocks->count;
|
|
|
|
MARIA_FILE_BITMAP *bitmap= &info->s->bitmap;
|
|
|
|
uint bits, current_bitmap_value;
|
|
|
|
DBUG_ENTER("_ma_bitmap_release_unused");
|
|
|
|
|
|
|
|
/*
|
|
|
|
We can skip FULL_HEAD_PAGE (4) as the page was marked as 'full'
|
|
|
|
when we allocated space in the page
|
|
|
|
*/
|
|
|
|
current_bitmap_value= FULL_HEAD_PAGE;
|
|
|
|
|
|
|
|
pthread_mutex_lock(&info->s->bitmap.bitmap_lock);
|
|
|
|
|
|
|
|
/* First handle head block */
|
|
|
|
if (block->used & BLOCKUSED_USED)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("head empty_space: %u", block->empty_space));
|
|
|
|
bits= _ma_free_size_to_head_pattern(bitmap, block->empty_space);
|
|
|
|
if (block->used & BLOCKUSED_USE_ORG_BITMAP)
|
|
|
|
current_bitmap_value= block->org_bitmap_value;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
bits= block->org_bitmap_value;
|
|
|
|
if (bits != current_bitmap_value &&
|
|
|
|
set_page_bits(info, bitmap, block->page, bits))
|
|
|
|
goto err;
|
|
|
|
|
2007-10-19 23:24:22 +02:00
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
/* Handle all full pages and tail pages (for head page and blob) */
|
|
|
|
for (block++; block < end; block++)
|
|
|
|
{
|
2007-09-03 11:05:17 +02:00
|
|
|
uint page_count;
|
2007-04-05 13:38:05 +02:00
|
|
|
if (!block->page_count)
|
|
|
|
continue; /* Skip 'filler blocks' */
|
|
|
|
|
2007-09-03 11:05:17 +02:00
|
|
|
page_count= block->page_count;
|
2007-01-18 20:38:14 +01:00
|
|
|
if (block->used & BLOCKUSED_TAIL)
|
|
|
|
{
|
2007-09-03 11:05:17 +02:00
|
|
|
/* The bitmap page is only one page */
|
|
|
|
page_count= 1;
|
2007-01-18 20:38:14 +01:00
|
|
|
if (block->used & BLOCKUSED_USED)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info", ("tail empty_space: %u", block->empty_space));
|
|
|
|
bits= free_size_to_tail_pattern(bitmap, block->empty_space);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
bits= block->org_bitmap_value;
|
2007-04-19 12:18:56 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
The page has all bits set; The following test is an optimization
|
|
|
|
to not set the bits to the same value as before.
|
|
|
|
*/
|
2007-01-18 20:38:14 +01:00
|
|
|
if (bits != FULL_TAIL_PAGE &&
|
|
|
|
set_page_bits(info, bitmap, block->page, bits))
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
if (!(block->used & BLOCKUSED_USED) &&
|
|
|
|
_ma_reset_full_page_bits(info, bitmap,
|
2007-09-03 11:05:17 +02:00
|
|
|
block->page, page_count))
|
2007-01-18 20:38:14 +01:00
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&info->s->bitmap.bitmap_lock);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
|
|
|
|
err:
|
|
|
|
pthread_mutex_unlock(&info->s->bitmap.bitmap_lock);
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
Free full pages from bitmap and pagecache
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_bitmap_free_full_pages()
|
|
|
|
info Maria handle
|
|
|
|
extents Extents (as stored on disk)
|
|
|
|
count Number of extents
|
|
|
|
|
|
|
|
IMPLEMENTATION
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
Mark all full pages (not tails) from extents as free, both in bitmap
|
|
|
|
and page cache.
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error (Couldn't write or read bitmap page)
|
|
|
|
*/
|
|
|
|
|
2007-07-02 19:45:15 +02:00
|
|
|
my_bool _ma_bitmap_free_full_pages(MARIA_HA *info, const uchar *extents,
|
2007-01-18 20:38:14 +01:00
|
|
|
uint count)
|
|
|
|
{
|
|
|
|
DBUG_ENTER("_ma_bitmap_free_full_pages");
|
|
|
|
|
|
|
|
pthread_mutex_lock(&info->s->bitmap.bitmap_lock);
|
2007-10-19 23:24:22 +02:00
|
|
|
for (; count--; extents+= ROW_EXTENT_SIZE)
|
2007-01-18 20:38:14 +01:00
|
|
|
{
|
|
|
|
ulonglong page= uint5korr(extents);
|
|
|
|
uint page_count= uint2korr(extents + ROW_EXTENT_PAGE_SIZE);
|
|
|
|
if (!(page_count & TAIL_BIT))
|
|
|
|
{
|
2007-10-19 23:24:22 +02:00
|
|
|
if (page == 0 && page_count == 0)
|
|
|
|
continue; /* Not used extent */
|
This patch is a collection of patches from from Sanja, Sergei and Monty.
Added logging and pinning of pages to block format.
Integration of transaction manager, log handler.
Better page cache intergration
Split trnman.h into two files, so that we don't have to include my_atomic.h into C++ programs.
Renaming of structures, more comments, more debugging etc.
Fixed problem with small head block + long varchar.
Added extra argument to delete_record() and update_record() (needed for UNDO logging)
Small changes to interface of pagecache and log handler.
Change initialization of log_record_type_descriptors to not be depending on enum order.
Use array of LEX_STRING's to send data to log handler
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
include/lf.h:
Interface fixes
Rename of structures
(Patch from Sergei via Sanja)
include/my_atomic.h:
More comments
include/my_global.h:
Added MY_ERRPTR
include/pagecache.h:
Added undo LSN when unlocking pages
mysql-test/r/maria.result:
Updated results
mysql-test/t/maria.test:
Added autocommit around lock tables
(Patch from Sanja)
mysys/lf_alloc-pin.c:
Post-review fixes, simple optimizations
More comments
Struct slot renames
Check amount of memory on stack
(Patch from Sergei)
mysys/lf_dynarray.c:
More comments
mysys/lf_hash.c:
More comments
After review fixes
(Patch from Sergei)
storage/maria/ha_maria.cc:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
Move out all deferencing of the transaction structure.
Transaction manager integrated (Patch from Sergei)
storage/maria/ha_maria.h:
Added prototype for start_stmt()
storage/maria/lockman.c:
Function call rename
storage/maria/ma_bitmap.c:
Mark deleted pages free from page cache
storage/maria/ma_blockrec.c:
Offset -> rownr
More debugging
Fixed problem with small head block + long varchar
Added logging of changed pages
Added logging of undo (Including only loggging of changed fields in case of update)
Added pinning/unpinning of all changed pages
More comments
Added free_full_pages() as the same code was used in several places.
fill_rows_parts() renamed as fill_insert_undo_parts()
offset -> rownr
Added some optimization of not transactional tables
_ma_update_block_record() has new parameter, as we need original row to do efficent undo for update
storage/maria/ma_blockrec.h:
Added ROW_EXTENTS_ON_STACK
Changed prototype for update and delete of row
storage/maria/ma_check.c:
Added original row to delete_record() call
storage/maria/ma_control_file.h:
Added ifdefs for C++
storage/maria/ma_delete.c:
Added original row to delete_record() call
(Needed for efficent undo logging)
storage/maria/ma_dynrec.c:
Added extra argument to delete_record() and update_record()
Removed not used variable
storage/maria/ma_init.c:
Initialize log handler
storage/maria/ma_loghandler.c:
Removed not used variable
Change initialization of log_record_type_descriptors to not be depending on enum order
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_loghandler.h:
New defines
Use array of LEX_STRING's to send data to log handler
storage/maria/ma_open.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
storage/maria/ma_pagecache.c:
Don't decrease number of readers when using pagecache_write()/pagecache_read()
In pagecache_write() decrement request count if page was left pinned
Added pagecache_delete_pages()
Removed some casts
Make trace output consistent with rest of code
Simplify calling of DBUG_ASSERT(0)
Only update LSN if the LSN is bigger than what's already on the page
Added LSN parameter pagecache_unpin_page(), pagecache_unpin(), and pagecache_unlock()
(Part of patch from Sanja)
storage/maria/ma_static.c:
Added 'dummy' transaction option to MARIA_INFO so that we can always assume 'trn' exists.
Added default page cache
storage/maria/ma_statrec.c:
Added extra argument to delete_record() and update_record()
storage/maria/ma_test1.c:
Added option -T for transactions
storage/maria/ma_test2.c:
Added option -T for transactions
storage/maria/ma_test_all.sh:
Test with transactions
storage/maria/ma_update.c:
Changed prototype for update of row
storage/maria/maria_def.h:
Changed prototype for update & delete of row as block records need to access the old row
Store in MARIA_SHARE->page_type if pages will have up to date LSN's
Added MARIA_MAX_TREE_LEVELS to allow us to calculate the number of possible pinned pages we may need.
Removed not used 'empty_bits_buffer'
Added pointer to transaction object
Added array for pinned pages
Added log_row_parts array for logging of field data.
Added MARIA_PINNED_PAGE to store pinned pages
storage/maria/trnman.c:
Added accessor functions to transaction object
Added missing DBUG_RETURN()
More debugging
More comments
Changed // comment of code to #ifdef NOT_USED
Transaction manager integrated.
Post review fixes
Part of patch originally from Sergei
storage/maria/trnman.h:
Split trnman.h into two files, so that we don't have to include my_atomic.h into the .cc program.
(Temporary fix to avoid bug in gcc)
storage/maria/unittest/ma_pagecache_single.c:
Added missing argument
Added SKIP_BIG_TESTS
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
Test logging with new LEX_STRING parameter
(Patch from Sanja)
storage/maria/unittest/trnman-t.c:
Stack overflow detection
(Patch from Sergei)
unittest/unit.pl:
Command-line options --big and --verbose
(Patch from Sergei)
unittest/mytap/tap.c:
Detect --big
(Patch from Sergei)
unittest/mytap/tap.h:
Skip_big_tests and SKIP_BIG_TESTS
(Patch from Sergei)
storage/maria/trnman_public.h:
New BitKeeper file ``storage/maria/trnman_public.h''
2007-05-29 19:13:56 +02:00
|
|
|
if (pagecache_delete_pages(info->s->pagecache, &info->dfile, page,
|
2007-10-19 23:24:22 +02:00
|
|
|
page_count, PAGECACHE_LOCK_WRITE, 1) ||
|
|
|
|
_ma_reset_full_page_bits(info, &info->s->bitmap, page, page_count))
|
2007-01-18 20:38:14 +01:00
|
|
|
{
|
|
|
|
pthread_mutex_unlock(&info->s->bitmap.bitmap_lock);
|
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&info->s->bitmap.bitmap_lock);
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
/*
|
|
|
|
Mark in the bitmap how much free space there is on a page
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_bitmap_set()
|
|
|
|
info Mari handler
|
|
|
|
page Adress to page
|
|
|
|
head 1 if page is a head page, 0 if tail page
|
|
|
|
empty_space How much empty space there is on page
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
2007-01-18 20:38:14 +01:00
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
my_bool _ma_bitmap_set(MARIA_HA *info, ulonglong page, my_bool head,
|
2007-01-18 20:38:14 +01:00
|
|
|
uint empty_space)
|
|
|
|
{
|
|
|
|
MARIA_FILE_BITMAP *bitmap= &info->s->bitmap;
|
|
|
|
uint bits;
|
|
|
|
my_bool res;
|
|
|
|
DBUG_ENTER("_ma_bitmap_set");
|
|
|
|
|
|
|
|
pthread_mutex_lock(&info->s->bitmap.bitmap_lock);
|
|
|
|
bits= (head ?
|
|
|
|
_ma_free_size_to_head_pattern(bitmap, empty_space) :
|
|
|
|
free_size_to_tail_pattern(bitmap, empty_space));
|
2007-04-19 12:18:56 +02:00
|
|
|
res= set_page_bits(info, bitmap, page, bits);
|
2007-01-18 20:38:14 +01:00
|
|
|
pthread_mutex_unlock(&info->s->bitmap.bitmap_lock);
|
|
|
|
DBUG_RETURN(res);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Check that bitmap pattern is correct for a page
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
Used in maria_chk
|
|
|
|
|
2007-04-19 12:18:56 +02:00
|
|
|
SYNOPSIS
|
|
|
|
_ma_check_bitmap_data()
|
|
|
|
info Maria handler
|
|
|
|
page_type What kind of page this is
|
|
|
|
page Adress to page
|
|
|
|
empty_space Empty space on page
|
|
|
|
bitmap_pattern Store here the pattern that was in the bitmap for the
|
|
|
|
page. This is always updated.
|
|
|
|
|
2007-01-18 20:38:14 +01:00
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_check_bitmap_data(MARIA_HA *info,
|
|
|
|
enum en_page_type page_type, ulonglong page,
|
|
|
|
uint empty_space, uint *bitmap_pattern)
|
|
|
|
{
|
|
|
|
uint bits;
|
|
|
|
switch (page_type) {
|
|
|
|
case UNALLOCATED_PAGE:
|
|
|
|
case MAX_PAGE_TYPE:
|
|
|
|
bits= 0;
|
|
|
|
break;
|
|
|
|
case HEAD_PAGE:
|
|
|
|
bits= _ma_free_size_to_head_pattern(&info->s->bitmap, empty_space);
|
|
|
|
break;
|
|
|
|
case TAIL_PAGE:
|
|
|
|
bits= free_size_to_tail_pattern(&info->s->bitmap, empty_space);
|
|
|
|
break;
|
|
|
|
case BLOB_PAGE:
|
|
|
|
bits= FULL_TAIL_PAGE;
|
|
|
|
break;
|
2007-10-11 17:45:42 +02:00
|
|
|
default:
|
|
|
|
bits= 0; /* to satisfy compiler */
|
|
|
|
DBUG_ASSERT(0);
|
2007-01-18 20:38:14 +01:00
|
|
|
}
|
|
|
|
return (*bitmap_pattern= get_page_bits(info, &info->s->bitmap, page)) !=
|
|
|
|
bits;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-04-19 12:18:56 +02:00
|
|
|
Check if the page type matches the one that we have in the bitmap
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_check_if_right_bitmap_type()
|
|
|
|
info Maria handler
|
|
|
|
page_type What kind of page this is
|
|
|
|
page Adress to page
|
|
|
|
bitmap_pattern Store here the pattern that was in the bitmap for the
|
|
|
|
page. This is always updated.
|
2007-01-18 20:38:14 +01:00
|
|
|
|
|
|
|
NOTES
|
|
|
|
Used in maria_chk
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok
|
|
|
|
1 error
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_check_if_right_bitmap_type(MARIA_HA *info,
|
|
|
|
enum en_page_type page_type,
|
|
|
|
ulonglong page,
|
|
|
|
uint *bitmap_pattern)
|
|
|
|
{
|
|
|
|
if ((*bitmap_pattern= get_page_bits(info, &info->s->bitmap, page)) > 7)
|
|
|
|
return 1; /* Couldn't read page */
|
|
|
|
switch (page_type) {
|
|
|
|
case HEAD_PAGE:
|
|
|
|
return *bitmap_pattern < 1 || *bitmap_pattern > 4;
|
|
|
|
case TAIL_PAGE:
|
|
|
|
return *bitmap_pattern < 5;
|
|
|
|
case BLOB_PAGE:
|
|
|
|
return *bitmap_pattern != 7;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
DBUG_ASSERT(0);
|
|
|
|
return 1;
|
|
|
|
}
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
@brief create the first bitmap page of a freshly created data file
|
|
|
|
|
|
|
|
@param share table's share
|
|
|
|
|
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval !=0 Error
|
|
|
|
*/
|
|
|
|
|
|
|
|
int _ma_bitmap_create_first(MARIA_SHARE *share)
|
2007-10-19 23:24:22 +02:00
|
|
|
{
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
uint block_size= share->bitmap.block_size;
|
|
|
|
File file= share->bitmap.file.file;
|
2007-10-09 20:09:50 +02:00
|
|
|
char marker[sizeof(maria_bitmap_marker)];
|
|
|
|
|
|
|
|
if (share->options & HA_OPTION_PAGE_CHECKSUM)
|
|
|
|
bzero(marker, sizeof(marker));
|
|
|
|
else
|
|
|
|
bmove(marker, maria_bitmap_marker, sizeof(marker));
|
|
|
|
|
WL#3072 Maria recovery
* create page cache before initializing engine and not after, because
Maria's recovery needs a page cache
* make the creation of a bitmap page more crash-resistent
* bugfix (see ma_blockrec.c)
* back to old way: create an 8k bitmap page when creating table
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* maria_chk tags repaired table with a special LSN
* reworking all around in ma_recovery.c (less duplication)
mysys/my_realloc.c:
noted an issue in my_realloc()
sql/mysqld.cc:
page cache needs to be created before engines are initialized,
because Maria's initialization may do a recovery which needs
the page cache.
storage/maria/ha_maria.cc:
update to new prototype
storage/maria/ma_bitmap.c:
when creating the first bitmap page we used chsize to 8192 bytes then
pwrite (overwrite) the last 2 bytes (8191-8192). If crash between
the two operations, this leaves a bitmap page full without its end
marker. A later recovery may try to read this page and find it
exists and misses a marker and conclude it's corrupted and fail.
Changing the chsize to only 8190 bytes: recovery will then find
the page is too short and recreate it entirely.
storage/maria/ma_blockrec.c:
Fix for a bug: when executing a REDO, if the data page is created,
data_file_length was increased before _ma_bitmap_set():
_ma_bitmap_set() called _ma_read_bitmap_page() which, due to the
increased data_file_length, expected to find a bitmap page on disk
with a correct end marker; if the bitmap page didn't exist already
in fact, this failed. Fixed by increasing data_file_length only after
_ma_read_bitmap_page() has created the new bitmap page correctly.
This bug could happen every time a REDO is about creating a new
bitmap page.
storage/maria/ma_check.c:
empty data file has a bitmap page
storage/maria/ma_control_file.c:
useless parameter to ma_control_file_create_or_open(), just
test if this is recovery.
storage/maria/ma_control_file.h:
new prototype
storage/maria/ma_create.c:
Back to how it was before: maria_create() creates an 8k bitmap page.
Thus (bugfix) data_file_length needs to reflect this instead of being 0.
storage/maria/ma_loghandler.c:
as ma_test1 and ma_test2 now use real transactions and not
dummy_transaction_object, REDO for INSERT/UPDATE/DELETE are always
about real transactions, can assert this.
A function for Recovery to assign a short id to a table.
storage/maria/ma_loghandler.h:
new function
storage/maria/ma_loghandler_lsn.h:
maria_chk tags repaired tables with this LSN
storage/maria/ma_open.c:
* enforce that DMLs on transactional tables use real transactions
and not dummy_transaction_object.
* test if table was repaired with maria_chk (which has to been
seen as an import of an external table into the server), test
validity of create_rename_lsn (header corruption detection)
* comments.
storage/maria/ma_recovery.c:
* preparations for the UNDO phase: recreate TRNs
* preparations for Checkpoint: list of dirty pages, testing
of rec_lsn to know if page should be skipped during Recovery
(unused in this patch as no Checkpoint module pushed yet)
* reworking all around (less duplication)
storage/maria/ma_recovery.h:
a parameter to say if the UNDO phase should be skipped
storage/maria/maria_chk.c:
tag repaired tables with a special LSN
storage/maria/maria_read_log.c:
* update to new prototype
* no UNDO phase in maria_read_log for now
storage/maria/trnman.c:
* a function for Recovery to create a transaction (TRN), needed
in the UNDO phase
* a function for Recovery to grab an existing transaction, needed
in the UNDO phase (rollback all existing transactions)
storage/maria/trnman_public.h:
new functions
2007-08-29 16:43:01 +02:00
|
|
|
if (my_chsize(file, block_size - sizeof(maria_bitmap_marker),
|
|
|
|
0, MYF(MY_WME)) ||
|
2007-10-09 20:09:50 +02:00
|
|
|
my_pwrite(file, marker, sizeof(maria_bitmap_marker),
|
WL#3072 - Maria recovery
Unit test for recovery: runs ma_test1 and ma_test2 (both only with
INSERTs and DELETEs; UPDATEs disabled as not handled by recovery)
then moves the tables elswhere; recreates tables from the log, and
compares and fails if there is a difference. Passes now.
Most of maria_read_log.c moved to ma_recovery.c, as it will be re-used
for recovery-from-ha_maria.
Bugfixes of applying of REDO_INSERT, REDO_PURGE_ROW.
Applying of REDO_PURGE_BLOCKS, REDO_DELETE_ALL, REDO_DROP_TABLE,
UNDO_ROW_INSERT (in REDO phase only, i.e. just doing records++),
UNDO_ROW_DELETE, UNDO_ROW_PURGE.
Code cleanups.
Monty: please look for "QQ". Sanja: please look for "Sanja".
Future tasks: recovery of the bitmap (easy), recovery of the state
(make it idempotent), more REDOs (Monty to work on
REDO_UPDATE?), UNDO phase...
Pushing this cset as it looks safe, contains test and bugfixes which
will help Monty implement applying of REDO_UPDATE.
sql/handler.cc:
typo
storage/maria/Makefile.am:
Adding ma_test_recovery (which ma_test_all invokes, and which can
also be run alone). Most of maria_read_log.c moved to ma_recovery.c
storage/maria/ha_maria.cc:
comments
storage/maria/ma_bitmap.c:
fixing comments. 2 -> sizeof(maria_bitmap_marker).
Bitmap-related part of _ma_initialize_datafile() moves in bitmap module.
Now putting the "bm" signature when creating the first bitmap page
(it used to happen only at next open, but that
caused an annoying difference when testing Recovery if the original
run didn't open the table, and it looks more
logical like this: it goes to disk only with its signature correct);
see the "QQ" comment towards the _ma_initialize_data_file() call
in ma_create.c for more).
When reading a bitmap page, verify its signature (happens when normally
using the table or when CHECKing it; not when REPAIRing it).
storage/maria/ma_blockrec.c:
* no need to sync the data file if table is not transactional
* Comments, code cleanup (log-related data moved to log-related code
block, int5store->page_store).
* Store the table's short id into LOGREC_UNDO_ROW_PURGE, like we
do for other records (though this record will soon be replaced
with a CLR).
* If "page" is 1 it means the page which extends from byte
page*block_size+1 to (page+1)*block_size (byte number 1 being
the first byte of the file). The last byte of the file is
data_file_length (same convention).
A new page needs to be created if the last byte of the page is
beyond the last byte of the file, i.e.
(page+1)*block_size+1 > data_file_length, so we correct the test
(bug found when testing log applying for ma_test1 -M -T --skip-update).
* update the page's LSN when removing a row from it during
execution of a REDO_PURGE_ROW record (bug found when testing log
applying for ma_test1 -M -T --skip-update).
* applying of REDO_PURGE_BLOCKs (limited to a one-page range for now).
storage/maria/ma_blockrec.h:
new functions. maria_bitmap_marker does not need to be exported.
storage/maria/ma_close.c:
we can always flush the table's state when closing the last instance
of the table. And it is needed for maria_read_log (as it does
not use maria_lock_database()).
storage/maria/ma_control_file.c:
when in Recovery, some assertions should not be used.
storage/maria/ma_control_file.h:
double-inclusion safe
storage/maria/ma_create.c:
during recovery, don't log records. Comments.
Moving the creation of the first bitmap page to ma_bitmap.c
storage/maria/ma_delete_table.c:
during recovery, don't log records. Log the end-zero of the dropped
table's name, so that recovery can use the string in place without
extending it to fit an end zero.
storage/maria/ma_loghandler.c:
* inwrite_rec_hook also needs access to the MARIA_SHARE, like
prewrite_rec_hook. This will be needed to update
share->records_diff (in the upcoming patch "recovery of the state").
* LOG_DESC::record_ends_group changed to an enum.
* LOG_DESC for LOGREC_REDO_PURGE_BLOCKS and LOGREC_UNDO_ROW_PURGE
corrected
* Sanja please see the @todo LOG BUG
* avoiding DBUG_RETURN(func()) as it gives confusing debug traces.
storage/maria/ma_loghandler.h:
- log write hooks called while the log's lock is held (inwrite_rec_hook)
now need the MARIA_SHARE, like prewrite_rec_hook already had
- instead of a bool saying if this record's type ends groups or not,
we refine: it may not end a group, it may end a group, or it may
be a group in itself. Imagine that we had a physical write failure
to a table before we log the UNDO, we still end up in
external_lock(F_UNLCK) and then we log a COMMIT: we don't want
to consider this COMMIT as ending the group of REDOs (don't want
to execute those REDOs during Recovery), that's why we say "COMMIT
is a group in itself, it aborts any previous group". This also
gives one more sanity check in maria_read_log.
storage/maria/ma_recovery.c:
New Recovery code, replacing the old pseudocode.
Most of maria_read_log moved here.
Call-able from ha_maria, but not enabled yet.
Compared to the previous version of maria_read_log, some bugs have
been fixed, debugging output can go to stdout or a disk file (for now
it's useful for me, later it can be changed), execution of
REDO_DROP_TABLE, REDO_DELETE_ALL, REDO_PURGE_BLOCKS has been added. Duplicate code
has been factored into functions. We abort an unfinished group
of records if we see a record which is a group in itself (like COMMIT).
No need for maria_panic() after a bug (which caused tables to not
be closed) was fixed; if there is yet another bug I prefer to see it.
When opening a table for Recovery, set data_file_length
and key_file_length to their real physical value (these are the
easiest state members to restore :). Warn us if the last page
was truncated (but Recovery handles it).
MARIA_SHARE::state::state::records is now partly recovered (not
idempotent, but works if recreating tables from scracth).
When applying a REDO to a page, stamp it with the UNDO's LSN
(current_group_end_lsn), not with the REDO's LSN; it makes
the table more identical to the original table (easier to compare
the two tables in the end).
Big thing missing: some types of REDOs are not handled,
and the UNDO phase does not exist (missing functions to execute UNDOs
to actually rollback). So for now tests are only inserting/deleting
a few 100 rows, closing the table and seeing if the log is applied ok;
it works. UPDATE not handled.
storage/maria/ma_recovery.h:
new functions: ma_recover() for recovery from inside ha_maria;
_ma_apply_log() for maria_read_log (ma_recover() calls _ma_apply_log()).
Btw, we need to not use the word "recover" for REPAIR/maria_chk anymore.
storage/maria/ma_rename.c:
don't write log records during recovery
storage/maria/ma_test2.c:
- fail if maria_info() or other subtests find some wrong information
- new option -g to skip updates.
- init the translog before creating the table, so that log applying
can work.
- in "#if 0" you'll see some fixed bugs (will be removed).
storage/maria/ma_test_all.sh:
cleanup files. Test log applying.
storage/maria/maria_read_log.c:
most of the logic moves to ma_recovery.c to be shared between
maria_read_log and recovery-from-inside-mysqld.
See ma_recovery.c for additional changes made to the moved code.
storage/maria/ma_test_recovery:
unit test for Recovery. Tests insert and delete,
REDO_UPDATE not yet coded.
Script is called from ma_test_all. Can run standalone.
2007-07-26 11:56:21 +02:00
|
|
|
block_size - sizeof(maria_bitmap_marker),
|
|
|
|
MYF(MY_NABP | MY_WME)))
|
|
|
|
return 1;
|
|
|
|
share->state.state.data_file_length= block_size;
|
|
|
|
_ma_bitmap_delete_all(share);
|
|
|
|
return 0;
|
|
|
|
}
|