2006-04-11 15:45:10 +02:00
|
|
|
/* Copyright (C) 2006 MySQL AB & MySQL Finland AB & TCX DataKonsult AB
|
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
2007-03-02 11:20:23 +01:00
|
|
|
the Free Software Foundation; version 2 of the License.
|
2006-04-11 15:45:10 +02:00
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program; if not, write to the Free Software
|
|
|
|
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
|
|
|
|
|
|
|
|
/*
|
|
|
|
locking of isam-tables.
|
|
|
|
reads info from a isam-table. Must be first request before doing any furter
|
|
|
|
calls to any isamfunktion. Is used to allow many process use the same
|
|
|
|
isamdatabase.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "ma_ftdefs.h"
|
|
|
|
|
|
|
|
/* lock table by F_UNLCK, F_RDLCK or F_WRLCK */
|
|
|
|
|
|
|
|
int maria_lock_database(MARIA_HA *info, int lock_type)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
uint count;
|
|
|
|
MARIA_SHARE *share=info->s;
|
|
|
|
DBUG_ENTER("maria_lock_database");
|
|
|
|
DBUG_PRINT("enter",("lock_type: %d old lock %d r_locks: %u w_locks: %u "
|
|
|
|
"global_changed: %d open_count: %u name: '%s'",
|
|
|
|
lock_type, info->lock_type, share->r_locks,
|
|
|
|
share->w_locks,
|
|
|
|
share->global_changed, share->state.open_count,
|
|
|
|
share->index_file_name));
|
|
|
|
if (share->options & HA_OPTION_READ_ONLY_DATA ||
|
|
|
|
info->lock_type == lock_type)
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
if (lock_type == F_EXTRA_LCK) /* Used by TMP tables */
|
|
|
|
{
|
|
|
|
++share->w_locks;
|
|
|
|
++share->tot_locks;
|
|
|
|
info->lock_type= lock_type;
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
2006-05-05 20:32:02 +02:00
|
|
|
error=0;
|
2006-04-11 15:45:10 +02:00
|
|
|
pthread_mutex_lock(&share->intern_lock);
|
2007-04-04 22:37:09 +02:00
|
|
|
if (share->kfile.file >= 0) /* May only be false on windows */
|
2006-04-11 15:45:10 +02:00
|
|
|
{
|
|
|
|
switch (lock_type) {
|
|
|
|
case F_UNLCK:
|
|
|
|
maria_ftparser_call_deinitializer(info);
|
|
|
|
if (info->lock_type == F_RDLCK)
|
2007-07-04 22:27:58 +02:00
|
|
|
{
|
2006-04-11 15:45:10 +02:00
|
|
|
count= --share->r_locks;
|
2007-07-04 22:27:58 +02:00
|
|
|
_ma_restore_status(info);
|
|
|
|
}
|
2006-04-11 15:45:10 +02:00
|
|
|
else
|
2007-07-04 22:27:58 +02:00
|
|
|
{
|
2006-04-11 15:45:10 +02:00
|
|
|
count= --share->w_locks;
|
2007-07-04 22:27:58 +02:00
|
|
|
_ma_update_status(info);
|
|
|
|
}
|
2006-04-11 15:45:10 +02:00
|
|
|
--share->tot_locks;
|
2007-01-18 20:38:14 +01:00
|
|
|
if (info->lock_type == F_WRLCK && !share->w_locks)
|
2006-04-11 15:45:10 +02:00
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/* pages of transactional tables get flushed at Checkpoint */
|
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows
Checksum for MyISAM now ignores NULL and not used part of VARCHAR
Renamed some variables that caused shadow compiler warnings
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug.
pagecache_unlock_by_ulink() now has extra argument to say if page was changed.
Give error message if we fail to open control file
Mark page cache variables as not flushable
include/maria.h:
Made min page cache larger (needed for pinning key page)
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
Added write_comp_flag to move some runtime code to maria_open()
include/my_base.h:
Added new error message to be used when handler initialization failed
include/my_global.h:
Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables
include/my_handler.h:
Added const to some parameters
mysys/array.c:
More DBUG
mysys/my_error.c:
Fixed indentation
mysys/my_handler.c:
Added const to some parameters
Added missing error messages
sql/field.h:
Renamed variables to avoid variable shadowing
sql/handler.h:
Renamed parameter to avoid variable name conflict
sql/item.h:
Renamed variables to avoid variable shadowing
sql/log_event_old.h:
Renamed variables to avoid variable shadowing
sql/set_var.h:
Renamed variables to avoid variable shadowing
sql/sql_delete.cc:
Removed maria hack for temporary tables
Fixed indentation
sql/sql_table.cc:
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use.
sql/table.cc:
Copy page_checksum from share
Removed Maria hack
storage/maria/Makefile.am:
Added new files
storage/maria/ha_maria.cc:
Renamed records -> record_count and info -> create_info to avoid variable name conflicts
Mark page cache variables as not flushable
storage/maria/ma_blockrec.c:
Moved _ma_unpin_all_pages() to ma_key_recover.c
Moved init of info->pinned_pages to ma_open.c
Moved _ma_finalize_row() to maria_key_recover.h
Renamed some variables to avoid variable name conflicts
Mark page_link.changed for blocks we change directly
Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index)
storage/maria/ma_blockrec.h:
Removed extra empty line
storage/maria/ma_checkpoint.c:
Remove not needed trnman.h
storage/maria/ma_close.c:
Free pinned pages (which are now always allocated)
storage/maria/ma_control_file.c:
Give error message if we fail to open control file
storage/maria/ma_delete.c:
Changes for redo logging (first part, logging of underflow not yet done)
- Log undo-key-delete
- Log delete of key
- Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert()
- Added new arguments to some functions to be able to write redo information
- Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED
Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway
Changed 2 bmove_upp() to bmove() as this made code easer to understand
More function comments
Indentation fixes
storage/maria/ma_ft_update.c:
New arguments to _ma_write_keypage()
storage/maria/ma_loghandler.c:
Fixed some DBUG_PRINT messages
Simplify code
Added new log entrys for key page redo
Renamed some variables to avoid variable name shadowing
storage/maria/ma_loghandler.h:
Moved some defines here
Added define for storing key number on key pages
Added new translog record types
Added enum for type of operations in LOGREC_REDO_INDEX
storage/maria/ma_open.c:
Always allocate info.pinned_pages (we need now also for normal key page usage)
Update keyinfo->key_nr
Added virtual functions to convert record position o number to be stored on key pages
Update keyinfo->write_comp_flag to value of search flag to be used when writing key
storage/maria/ma_page.c:
Added redo for key pages
- Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE
- _ma_fetch_keypage() now pin's pages if needed
- Extended _ma_write_keypage() with type of locks to be used
- ma_dispose() now locks info->s->state.key_del from other threads
- ma_dispose() writes redo log record
- ma_new() locks info->s->state.key_del from other threads if it was used
- ma_new() now pins read page
Other things:
- Removed some not needed arguments from _ma_new() and _ma_dispose)
- Added some new variables to simplify code
- If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes
storage/maria/ma_pagecache.h:
Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed
Added some defines for pagecache priority levels that one can use
storage/maria/ma_range.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_recovery.c:
- Added hooks for new translog types:
REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and
UNDO_KEY_DELETE_WITH_ROOT.
- Moved variable declarations to start of function (portability fixes)
- Removed some not needed initializations
- Set only relevant state changes for each redo/undo entry
storage/maria/lockman.c:
Removed end space
storage/maria/ma_check.c:
Removed end space
storage/maria/ma_create.c:
Removed end space
storage/maria/ma_locking.c:
Removed end space
storage/maria/ma_packrec.c:
Removed end space
storage/maria/ma_pagecache.c:
Removed end space
storage/maria/ma_panic.c:
Removed end space
storage/maria/ma_rt_index.c:
Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new()
Fixed indentation
storage/maria/ma_rt_key.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_rt_split.c:
Added new arguments for call to _ma_new()
Use new keypage header
Added new arguments for call to _ma_write_keypage()
storage/maria/ma_search.c:
Updated comments & indentation
Added new arguments for call to _ma_fetch_keypage()
Made some variables and arguments const
Added virtual functions for converting row position to number to be stored in key
use MARIA_RECORD_POS of record position instead of my_off_t
Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO)
storage/maria/ma_sort.c:
Removed end space
storage/maria/ma_statrec.c:
Updated arguments for call to _ma_rec_pos()
storage/maria/ma_test1.c:
Fixed too small buffer to init_pagecache()
Fixed bug when using insert_count and test_flag
storage/maria/ma_test2.c:
Use more resonable pagecache size
Remove not used code
Reset blob_length to fix wrong output message
storage/maria/ma_test_all.sh:
Fixed wrong test
storage/maria/ma_write.c:
Lots of new code to handle REDO of key pages
No logic changes because of REDO code, mostly adding new arguments and adding new code for logging
Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions
Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open()
Zerofill new used pages for:
- To remove possible sensitive data left in buffer
- To get idenitical data on pages after running redo
- Better compression of pages if archived
storage/maria/maria_chk.c:
Added information if table is crash safe
storage/maria/maria_def.h:
New virtual function to convert between record position on key and normal record position
Aded mutex and extra variables to handle locking of share->state.key_del
Moved some structure variables to get things more aligned
Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert
Added argument to MARIA_PINNED_PAGE to indicate if page was changed
Updated prototypes for functions
Added some structures for signaling changes in REDO handling
storage/maria/unittest/ma_pagecache_single.c:
Updated arguments for changed function calls
storage/myisam/mi_check.c:
Made calc_check_checksum virtual
storage/myisam/mi_checksum.c:
Update checksums to ignore null columns
storage/myisam/mi_create.c:
Mark if table has null column (to know when we have to use mi_checksum())
storage/myisam/mi_open.c:
Added virtual function for calculating checksum to be able to easily ignore NULL fields
storage/myisam/mi_test2.c:
Fixed bug
storage/myisam/myisamdef.h:
Added virtual function for calculating checksum during check table
Removed ha_key_cmp() as this is in handler.h
storage/maria/ma_key_recover.c:
New BitKeeper file ``storage/maria/ma_key_recover.c''
storage/maria/ma_key_recover.h:
New BitKeeper file ``storage/maria/ma_key_recover.h''
storage/maria/ma_key_redo.c:
New BitKeeper file ``storage/maria/ma_key_redo.c''
2007-11-14 18:08:06 +01:00
|
|
|
if (!share->base.born_transactional && !share->temporary &&
|
|
|
|
_ma_flush_table_files(info,
|
|
|
|
share->delay_key_write ? MARIA_FLUSH_DATA :
|
|
|
|
MARIA_FLUSH_DATA | MARIA_FLUSH_INDEX,
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
FLUSH_KEEP, FLUSH_KEEP))
|
2007-01-18 20:38:14 +01:00
|
|
|
error= my_errno;
|
2006-04-11 15:45:10 +02:00
|
|
|
}
|
|
|
|
if (info->opt_flag & (READ_CACHE_USED | WRITE_CACHE_USED))
|
|
|
|
{
|
|
|
|
if (end_io_cache(&info->rec_cache))
|
|
|
|
{
|
|
|
|
error=my_errno;
|
|
|
|
maria_print_error(info->s, HA_ERR_CRASHED);
|
|
|
|
maria_mark_crashed(info);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!count)
|
|
|
|
{
|
|
|
|
DBUG_PRINT("info",("changed: %u w_locks: %u",
|
|
|
|
(uint) share->changed, share->w_locks));
|
|
|
|
if (share->changed && !share->w_locks)
|
|
|
|
{
|
|
|
|
#ifdef HAVE_MMAP
|
2007-01-18 20:38:14 +01:00
|
|
|
if ((info->s->mmaped_length !=
|
|
|
|
info->s->state.state.data_file_length) &&
|
|
|
|
(info->s->nonmmaped_inserts > MAX_NONMAPPED_INSERTS))
|
|
|
|
{
|
|
|
|
if (info->s->concurrent_insert)
|
|
|
|
rw_wrlock(&info->s->mmap_lock);
|
|
|
|
_ma_remap_file(info, info->s->state.state.data_file_length);
|
|
|
|
info->s->nonmmaped_inserts= 0;
|
|
|
|
if (info->s->concurrent_insert)
|
|
|
|
rw_unlock(&info->s->mmap_lock);
|
|
|
|
}
|
2006-04-11 15:45:10 +02:00
|
|
|
#endif
|
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows
Checksum for MyISAM now ignores NULL and not used part of VARCHAR
Renamed some variables that caused shadow compiler warnings
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug.
pagecache_unlock_by_ulink() now has extra argument to say if page was changed.
Give error message if we fail to open control file
Mark page cache variables as not flushable
include/maria.h:
Made min page cache larger (needed for pinning key page)
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
Added write_comp_flag to move some runtime code to maria_open()
include/my_base.h:
Added new error message to be used when handler initialization failed
include/my_global.h:
Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables
include/my_handler.h:
Added const to some parameters
mysys/array.c:
More DBUG
mysys/my_error.c:
Fixed indentation
mysys/my_handler.c:
Added const to some parameters
Added missing error messages
sql/field.h:
Renamed variables to avoid variable shadowing
sql/handler.h:
Renamed parameter to avoid variable name conflict
sql/item.h:
Renamed variables to avoid variable shadowing
sql/log_event_old.h:
Renamed variables to avoid variable shadowing
sql/set_var.h:
Renamed variables to avoid variable shadowing
sql/sql_delete.cc:
Removed maria hack for temporary tables
Fixed indentation
sql/sql_table.cc:
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use.
sql/table.cc:
Copy page_checksum from share
Removed Maria hack
storage/maria/Makefile.am:
Added new files
storage/maria/ha_maria.cc:
Renamed records -> record_count and info -> create_info to avoid variable name conflicts
Mark page cache variables as not flushable
storage/maria/ma_blockrec.c:
Moved _ma_unpin_all_pages() to ma_key_recover.c
Moved init of info->pinned_pages to ma_open.c
Moved _ma_finalize_row() to maria_key_recover.h
Renamed some variables to avoid variable name conflicts
Mark page_link.changed for blocks we change directly
Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index)
storage/maria/ma_blockrec.h:
Removed extra empty line
storage/maria/ma_checkpoint.c:
Remove not needed trnman.h
storage/maria/ma_close.c:
Free pinned pages (which are now always allocated)
storage/maria/ma_control_file.c:
Give error message if we fail to open control file
storage/maria/ma_delete.c:
Changes for redo logging (first part, logging of underflow not yet done)
- Log undo-key-delete
- Log delete of key
- Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert()
- Added new arguments to some functions to be able to write redo information
- Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED
Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway
Changed 2 bmove_upp() to bmove() as this made code easer to understand
More function comments
Indentation fixes
storage/maria/ma_ft_update.c:
New arguments to _ma_write_keypage()
storage/maria/ma_loghandler.c:
Fixed some DBUG_PRINT messages
Simplify code
Added new log entrys for key page redo
Renamed some variables to avoid variable name shadowing
storage/maria/ma_loghandler.h:
Moved some defines here
Added define for storing key number on key pages
Added new translog record types
Added enum for type of operations in LOGREC_REDO_INDEX
storage/maria/ma_open.c:
Always allocate info.pinned_pages (we need now also for normal key page usage)
Update keyinfo->key_nr
Added virtual functions to convert record position o number to be stored on key pages
Update keyinfo->write_comp_flag to value of search flag to be used when writing key
storage/maria/ma_page.c:
Added redo for key pages
- Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE
- _ma_fetch_keypage() now pin's pages if needed
- Extended _ma_write_keypage() with type of locks to be used
- ma_dispose() now locks info->s->state.key_del from other threads
- ma_dispose() writes redo log record
- ma_new() locks info->s->state.key_del from other threads if it was used
- ma_new() now pins read page
Other things:
- Removed some not needed arguments from _ma_new() and _ma_dispose)
- Added some new variables to simplify code
- If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes
storage/maria/ma_pagecache.h:
Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed
Added some defines for pagecache priority levels that one can use
storage/maria/ma_range.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_recovery.c:
- Added hooks for new translog types:
REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and
UNDO_KEY_DELETE_WITH_ROOT.
- Moved variable declarations to start of function (portability fixes)
- Removed some not needed initializations
- Set only relevant state changes for each redo/undo entry
storage/maria/lockman.c:
Removed end space
storage/maria/ma_check.c:
Removed end space
storage/maria/ma_create.c:
Removed end space
storage/maria/ma_locking.c:
Removed end space
storage/maria/ma_packrec.c:
Removed end space
storage/maria/ma_pagecache.c:
Removed end space
storage/maria/ma_panic.c:
Removed end space
storage/maria/ma_rt_index.c:
Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new()
Fixed indentation
storage/maria/ma_rt_key.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_rt_split.c:
Added new arguments for call to _ma_new()
Use new keypage header
Added new arguments for call to _ma_write_keypage()
storage/maria/ma_search.c:
Updated comments & indentation
Added new arguments for call to _ma_fetch_keypage()
Made some variables and arguments const
Added virtual functions for converting row position to number to be stored in key
use MARIA_RECORD_POS of record position instead of my_off_t
Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO)
storage/maria/ma_sort.c:
Removed end space
storage/maria/ma_statrec.c:
Updated arguments for call to _ma_rec_pos()
storage/maria/ma_test1.c:
Fixed too small buffer to init_pagecache()
Fixed bug when using insert_count and test_flag
storage/maria/ma_test2.c:
Use more resonable pagecache size
Remove not used code
Reset blob_length to fix wrong output message
storage/maria/ma_test_all.sh:
Fixed wrong test
storage/maria/ma_write.c:
Lots of new code to handle REDO of key pages
No logic changes because of REDO code, mostly adding new arguments and adding new code for logging
Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions
Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open()
Zerofill new used pages for:
- To remove possible sensitive data left in buffer
- To get idenitical data on pages after running redo
- Better compression of pages if archived
storage/maria/maria_chk.c:
Added information if table is crash safe
storage/maria/maria_def.h:
New virtual function to convert between record position on key and normal record position
Aded mutex and extra variables to handle locking of share->state.key_del
Moved some structure variables to get things more aligned
Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert
Added argument to MARIA_PINNED_PAGE to indicate if page was changed
Updated prototypes for functions
Added some structures for signaling changes in REDO handling
storage/maria/unittest/ma_pagecache_single.c:
Updated arguments for changed function calls
storage/myisam/mi_check.c:
Made calc_check_checksum virtual
storage/myisam/mi_checksum.c:
Update checksums to ignore null columns
storage/myisam/mi_create.c:
Mark if table has null column (to know when we have to use mi_checksum())
storage/myisam/mi_open.c:
Added virtual function for calculating checksum to be able to easily ignore NULL fields
storage/myisam/mi_test2.c:
Fixed bug
storage/myisam/myisamdef.h:
Added virtual function for calculating checksum during check table
Removed ha_key_cmp() as this is in handler.h
storage/maria/ma_key_recover.c:
New BitKeeper file ``storage/maria/ma_key_recover.c''
storage/maria/ma_key_recover.h:
New BitKeeper file ``storage/maria/ma_key_recover.h''
storage/maria/ma_key_redo.c:
New BitKeeper file ``storage/maria/ma_key_redo.c''
2007-11-14 18:08:06 +01:00
|
|
|
#ifdef EXTERNAL_LOCKING
|
2006-04-11 15:45:10 +02:00
|
|
|
share->state.process= share->last_process=share->this_process;
|
|
|
|
share->state.unique= info->last_unique= info->this_unique;
|
|
|
|
share->state.update_count= info->last_loop= ++info->this_loop;
|
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows
Checksum for MyISAM now ignores NULL and not used part of VARCHAR
Renamed some variables that caused shadow compiler warnings
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug.
pagecache_unlock_by_ulink() now has extra argument to say if page was changed.
Give error message if we fail to open control file
Mark page cache variables as not flushable
include/maria.h:
Made min page cache larger (needed for pinning key page)
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
Added write_comp_flag to move some runtime code to maria_open()
include/my_base.h:
Added new error message to be used when handler initialization failed
include/my_global.h:
Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables
include/my_handler.h:
Added const to some parameters
mysys/array.c:
More DBUG
mysys/my_error.c:
Fixed indentation
mysys/my_handler.c:
Added const to some parameters
Added missing error messages
sql/field.h:
Renamed variables to avoid variable shadowing
sql/handler.h:
Renamed parameter to avoid variable name conflict
sql/item.h:
Renamed variables to avoid variable shadowing
sql/log_event_old.h:
Renamed variables to avoid variable shadowing
sql/set_var.h:
Renamed variables to avoid variable shadowing
sql/sql_delete.cc:
Removed maria hack for temporary tables
Fixed indentation
sql/sql_table.cc:
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use.
sql/table.cc:
Copy page_checksum from share
Removed Maria hack
storage/maria/Makefile.am:
Added new files
storage/maria/ha_maria.cc:
Renamed records -> record_count and info -> create_info to avoid variable name conflicts
Mark page cache variables as not flushable
storage/maria/ma_blockrec.c:
Moved _ma_unpin_all_pages() to ma_key_recover.c
Moved init of info->pinned_pages to ma_open.c
Moved _ma_finalize_row() to maria_key_recover.h
Renamed some variables to avoid variable name conflicts
Mark page_link.changed for blocks we change directly
Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index)
storage/maria/ma_blockrec.h:
Removed extra empty line
storage/maria/ma_checkpoint.c:
Remove not needed trnman.h
storage/maria/ma_close.c:
Free pinned pages (which are now always allocated)
storage/maria/ma_control_file.c:
Give error message if we fail to open control file
storage/maria/ma_delete.c:
Changes for redo logging (first part, logging of underflow not yet done)
- Log undo-key-delete
- Log delete of key
- Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert()
- Added new arguments to some functions to be able to write redo information
- Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED
Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway
Changed 2 bmove_upp() to bmove() as this made code easer to understand
More function comments
Indentation fixes
storage/maria/ma_ft_update.c:
New arguments to _ma_write_keypage()
storage/maria/ma_loghandler.c:
Fixed some DBUG_PRINT messages
Simplify code
Added new log entrys for key page redo
Renamed some variables to avoid variable name shadowing
storage/maria/ma_loghandler.h:
Moved some defines here
Added define for storing key number on key pages
Added new translog record types
Added enum for type of operations in LOGREC_REDO_INDEX
storage/maria/ma_open.c:
Always allocate info.pinned_pages (we need now also for normal key page usage)
Update keyinfo->key_nr
Added virtual functions to convert record position o number to be stored on key pages
Update keyinfo->write_comp_flag to value of search flag to be used when writing key
storage/maria/ma_page.c:
Added redo for key pages
- Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE
- _ma_fetch_keypage() now pin's pages if needed
- Extended _ma_write_keypage() with type of locks to be used
- ma_dispose() now locks info->s->state.key_del from other threads
- ma_dispose() writes redo log record
- ma_new() locks info->s->state.key_del from other threads if it was used
- ma_new() now pins read page
Other things:
- Removed some not needed arguments from _ma_new() and _ma_dispose)
- Added some new variables to simplify code
- If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes
storage/maria/ma_pagecache.h:
Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed
Added some defines for pagecache priority levels that one can use
storage/maria/ma_range.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_recovery.c:
- Added hooks for new translog types:
REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and
UNDO_KEY_DELETE_WITH_ROOT.
- Moved variable declarations to start of function (portability fixes)
- Removed some not needed initializations
- Set only relevant state changes for each redo/undo entry
storage/maria/lockman.c:
Removed end space
storage/maria/ma_check.c:
Removed end space
storage/maria/ma_create.c:
Removed end space
storage/maria/ma_locking.c:
Removed end space
storage/maria/ma_packrec.c:
Removed end space
storage/maria/ma_pagecache.c:
Removed end space
storage/maria/ma_panic.c:
Removed end space
storage/maria/ma_rt_index.c:
Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new()
Fixed indentation
storage/maria/ma_rt_key.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_rt_split.c:
Added new arguments for call to _ma_new()
Use new keypage header
Added new arguments for call to _ma_write_keypage()
storage/maria/ma_search.c:
Updated comments & indentation
Added new arguments for call to _ma_fetch_keypage()
Made some variables and arguments const
Added virtual functions for converting row position to number to be stored in key
use MARIA_RECORD_POS of record position instead of my_off_t
Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO)
storage/maria/ma_sort.c:
Removed end space
storage/maria/ma_statrec.c:
Updated arguments for call to _ma_rec_pos()
storage/maria/ma_test1.c:
Fixed too small buffer to init_pagecache()
Fixed bug when using insert_count and test_flag
storage/maria/ma_test2.c:
Use more resonable pagecache size
Remove not used code
Reset blob_length to fix wrong output message
storage/maria/ma_test_all.sh:
Fixed wrong test
storage/maria/ma_write.c:
Lots of new code to handle REDO of key pages
No logic changes because of REDO code, mostly adding new arguments and adding new code for logging
Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions
Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open()
Zerofill new used pages for:
- To remove possible sensitive data left in buffer
- To get idenitical data on pages after running redo
- Better compression of pages if archived
storage/maria/maria_chk.c:
Added information if table is crash safe
storage/maria/maria_def.h:
New virtual function to convert between record position on key and normal record position
Aded mutex and extra variables to handle locking of share->state.key_del
Moved some structure variables to get things more aligned
Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert
Added argument to MARIA_PINNED_PAGE to indicate if page was changed
Updated prototypes for functions
Added some structures for signaling changes in REDO handling
storage/maria/unittest/ma_pagecache_single.c:
Updated arguments for changed function calls
storage/myisam/mi_check.c:
Made calc_check_checksum virtual
storage/myisam/mi_checksum.c:
Update checksums to ignore null columns
storage/myisam/mi_create.c:
Mark if table has null column (to know when we have to use mi_checksum())
storage/myisam/mi_open.c:
Added virtual function for calculating checksum to be able to easily ignore NULL fields
storage/myisam/mi_test2.c:
Fixed bug
storage/myisam/myisamdef.h:
Added virtual function for calculating checksum during check table
Removed ha_key_cmp() as this is in handler.h
storage/maria/ma_key_recover.c:
New BitKeeper file ``storage/maria/ma_key_recover.c''
storage/maria/ma_key_recover.h:
New BitKeeper file ``storage/maria/ma_key_recover.h''
storage/maria/ma_key_redo.c:
New BitKeeper file ``storage/maria/ma_key_redo.c''
2007-11-14 18:08:06 +01:00
|
|
|
#endif
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/* transactional tables rather flush their state at Checkpoint */
|
|
|
|
if (!share->base.born_transactional)
|
|
|
|
{
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
if (_ma_state_info_write_sub(share->kfile.file, &share->state, 1))
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
error= my_errno;
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* A value of 0 means below means "state flushed" */
|
|
|
|
share->changed= 0;
|
|
|
|
}
|
|
|
|
}
|
2006-04-11 15:45:10 +02:00
|
|
|
if (maria_flush)
|
|
|
|
{
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
if (_ma_sync_table_files(info))
|
2006-04-11 15:45:10 +02:00
|
|
|
error= my_errno;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
share->not_flushed=1;
|
|
|
|
if (error)
|
|
|
|
{
|
|
|
|
maria_print_error(info->s, HA_ERR_CRASHED);
|
|
|
|
maria_mark_crashed(info);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
|
|
|
|
info->lock_type= F_UNLCK;
|
WL#3072 Maria recovery
Misc changes:
- fix for benign Valgrind error, compiler warnings
- fix for a segfault in execution of maria_delete_all_rows() and one
when taking multiple checkpoints
- fix for too paranoid assertion
- adding ability to take checkpoints at the end of the REDO phase
and at the end of recovery.
- other minor changes
storage/maria/ha_maria.cc:
The checkpoint done after Recovery is finished, is moved to
maria_recover().
storage/maria/ma_bitmap.c:
fix for Valgrind error: the "shadow debug copy" of the bitmap page
started unitialized and so ma_print_bitmap() would use it uninitialized
storage/maria/ma_checkpoint.c:
* reset pointers to NULL after freeing them, or we segfault at
next checkpoint in my_realloc().
* fix for compiler warnings.
storage/maria/ma_delete_all.c:
info->trn is NULL for non-transactional tables
storage/maria/ma_locking.c:
correct assertion (it fired wrongly in execution of REDO_DROP_TABLE
due to the maria_extra(HA_PREPARE_FOR_DROP)->_ma_decrement_open_count()
->maria_lock_database(F_UNLCK); another solution would have been to
not call _ma_decrement_open_count() (it's ok to have a wrong open
count in a table which we are dropping), but the same problem
would still exist for REDO_RENAME_TABLE.
storage/maria/ma_loghandler.c:
fail early if UNRECOVERABLE_ERROR
storage/maria/ma_recovery.c:
* new argument to maria_apply_log(): should it take checkpoints
(at end of REDO phase and at the very end) or no.
* moving the call to translog_next_LSN() into
parse_checkpoint_record() ("hide the details").
* Refining an error detection for something which could happen
if there is a checkpoint record in the log.
* Using close_one_table() instead of maria_extra(HA_EXTRA_PREPARE_FOR_DROP|RENAME),
as it looks safer, and also changing how close_one_table() works:
it now limits itself to scanning all_tables[], thus having one loopp
instead of two, which should be faster (as a result, it does not
close tables not registered in this array, which is ok as there
should not be any).
storage/maria/ma_recovery.h:
new parameter
storage/maria/maria_read_log.c:
update to new prototype
2007-10-08 19:08:25 +02:00
|
|
|
/*
|
|
|
|
Verify that user of the table cleaned up after itself. Not in
|
|
|
|
recovery, as for example maria_extra(HA_EXTRA_PREPARE_FOR_RENAME) may
|
|
|
|
call us here, with transactionality temporarily disabled.
|
|
|
|
*/
|
|
|
|
DBUG_ASSERT(maria_in_recovery ||
|
|
|
|
share->now_transactional == share->base.born_transactional);
|
2006-04-11 15:45:10 +02:00
|
|
|
break;
|
|
|
|
case F_RDLCK:
|
|
|
|
if (info->lock_type == F_WRLCK)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
Change RW to READONLY
|
|
|
|
|
|
|
|
mysqld does not turn write locks to read locks,
|
|
|
|
so we're never here in mysqld.
|
|
|
|
*/
|
|
|
|
share->w_locks--;
|
|
|
|
share->r_locks++;
|
|
|
|
info->lock_type=lock_type;
|
|
|
|
break;
|
|
|
|
}
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
#ifdef MARIA_EXTERNAL_LOCKING
|
2006-04-11 15:45:10 +02:00
|
|
|
if (!share->r_locks && !share->w_locks)
|
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/* note that a transactional table should not do this */
|
2007-09-03 11:05:17 +02:00
|
|
|
if (_ma_state_info_read_dsk(share->kfile.file, &share->state))
|
2006-04-11 15:45:10 +02:00
|
|
|
{
|
|
|
|
error=my_errno;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
#endif
|
2006-04-11 15:45:10 +02:00
|
|
|
VOID(_ma_test_if_changed(info));
|
|
|
|
share->r_locks++;
|
|
|
|
share->tot_locks++;
|
|
|
|
info->lock_type=lock_type;
|
|
|
|
break;
|
|
|
|
case F_WRLCK:
|
|
|
|
if (info->lock_type == F_RDLCK)
|
|
|
|
{ /* Change READONLY to RW */
|
|
|
|
if (share->r_locks == 1)
|
|
|
|
{
|
|
|
|
share->r_locks--;
|
|
|
|
share->w_locks++;
|
|
|
|
info->lock_type=lock_type;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
#ifdef MARIA_EXTERNAL_LOCKING
|
2006-04-11 15:45:10 +02:00
|
|
|
if (!(share->options & HA_OPTION_READ_ONLY_DATA))
|
|
|
|
{
|
|
|
|
if (!share->w_locks)
|
|
|
|
{
|
|
|
|
if (!share->r_locks)
|
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/*
|
|
|
|
Note that transactional tables should not do this.
|
|
|
|
If we enabled this code, we should make sure to skip it if
|
|
|
|
born_transactional is true. We should not test
|
|
|
|
now_transactional to decide if we can call
|
|
|
|
_ma_state_info_read_dsk(), because it can temporarily be 0
|
|
|
|
(TRUNCATE on a partitioned table) and thus it would make a state
|
|
|
|
modification below without mutex, confusing a concurrent
|
|
|
|
checkpoint running.
|
|
|
|
Even if this code was enabled only for non-transactional tables:
|
|
|
|
in scenario LOCK TABLE t1 WRITE; INSERT INTO t1; DELETE FROM t1;
|
|
|
|
state on disk read by DELETE is obsolete as it was not flushed
|
|
|
|
at the end of INSERT. MyISAM same. It however causes no issue as
|
|
|
|
maria_delete_all_rows() calls _ma_reset_status() thus is not
|
|
|
|
influenced by the obsolete read values.
|
|
|
|
*/
|
2007-09-03 11:05:17 +02:00
|
|
|
if (_ma_state_info_read_dsk(share->kfile.file, &share->state))
|
2006-04-11 15:45:10 +02:00
|
|
|
{
|
|
|
|
error=my_errno;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
#endif /* defined(MARIA_EXTERNAL_LOCKING) */
|
2006-04-11 15:45:10 +02:00
|
|
|
VOID(_ma_test_if_changed(info));
|
|
|
|
|
|
|
|
info->lock_type=lock_type;
|
|
|
|
info->invalidator=info->s->invalidator;
|
|
|
|
share->w_locks++;
|
|
|
|
share->tot_locks++;
|
|
|
|
break;
|
|
|
|
default:
|
2007-06-07 21:51:11 +02:00
|
|
|
DBUG_ASSERT(0);
|
2006-04-11 15:45:10 +02:00
|
|
|
break; /* Impossible */
|
|
|
|
}
|
|
|
|
}
|
2006-10-11 18:30:16 +02:00
|
|
|
#ifdef __WIN__
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
Check for bad file descriptors if this table is part
|
|
|
|
of a merge union. Failing to capture this may cause
|
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows
Checksum for MyISAM now ignores NULL and not used part of VARCHAR
Renamed some variables that caused shadow compiler warnings
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug.
pagecache_unlock_by_ulink() now has extra argument to say if page was changed.
Give error message if we fail to open control file
Mark page cache variables as not flushable
include/maria.h:
Made min page cache larger (needed for pinning key page)
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
Added write_comp_flag to move some runtime code to maria_open()
include/my_base.h:
Added new error message to be used when handler initialization failed
include/my_global.h:
Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables
include/my_handler.h:
Added const to some parameters
mysys/array.c:
More DBUG
mysys/my_error.c:
Fixed indentation
mysys/my_handler.c:
Added const to some parameters
Added missing error messages
sql/field.h:
Renamed variables to avoid variable shadowing
sql/handler.h:
Renamed parameter to avoid variable name conflict
sql/item.h:
Renamed variables to avoid variable shadowing
sql/log_event_old.h:
Renamed variables to avoid variable shadowing
sql/set_var.h:
Renamed variables to avoid variable shadowing
sql/sql_delete.cc:
Removed maria hack for temporary tables
Fixed indentation
sql/sql_table.cc:
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use.
sql/table.cc:
Copy page_checksum from share
Removed Maria hack
storage/maria/Makefile.am:
Added new files
storage/maria/ha_maria.cc:
Renamed records -> record_count and info -> create_info to avoid variable name conflicts
Mark page cache variables as not flushable
storage/maria/ma_blockrec.c:
Moved _ma_unpin_all_pages() to ma_key_recover.c
Moved init of info->pinned_pages to ma_open.c
Moved _ma_finalize_row() to maria_key_recover.h
Renamed some variables to avoid variable name conflicts
Mark page_link.changed for blocks we change directly
Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index)
storage/maria/ma_blockrec.h:
Removed extra empty line
storage/maria/ma_checkpoint.c:
Remove not needed trnman.h
storage/maria/ma_close.c:
Free pinned pages (which are now always allocated)
storage/maria/ma_control_file.c:
Give error message if we fail to open control file
storage/maria/ma_delete.c:
Changes for redo logging (first part, logging of underflow not yet done)
- Log undo-key-delete
- Log delete of key
- Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert()
- Added new arguments to some functions to be able to write redo information
- Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED
Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway
Changed 2 bmove_upp() to bmove() as this made code easer to understand
More function comments
Indentation fixes
storage/maria/ma_ft_update.c:
New arguments to _ma_write_keypage()
storage/maria/ma_loghandler.c:
Fixed some DBUG_PRINT messages
Simplify code
Added new log entrys for key page redo
Renamed some variables to avoid variable name shadowing
storage/maria/ma_loghandler.h:
Moved some defines here
Added define for storing key number on key pages
Added new translog record types
Added enum for type of operations in LOGREC_REDO_INDEX
storage/maria/ma_open.c:
Always allocate info.pinned_pages (we need now also for normal key page usage)
Update keyinfo->key_nr
Added virtual functions to convert record position o number to be stored on key pages
Update keyinfo->write_comp_flag to value of search flag to be used when writing key
storage/maria/ma_page.c:
Added redo for key pages
- Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE
- _ma_fetch_keypage() now pin's pages if needed
- Extended _ma_write_keypage() with type of locks to be used
- ma_dispose() now locks info->s->state.key_del from other threads
- ma_dispose() writes redo log record
- ma_new() locks info->s->state.key_del from other threads if it was used
- ma_new() now pins read page
Other things:
- Removed some not needed arguments from _ma_new() and _ma_dispose)
- Added some new variables to simplify code
- If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes
storage/maria/ma_pagecache.h:
Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed
Added some defines for pagecache priority levels that one can use
storage/maria/ma_range.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_recovery.c:
- Added hooks for new translog types:
REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and
UNDO_KEY_DELETE_WITH_ROOT.
- Moved variable declarations to start of function (portability fixes)
- Removed some not needed initializations
- Set only relevant state changes for each redo/undo entry
storage/maria/lockman.c:
Removed end space
storage/maria/ma_check.c:
Removed end space
storage/maria/ma_create.c:
Removed end space
storage/maria/ma_locking.c:
Removed end space
storage/maria/ma_packrec.c:
Removed end space
storage/maria/ma_pagecache.c:
Removed end space
storage/maria/ma_panic.c:
Removed end space
storage/maria/ma_rt_index.c:
Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new()
Fixed indentation
storage/maria/ma_rt_key.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_rt_split.c:
Added new arguments for call to _ma_new()
Use new keypage header
Added new arguments for call to _ma_write_keypage()
storage/maria/ma_search.c:
Updated comments & indentation
Added new arguments for call to _ma_fetch_keypage()
Made some variables and arguments const
Added virtual functions for converting row position to number to be stored in key
use MARIA_RECORD_POS of record position instead of my_off_t
Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO)
storage/maria/ma_sort.c:
Removed end space
storage/maria/ma_statrec.c:
Updated arguments for call to _ma_rec_pos()
storage/maria/ma_test1.c:
Fixed too small buffer to init_pagecache()
Fixed bug when using insert_count and test_flag
storage/maria/ma_test2.c:
Use more resonable pagecache size
Remove not used code
Reset blob_length to fix wrong output message
storage/maria/ma_test_all.sh:
Fixed wrong test
storage/maria/ma_write.c:
Lots of new code to handle REDO of key pages
No logic changes because of REDO code, mostly adding new arguments and adding new code for logging
Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions
Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open()
Zerofill new used pages for:
- To remove possible sensitive data left in buffer
- To get idenitical data on pages after running redo
- Better compression of pages if archived
storage/maria/maria_chk.c:
Added information if table is crash safe
storage/maria/maria_def.h:
New virtual function to convert between record position on key and normal record position
Aded mutex and extra variables to handle locking of share->state.key_del
Moved some structure variables to get things more aligned
Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert
Added argument to MARIA_PINNED_PAGE to indicate if page was changed
Updated prototypes for functions
Added some structures for signaling changes in REDO handling
storage/maria/unittest/ma_pagecache_single.c:
Updated arguments for changed function calls
storage/myisam/mi_check.c:
Made calc_check_checksum virtual
storage/myisam/mi_checksum.c:
Update checksums to ignore null columns
storage/myisam/mi_create.c:
Mark if table has null column (to know when we have to use mi_checksum())
storage/myisam/mi_open.c:
Added virtual function for calculating checksum to be able to easily ignore NULL fields
storage/myisam/mi_test2.c:
Fixed bug
storage/myisam/myisamdef.h:
Added virtual function for calculating checksum during check table
Removed ha_key_cmp() as this is in handler.h
storage/maria/ma_key_recover.c:
New BitKeeper file ``storage/maria/ma_key_recover.c''
storage/maria/ma_key_recover.h:
New BitKeeper file ``storage/maria/ma_key_recover.h''
storage/maria/ma_key_redo.c:
New BitKeeper file ``storage/maria/ma_key_redo.c''
2007-11-14 18:08:06 +01:00
|
|
|
a crash on windows if the table is renamed and
|
2006-10-11 18:30:16 +02:00
|
|
|
later on referenced by the merge table.
|
|
|
|
*/
|
2007-04-04 22:37:09 +02:00
|
|
|
if( info->owned_by_merge && (info->s)->kfile.file < 0 )
|
2006-10-11 18:30:16 +02:00
|
|
|
{
|
|
|
|
error = HA_ERR_NO_SUCH_TABLE;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
2006-04-11 15:45:10 +02:00
|
|
|
pthread_mutex_unlock(&share->intern_lock);
|
|
|
|
DBUG_RETURN(error);
|
|
|
|
} /* maria_lock_database */
|
|
|
|
|
|
|
|
|
|
|
|
/****************************************************************************
|
|
|
|
The following functions are called by thr_lock() in threaded applications
|
|
|
|
****************************************************************************/
|
|
|
|
|
|
|
|
/*
|
|
|
|
Create a copy of the current status for the table
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
_ma_get_status()
|
|
|
|
param Pointer to Myisam handler
|
|
|
|
concurrent_insert Set to 1 if we are going to do concurrent inserts
|
|
|
|
(THR_WRITE_CONCURRENT_INSERT was used)
|
|
|
|
*/
|
|
|
|
|
|
|
|
void _ma_get_status(void* param, int concurrent_insert)
|
|
|
|
{
|
|
|
|
MARIA_HA *info=(MARIA_HA*) param;
|
|
|
|
DBUG_ENTER("_ma_get_status");
|
|
|
|
DBUG_PRINT("info",("key_file: %ld data_file: %ld concurrent_insert: %d",
|
|
|
|
(long) info->s->state.state.key_file_length,
|
|
|
|
(long) info->s->state.state.data_file_length,
|
|
|
|
concurrent_insert));
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
if (info->state->key_file_length > info->s->state.state.key_file_length ||
|
|
|
|
info->state->data_file_length > info->s->state.state.data_file_length)
|
|
|
|
DBUG_PRINT("warning",("old info: key_file: %ld data_file: %ld",
|
|
|
|
(long) info->state->key_file_length,
|
|
|
|
(long) info->state->data_file_length));
|
|
|
|
#endif
|
|
|
|
info->save_state=info->s->state.state;
|
|
|
|
info->state= &info->save_state;
|
|
|
|
info->append_insert_at_end= concurrent_insert;
|
|
|
|
DBUG_VOID_RETURN;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
void _ma_update_status(void* param)
|
|
|
|
{
|
|
|
|
MARIA_HA *info=(MARIA_HA*) param;
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
MARIA_SHARE *share= info->s;
|
2006-04-11 15:45:10 +02:00
|
|
|
/*
|
|
|
|
Because someone may have closed the table we point at, we only
|
|
|
|
update the state if its our own state. This isn't a problem as
|
|
|
|
we are always pointing at our own lock or at a read lock.
|
|
|
|
(This is enforced by thr_multi_lock.c)
|
|
|
|
*/
|
|
|
|
if (info->state == &info->save_state)
|
|
|
|
{
|
|
|
|
#ifndef DBUG_OFF
|
|
|
|
DBUG_PRINT("info",("updating status: key_file: %ld data_file: %ld",
|
|
|
|
(long) info->state->key_file_length,
|
|
|
|
(long) info->state->data_file_length));
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
if (info->state->key_file_length < share->state.state.key_file_length ||
|
|
|
|
info->state->data_file_length < share->state.state.data_file_length)
|
2006-04-11 15:45:10 +02:00
|
|
|
DBUG_PRINT("warning",("old info: key_file: %ld data_file: %ld",
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
(long) share->state.state.key_file_length,
|
|
|
|
(long) share->state.state.data_file_length));
|
2006-04-11 15:45:10 +02:00
|
|
|
#endif
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/*
|
|
|
|
we are going to modify the state without lock's log, this would break
|
|
|
|
recovery if done with a transactional table.
|
|
|
|
*/
|
|
|
|
DBUG_ASSERT(!info->s->base.born_transactional);
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
share->state.state= *info->state;
|
|
|
|
info->state= &share->state.state;
|
2006-04-11 15:45:10 +02:00
|
|
|
}
|
|
|
|
info->append_insert_at_end= 0;
|
|
|
|
}
|
|
|
|
|
2007-03-01 18:23:58 +01:00
|
|
|
|
|
|
|
void _ma_restore_status(void *param)
|
|
|
|
{
|
|
|
|
MARIA_HA *info= (MARIA_HA*) param;
|
|
|
|
info->state= &info->s->state.state;
|
|
|
|
info->append_insert_at_end= 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2006-04-11 15:45:10 +02:00
|
|
|
void _ma_copy_status(void* to,void *from)
|
|
|
|
{
|
|
|
|
((MARIA_HA*) to)->state= &((MARIA_HA*) from)->save_state;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Check if should allow concurrent inserts
|
|
|
|
|
|
|
|
IMPLEMENTATION
|
|
|
|
Allow concurrent inserts if we don't have a hole in the table or
|
|
|
|
if there is no active write lock and there is active read locks and
|
|
|
|
maria_concurrent_insert == 2. In this last case the new
|
|
|
|
row('s) are inserted at end of file instead of filling up the hole.
|
|
|
|
|
|
|
|
The last case is to allow one to inserts into a heavily read-used table
|
|
|
|
even if there is holes.
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
If there is a an rtree indexes in the table, concurrent inserts are
|
|
|
|
disabled in maria_open()
|
|
|
|
|
|
|
|
RETURN
|
|
|
|
0 ok to use concurrent inserts
|
|
|
|
1 not ok
|
|
|
|
*/
|
|
|
|
|
|
|
|
my_bool _ma_check_status(void *param)
|
|
|
|
{
|
|
|
|
MARIA_HA *info=(MARIA_HA*) param;
|
|
|
|
/*
|
|
|
|
The test for w_locks == 1 is here because this thread has already done an
|
|
|
|
external lock (in other words: w_locks == 1 means no other threads has
|
|
|
|
a write lock)
|
|
|
|
*/
|
|
|
|
DBUG_PRINT("info",("dellink: %ld r_locks: %u w_locks: %u",
|
|
|
|
(long) info->s->state.dellink, (uint) info->s->r_locks,
|
|
|
|
(uint) info->s->w_locks));
|
|
|
|
return (my_bool) !(info->s->state.dellink == HA_OFFSET_ERROR ||
|
|
|
|
(maria_concurrent_insert == 2 && info->s->r_locks &&
|
|
|
|
info->s->w_locks == 1));
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/****************************************************************************
|
|
|
|
** functions to read / write the state
|
|
|
|
****************************************************************************/
|
|
|
|
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
int _ma_readinfo(register MARIA_HA *info __attribute__ ((unused)),
|
|
|
|
int lock_type __attribute__ ((unused)),
|
|
|
|
int check_keybuffer __attribute__ ((unused)))
|
2006-04-11 15:45:10 +02:00
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
#ifdef MARIA_EXTERNAL_LOCKING
|
2006-04-11 15:45:10 +02:00
|
|
|
DBUG_ENTER("_ma_readinfo");
|
|
|
|
|
|
|
|
if (info->lock_type == F_UNLCK)
|
|
|
|
{
|
|
|
|
MARIA_SHARE *share=info->s;
|
|
|
|
if (!share->tot_locks)
|
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/* should not be done for transactional tables */
|
2007-09-03 11:05:17 +02:00
|
|
|
if (_ma_state_info_read_dsk(share->kfile.file, &share->state))
|
2006-04-11 15:45:10 +02:00
|
|
|
{
|
2007-12-04 22:23:42 +01:00
|
|
|
if (!my_errno)
|
|
|
|
my_errno= HA_ERR_FILE_TOO_SHORT;
|
2006-04-11 15:45:10 +02:00
|
|
|
DBUG_RETURN(1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (check_keybuffer)
|
|
|
|
VOID(_ma_test_if_changed(info));
|
|
|
|
info->invalidator=info->s->invalidator;
|
|
|
|
}
|
|
|
|
else if (lock_type == F_WRLCK && info->lock_type == F_RDLCK)
|
|
|
|
{
|
|
|
|
my_errno=EACCES; /* Not allowed to change */
|
|
|
|
DBUG_RETURN(-1); /* when have read_lock() */
|
|
|
|
}
|
|
|
|
DBUG_RETURN(0);
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
#else
|
|
|
|
return 0;
|
|
|
|
#endif /* defined(MARIA_EXTERNAL_LOCKING) */
|
2006-04-11 15:45:10 +02:00
|
|
|
} /* _ma_readinfo */
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Every isam-function that uppdates the isam-database MUST end with this
|
|
|
|
request
|
2007-09-09 18:15:10 +02:00
|
|
|
|
|
|
|
NOTES
|
|
|
|
my_errno is not changed if this succeeds!
|
2006-04-11 15:45:10 +02:00
|
|
|
*/
|
|
|
|
|
|
|
|
int _ma_writeinfo(register MARIA_HA *info, uint operation)
|
|
|
|
{
|
|
|
|
int error,olderror;
|
2007-01-18 20:38:14 +01:00
|
|
|
MARIA_SHARE *share= info->s;
|
2006-04-11 15:45:10 +02:00
|
|
|
DBUG_ENTER("_ma_writeinfo");
|
|
|
|
DBUG_PRINT("info",("operation: %u tot_locks: %u", operation,
|
|
|
|
share->tot_locks));
|
|
|
|
|
|
|
|
error=0;
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
if (share->tot_locks == 0 && !share->base.born_transactional)
|
2006-04-11 15:45:10 +02:00
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/* transactional tables flush their state at Checkpoint */
|
2006-04-11 15:45:10 +02:00
|
|
|
if (operation)
|
|
|
|
{ /* Two threads can't be here */
|
2007-01-18 20:38:14 +01:00
|
|
|
olderror= my_errno; /* Remember last error */
|
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows
Checksum for MyISAM now ignores NULL and not used part of VARCHAR
Renamed some variables that caused shadow compiler warnings
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug.
pagecache_unlock_by_ulink() now has extra argument to say if page was changed.
Give error message if we fail to open control file
Mark page cache variables as not flushable
include/maria.h:
Made min page cache larger (needed for pinning key page)
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
Added write_comp_flag to move some runtime code to maria_open()
include/my_base.h:
Added new error message to be used when handler initialization failed
include/my_global.h:
Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables
include/my_handler.h:
Added const to some parameters
mysys/array.c:
More DBUG
mysys/my_error.c:
Fixed indentation
mysys/my_handler.c:
Added const to some parameters
Added missing error messages
sql/field.h:
Renamed variables to avoid variable shadowing
sql/handler.h:
Renamed parameter to avoid variable name conflict
sql/item.h:
Renamed variables to avoid variable shadowing
sql/log_event_old.h:
Renamed variables to avoid variable shadowing
sql/set_var.h:
Renamed variables to avoid variable shadowing
sql/sql_delete.cc:
Removed maria hack for temporary tables
Fixed indentation
sql/sql_table.cc:
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use.
sql/table.cc:
Copy page_checksum from share
Removed Maria hack
storage/maria/Makefile.am:
Added new files
storage/maria/ha_maria.cc:
Renamed records -> record_count and info -> create_info to avoid variable name conflicts
Mark page cache variables as not flushable
storage/maria/ma_blockrec.c:
Moved _ma_unpin_all_pages() to ma_key_recover.c
Moved init of info->pinned_pages to ma_open.c
Moved _ma_finalize_row() to maria_key_recover.h
Renamed some variables to avoid variable name conflicts
Mark page_link.changed for blocks we change directly
Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index)
storage/maria/ma_blockrec.h:
Removed extra empty line
storage/maria/ma_checkpoint.c:
Remove not needed trnman.h
storage/maria/ma_close.c:
Free pinned pages (which are now always allocated)
storage/maria/ma_control_file.c:
Give error message if we fail to open control file
storage/maria/ma_delete.c:
Changes for redo logging (first part, logging of underflow not yet done)
- Log undo-key-delete
- Log delete of key
- Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert()
- Added new arguments to some functions to be able to write redo information
- Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED
Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway
Changed 2 bmove_upp() to bmove() as this made code easer to understand
More function comments
Indentation fixes
storage/maria/ma_ft_update.c:
New arguments to _ma_write_keypage()
storage/maria/ma_loghandler.c:
Fixed some DBUG_PRINT messages
Simplify code
Added new log entrys for key page redo
Renamed some variables to avoid variable name shadowing
storage/maria/ma_loghandler.h:
Moved some defines here
Added define for storing key number on key pages
Added new translog record types
Added enum for type of operations in LOGREC_REDO_INDEX
storage/maria/ma_open.c:
Always allocate info.pinned_pages (we need now also for normal key page usage)
Update keyinfo->key_nr
Added virtual functions to convert record position o number to be stored on key pages
Update keyinfo->write_comp_flag to value of search flag to be used when writing key
storage/maria/ma_page.c:
Added redo for key pages
- Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE
- _ma_fetch_keypage() now pin's pages if needed
- Extended _ma_write_keypage() with type of locks to be used
- ma_dispose() now locks info->s->state.key_del from other threads
- ma_dispose() writes redo log record
- ma_new() locks info->s->state.key_del from other threads if it was used
- ma_new() now pins read page
Other things:
- Removed some not needed arguments from _ma_new() and _ma_dispose)
- Added some new variables to simplify code
- If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes
storage/maria/ma_pagecache.h:
Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed
Added some defines for pagecache priority levels that one can use
storage/maria/ma_range.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_recovery.c:
- Added hooks for new translog types:
REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and
UNDO_KEY_DELETE_WITH_ROOT.
- Moved variable declarations to start of function (portability fixes)
- Removed some not needed initializations
- Set only relevant state changes for each redo/undo entry
storage/maria/lockman.c:
Removed end space
storage/maria/ma_check.c:
Removed end space
storage/maria/ma_create.c:
Removed end space
storage/maria/ma_locking.c:
Removed end space
storage/maria/ma_packrec.c:
Removed end space
storage/maria/ma_pagecache.c:
Removed end space
storage/maria/ma_panic.c:
Removed end space
storage/maria/ma_rt_index.c:
Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new()
Fixed indentation
storage/maria/ma_rt_key.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_rt_split.c:
Added new arguments for call to _ma_new()
Use new keypage header
Added new arguments for call to _ma_write_keypage()
storage/maria/ma_search.c:
Updated comments & indentation
Added new arguments for call to _ma_fetch_keypage()
Made some variables and arguments const
Added virtual functions for converting row position to number to be stored in key
use MARIA_RECORD_POS of record position instead of my_off_t
Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO)
storage/maria/ma_sort.c:
Removed end space
storage/maria/ma_statrec.c:
Updated arguments for call to _ma_rec_pos()
storage/maria/ma_test1.c:
Fixed too small buffer to init_pagecache()
Fixed bug when using insert_count and test_flag
storage/maria/ma_test2.c:
Use more resonable pagecache size
Remove not used code
Reset blob_length to fix wrong output message
storage/maria/ma_test_all.sh:
Fixed wrong test
storage/maria/ma_write.c:
Lots of new code to handle REDO of key pages
No logic changes because of REDO code, mostly adding new arguments and adding new code for logging
Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions
Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open()
Zerofill new used pages for:
- To remove possible sensitive data left in buffer
- To get idenitical data on pages after running redo
- Better compression of pages if archived
storage/maria/maria_chk.c:
Added information if table is crash safe
storage/maria/maria_def.h:
New virtual function to convert between record position on key and normal record position
Aded mutex and extra variables to handle locking of share->state.key_del
Moved some structure variables to get things more aligned
Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert
Added argument to MARIA_PINNED_PAGE to indicate if page was changed
Updated prototypes for functions
Added some structures for signaling changes in REDO handling
storage/maria/unittest/ma_pagecache_single.c:
Updated arguments for changed function calls
storage/myisam/mi_check.c:
Made calc_check_checksum virtual
storage/myisam/mi_checksum.c:
Update checksums to ignore null columns
storage/myisam/mi_create.c:
Mark if table has null column (to know when we have to use mi_checksum())
storage/myisam/mi_open.c:
Added virtual function for calculating checksum to be able to easily ignore NULL fields
storage/myisam/mi_test2.c:
Fixed bug
storage/myisam/myisamdef.h:
Added virtual function for calculating checksum during check table
Removed ha_key_cmp() as this is in handler.h
storage/maria/ma_key_recover.c:
New BitKeeper file ``storage/maria/ma_key_recover.c''
storage/maria/ma_key_recover.h:
New BitKeeper file ``storage/maria/ma_key_recover.h''
storage/maria/ma_key_redo.c:
New BitKeeper file ``storage/maria/ma_key_redo.c''
2007-11-14 18:08:06 +01:00
|
|
|
|
|
|
|
#ifdef EXTERNAL_LOCKING
|
|
|
|
/*
|
|
|
|
The following only makes sense if we want to be allow two different
|
|
|
|
processes access the same table at the same time
|
|
|
|
*/
|
2006-04-11 15:45:10 +02:00
|
|
|
share->state.process= share->last_process= share->this_process;
|
|
|
|
share->state.unique= info->last_unique= info->this_unique;
|
|
|
|
share->state.update_count= info->last_loop= ++info->this_loop;
|
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows
Checksum for MyISAM now ignores NULL and not used part of VARCHAR
Renamed some variables that caused shadow compiler warnings
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug.
pagecache_unlock_by_ulink() now has extra argument to say if page was changed.
Give error message if we fail to open control file
Mark page cache variables as not flushable
include/maria.h:
Made min page cache larger (needed for pinning key page)
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
Added write_comp_flag to move some runtime code to maria_open()
include/my_base.h:
Added new error message to be used when handler initialization failed
include/my_global.h:
Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables
include/my_handler.h:
Added const to some parameters
mysys/array.c:
More DBUG
mysys/my_error.c:
Fixed indentation
mysys/my_handler.c:
Added const to some parameters
Added missing error messages
sql/field.h:
Renamed variables to avoid variable shadowing
sql/handler.h:
Renamed parameter to avoid variable name conflict
sql/item.h:
Renamed variables to avoid variable shadowing
sql/log_event_old.h:
Renamed variables to avoid variable shadowing
sql/set_var.h:
Renamed variables to avoid variable shadowing
sql/sql_delete.cc:
Removed maria hack for temporary tables
Fixed indentation
sql/sql_table.cc:
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use.
sql/table.cc:
Copy page_checksum from share
Removed Maria hack
storage/maria/Makefile.am:
Added new files
storage/maria/ha_maria.cc:
Renamed records -> record_count and info -> create_info to avoid variable name conflicts
Mark page cache variables as not flushable
storage/maria/ma_blockrec.c:
Moved _ma_unpin_all_pages() to ma_key_recover.c
Moved init of info->pinned_pages to ma_open.c
Moved _ma_finalize_row() to maria_key_recover.h
Renamed some variables to avoid variable name conflicts
Mark page_link.changed for blocks we change directly
Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index)
storage/maria/ma_blockrec.h:
Removed extra empty line
storage/maria/ma_checkpoint.c:
Remove not needed trnman.h
storage/maria/ma_close.c:
Free pinned pages (which are now always allocated)
storage/maria/ma_control_file.c:
Give error message if we fail to open control file
storage/maria/ma_delete.c:
Changes for redo logging (first part, logging of underflow not yet done)
- Log undo-key-delete
- Log delete of key
- Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert()
- Added new arguments to some functions to be able to write redo information
- Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED
Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway
Changed 2 bmove_upp() to bmove() as this made code easer to understand
More function comments
Indentation fixes
storage/maria/ma_ft_update.c:
New arguments to _ma_write_keypage()
storage/maria/ma_loghandler.c:
Fixed some DBUG_PRINT messages
Simplify code
Added new log entrys for key page redo
Renamed some variables to avoid variable name shadowing
storage/maria/ma_loghandler.h:
Moved some defines here
Added define for storing key number on key pages
Added new translog record types
Added enum for type of operations in LOGREC_REDO_INDEX
storage/maria/ma_open.c:
Always allocate info.pinned_pages (we need now also for normal key page usage)
Update keyinfo->key_nr
Added virtual functions to convert record position o number to be stored on key pages
Update keyinfo->write_comp_flag to value of search flag to be used when writing key
storage/maria/ma_page.c:
Added redo for key pages
- Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE
- _ma_fetch_keypage() now pin's pages if needed
- Extended _ma_write_keypage() with type of locks to be used
- ma_dispose() now locks info->s->state.key_del from other threads
- ma_dispose() writes redo log record
- ma_new() locks info->s->state.key_del from other threads if it was used
- ma_new() now pins read page
Other things:
- Removed some not needed arguments from _ma_new() and _ma_dispose)
- Added some new variables to simplify code
- If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes
storage/maria/ma_pagecache.h:
Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed
Added some defines for pagecache priority levels that one can use
storage/maria/ma_range.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_recovery.c:
- Added hooks for new translog types:
REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and
UNDO_KEY_DELETE_WITH_ROOT.
- Moved variable declarations to start of function (portability fixes)
- Removed some not needed initializations
- Set only relevant state changes for each redo/undo entry
storage/maria/lockman.c:
Removed end space
storage/maria/ma_check.c:
Removed end space
storage/maria/ma_create.c:
Removed end space
storage/maria/ma_locking.c:
Removed end space
storage/maria/ma_packrec.c:
Removed end space
storage/maria/ma_pagecache.c:
Removed end space
storage/maria/ma_panic.c:
Removed end space
storage/maria/ma_rt_index.c:
Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new()
Fixed indentation
storage/maria/ma_rt_key.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_rt_split.c:
Added new arguments for call to _ma_new()
Use new keypage header
Added new arguments for call to _ma_write_keypage()
storage/maria/ma_search.c:
Updated comments & indentation
Added new arguments for call to _ma_fetch_keypage()
Made some variables and arguments const
Added virtual functions for converting row position to number to be stored in key
use MARIA_RECORD_POS of record position instead of my_off_t
Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO)
storage/maria/ma_sort.c:
Removed end space
storage/maria/ma_statrec.c:
Updated arguments for call to _ma_rec_pos()
storage/maria/ma_test1.c:
Fixed too small buffer to init_pagecache()
Fixed bug when using insert_count and test_flag
storage/maria/ma_test2.c:
Use more resonable pagecache size
Remove not used code
Reset blob_length to fix wrong output message
storage/maria/ma_test_all.sh:
Fixed wrong test
storage/maria/ma_write.c:
Lots of new code to handle REDO of key pages
No logic changes because of REDO code, mostly adding new arguments and adding new code for logging
Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions
Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open()
Zerofill new used pages for:
- To remove possible sensitive data left in buffer
- To get idenitical data on pages after running redo
- Better compression of pages if archived
storage/maria/maria_chk.c:
Added information if table is crash safe
storage/maria/maria_def.h:
New virtual function to convert between record position on key and normal record position
Aded mutex and extra variables to handle locking of share->state.key_del
Moved some structure variables to get things more aligned
Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert
Added argument to MARIA_PINNED_PAGE to indicate if page was changed
Updated prototypes for functions
Added some structures for signaling changes in REDO handling
storage/maria/unittest/ma_pagecache_single.c:
Updated arguments for changed function calls
storage/myisam/mi_check.c:
Made calc_check_checksum virtual
storage/myisam/mi_checksum.c:
Update checksums to ignore null columns
storage/myisam/mi_create.c:
Mark if table has null column (to know when we have to use mi_checksum())
storage/myisam/mi_open.c:
Added virtual function for calculating checksum to be able to easily ignore NULL fields
storage/myisam/mi_test2.c:
Fixed bug
storage/myisam/myisamdef.h:
Added virtual function for calculating checksum during check table
Removed ha_key_cmp() as this is in handler.h
storage/maria/ma_key_recover.c:
New BitKeeper file ``storage/maria/ma_key_recover.c''
storage/maria/ma_key_recover.h:
New BitKeeper file ``storage/maria/ma_key_recover.h''
storage/maria/ma_key_redo.c:
New BitKeeper file ``storage/maria/ma_key_redo.c''
2007-11-14 18:08:06 +01:00
|
|
|
#endif
|
|
|
|
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
if ((error= _ma_state_info_write_sub(share->kfile.file,
|
|
|
|
&share->state, 1)))
|
2006-04-11 15:45:10 +02:00
|
|
|
olderror=my_errno;
|
|
|
|
#ifdef __WIN__
|
|
|
|
if (maria_flush)
|
|
|
|
{
|
2007-04-04 22:37:09 +02:00
|
|
|
_commit(share->kfile.file);
|
|
|
|
_commit(info->dfile.file);
|
2006-04-11 15:45:10 +02:00
|
|
|
}
|
|
|
|
#endif
|
2007-01-18 20:38:14 +01:00
|
|
|
my_errno=olderror;
|
2006-04-11 15:45:10 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
else if (operation)
|
|
|
|
share->changed= 1; /* Mark keyfile changed */
|
|
|
|
DBUG_RETURN(error);
|
|
|
|
} /* _ma_writeinfo */
|
|
|
|
|
|
|
|
|
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows
Checksum for MyISAM now ignores NULL and not used part of VARCHAR
Renamed some variables that caused shadow compiler warnings
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug.
pagecache_unlock_by_ulink() now has extra argument to say if page was changed.
Give error message if we fail to open control file
Mark page cache variables as not flushable
include/maria.h:
Made min page cache larger (needed for pinning key page)
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
Added write_comp_flag to move some runtime code to maria_open()
include/my_base.h:
Added new error message to be used when handler initialization failed
include/my_global.h:
Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables
include/my_handler.h:
Added const to some parameters
mysys/array.c:
More DBUG
mysys/my_error.c:
Fixed indentation
mysys/my_handler.c:
Added const to some parameters
Added missing error messages
sql/field.h:
Renamed variables to avoid variable shadowing
sql/handler.h:
Renamed parameter to avoid variable name conflict
sql/item.h:
Renamed variables to avoid variable shadowing
sql/log_event_old.h:
Renamed variables to avoid variable shadowing
sql/set_var.h:
Renamed variables to avoid variable shadowing
sql/sql_delete.cc:
Removed maria hack for temporary tables
Fixed indentation
sql/sql_table.cc:
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use.
sql/table.cc:
Copy page_checksum from share
Removed Maria hack
storage/maria/Makefile.am:
Added new files
storage/maria/ha_maria.cc:
Renamed records -> record_count and info -> create_info to avoid variable name conflicts
Mark page cache variables as not flushable
storage/maria/ma_blockrec.c:
Moved _ma_unpin_all_pages() to ma_key_recover.c
Moved init of info->pinned_pages to ma_open.c
Moved _ma_finalize_row() to maria_key_recover.h
Renamed some variables to avoid variable name conflicts
Mark page_link.changed for blocks we change directly
Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index)
storage/maria/ma_blockrec.h:
Removed extra empty line
storage/maria/ma_checkpoint.c:
Remove not needed trnman.h
storage/maria/ma_close.c:
Free pinned pages (which are now always allocated)
storage/maria/ma_control_file.c:
Give error message if we fail to open control file
storage/maria/ma_delete.c:
Changes for redo logging (first part, logging of underflow not yet done)
- Log undo-key-delete
- Log delete of key
- Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert()
- Added new arguments to some functions to be able to write redo information
- Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED
Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway
Changed 2 bmove_upp() to bmove() as this made code easer to understand
More function comments
Indentation fixes
storage/maria/ma_ft_update.c:
New arguments to _ma_write_keypage()
storage/maria/ma_loghandler.c:
Fixed some DBUG_PRINT messages
Simplify code
Added new log entrys for key page redo
Renamed some variables to avoid variable name shadowing
storage/maria/ma_loghandler.h:
Moved some defines here
Added define for storing key number on key pages
Added new translog record types
Added enum for type of operations in LOGREC_REDO_INDEX
storage/maria/ma_open.c:
Always allocate info.pinned_pages (we need now also for normal key page usage)
Update keyinfo->key_nr
Added virtual functions to convert record position o number to be stored on key pages
Update keyinfo->write_comp_flag to value of search flag to be used when writing key
storage/maria/ma_page.c:
Added redo for key pages
- Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE
- _ma_fetch_keypage() now pin's pages if needed
- Extended _ma_write_keypage() with type of locks to be used
- ma_dispose() now locks info->s->state.key_del from other threads
- ma_dispose() writes redo log record
- ma_new() locks info->s->state.key_del from other threads if it was used
- ma_new() now pins read page
Other things:
- Removed some not needed arguments from _ma_new() and _ma_dispose)
- Added some new variables to simplify code
- If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes
storage/maria/ma_pagecache.h:
Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed
Added some defines for pagecache priority levels that one can use
storage/maria/ma_range.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_recovery.c:
- Added hooks for new translog types:
REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and
UNDO_KEY_DELETE_WITH_ROOT.
- Moved variable declarations to start of function (portability fixes)
- Removed some not needed initializations
- Set only relevant state changes for each redo/undo entry
storage/maria/lockman.c:
Removed end space
storage/maria/ma_check.c:
Removed end space
storage/maria/ma_create.c:
Removed end space
storage/maria/ma_locking.c:
Removed end space
storage/maria/ma_packrec.c:
Removed end space
storage/maria/ma_pagecache.c:
Removed end space
storage/maria/ma_panic.c:
Removed end space
storage/maria/ma_rt_index.c:
Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new()
Fixed indentation
storage/maria/ma_rt_key.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_rt_split.c:
Added new arguments for call to _ma_new()
Use new keypage header
Added new arguments for call to _ma_write_keypage()
storage/maria/ma_search.c:
Updated comments & indentation
Added new arguments for call to _ma_fetch_keypage()
Made some variables and arguments const
Added virtual functions for converting row position to number to be stored in key
use MARIA_RECORD_POS of record position instead of my_off_t
Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO)
storage/maria/ma_sort.c:
Removed end space
storage/maria/ma_statrec.c:
Updated arguments for call to _ma_rec_pos()
storage/maria/ma_test1.c:
Fixed too small buffer to init_pagecache()
Fixed bug when using insert_count and test_flag
storage/maria/ma_test2.c:
Use more resonable pagecache size
Remove not used code
Reset blob_length to fix wrong output message
storage/maria/ma_test_all.sh:
Fixed wrong test
storage/maria/ma_write.c:
Lots of new code to handle REDO of key pages
No logic changes because of REDO code, mostly adding new arguments and adding new code for logging
Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions
Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open()
Zerofill new used pages for:
- To remove possible sensitive data left in buffer
- To get idenitical data on pages after running redo
- Better compression of pages if archived
storage/maria/maria_chk.c:
Added information if table is crash safe
storage/maria/maria_def.h:
New virtual function to convert between record position on key and normal record position
Aded mutex and extra variables to handle locking of share->state.key_del
Moved some structure variables to get things more aligned
Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert
Added argument to MARIA_PINNED_PAGE to indicate if page was changed
Updated prototypes for functions
Added some structures for signaling changes in REDO handling
storage/maria/unittest/ma_pagecache_single.c:
Updated arguments for changed function calls
storage/myisam/mi_check.c:
Made calc_check_checksum virtual
storage/myisam/mi_checksum.c:
Update checksums to ignore null columns
storage/myisam/mi_create.c:
Mark if table has null column (to know when we have to use mi_checksum())
storage/myisam/mi_open.c:
Added virtual function for calculating checksum to be able to easily ignore NULL fields
storage/myisam/mi_test2.c:
Fixed bug
storage/myisam/myisamdef.h:
Added virtual function for calculating checksum during check table
Removed ha_key_cmp() as this is in handler.h
storage/maria/ma_key_recover.c:
New BitKeeper file ``storage/maria/ma_key_recover.c''
storage/maria/ma_key_recover.h:
New BitKeeper file ``storage/maria/ma_key_recover.h''
storage/maria/ma_key_redo.c:
New BitKeeper file ``storage/maria/ma_key_redo.c''
2007-11-14 18:08:06 +01:00
|
|
|
/*
|
|
|
|
Test if an external process has changed the database
|
|
|
|
(Should be called after readinfo)
|
|
|
|
*/
|
2006-04-11 15:45:10 +02:00
|
|
|
|
|
|
|
int _ma_test_if_changed(register MARIA_HA *info)
|
|
|
|
{
|
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows
Checksum for MyISAM now ignores NULL and not used part of VARCHAR
Renamed some variables that caused shadow compiler warnings
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug.
pagecache_unlock_by_ulink() now has extra argument to say if page was changed.
Give error message if we fail to open control file
Mark page cache variables as not flushable
include/maria.h:
Made min page cache larger (needed for pinning key page)
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
Added write_comp_flag to move some runtime code to maria_open()
include/my_base.h:
Added new error message to be used when handler initialization failed
include/my_global.h:
Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables
include/my_handler.h:
Added const to some parameters
mysys/array.c:
More DBUG
mysys/my_error.c:
Fixed indentation
mysys/my_handler.c:
Added const to some parameters
Added missing error messages
sql/field.h:
Renamed variables to avoid variable shadowing
sql/handler.h:
Renamed parameter to avoid variable name conflict
sql/item.h:
Renamed variables to avoid variable shadowing
sql/log_event_old.h:
Renamed variables to avoid variable shadowing
sql/set_var.h:
Renamed variables to avoid variable shadowing
sql/sql_delete.cc:
Removed maria hack for temporary tables
Fixed indentation
sql/sql_table.cc:
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use.
sql/table.cc:
Copy page_checksum from share
Removed Maria hack
storage/maria/Makefile.am:
Added new files
storage/maria/ha_maria.cc:
Renamed records -> record_count and info -> create_info to avoid variable name conflicts
Mark page cache variables as not flushable
storage/maria/ma_blockrec.c:
Moved _ma_unpin_all_pages() to ma_key_recover.c
Moved init of info->pinned_pages to ma_open.c
Moved _ma_finalize_row() to maria_key_recover.h
Renamed some variables to avoid variable name conflicts
Mark page_link.changed for blocks we change directly
Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index)
storage/maria/ma_blockrec.h:
Removed extra empty line
storage/maria/ma_checkpoint.c:
Remove not needed trnman.h
storage/maria/ma_close.c:
Free pinned pages (which are now always allocated)
storage/maria/ma_control_file.c:
Give error message if we fail to open control file
storage/maria/ma_delete.c:
Changes for redo logging (first part, logging of underflow not yet done)
- Log undo-key-delete
- Log delete of key
- Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert()
- Added new arguments to some functions to be able to write redo information
- Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED
Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway
Changed 2 bmove_upp() to bmove() as this made code easer to understand
More function comments
Indentation fixes
storage/maria/ma_ft_update.c:
New arguments to _ma_write_keypage()
storage/maria/ma_loghandler.c:
Fixed some DBUG_PRINT messages
Simplify code
Added new log entrys for key page redo
Renamed some variables to avoid variable name shadowing
storage/maria/ma_loghandler.h:
Moved some defines here
Added define for storing key number on key pages
Added new translog record types
Added enum for type of operations in LOGREC_REDO_INDEX
storage/maria/ma_open.c:
Always allocate info.pinned_pages (we need now also for normal key page usage)
Update keyinfo->key_nr
Added virtual functions to convert record position o number to be stored on key pages
Update keyinfo->write_comp_flag to value of search flag to be used when writing key
storage/maria/ma_page.c:
Added redo for key pages
- Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE
- _ma_fetch_keypage() now pin's pages if needed
- Extended _ma_write_keypage() with type of locks to be used
- ma_dispose() now locks info->s->state.key_del from other threads
- ma_dispose() writes redo log record
- ma_new() locks info->s->state.key_del from other threads if it was used
- ma_new() now pins read page
Other things:
- Removed some not needed arguments from _ma_new() and _ma_dispose)
- Added some new variables to simplify code
- If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes
storage/maria/ma_pagecache.h:
Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed
Added some defines for pagecache priority levels that one can use
storage/maria/ma_range.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_recovery.c:
- Added hooks for new translog types:
REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and
UNDO_KEY_DELETE_WITH_ROOT.
- Moved variable declarations to start of function (portability fixes)
- Removed some not needed initializations
- Set only relevant state changes for each redo/undo entry
storage/maria/lockman.c:
Removed end space
storage/maria/ma_check.c:
Removed end space
storage/maria/ma_create.c:
Removed end space
storage/maria/ma_locking.c:
Removed end space
storage/maria/ma_packrec.c:
Removed end space
storage/maria/ma_pagecache.c:
Removed end space
storage/maria/ma_panic.c:
Removed end space
storage/maria/ma_rt_index.c:
Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new()
Fixed indentation
storage/maria/ma_rt_key.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_rt_split.c:
Added new arguments for call to _ma_new()
Use new keypage header
Added new arguments for call to _ma_write_keypage()
storage/maria/ma_search.c:
Updated comments & indentation
Added new arguments for call to _ma_fetch_keypage()
Made some variables and arguments const
Added virtual functions for converting row position to number to be stored in key
use MARIA_RECORD_POS of record position instead of my_off_t
Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO)
storage/maria/ma_sort.c:
Removed end space
storage/maria/ma_statrec.c:
Updated arguments for call to _ma_rec_pos()
storage/maria/ma_test1.c:
Fixed too small buffer to init_pagecache()
Fixed bug when using insert_count and test_flag
storage/maria/ma_test2.c:
Use more resonable pagecache size
Remove not used code
Reset blob_length to fix wrong output message
storage/maria/ma_test_all.sh:
Fixed wrong test
storage/maria/ma_write.c:
Lots of new code to handle REDO of key pages
No logic changes because of REDO code, mostly adding new arguments and adding new code for logging
Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions
Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open()
Zerofill new used pages for:
- To remove possible sensitive data left in buffer
- To get idenitical data on pages after running redo
- Better compression of pages if archived
storage/maria/maria_chk.c:
Added information if table is crash safe
storage/maria/maria_def.h:
New virtual function to convert between record position on key and normal record position
Aded mutex and extra variables to handle locking of share->state.key_del
Moved some structure variables to get things more aligned
Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert
Added argument to MARIA_PINNED_PAGE to indicate if page was changed
Updated prototypes for functions
Added some structures for signaling changes in REDO handling
storage/maria/unittest/ma_pagecache_single.c:
Updated arguments for changed function calls
storage/myisam/mi_check.c:
Made calc_check_checksum virtual
storage/myisam/mi_checksum.c:
Update checksums to ignore null columns
storage/myisam/mi_create.c:
Mark if table has null column (to know when we have to use mi_checksum())
storage/myisam/mi_open.c:
Added virtual function for calculating checksum to be able to easily ignore NULL fields
storage/myisam/mi_test2.c:
Fixed bug
storage/myisam/myisamdef.h:
Added virtual function for calculating checksum during check table
Removed ha_key_cmp() as this is in handler.h
storage/maria/ma_key_recover.c:
New BitKeeper file ``storage/maria/ma_key_recover.c''
storage/maria/ma_key_recover.h:
New BitKeeper file ``storage/maria/ma_key_recover.h''
storage/maria/ma_key_redo.c:
New BitKeeper file ``storage/maria/ma_key_redo.c''
2007-11-14 18:08:06 +01:00
|
|
|
#ifdef EXTERNAL_LOCKING
|
2006-04-11 15:45:10 +02:00
|
|
|
MARIA_SHARE *share=info->s;
|
|
|
|
if (share->state.process != share->last_process ||
|
|
|
|
share->state.unique != info->last_unique ||
|
|
|
|
share->state.update_count != info->last_loop)
|
|
|
|
{ /* Keyfile has changed */
|
|
|
|
DBUG_PRINT("info",("index file changed"));
|
|
|
|
if (share->state.process != share->this_process)
|
2007-04-04 22:37:09 +02:00
|
|
|
VOID(flush_pagecache_blocks(share->pagecache, &share->kfile,
|
|
|
|
FLUSH_RELEASE));
|
2006-04-11 15:45:10 +02:00
|
|
|
share->last_process=share->state.process;
|
|
|
|
info->last_unique= share->state.unique;
|
|
|
|
info->last_loop= share->state.update_count;
|
|
|
|
info->update|= HA_STATE_WRITTEN; /* Must use file on next */
|
|
|
|
info->data_changed= 1; /* For maria_is_changed */
|
|
|
|
return 1;
|
|
|
|
}
|
First part of redo/undo for key pages
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
For transactional tables, shift record number in keys up with 1 bit to have place to indicate if transid follows
Checksum for MyISAM now ignores NULL and not used part of VARCHAR
Renamed some variables that caused shadow compiler warnings
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
Fixed crashing bugs when using Maria TEMPORARY tables with TRUNCATE. Removed 'hack' code in sql directory to go around this bug.
pagecache_unlock_by_ulink() now has extra argument to say if page was changed.
Give error message if we fail to open control file
Mark page cache variables as not flushable
include/maria.h:
Made min page cache larger (needed for pinning key page)
Added key_nr to st_maria_keydef for faster keyinfo->keynr conversion
Added write_comp_flag to move some runtime code to maria_open()
include/my_base.h:
Added new error message to be used when handler initialization failed
include/my_global.h:
Renamed dummy to swap_dummy to avoid conflicts with local 'dummy' variables
include/my_handler.h:
Added const to some parameters
mysys/array.c:
More DBUG
mysys/my_error.c:
Fixed indentation
mysys/my_handler.c:
Added const to some parameters
Added missing error messages
sql/field.h:
Renamed variables to avoid variable shadowing
sql/handler.h:
Renamed parameter to avoid variable name conflict
sql/item.h:
Renamed variables to avoid variable shadowing
sql/log_event_old.h:
Renamed variables to avoid variable shadowing
sql/set_var.h:
Renamed variables to avoid variable shadowing
sql/sql_delete.cc:
Removed maria hack for temporary tables
Fixed indentation
sql/sql_table.cc:
Moved extra() call when waiting for tables to not be used to after tables are removed from cache.
This was needed to ensure we don't do a PREPARE_FOR_DROP or similar call while the table is still in use.
sql/table.cc:
Copy page_checksum from share
Removed Maria hack
storage/maria/Makefile.am:
Added new files
storage/maria/ha_maria.cc:
Renamed records -> record_count and info -> create_info to avoid variable name conflicts
Mark page cache variables as not flushable
storage/maria/ma_blockrec.c:
Moved _ma_unpin_all_pages() to ma_key_recover.c
Moved init of info->pinned_pages to ma_open.c
Moved _ma_finalize_row() to maria_key_recover.h
Renamed some variables to avoid variable name conflicts
Mark page_link.changed for blocks we change directly
Simplify handling of undo link when writing LOGREC_UNDO_ROW_INSERT (old code crashed when having redo for index)
storage/maria/ma_blockrec.h:
Removed extra empty line
storage/maria/ma_checkpoint.c:
Remove not needed trnman.h
storage/maria/ma_close.c:
Free pinned pages (which are now always allocated)
storage/maria/ma_control_file.c:
Give error message if we fail to open control file
storage/maria/ma_delete.c:
Changes for redo logging (first part, logging of underflow not yet done)
- Log undo-key-delete
- Log delete of key
- Updated arguments to _ma_fetch_keypage(), _ma_dispose(), _ma_write_keypage(), _ma_insert()
- Added new arguments to some functions to be able to write redo information
- Mark key pages as changed when we write with PAGECACHE_LOCK_LEFT_WRITELOCKED
Remove one not needed _ma_write_keypage() in d_search() when upper level will do the write anyway
Changed 2 bmove_upp() to bmove() as this made code easer to understand
More function comments
Indentation fixes
storage/maria/ma_ft_update.c:
New arguments to _ma_write_keypage()
storage/maria/ma_loghandler.c:
Fixed some DBUG_PRINT messages
Simplify code
Added new log entrys for key page redo
Renamed some variables to avoid variable name shadowing
storage/maria/ma_loghandler.h:
Moved some defines here
Added define for storing key number on key pages
Added new translog record types
Added enum for type of operations in LOGREC_REDO_INDEX
storage/maria/ma_open.c:
Always allocate info.pinned_pages (we need now also for normal key page usage)
Update keyinfo->key_nr
Added virtual functions to convert record position o number to be stored on key pages
Update keyinfo->write_comp_flag to value of search flag to be used when writing key
storage/maria/ma_page.c:
Added redo for key pages
- Extended _ma_fetch_keypage() with type of lock to put on page and address to used MARIA_PINNED_PAGE
- _ma_fetch_keypage() now pin's pages if needed
- Extended _ma_write_keypage() with type of locks to be used
- ma_dispose() now locks info->s->state.key_del from other threads
- ma_dispose() writes redo log record
- ma_new() locks info->s->state.key_del from other threads if it was used
- ma_new() now pins read page
Other things:
- Removed some not needed arguments from _ma_new() and _ma_dispose)
- Added some new variables to simplify code
- If EXTRA_DEBUG is used, do crc on full page to catch not unitialized bytes
storage/maria/ma_pagecache.h:
Applied patch from Sanja to add extra argument to pagecache_unlock_by_ulink() to mark if page was changed
Added some defines for pagecache priority levels that one can use
storage/maria/ma_range.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_recovery.c:
- Added hooks for new translog types:
REDO_INDEX, REDO_INDEX_NEW_PAGE, REDO_INDEX_FREE_PAGE, UNDO_KEY_INSERT, UNDO_KEY_DELETE and
UNDO_KEY_DELETE_WITH_ROOT.
- Moved variable declarations to start of function (portability fixes)
- Removed some not needed initializations
- Set only relevant state changes for each redo/undo entry
storage/maria/lockman.c:
Removed end space
storage/maria/ma_check.c:
Removed end space
storage/maria/ma_create.c:
Removed end space
storage/maria/ma_locking.c:
Removed end space
storage/maria/ma_packrec.c:
Removed end space
storage/maria/ma_pagecache.c:
Removed end space
storage/maria/ma_panic.c:
Removed end space
storage/maria/ma_rt_index.c:
Added new arguments for call to _ma_fetch_keypage(), _ma_write_keypage(), _ma_dispose() and _ma_new()
Fixed indentation
storage/maria/ma_rt_key.c:
Added new arguments for call to _ma_fetch_keypage()
storage/maria/ma_rt_split.c:
Added new arguments for call to _ma_new()
Use new keypage header
Added new arguments for call to _ma_write_keypage()
storage/maria/ma_search.c:
Updated comments & indentation
Added new arguments for call to _ma_fetch_keypage()
Made some variables and arguments const
Added virtual functions for converting row position to number to be stored in key
use MARIA_RECORD_POS of record position instead of my_off_t
Record in MARIA_KEY_PARAM how page was changed one key insert (needed for REDO)
storage/maria/ma_sort.c:
Removed end space
storage/maria/ma_statrec.c:
Updated arguments for call to _ma_rec_pos()
storage/maria/ma_test1.c:
Fixed too small buffer to init_pagecache()
Fixed bug when using insert_count and test_flag
storage/maria/ma_test2.c:
Use more resonable pagecache size
Remove not used code
Reset blob_length to fix wrong output message
storage/maria/ma_test_all.sh:
Fixed wrong test
storage/maria/ma_write.c:
Lots of new code to handle REDO of key pages
No logic changes because of REDO code, mostly adding new arguments and adding new code for logging
Added new arguments for calls to _ma_fetch_keypage(), _ma_write_keypage() and similar functions
Move setting of comp_flag in ma_ck_wrte_btree() from runtime to maria_open()
Zerofill new used pages for:
- To remove possible sensitive data left in buffer
- To get idenitical data on pages after running redo
- Better compression of pages if archived
storage/maria/maria_chk.c:
Added information if table is crash safe
storage/maria/maria_def.h:
New virtual function to convert between record position on key and normal record position
Aded mutex and extra variables to handle locking of share->state.key_del
Moved some structure variables to get things more aligned
Added extra arguments to MARIA_KEY_PARAM to be able to remember what was changed on key page on key insert
Added argument to MARIA_PINNED_PAGE to indicate if page was changed
Updated prototypes for functions
Added some structures for signaling changes in REDO handling
storage/maria/unittest/ma_pagecache_single.c:
Updated arguments for changed function calls
storage/myisam/mi_check.c:
Made calc_check_checksum virtual
storage/myisam/mi_checksum.c:
Update checksums to ignore null columns
storage/myisam/mi_create.c:
Mark if table has null column (to know when we have to use mi_checksum())
storage/myisam/mi_open.c:
Added virtual function for calculating checksum to be able to easily ignore NULL fields
storage/myisam/mi_test2.c:
Fixed bug
storage/myisam/myisamdef.h:
Added virtual function for calculating checksum during check table
Removed ha_key_cmp() as this is in handler.h
storage/maria/ma_key_recover.c:
New BitKeeper file ``storage/maria/ma_key_recover.c''
storage/maria/ma_key_recover.h:
New BitKeeper file ``storage/maria/ma_key_recover.h''
storage/maria/ma_key_redo.c:
New BitKeeper file ``storage/maria/ma_key_redo.c''
2007-11-14 18:08:06 +01:00
|
|
|
#endif
|
2006-04-11 15:45:10 +02:00
|
|
|
return (!(info->update & HA_STATE_AKTIV) ||
|
|
|
|
(info->update & (HA_STATE_WRITTEN | HA_STATE_DELETED |
|
|
|
|
HA_STATE_KEY_CHANGED)));
|
|
|
|
} /* _ma_test_if_changed */
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
Put a mark in the .MYI file that someone is updating the table
|
|
|
|
|
|
|
|
|
|
|
|
DOCUMENTATION
|
|
|
|
|
|
|
|
state.open_count in the .MYI file is used the following way:
|
|
|
|
- For the first change of the .MYI file in this process open_count is
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
incremented by _ma_mark_file_changed(). (We have a write lock on the file
|
2006-04-11 15:45:10 +02:00
|
|
|
when this happens)
|
|
|
|
- In maria_close() it's decremented by _ma_decrement_open_count() if it
|
|
|
|
was incremented in the same process.
|
|
|
|
|
|
|
|
This mean that if we are the only process using the file, the open_count
|
|
|
|
tells us if the MARIA file wasn't properly closed. (This is true if
|
|
|
|
my_disable_locking is set).
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
|
|
|
|
open_count is not maintained on disk for transactional or temporary tables.
|
2006-04-11 15:45:10 +02:00
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
|
|
int _ma_mark_file_changed(MARIA_HA *info)
|
|
|
|
{
|
2007-10-04 19:33:42 +02:00
|
|
|
uchar buff[3];
|
2006-04-11 15:45:10 +02:00
|
|
|
register MARIA_SHARE *share=info->s;
|
|
|
|
DBUG_ENTER("_ma_mark_file_changed");
|
|
|
|
|
|
|
|
if (!(share->state.changed & STATE_CHANGED) || ! share->global_changed)
|
|
|
|
{
|
|
|
|
share->state.changed|=(STATE_CHANGED | STATE_NOT_ANALYZED |
|
|
|
|
STATE_NOT_OPTIMIZED_KEYS);
|
|
|
|
if (!share->global_changed)
|
|
|
|
{
|
|
|
|
share->global_changed=1;
|
|
|
|
share->state.open_count++;
|
|
|
|
}
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/*
|
|
|
|
temp tables don't need an open_count as they are removed on crash;
|
|
|
|
transactional tables are fixed by log-based recovery, so don't need an
|
|
|
|
open_count either (and we thus avoid the disk write below).
|
|
|
|
*/
|
|
|
|
if (!(share->temporary | share->base.born_transactional))
|
2006-04-11 15:45:10 +02:00
|
|
|
{
|
|
|
|
mi_int2store(buff,share->state.open_count);
|
|
|
|
buff[2]=1; /* Mark that it's changed */
|
2007-04-04 22:37:09 +02:00
|
|
|
DBUG_RETURN(my_pwrite(share->kfile.file, buff, sizeof(buff),
|
2006-04-11 15:45:10 +02:00
|
|
|
sizeof(share->state.header),
|
|
|
|
MYF(MY_NABP)));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
DBUG_RETURN(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
This is only called by close or by extra(HA_FLUSH) if the OS has the pwrite()
|
|
|
|
call. In these context the following code should be safe!
|
|
|
|
*/
|
|
|
|
|
|
|
|
int _ma_decrement_open_count(MARIA_HA *info)
|
|
|
|
{
|
2007-10-04 19:33:42 +02:00
|
|
|
uchar buff[2];
|
2006-04-11 15:45:10 +02:00
|
|
|
register MARIA_SHARE *share=info->s;
|
|
|
|
int lock_error=0,write_error=0;
|
|
|
|
if (share->global_changed)
|
|
|
|
{
|
|
|
|
uint old_lock=info->lock_type;
|
|
|
|
share->global_changed=0;
|
|
|
|
lock_error=maria_lock_database(info,F_WRLCK);
|
|
|
|
/* Its not fatal even if we couldn't get the lock ! */
|
|
|
|
if (share->state.open_count > 0)
|
|
|
|
{
|
|
|
|
share->state.open_count--;
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
if (!(share->temporary | share->base.born_transactional))
|
|
|
|
{
|
|
|
|
mi_int2store(buff,share->state.open_count);
|
|
|
|
write_error= my_pwrite(share->kfile.file, buff, sizeof(buff),
|
|
|
|
sizeof(share->state.header),
|
2007-04-04 22:37:09 +02:00
|
|
|
MYF(MY_NABP));
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
}
|
2006-04-11 15:45:10 +02:00
|
|
|
}
|
|
|
|
if (!lock_error)
|
|
|
|
lock_error=maria_lock_database(info,old_lock);
|
|
|
|
}
|
|
|
|
return test(lock_error || write_error);
|
|
|
|
}
|