2006-04-11 16:45:10 +03:00
|
|
|
/* Copyright (C) 2006 MySQL AB & MySQL Finland AB & TCX DataKonsult AB
|
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
2007-03-02 11:20:23 +01:00
|
|
|
the Free Software Foundation; version 2 of the License.
|
2006-04-11 16:45:10 +03:00
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program; if not, write to the Free Software
|
|
|
|
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
|
|
|
|
|
|
|
|
#include "maria_def.h"
|
|
|
|
#ifdef HAVE_SYS_MMAN_H
|
|
|
|
#include <sys/mman.h>
|
|
|
|
#endif
|
Fix for three bugs:
number 1: "./mtr --mysqld=--default-storage-engine=maria backup"
restored no rows (forgot to flush data pages before my_copy(),
and also the maria_repair() used by ha_maria::restore() needed
a correct data_file_length to not miss rows). [note that BACKUP
TABLE will be removed anyway in 5.2]
number 2: "./mtr --mysqld=--default-storage-engine=maria bootstrap"
caused segfault (uninitialized variable)
number 3: "./mtr --mysqld=--default-storage-engine=maria check"
showed warning in CHECK TABLE (maria_create() created a non-empty
data file with data_file_length==0).
storage/maria/ha_maria.cc:
in ha_maria::backup, need to flush the data file before copying it,
otherwise data misses from the copy (bug 1)
storage/maria/ma_bitmap.c:
when allocating data at the end of the bitmap, best_data is at "end",
should not be left to 0 (bug 2)
storage/maria/ma_check.c:
_ma_scan_block_record() is used in QUICK repair. It relies on
data_file_length. RESTORE TABLE mixes the MAI of an empty table
(so, data_file_length==0) with an non-empty MAD, and does a
QUICK repair; that got fooled (thought it had hit EOF immediately,
so found no records) (bug 1)
storage/maria/ma_create.c:
At the end of maria_create() we have, in the index file,
data_file_length==0, while the data file has a bitmap page (8192).
This inconsistency makes CHECK TABLE rightly complain.
Fixed by not creating a first bitmap page during maria_create()
(also saves disk space) (bug 3) Question for Monty.
storage/maria/ma_extra.c:
A function to flush the data and index files before one can
use OS syscalls (reads, writes) on those files. For example,
ha_maria::backup() does a my_copy() of the data file and so
all cached pieces of this file must be sent to the OS (bug 1)
This function will have to be used elsewhere in Maria, several places
have not been updated when we added pagecache-ing of the data file
(they still only flush the index file), they are probable bugs.
storage/maria/maria_def.h:
new function. Needs to be visible from ha_maria::backup.
2007-08-07 16:06:42 +02:00
|
|
|
#include "ma_blockrec.h"
|
2006-04-11 16:45:10 +03:00
|
|
|
|
2007-01-18 21:38:14 +02:00
|
|
|
static void maria_extra_keyflag(MARIA_HA *info,
|
|
|
|
enum ha_extra_function function);
|
2006-04-11 16:45:10 +03:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/**
|
|
|
|
@brief Set options and buffers to optimize table handling
|
2006-04-11 16:45:10 +03:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
@param name table's name
|
|
|
|
@param info open table
|
|
|
|
@param function operation
|
|
|
|
@param extra_arg Pointer to extra argument (normally pointer to
|
|
|
|
ulong); used when function is one of:
|
|
|
|
HA_EXTRA_WRITE_CACHE
|
|
|
|
HA_EXTRA_CACHE
|
2006-04-11 16:45:10 +03:00
|
|
|
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
@return Operation status
|
|
|
|
@retval 0 ok
|
|
|
|
@retval !=0 error
|
2006-04-11 16:45:10 +03:00
|
|
|
*/
|
|
|
|
|
2007-01-18 21:38:14 +02:00
|
|
|
int maria_extra(MARIA_HA *info, enum ha_extra_function function,
|
|
|
|
void *extra_arg)
|
2006-04-11 16:45:10 +03:00
|
|
|
{
|
|
|
|
int error=0;
|
|
|
|
ulong cache_size;
|
|
|
|
MARIA_SHARE *share=info->s;
|
2007-04-05 14:38:05 +03:00
|
|
|
my_bool block_records= share->data_file_type == BLOCK_RECORD;
|
|
|
|
|
2006-04-11 16:45:10 +03:00
|
|
|
DBUG_ENTER("maria_extra");
|
|
|
|
DBUG_PRINT("enter",("function: %d",(int) function));
|
|
|
|
|
|
|
|
switch (function) {
|
|
|
|
case HA_EXTRA_RESET_STATE: /* Reset state (don't free buffers) */
|
|
|
|
info->lastinx= 0; /* Use first index as def */
|
2007-01-18 21:38:14 +02:00
|
|
|
info->last_search_keypage= info->cur_row.lastpos= HA_OFFSET_ERROR;
|
2006-04-11 16:45:10 +03:00
|
|
|
info->page_changed=1;
|
|
|
|
/* Next/prev gives first/last */
|
|
|
|
if (info->opt_flag & READ_CACHE_USED)
|
|
|
|
{
|
|
|
|
reinit_io_cache(&info->rec_cache,READ_CACHE,0,
|
|
|
|
(pbool) (info->lock_type != F_UNLCK),
|
|
|
|
(pbool) test(info->update & HA_STATE_ROW_CHANGED)
|
|
|
|
);
|
|
|
|
}
|
|
|
|
info->update= ((info->update & HA_STATE_CHANGED) | HA_STATE_NEXT_FOUND |
|
|
|
|
HA_STATE_PREV_FOUND);
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_CACHE:
|
2007-04-05 14:38:05 +03:00
|
|
|
if (block_records)
|
|
|
|
break; /* Not supported */
|
|
|
|
|
2006-04-11 16:45:10 +03:00
|
|
|
if (info->lock_type == F_UNLCK &&
|
|
|
|
(share->options & HA_OPTION_PACK_RECORD))
|
|
|
|
{
|
|
|
|
error=1; /* Not possibly if not locked */
|
|
|
|
my_errno=EACCES;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (info->s->file_map) /* Don't use cache if mmap */
|
|
|
|
break;
|
|
|
|
#if defined(HAVE_MMAP) && defined(HAVE_MADVISE)
|
|
|
|
if ((share->options & HA_OPTION_COMPRESS_RECORD))
|
|
|
|
{
|
|
|
|
pthread_mutex_lock(&share->intern_lock);
|
|
|
|
if (_ma_memmap_file(info))
|
|
|
|
{
|
|
|
|
/* We don't nead MADV_SEQUENTIAL if small file */
|
|
|
|
madvise(share->file_map,share->state.state.data_file_length,
|
|
|
|
share->state.state.data_file_length <= RECORD_CACHE_SIZE*16 ?
|
|
|
|
MADV_RANDOM : MADV_SEQUENTIAL);
|
|
|
|
pthread_mutex_unlock(&share->intern_lock);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&share->intern_lock);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
if (info->opt_flag & WRITE_CACHE_USED)
|
|
|
|
{
|
|
|
|
info->opt_flag&= ~WRITE_CACHE_USED;
|
|
|
|
if ((error=end_io_cache(&info->rec_cache)))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (!(info->opt_flag &
|
|
|
|
(READ_CACHE_USED | WRITE_CACHE_USED | MEMMAP_USED)))
|
|
|
|
{
|
|
|
|
cache_size= (extra_arg ? *(ulong*) extra_arg :
|
|
|
|
my_default_record_cache_size);
|
2007-04-04 23:37:09 +03:00
|
|
|
if (!(init_io_cache(&info->rec_cache, info->dfile.file,
|
2006-04-11 16:45:10 +03:00
|
|
|
(uint) min(info->state->data_file_length+1,
|
|
|
|
cache_size),
|
|
|
|
READ_CACHE,0L,(pbool) (info->lock_type != F_UNLCK),
|
|
|
|
MYF(share->write_flag & MY_WAIT_IF_FULL))))
|
|
|
|
{
|
|
|
|
info->opt_flag|=READ_CACHE_USED;
|
|
|
|
info->update&= ~HA_STATE_ROW_CHANGED;
|
|
|
|
}
|
|
|
|
if (share->concurrent_insert)
|
|
|
|
info->rec_cache.end_of_file=info->state->data_file_length;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_REINIT_CACHE:
|
|
|
|
if (info->opt_flag & READ_CACHE_USED)
|
|
|
|
{
|
2007-01-18 21:38:14 +02:00
|
|
|
reinit_io_cache(&info->rec_cache, READ_CACHE, info->cur_row.nextpos,
|
2006-04-11 16:45:10 +03:00
|
|
|
(pbool) (info->lock_type != F_UNLCK),
|
|
|
|
(pbool) test(info->update & HA_STATE_ROW_CHANGED));
|
|
|
|
info->update&= ~HA_STATE_ROW_CHANGED;
|
|
|
|
if (share->concurrent_insert)
|
|
|
|
info->rec_cache.end_of_file=info->state->data_file_length;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_WRITE_CACHE:
|
|
|
|
if (info->lock_type == F_UNLCK)
|
|
|
|
{
|
2007-04-05 14:38:05 +03:00
|
|
|
error=1; /* Not possibly if not locked */
|
2006-04-11 16:45:10 +03:00
|
|
|
break;
|
|
|
|
}
|
2007-04-05 14:38:05 +03:00
|
|
|
if (block_records)
|
|
|
|
break; /* Not supported */
|
2006-04-11 16:45:10 +03:00
|
|
|
|
|
|
|
cache_size= (extra_arg ? *(ulong*) extra_arg :
|
|
|
|
my_default_record_cache_size);
|
|
|
|
if (!(info->opt_flag &
|
|
|
|
(READ_CACHE_USED | WRITE_CACHE_USED | OPT_NO_ROWS)) &&
|
|
|
|
!share->state.header.uniques)
|
2007-04-04 23:37:09 +03:00
|
|
|
if (!(init_io_cache(&info->rec_cache, info->dfile.file, cache_size,
|
2006-04-11 16:45:10 +03:00
|
|
|
WRITE_CACHE,info->state->data_file_length,
|
|
|
|
(pbool) (info->lock_type != F_UNLCK),
|
|
|
|
MYF(share->write_flag & MY_WAIT_IF_FULL))))
|
|
|
|
{
|
|
|
|
info->opt_flag|=WRITE_CACHE_USED;
|
|
|
|
info->update&= ~(HA_STATE_ROW_CHANGED |
|
|
|
|
HA_STATE_WRITE_AT_END |
|
|
|
|
HA_STATE_EXTEND_BLOCK);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_PREPARE_FOR_UPDATE:
|
|
|
|
if (info->s->data_file_type != DYNAMIC_RECORD)
|
|
|
|
break;
|
|
|
|
/* Remove read/write cache if dynamic rows */
|
|
|
|
case HA_EXTRA_NO_CACHE:
|
|
|
|
if (info->opt_flag & (READ_CACHE_USED | WRITE_CACHE_USED))
|
|
|
|
{
|
|
|
|
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
|
|
|
|
error=end_io_cache(&info->rec_cache);
|
|
|
|
/* Sergei will insert full text index caching here */
|
|
|
|
}
|
|
|
|
#if defined(HAVE_MMAP) && defined(HAVE_MADVISE)
|
|
|
|
if (info->opt_flag & MEMMAP_USED)
|
|
|
|
madvise(share->file_map,share->state.state.data_file_length,MADV_RANDOM);
|
|
|
|
#endif
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_FLUSH_CACHE:
|
|
|
|
if (info->opt_flag & WRITE_CACHE_USED)
|
|
|
|
{
|
|
|
|
if ((error=flush_io_cache(&info->rec_cache)))
|
|
|
|
{
|
|
|
|
maria_print_error(info->s, HA_ERR_CRASHED);
|
|
|
|
maria_mark_crashed(info); /* Fatal error found */
|
|
|
|
}
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_NO_READCHECK:
|
|
|
|
info->opt_flag&= ~READ_CHECK_USED; /* No readcheck */
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_READCHECK:
|
|
|
|
info->opt_flag|= READ_CHECK_USED;
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_KEYREAD: /* Read only keys to record */
|
|
|
|
case HA_EXTRA_REMEMBER_POS:
|
|
|
|
info->opt_flag |= REMEMBER_OLD_POS;
|
2007-07-02 20:45:15 +03:00
|
|
|
bmove((uchar*) info->lastkey+share->base.max_key_length*2,
|
|
|
|
(uchar*) info->lastkey,info->lastkey_length);
|
2006-04-11 16:45:10 +03:00
|
|
|
info->save_update= info->update;
|
|
|
|
info->save_lastinx= info->lastinx;
|
2007-01-18 21:38:14 +02:00
|
|
|
info->save_lastpos= info->cur_row.lastpos;
|
2006-04-11 16:45:10 +03:00
|
|
|
info->save_lastkey_length=info->lastkey_length;
|
|
|
|
if (function == HA_EXTRA_REMEMBER_POS)
|
|
|
|
break;
|
|
|
|
/* fall through */
|
|
|
|
case HA_EXTRA_KEYREAD_CHANGE_POS:
|
|
|
|
info->opt_flag |= KEY_READ_USED;
|
|
|
|
info->read_record= _ma_read_key_record;
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_NO_KEYREAD:
|
|
|
|
case HA_EXTRA_RESTORE_POS:
|
|
|
|
if (info->opt_flag & REMEMBER_OLD_POS)
|
|
|
|
{
|
2007-07-02 20:45:15 +03:00
|
|
|
bmove((uchar*) info->lastkey,
|
|
|
|
(uchar*) info->lastkey+share->base.max_key_length*2,
|
2006-04-11 16:45:10 +03:00
|
|
|
info->save_lastkey_length);
|
|
|
|
info->update= info->save_update | HA_STATE_WRITTEN;
|
|
|
|
info->lastinx= info->save_lastinx;
|
2007-01-18 21:38:14 +02:00
|
|
|
info->cur_row.lastpos= info->save_lastpos;
|
2006-04-11 16:45:10 +03:00
|
|
|
info->lastkey_length=info->save_lastkey_length;
|
|
|
|
}
|
|
|
|
info->read_record= share->read_record;
|
|
|
|
info->opt_flag&= ~(KEY_READ_USED | REMEMBER_OLD_POS);
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_NO_USER_CHANGE: /* Database is somehow locked agains changes */
|
|
|
|
info->lock_type= F_EXTRA_LCK; /* Simulate as locked */
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_WAIT_LOCK:
|
|
|
|
info->lock_wait=0;
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_NO_WAIT_LOCK:
|
|
|
|
info->lock_wait=MY_DONT_WAIT;
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_NO_KEYS:
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
/* we're going to modify pieces of the state, stall Checkpoint */
|
|
|
|
pthread_mutex_lock(&share->intern_lock);
|
2006-04-11 16:45:10 +03:00
|
|
|
if (info->lock_type == F_UNLCK)
|
|
|
|
{
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
pthread_mutex_unlock(&share->intern_lock);
|
2006-04-11 16:45:10 +03:00
|
|
|
error=1; /* Not possibly if not lock */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (maria_is_any_key_active(share->state.key_map))
|
|
|
|
{
|
|
|
|
MARIA_KEYDEF *key=share->keyinfo;
|
|
|
|
uint i;
|
|
|
|
for (i=0 ; i < share->base.keys ; i++,key++)
|
|
|
|
{
|
|
|
|
if (!(key->flag & HA_NOSAME) && info->s->base.auto_key != i+1)
|
|
|
|
{
|
|
|
|
maria_clear_key_active(share->state.key_map, i);
|
|
|
|
info->update|= HA_STATE_CHANGED;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!share->changed)
|
|
|
|
{
|
|
|
|
share->state.changed|= STATE_CHANGED | STATE_NOT_ANALYZED;
|
|
|
|
share->changed=1; /* Update on close */
|
|
|
|
if (!share->global_changed)
|
|
|
|
{
|
|
|
|
share->global_changed=1;
|
|
|
|
share->state.open_count++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
share->state.state= *info->state;
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/*
|
|
|
|
That state write to disk must be done, even for transactional tables;
|
|
|
|
indeed the table's share is going to be lost (there was a
|
|
|
|
HA_EXTRA_FORCE_REOPEN before, which set share->last_version to
|
|
|
|
0), and so the only way it leaves information (share->state.key_map)
|
|
|
|
for the posterity is by writing it to disk.
|
|
|
|
*/
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
DBUG_ASSERT(!maria_in_recovery);
|
|
|
|
error= _ma_state_info_write(share, 1|2);
|
2006-04-11 16:45:10 +03:00
|
|
|
}
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
pthread_mutex_unlock(&share->intern_lock);
|
2006-04-11 16:45:10 +03:00
|
|
|
break;
|
|
|
|
case HA_EXTRA_FORCE_REOPEN:
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/*
|
|
|
|
Normally MySQL uses this case when it is going to close all open
|
|
|
|
instances of the table, thus going to flush all data/index/state.
|
|
|
|
We however do a flush here for additional safety.
|
|
|
|
*/
|
|
|
|
/** @todo consider porting these flush-es to MyISAM */
|
|
|
|
error= _ma_flush_table_files(info, MARIA_FLUSH_DATA | MARIA_FLUSH_INDEX,
|
|
|
|
FLUSH_FORCE_WRITE, FLUSH_FORCE_WRITE) ||
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
_ma_state_info_write(share, 1|2|4);
|
|
|
|
#ifdef ASK_MONTY
|
|
|
|
|| (share->changed= 0);
|
|
|
|
#endif
|
|
|
|
/**
|
|
|
|
@todo RECOVERY BUG
|
|
|
|
Though we flushed the state, IF some other thread may have the same
|
|
|
|
table (same MARIA_SHARE) open at this time then it may have a
|
|
|
|
more recent state to flush when it closes, thus we don't set
|
|
|
|
share->changed to 0 here. On the other hand, this means that when our
|
|
|
|
thread closes its table, it will flush the state again, then it would
|
|
|
|
overwrite any state written by yet another thread which may have opened
|
|
|
|
the table (new MARIA_SHARE) and done some updates.
|
|
|
|
ASK_MONTY about the IF above. See also same tag in
|
|
|
|
HA_EXTRA_PREPARE_FOR_DROP|RENAME.
|
|
|
|
*/
|
2006-04-11 16:45:10 +03:00
|
|
|
pthread_mutex_lock(&THR_LOCK_maria);
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/* this makes the share not be re-used next time the table is opened */
|
2006-04-11 16:45:10 +03:00
|
|
|
share->last_version= 0L; /* Impossible version */
|
|
|
|
pthread_mutex_unlock(&THR_LOCK_maria);
|
|
|
|
break;
|
2007-09-03 12:05:17 +03:00
|
|
|
case HA_EXTRA_PREPARE_FOR_DROP:
|
|
|
|
case HA_EXTRA_PREPARE_FOR_RENAME:
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
{
|
|
|
|
my_bool do_flush= test(function != HA_EXTRA_PREPARE_FOR_DROP);
|
2006-04-11 16:45:10 +03:00
|
|
|
pthread_mutex_lock(&THR_LOCK_maria);
|
|
|
|
share->last_version= 0L; /* Impossible version */
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
/*
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
This share, having last_version=0, needs to save all its data/index
|
|
|
|
blocks to disk if this is not for a DROP TABLE. Otherwise they would be
|
|
|
|
invisible to future openers; and they could even go to disk late and
|
|
|
|
cancel the work of future openers.
|
|
|
|
On Windows, which cannot delete an open file (cannot drop an open table)
|
|
|
|
we have to close the table's files.
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
*/
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
if (info->lock_type != F_UNLCK && !info->was_locked)
|
|
|
|
{
|
|
|
|
info->was_locked= info->lock_type;
|
|
|
|
if (maria_lock_database(info, F_UNLCK))
|
|
|
|
error= my_errno;
|
|
|
|
info->lock_type= F_UNLCK;
|
|
|
|
}
|
|
|
|
if (share->kfile.file >= 0)
|
|
|
|
_ma_decrement_open_count(info);
|
|
|
|
pthread_mutex_lock(&share->intern_lock);
|
|
|
|
enum flush_type type= do_flush ? FLUSH_RELEASE : FLUSH_IGNORE_CHANGED;
|
|
|
|
if (_ma_flush_table_files(info, MARIA_FLUSH_DATA | MARIA_FLUSH_INDEX,
|
|
|
|
type, type))
|
2006-04-11 16:45:10 +03:00
|
|
|
{
|
|
|
|
error=my_errno;
|
|
|
|
share->changed=1;
|
|
|
|
}
|
|
|
|
if (info->opt_flag & (READ_CACHE_USED | WRITE_CACHE_USED))
|
|
|
|
{
|
|
|
|
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
if (end_io_cache(&info->rec_cache))
|
|
|
|
error= 1;
|
2006-04-11 16:45:10 +03:00
|
|
|
}
|
2007-04-04 23:37:09 +03:00
|
|
|
if (share->kfile.file >= 0)
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
if (do_flush)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
Save the state so that others can find it from disk.
|
|
|
|
We have to sync now, as on Windows we are going to close the file
|
|
|
|
(so cannot sync later).
|
|
|
|
*/
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
if (_ma_state_info_write(share, 1 | 2) ||
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
my_sync(share->kfile.file, MYF(0)))
|
|
|
|
error= my_errno;
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
#ifdef ASK_MONTY /* see same tag in HA_EXTRA_FORCE_REOPEN */
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
else
|
|
|
|
share->changed= 0;
|
- WL#3072 Maria Recovery:
Recovery of state.records (the count of records which is stored into
the header of the index file). For that, state.is_of_lsn is introduced;
logic is explained in ma_recovery.c (look for "Recovery of the state").
The net gain is that in case of crash, we now recover state.records,
and it is idempotent (ma_test_recovery tests it).
state.checksum is not recovered yet, mail sent for discussion.
- WL#3071 Maria Checkpoint: preparation for it, by protecting
all modifications of the state in memory or on disk with intern_lock
(with the exception of the really-often-modified state.records,
which is now protected with the log's lock, see ma_recovery.c
(look for "Recovery of the state"). Also, if maria_close() sees that
Checkpoint is looking at this table it will not my_free() the share.
- don't compute row's checksum twice in case of UPDATE (correction
to a bugfix I made yesterday).
storage/maria/ha_maria.cc:
protect state write with intern_lock (against Checkpoint)
storage/maria/ma_blockrec.c:
* don't reset trn->rec_lsn in _ma_unpin_all_pages(), because it
should wait until we have corrected the allocation in the bitmap
(as the REDO can serve to correct the allocation during Recovery);
introducing _ma_finalize_row() for that.
* In a changeset yesterday I moved computation of the checksum
into write_block_record(), to fix a bug in UPDATE. Now I notice
that maria_update() already computes the checksum, it's just that
it puts it into info->cur_row while _ma_update_block_record()
uses info->new_row; so, removing the checksum computation from
write_block_record(), putting it back into allocate_and_write_block_record()
(which is called only by INSERT and UNDO_DELETE), and copying
cur_row->checksum into new_row->checksum in _ma_update_block_record().
storage/maria/ma_check.c:
new prototypes, they will take intern_lock when writing the state;
also take intern_lock when changing share->kfile. In both cases
this is to protect against Checkpoint reading/writing the state or reading
kfile at the same time.
Not updating create_rename_lsn directly at end of write_log_record_for_repair()
as it wouldn't have intern_lock.
storage/maria/ma_close.c:
Checkpoint builds a list of shares (under THR_LOCK_maria), then it
handles each such share (under intern_lock) (doing flushing etc);
if maria_close() freed this share between the two, Checkpoint
would see a bad pointer. To avoid this, when building the list Checkpoint
marks each share, so that maria_close() knows it should not free it
and Checkpoint will free it itself.
Extending the zone covered by intern_lock to protect against
Checkpoint reading kfile, writing state.
storage/maria/ma_create.c:
When we update create_rename_lsn, we also update is_of_lsn to
the same value: it is logical, and allows us to test in maria_open()
that the former is not bigger than the latter (the contrary is a sign
of index header corruption, or severe logging bug which hinders
Recovery, table needs a repair).
_ma_update_create_rename_lsn_on_disk() also writes is_of_lsn;
it now operates under intern_lock (protect against Checkpoint),
a shortcut function is available for cases where acquiring
intern_lock is not needed (table's creation or first open).
storage/maria/ma_delete.c:
if table is transactional, "records" is already decremented
when logging UNDO_ROW_DELETE.
storage/maria/ma_delete_all.c:
comments
storage/maria/ma_extra.c:
Protect modifications of the state, in memory and/or on disk,
with intern_lock, against a concurrent Checkpoint.
When state goes to disk, update it's is_of_lsn (by calling
the new _ma_state_info_write()).
In HA_EXTRA_FORCE_REOPEN, don't set share->changed to 0 (undoing
a change I made a few days ago) and ASK_MONTY
storage/maria/ma_locking.c:
no real code change here.
storage/maria/ma_loghandler.c:
Log-write-hooks for updating "state.records" under log's mutex
when writing/updating/deleting a row or deleting all rows.
storage/maria/ma_loghandler_lsn.h:
merge (make LSN_ERROR and LSN_REPAIRED_BY_MARIA_CHK different)
storage/maria/ma_open.c:
When opening a table verify that is_of_lsn >= create_rename_lsn; if
false the header must be corrupted.
_ma_state_info_write() is split in two: _ma_state_info_write_sub()
which is the old _ma_state_info_write(), and _ma_state_info_write()
which additionally takes intern_lock if requested (to protect
against Checkpoint) and updates is_of_lsn.
_ma_open_keyfile() should change kfile.file under intern_lock
to protect Checkpoint from reading a wrong kfile.file.
storage/maria/ma_recovery.c:
Recovery of state.records: when the REDO phase sees UNDO_ROW_INSERT
which has a LSN > state.is_of_lsn it increments state.records.
Same for UNDO_ROW_DELETE and UNDO_ROW_PURGE.
When closing a table during Recovery, we know its state is at least
as new as the current log record we are looking at, so increase
is_of_lsn to the LSN of the current log record.
storage/maria/ma_rename.c:
update for new behaviour of _ma_update_create_rename_lsn_on_disk().
storage/maria/ma_test1.c:
update to new prototype
storage/maria/ma_test2.c:
update to new prototype (actually prototype was changed days ago,
but compiler does not complain about the extra argument??)
storage/maria/ma_test_recovery.expected:
new result file of ma_test_recovery. Improvements: record
count read from index's header is now always correct.
storage/maria/ma_test_recovery:
"rm" fails if file does not exist. Redirect stderr of script.
storage/maria/ma_write.c:
if table is transactional, "records" is already incremented when
logging UNDO_ROW_INSERT. Comments.
storage/maria/maria_chk.c:
update is_of_lsn too
storage/maria/maria_def.h:
- MARIA_STATE_INFO::is_of_lsn which is used by Recovery. It is stored
into the index file's header.
- Checkpoint can now mark a table as "don't free this", and maria_close()
can reply "ok then you will free it".
- new functions
storage/maria/maria_pack.c:
update for new name
2007-09-07 15:02:30 +02:00
|
|
|
#endif
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* be sure that state is not tried for write as file may be closed */
|
|
|
|
share->changed= 0;
|
|
|
|
}
|
|
|
|
#ifdef __WIN__
|
|
|
|
if (my_close(share->kfile, MYF(0)))
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
error=my_errno;
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
share->kfile.file= -1;
|
|
|
|
#endif
|
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria"
- similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and
DELETE no_WHERE_clause (== the DELETE which just truncates the files)
- create_rename_lsn added to MARIA_SHARE's state
- all these operations (except DROP TABLE) also update the table's
create_rename_lsn, which is needed for the correctness of
Recovery (see function comment of _ma_repair_write_log_record()
in ma_check.c)
- write a COMMIT record when transaction commits.
- don't log REDOs/UNDOs if this is an internal temporary table
like inside ALTER TABLE (I expect this to be a big win). There was
already no logging for user-created "CREATE TEMPORARY" tables.
- don't fsync files/directories if the table is not transactional
- in translog_write_record(), autogenerate a 2-byte-id for the table
and log the "id->name" pair (LOGREC_FILE_ID); log
LOGREC_LONG_TRANSACTION_ID; automatically store
the table's 2-byte-id in any log record.
- preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint
when some dirty pages are unknown; capturing trn->rec_lsn,
trn->first_undo_lsn for Checkpoint and log's low-water-mark computing.
- assertions, comments.
storage/maria/Makefile.am:
more files to build
storage/maria/ha_maria.cc:
- logging a REPAIR log record if REPAIR/OPTIMIZE was successful.
- ha_maria::data_file_type does not have to be set in every info()
call, just do it once in open().
- if caller said that transactionality can be disabled (like if
caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we
temporarily disable transactionality of the table in external_lock();
that will ensure that no REDOs/UNDOs are logged for this possibly
massive write operation (they are not needed, as if any write fails,
the table will be dropped). We re-enable in external_lock(F_UNLCK),
which in ALTER TABLE happens before the tmp table replaces the original
one (which is good, as thus the final table will have a REDO RENAME
and a correct create_rename_lsn).
- when we commit we also have to write a log record, so
trnman_commit_trn() calls become ma_commit() calls
- at end of engine's initialization, we are potentially entering a
multi-threaded dangerous world (clients are going to be accepted)
and so some assertions of mutex-owning become enforceable, for that
we set maria_multi_threaded=TRUE (see ma_control_file.c)
storage/maria/ha_maria.h:
new member ha_maria::save_transactional (see also ha_maria.cc)
storage/maria/ma_blockrec.c:
- fixing comments according to discussion with Monty
- if a table is transactional but temporarily non-transactional
(like in ALTER TABLE), we need to give a sensible LSN to the pages
(and, if we give 0, pagecache asserts).
- translog_write_record() now takes care of storing the share's
2-byte-id in the log record
storage/maria/ma_blockrec.h:
fixing comment according to discussion with Monty
storage/maria/ma_check.c:
When REPAIR/OPTIMIZE modify the data/index file, if this is a
transactional table, they must sync it; if they remove files or rename
files, they must sync the directory, so that everything is durable.
This is just applying to REPAIR/OPTIMIZE the logic already implemented
in CREATE/DROP/RENAME a few months ago.
Adding a function to write a LOGREC_REPAIR_TABLE at end of
REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and
to update the table's create_rename_lsn.
storage/maria/ma_close.c:
fix for a future bug
storage/maria/ma_control_file.c:
ensuring that if Maria is running in multi-threaded mode, anybody
wanting to write to the control file and update
last_checkpoint_lsn/last_logno owns the log's lock.
storage/maria/ma_control_file.h:
see ma_control_file.c
storage/maria/ma_create.c:
when creating a table:
- sync it and its directory only if this is a transactional table
and there is a log (no point in syncing in maria_chk)
- decouple the two uses of linkname/linkname_ptr (for index file and
for data file) into more variables, as we need to know all links
until the moment we write the LOGREC_CREATE_TABLE.
- set share.data_file_type early so that _ma_initialize_data_file()
knows it (Monty's bugfix so that a table always has at least a bitmap
page when it is created; so data-file is not 0 bytes anymore).
- log a LOGREC_CREATE_TABLE; it contains the bytes which we have
just written to the index file's header. Update table's
create_rename_lsn.
- syncing of kfile had been bugified in a previous merge, correcting
- syncing of dfile is now needed as it's not empty anymore
- in _ma_initialize_data_file(), use share's block_size and not the
global one. This is a gratuitous change, both variables are equal,
just that I find it more future-proof to use share-bound variable
rather than global one.
storage/maria/ma_delete_all.c:
log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows();
update create_rename_lsn then.
storage/maria/ma_delete_table.c:
- logging LOGREC_DROP_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- we need to sync directories only if the table is transactional
storage/maria/ma_extra.c:
questions
storage/maria/ma_init.c:
when maria_end() is called, engine is not multithreaded
storage/maria/ma_loghandler.c:
- translog_inited has to be visible to ma_create() (see how it is used
in ma_create())
- checkpoint record will be a single record, not three
- no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will
log a REDO_CREATE)
- adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by
truncating the files), REPAIR.
- MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk
- in translog_write_record(), if MARIA_SHARE does not yet have a
2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically
store this short id into log records.
- in translog_write_record(), if transaction has not logged its
long trid, log LOGREC_LONG_TRANSACTION_ID.
- For Checkpoint, we need to know the current end-of-log: adding
translog_get_horizon().
- For Control File, adding an assertion that the thread owns the
log's lock (control file is protected by this lock)
storage/maria/ma_loghandler.h:
Changes in log records (see ma_loghandler.c).
new prototypes, new functions.
storage/maria/ma_loghandler_lsn.h:
adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn,
where the most significant byte is used for flags.
storage/maria/ma_open.c:
storing the create_rename_lsn in the index file's header (in the
state, precisely) and retrieving it from there.
storage/maria/ma_pagecache.c:
- my set_if_bigger was wrong, correcting it
- if the first_in_switch list is not empty, it means that
changed_blocks misses some dirty pages, so Checkpoint cannot run and
needs to wait. A variable missing_blocks_in_changed_list is added to
tell that (should it be named missing_blocks_in_changed_blocks?)
- pagecache_collect_changed_blocks_with_lsn() now also tells the
minimum rec_lsn (needed for low-water mark computation).
storage/maria/ma_pagecache.h:
see ma_pagecache.c
storage/maria/ma_panic.c:
comment
storage/maria/ma_range.c:
comment
storage/maria/ma_rename.c:
- logging LOGREC_RENAME_TABLE; knowing if this is needed, requires
knowing if the table is transactional, which requires opening the
table.
- update create_rename_lsn
- we need to sync directories only if the table is transactional
storage/maria/ma_static.c:
comment
storage/maria/ma_test_all.sh:
- tip for Valgrind-ing ma_test_all
- do "export maria_path=somepath" before calling ma_test_all,
if you want to run ma_test_all out of storage/maria (useful
to have parallel runs, like one normal and one Valgrind, they
must not use the same tables so need to run in different directories)
storage/maria/maria_def.h:
- state now contains, in memory and on disk, the create_rename_lsn
- share now contains a 2-byte-id
storage/maria/trnman.c:
preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn;
minimum first_undo_lsn needed to know log's low-water-mark
storage/maria/trnman.h:
using most significant byte of first_undo_lsn to hold miscellaneous
flags, for now TRANSACTION_LOGGED_LONG_ID.
dummy_transaction_object is already declared in ma_static.c.
storage/maria/trnman_public.h:
dummy_transaction_object was declared in all files including
trnman_public.h, while in fact it's a single object.
new prototype
storage/maria/unittest/ma_test_loghandler-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multigroup-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_multithread-t.c:
update for new prototype
storage/maria/unittest/ma_test_loghandler_pagecache-t.c:
update for new prototype
storage/maria/ma_commit.c:
function which wraps:
- writing a LOGREC_COMMIT record (==commit on disk)
- calling trnman_commit_trn() (=commit in memory)
storage/maria/ma_commit.h:
new header file
.tree-is-private:
this file is now needed to keep our tree private (don't push it
to public trees). When 5.1 is merged into mysql-maria, we can abandon
our maria-specific post-commit trigger; .tree_is_private will take
care of keeping commit mails private. Don't push this file to public
trees.
2007-06-22 14:49:37 +02:00
|
|
|
}
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
if (share->data_file_type == BLOCK_RECORD &&
|
|
|
|
share->bitmap.file.file >= 0)
|
|
|
|
{
|
|
|
|
if (do_flush && my_sync(share->bitmap.file.file, MYF(0)))
|
|
|
|
error= my_errno;
|
|
|
|
#ifdef __WIN__
|
|
|
|
if (my_close(share->bitmap.file.file, MYF(0)))
|
|
|
|
error= my_errno;
|
|
|
|
share->bitmap.file.file= -1;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
#ifdef __WIN__
|
2006-04-11 16:45:10 +03:00
|
|
|
{
|
|
|
|
LIST *list_element ;
|
|
|
|
for (list_element=maria_open_list ;
|
|
|
|
list_element ;
|
|
|
|
list_element=list_element->next)
|
|
|
|
{
|
|
|
|
MARIA_HA *tmpinfo=(MARIA_HA*) list_element->data;
|
|
|
|
if (tmpinfo->s == info->s)
|
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
if (share->data_file_type != BLOCK_RECORD &&
|
|
|
|
tmpinfo->dfile.file >= 0 &&
|
2007-04-04 23:37:09 +03:00
|
|
|
my_close(tmpinfo->dfile.file, MYF(0)))
|
2006-04-11 16:45:10 +03:00
|
|
|
error = my_errno;
|
2007-04-04 23:37:09 +03:00
|
|
|
tmpinfo->dfile.file= -1;
|
2006-04-11 16:45:10 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
pthread_mutex_unlock(&share->intern_lock);
|
2006-04-11 16:45:10 +03:00
|
|
|
pthread_mutex_unlock(&THR_LOCK_maria);
|
|
|
|
break;
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
}
|
2006-04-11 16:45:10 +03:00
|
|
|
case HA_EXTRA_FLUSH:
|
|
|
|
if (!share->temporary)
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
error= _ma_flush_table_files(info, MARIA_FLUSH_DATA | MARIA_FLUSH_INDEX,
|
|
|
|
FLUSH_KEEP, FLUSH_KEEP);
|
2006-04-11 16:45:10 +03:00
|
|
|
#ifdef HAVE_PWRITE
|
|
|
|
_ma_decrement_open_count(info);
|
|
|
|
#endif
|
|
|
|
if (share->not_flushed)
|
|
|
|
{
|
|
|
|
share->not_flushed=0;
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
if (_ma_sync_table_files(info))
|
2006-04-11 16:45:10 +03:00
|
|
|
error= my_errno;
|
|
|
|
if (error)
|
|
|
|
{
|
|
|
|
share->changed=1;
|
|
|
|
maria_print_error(info->s, HA_ERR_CRASHED);
|
|
|
|
maria_mark_crashed(info); /* Fatal error found */
|
|
|
|
}
|
|
|
|
}
|
2007-01-18 21:38:14 +02:00
|
|
|
if (share->base.blobs && info->rec_buff_size >
|
|
|
|
share->base.default_rec_buff_size)
|
|
|
|
{
|
|
|
|
info->rec_buff_size= 1; /* Force realloc */
|
|
|
|
_ma_alloc_buffer(&info->rec_buff, &info->rec_buff_size,
|
|
|
|
share->base.default_rec_buff_size);
|
|
|
|
}
|
2006-04-11 16:45:10 +03:00
|
|
|
break;
|
|
|
|
case HA_EXTRA_NORMAL: /* Theese isn't in use */
|
|
|
|
info->quick_mode=0;
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_QUICK:
|
|
|
|
info->quick_mode=1;
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_NO_ROWS:
|
|
|
|
if (!share->state.header.uniques)
|
|
|
|
info->opt_flag|= OPT_NO_ROWS;
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_PRELOAD_BUFFER_SIZE:
|
|
|
|
info->preload_buff_size= *((ulong *) extra_arg);
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_CHANGE_KEY_TO_UNIQUE:
|
|
|
|
case HA_EXTRA_CHANGE_KEY_TO_DUP:
|
|
|
|
maria_extra_keyflag(info, function);
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_MMAP:
|
|
|
|
#ifdef HAVE_MMAP
|
2007-04-05 14:38:05 +03:00
|
|
|
if (block_records)
|
|
|
|
break; /* Not supported */
|
2006-04-11 16:45:10 +03:00
|
|
|
pthread_mutex_lock(&share->intern_lock);
|
2007-03-01 18:23:58 +01:00
|
|
|
/*
|
2007-07-27 12:06:39 +02:00
|
|
|
Memory map the data file if it is not already mapped. It is safe
|
|
|
|
to memory map a file while other threads are using file I/O on it.
|
|
|
|
Assigning a new address to a function pointer is an atomic
|
|
|
|
operation. intern_lock prevents that two or more mappings are done
|
|
|
|
at the same time.
|
2007-03-01 18:23:58 +01:00
|
|
|
*/
|
2007-07-27 12:06:39 +02:00
|
|
|
if (!share->file_map)
|
2006-04-11 16:45:10 +03:00
|
|
|
{
|
|
|
|
if (_ma_dynmap_file(info, share->state.state.data_file_length))
|
|
|
|
{
|
|
|
|
DBUG_PRINT("warning",("mmap failed: errno: %d",errno));
|
|
|
|
error= my_errno= errno;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
share->file_read= _ma_mmap_pread;
|
|
|
|
share->file_write= _ma_mmap_pwrite;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
pthread_mutex_unlock(&share->intern_lock);
|
|
|
|
#endif
|
|
|
|
break;
|
2006-09-07 17:07:17 +02:00
|
|
|
case HA_EXTRA_MARK_AS_LOG_TABLE:
|
|
|
|
pthread_mutex_lock(&share->intern_lock);
|
|
|
|
share->is_log_table= TRUE;
|
|
|
|
pthread_mutex_unlock(&share->intern_lock);
|
|
|
|
break;
|
2006-04-11 16:45:10 +03:00
|
|
|
case HA_EXTRA_KEY_CACHE:
|
|
|
|
case HA_EXTRA_NO_KEY_CACHE:
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
{
|
|
|
|
char tmp[1];
|
|
|
|
tmp[0]=function;
|
|
|
|
}
|
|
|
|
DBUG_RETURN(error);
|
|
|
|
} /* maria_extra */
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2007-04-05 14:38:05 +03:00
|
|
|
Start/Stop Inserting Duplicates Into a Table, WL#1648.
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void maria_extra_keyflag(MARIA_HA *info,
|
|
|
|
enum ha_extra_function function)
|
2006-04-11 16:45:10 +03:00
|
|
|
{
|
|
|
|
uint idx;
|
|
|
|
|
|
|
|
for (idx= 0; idx< info->s->base.keys; idx++)
|
|
|
|
{
|
|
|
|
switch (function) {
|
|
|
|
case HA_EXTRA_CHANGE_KEY_TO_UNIQUE:
|
|
|
|
info->s->keyinfo[idx].flag|= HA_NOSAME;
|
|
|
|
break;
|
|
|
|
case HA_EXTRA_CHANGE_KEY_TO_DUP:
|
|
|
|
info->s->keyinfo[idx].flag&= ~(HA_NOSAME);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
Completion of merge of mysql-5.1 into mysql-maria.
Manually imported changes done to MyISAM (include/myisam.h,
storage/myisam/*, sql/ha_myisam.*, mysql-test/t/myisam.test,
mysql-test/t/ps_2myisam.test) the last
months into Maria (tedious, should do it more frequently in the
future), including those not done at the previous 5.1->Maria merge
(please in the future don't forget to apply MyISAM changes to Maria
when you merge 5.1 into Maria).
Note: I didn't try to import anything which could be MyISAM-related
in other tests of mysql-test (I didn't want to dig in all csets),
but as QA is working to make most tests re-usable for other engines
(Falcon), it is likely that we'll benefit from this and just have
to set engine=Maria somewhere to run those tests on Maria.
func_group and partition tests fail but they already do in main 5.1
on my machine. No Valgrind error in t/*maria*.test.
Monty: please see the commit comment of maria.result and check.
BitKeeper/deleted/.del-ha_maria.m4:
Delete: config/ac-macros/ha_maria.m4
configure.in:
fix for the new way of enabling engines
include/maria.h:
importing changes done to MyISAM the last months into Maria
include/my_handler.h:
importing changes done to MyISAM the last months into Maria
include/myisam.h:
importing changes done to MyISAM the last months into Maria
mysql-test/r/maria.result:
identical to myisam.result, except the engine name in some places
AND in the line testing key_block_size=1000000000000000000:
Maria gives a key block size of 8192 while MyISAM gives 4096;
is it explainable by the difference between MARIA_KEY_BLOCK_LENGTH
and the same constant in MyISAM? Monty?
mysql-test/r/ps_maria.result:
identical to ps_2myisam.result (except the engine name in some places)
mysql-test/t/maria.test:
instead of engine=maria everywhere, I use @@storage_engine (reduces
the diff with myisam.test).
importing changes done to MyISAM the last months into Maria
mysys/my_handler.c:
importing changes done to MyISAM the last months into Maria
sql/ha_maria.cc:
importing changes done to MyISAM the last months into Maria
sql/ha_maria.h:
importing changes done to MyISAM the last months into Maria
sql/mysqld.cc:
unneeded
storage/maria/Makefile.am:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_check.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_create.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_delete_table.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_dynrec.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_extra.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_boolean_search.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_eval.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_nlq_search.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_parser.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_test1.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_update.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ftdefs.h:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_key.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_open.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_page.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rkey.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rsamepos.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rt_index.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rt_mbr.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_search.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_sort.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_test1.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_test2.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_test3.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_update.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_write.c:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_chk.c:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_def.h:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_ftdump.c:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_pack.c:
importing changes done to MyISAM the last months into Maria
2006-08-10 16:36:54 +02:00
|
|
|
|
|
|
|
|
|
|
|
int maria_reset(MARIA_HA *info)
|
|
|
|
{
|
|
|
|
int error= 0;
|
|
|
|
MARIA_SHARE *share=info->s;
|
|
|
|
DBUG_ENTER("maria_reset");
|
|
|
|
/*
|
|
|
|
Free buffers and reset the following flags:
|
|
|
|
EXTRA_CACHE, EXTRA_WRITE_CACHE, EXTRA_KEYREAD, EXTRA_QUICK
|
|
|
|
|
|
|
|
If the row buffer cache is large (for dynamic tables), reduce it
|
|
|
|
to save memory.
|
|
|
|
*/
|
|
|
|
if (info->opt_flag & (READ_CACHE_USED | WRITE_CACHE_USED))
|
|
|
|
{
|
|
|
|
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
|
|
|
|
error= end_io_cache(&info->rec_cache);
|
|
|
|
}
|
2007-01-18 21:38:14 +02:00
|
|
|
if (share->base.blobs && info->rec_buff_size >
|
|
|
|
share->base.default_rec_buff_size)
|
|
|
|
{
|
|
|
|
info->rec_buff_size= 1; /* Force realloc */
|
|
|
|
_ma_alloc_buffer(&info->rec_buff, &info->rec_buff_size,
|
|
|
|
share->base.default_rec_buff_size);
|
|
|
|
}
|
Completion of merge of mysql-5.1 into mysql-maria.
Manually imported changes done to MyISAM (include/myisam.h,
storage/myisam/*, sql/ha_myisam.*, mysql-test/t/myisam.test,
mysql-test/t/ps_2myisam.test) the last
months into Maria (tedious, should do it more frequently in the
future), including those not done at the previous 5.1->Maria merge
(please in the future don't forget to apply MyISAM changes to Maria
when you merge 5.1 into Maria).
Note: I didn't try to import anything which could be MyISAM-related
in other tests of mysql-test (I didn't want to dig in all csets),
but as QA is working to make most tests re-usable for other engines
(Falcon), it is likely that we'll benefit from this and just have
to set engine=Maria somewhere to run those tests on Maria.
func_group and partition tests fail but they already do in main 5.1
on my machine. No Valgrind error in t/*maria*.test.
Monty: please see the commit comment of maria.result and check.
BitKeeper/deleted/.del-ha_maria.m4:
Delete: config/ac-macros/ha_maria.m4
configure.in:
fix for the new way of enabling engines
include/maria.h:
importing changes done to MyISAM the last months into Maria
include/my_handler.h:
importing changes done to MyISAM the last months into Maria
include/myisam.h:
importing changes done to MyISAM the last months into Maria
mysql-test/r/maria.result:
identical to myisam.result, except the engine name in some places
AND in the line testing key_block_size=1000000000000000000:
Maria gives a key block size of 8192 while MyISAM gives 4096;
is it explainable by the difference between MARIA_KEY_BLOCK_LENGTH
and the same constant in MyISAM? Monty?
mysql-test/r/ps_maria.result:
identical to ps_2myisam.result (except the engine name in some places)
mysql-test/t/maria.test:
instead of engine=maria everywhere, I use @@storage_engine (reduces
the diff with myisam.test).
importing changes done to MyISAM the last months into Maria
mysys/my_handler.c:
importing changes done to MyISAM the last months into Maria
sql/ha_maria.cc:
importing changes done to MyISAM the last months into Maria
sql/ha_maria.h:
importing changes done to MyISAM the last months into Maria
sql/mysqld.cc:
unneeded
storage/maria/Makefile.am:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_check.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_create.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_delete_table.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_dynrec.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_extra.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_boolean_search.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_eval.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_nlq_search.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_parser.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_test1.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_update.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ftdefs.h:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_key.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_open.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_page.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rkey.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rsamepos.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rt_index.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rt_mbr.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_search.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_sort.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_test1.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_test2.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_test3.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_update.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_write.c:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_chk.c:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_def.h:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_ftdump.c:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_pack.c:
importing changes done to MyISAM the last months into Maria
2006-08-10 16:36:54 +02:00
|
|
|
#if defined(HAVE_MMAP) && defined(HAVE_MADVISE)
|
|
|
|
if (info->opt_flag & MEMMAP_USED)
|
|
|
|
madvise(share->file_map,share->state.state.data_file_length,MADV_RANDOM);
|
|
|
|
#endif
|
|
|
|
info->opt_flag&= ~(KEY_READ_USED | REMEMBER_OLD_POS);
|
|
|
|
info->quick_mode=0;
|
|
|
|
info->lastinx= 0; /* Use first index as def */
|
2007-01-18 21:38:14 +02:00
|
|
|
info->last_search_keypage= info->cur_row.lastpos= HA_OFFSET_ERROR;
|
Completion of merge of mysql-5.1 into mysql-maria.
Manually imported changes done to MyISAM (include/myisam.h,
storage/myisam/*, sql/ha_myisam.*, mysql-test/t/myisam.test,
mysql-test/t/ps_2myisam.test) the last
months into Maria (tedious, should do it more frequently in the
future), including those not done at the previous 5.1->Maria merge
(please in the future don't forget to apply MyISAM changes to Maria
when you merge 5.1 into Maria).
Note: I didn't try to import anything which could be MyISAM-related
in other tests of mysql-test (I didn't want to dig in all csets),
but as QA is working to make most tests re-usable for other engines
(Falcon), it is likely that we'll benefit from this and just have
to set engine=Maria somewhere to run those tests on Maria.
func_group and partition tests fail but they already do in main 5.1
on my machine. No Valgrind error in t/*maria*.test.
Monty: please see the commit comment of maria.result and check.
BitKeeper/deleted/.del-ha_maria.m4:
Delete: config/ac-macros/ha_maria.m4
configure.in:
fix for the new way of enabling engines
include/maria.h:
importing changes done to MyISAM the last months into Maria
include/my_handler.h:
importing changes done to MyISAM the last months into Maria
include/myisam.h:
importing changes done to MyISAM the last months into Maria
mysql-test/r/maria.result:
identical to myisam.result, except the engine name in some places
AND in the line testing key_block_size=1000000000000000000:
Maria gives a key block size of 8192 while MyISAM gives 4096;
is it explainable by the difference between MARIA_KEY_BLOCK_LENGTH
and the same constant in MyISAM? Monty?
mysql-test/r/ps_maria.result:
identical to ps_2myisam.result (except the engine name in some places)
mysql-test/t/maria.test:
instead of engine=maria everywhere, I use @@storage_engine (reduces
the diff with myisam.test).
importing changes done to MyISAM the last months into Maria
mysys/my_handler.c:
importing changes done to MyISAM the last months into Maria
sql/ha_maria.cc:
importing changes done to MyISAM the last months into Maria
sql/ha_maria.h:
importing changes done to MyISAM the last months into Maria
sql/mysqld.cc:
unneeded
storage/maria/Makefile.am:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_check.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_create.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_delete_table.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_dynrec.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_extra.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_boolean_search.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_eval.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_nlq_search.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_parser.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_test1.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ft_update.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_ftdefs.h:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_key.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_open.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_page.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rkey.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rsamepos.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rt_index.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_rt_mbr.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_search.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_sort.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_test1.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_test2.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_test3.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_update.c:
importing changes done to MyISAM the last months into Maria
storage/maria/ma_write.c:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_chk.c:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_def.h:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_ftdump.c:
importing changes done to MyISAM the last months into Maria
storage/maria/maria_pack.c:
importing changes done to MyISAM the last months into Maria
2006-08-10 16:36:54 +02:00
|
|
|
info->page_changed= 1;
|
|
|
|
info->update= ((info->update & HA_STATE_CHANGED) | HA_STATE_NEXT_FOUND |
|
|
|
|
HA_STATE_PREV_FOUND);
|
|
|
|
DBUG_RETURN(error);
|
|
|
|
}
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
|
|
|
|
|
|
|
|
int _ma_sync_table_files(const MARIA_HA *info)
|
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
return (my_sync(info->dfile.file, MYF(MY_WME)) ||
|
|
|
|
my_sync(info->s->kfile.file, MYF(MY_WME)));
|
WL#3072 Maria Recovery. Making DDLs durable in Maria:
Sync table files after CREATE (of non-temp table), DROP, RENAME,
TRUNCATE, sync directories and symlinks (for the 3 first commands).
Comments for future log records.
In ma_rename(), if rename of index works and then rename of data fails,
try to undo the rename of the index to leave a consistent state.
mysys/my_symlink.c:
sync directory after creation of a symbolic link in it, if asked
mysys/my_sync.c:
comment. Fix for when the file's name has no directory in it.
storage/maria/ma_create.c:
sync files and links and dirs when creating a non-temporary table.
Optimizations of the above to reduce syncs in the common cases:
* if index file and data file have the exact same paths (regular
and link), sync the directories (of regular and link) only once
after creating the last file (the data file).
* don't sync the data file if we didn't write to it (always true
in our builds).
storage/maria/ma_delete_all.c:
sync files after truncating a table
storage/maria/ma_delete_table.c:
sync files and symbolic links and dirs after dropping a table
storage/maria/ma_extra.c:
a function which wraps the sync of the index file and the sync of the
data file.
storage/maria/ma_locking.c:
using a wrapper function
storage/maria/ma_rename.c:
sync files and symbolic links and dirs after renaming a table.
If rename of index works and then rename of data fails, try to undo
the rename of the index to leave a consistent state. That is just a
try, it may fail...
storage/maria/ma_test3.c:
warning to not pay attention to this test.
storage/maria/maria_def.h:
declaration for the function added to ma_extra.c
2006-11-27 22:01:29 +01:00
|
|
|
}
|
Fix for three bugs:
number 1: "./mtr --mysqld=--default-storage-engine=maria backup"
restored no rows (forgot to flush data pages before my_copy(),
and also the maria_repair() used by ha_maria::restore() needed
a correct data_file_length to not miss rows). [note that BACKUP
TABLE will be removed anyway in 5.2]
number 2: "./mtr --mysqld=--default-storage-engine=maria bootstrap"
caused segfault (uninitialized variable)
number 3: "./mtr --mysqld=--default-storage-engine=maria check"
showed warning in CHECK TABLE (maria_create() created a non-empty
data file with data_file_length==0).
storage/maria/ha_maria.cc:
in ha_maria::backup, need to flush the data file before copying it,
otherwise data misses from the copy (bug 1)
storage/maria/ma_bitmap.c:
when allocating data at the end of the bitmap, best_data is at "end",
should not be left to 0 (bug 2)
storage/maria/ma_check.c:
_ma_scan_block_record() is used in QUICK repair. It relies on
data_file_length. RESTORE TABLE mixes the MAI of an empty table
(so, data_file_length==0) with an non-empty MAD, and does a
QUICK repair; that got fooled (thought it had hit EOF immediately,
so found no records) (bug 1)
storage/maria/ma_create.c:
At the end of maria_create() we have, in the index file,
data_file_length==0, while the data file has a bitmap page (8192).
This inconsistency makes CHECK TABLE rightly complain.
Fixed by not creating a first bitmap page during maria_create()
(also saves disk space) (bug 3) Question for Monty.
storage/maria/ma_extra.c:
A function to flush the data and index files before one can
use OS syscalls (reads, writes) on those files. For example,
ha_maria::backup() does a my_copy() of the data file and so
all cached pieces of this file must be sent to the OS (bug 1)
This function will have to be used elsewhere in Maria, several places
have not been updated when we added pagecache-ing of the data file
(they still only flush the index file), they are probable bugs.
storage/maria/maria_def.h:
new function. Needs to be visible from ha_maria::backup.
2007-08-07 16:06:42 +02:00
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
@brief flushes the data and/or index file of a table
|
|
|
|
|
|
|
|
This is useful when one wants to read a table using OS syscalls (like
|
|
|
|
my_copy()) and first wants to be sure that MySQL-level caches go down to
|
|
|
|
the OS so that OS syscalls can see all data. It can flush rec_cache,
|
|
|
|
bitmap, pagecache of data file, pagecache of index file.
|
|
|
|
|
|
|
|
@param info table
|
|
|
|
@param flush_data_or_index one or two of these flags:
|
|
|
|
MARIA_FLUSH_DATA, MARIA_FLUSH_INDEX
|
|
|
|
@param flush_type_for_data
|
|
|
|
@param flush_type_for_index
|
|
|
|
|
|
|
|
@note does not sync files (@see _ma_sync_table_files()).
|
|
|
|
@note Progressively this function will be used in all places where we flush
|
|
|
|
the index but not the data file (probable bugs).
|
|
|
|
|
|
|
|
@return Operation status
|
|
|
|
@retval 0 OK
|
|
|
|
@retval 1 Error
|
|
|
|
*/
|
|
|
|
|
|
|
|
int _ma_flush_table_files(MARIA_HA *info, uint flush_data_or_index,
|
|
|
|
enum flush_type flush_type_for_data,
|
|
|
|
enum flush_type flush_type_for_index)
|
|
|
|
{
|
|
|
|
MARIA_SHARE *share= info->s;
|
|
|
|
/* flush data file first because it's more critical */
|
|
|
|
if (flush_data_or_index & MARIA_FLUSH_DATA)
|
|
|
|
{
|
|
|
|
if (info->opt_flag & WRITE_CACHE_USED)
|
|
|
|
{
|
- speed optimization:
minimize writes to transactional Maria tables: don't write
data pages, state, and open_count at the end of each statement.
Data pages will be written by a background thread periodically.
State will be written by Checkpoint periodically.
open_count serves to detect when a table is potentially damaged
due to an unclean mysqld stop, but thanks to recovery an unclean
mysqld stop will be corrected and so open_count becomes useless.
As state is written less often, it is often obsolete on disk,
we thus should avoid to read it from disk.
- by removing the data page writes above, it is necessary to put
it back at the start of some statements like check, repair and
delete_all. It was already necessary in fact (see ma_delete_all.c).
- disabling CACHE INDEX on Maria tables for now (fixes crash
of test 'key_cache' when run with --default-storage-engine=maria).
- correcting some fishy code in maria_extra.c (we possibly could lose
index pages when doing a DROP TABLE under Windows, in theory).
storage/maria/ha_maria.cc:
disable CACHE INDEX in Maria for now (there is a single cache for now),
it crashes and it's not a priority
storage/maria/ma_bitmap.c:
debug message
storage/maria/ma_check.c:
The statement before maria_repair() may not flush state,
so it needs to be done by maria_repair() (indeed this function
uses maria_open(HA_OPEN_COPY) so reads state from disk,
so needs to find it up-to-date on disk).
For safety (but normally this is not needed) we remove index blocks
out of the cache before repairing.
_ma_flush_blocks() becomes _ma_flush_table_files_after_repair():
it now additionally flushes the data file and state and syncs files.
As a side effect, the assertion "no WRITE_CACHE_USED" from
_ma_flush_table_files() fired so we move all end_io_cache() done
at the end of repair to before the calls to _ma_flush_table_files_after_repair().
storage/maria/ma_close.c:
when closing a transactional table, we fsync it. But we need to
do this only after writing its state.
We need to write the state at close time only for transactional
tables (the other tables do that at last unlock).
Putting back the O_RDONLY||crashed condition which I had
removed earlier.
Unmap the file before syncing it (does not matter now as Maria
does not use mmap)
storage/maria/ma_delete_all.c:
need to flush data pages before chsize-ing it. Was needed even when
we flushed data pages at the end of each statement, because we didn't
anyway do it if under LOCK TABLES: the change here thus fixes this bug:
create table t(a int) engine=maria;lock tables t write;
insert into t values(1);delete from t;unlock tables;check table t;
"Size of datafile is: 16384 Should be: 8192"
(an obsolete page went to disk after the chsize(), at unlock time).
storage/maria/ma_extra.c:
When doing share->last_version=0, we make the MARIA_SHARE-in-memory
invisible to future openers, so need to have an up-to-date state
on disk for them. The same way, future openers will reopen the data
and index file, so they will not find our cached blocks, so we
need to flush them to disk.
In HA_EXTRA_FORCE_REOPEN, this probably happens naturally as all
tables normally get closed, we however add a safety flush.
In HA_EXTRA_PREPARE_FOR_RENAME, we need to do the flushing. On
Windows we additionally need to close files.
In HA_EXTRA_PREPARE_FOR_DROP, we don't need to flush anything but
remove dirty cached blocks from memory. On Windows we need to close
files.
Closing files forces us to sync them before (requirement for transactional
tables).
For mutex reasons (don't lock intern_lock twice), we move
maria_lock_database() and _ma_decrement_open_count() first in the list
of operations.
Flush also data file in HA_EXTRA_FLUSH.
storage/maria/ma_locking.c:
For transactional tables:
- don't write data pages / state at unlock time;
as a consequence, "share->changed=0" cannot be done.
- don't write state in _ma_writeinfo()
- don't maintain open_count on disk (Recovery corrects the table in case of crash
anyway, and we gain speed by not writing open_count to disk),
For non-transactional tables, flush the state at unlock only
if the table was changed (optimization).
Code which read the state from disk is relevant only with
external locking, we disable it (if want to re-enable it, it shouldn't
for transactional tables as state on disk may be obsolete (such tables
does not flush state at unlock anymore).
The comment "We have to flush the write cache" is now wrong because
maria_lock_database(F_UNLCK) now happens before thr_unlock(), and
we are not using external locking.
storage/maria/ma_open.c:
_ma_state_info_read() is only used in ma_open.c, making it static
storage/maria/ma_recovery.c:
set MARIA_SHARE::changed to TRUE when we are going to apply a
REDO/UNDO, so that the state gets flushed at close.
storage/maria/ma_test_recovery.expected:
Changes introduced by this patch:
- good: the "open" (table open, not properly closed) is gone,
it was pointless for a recovered table
- bad: stemming from different moments of writing the index's state
probably (_ma_writeinfo() used to write the state after every row
write in ma_test* programs, doesn't anymore as the table is
transactional): some differences in indexes (not relevant as we don't
yet have recovery for them); some differences in count of records
(changed from a wrong value to another wrong value) (not relevant
as we don't recover this count correctly yet anyway, though
a patch will be pushed soon).
storage/maria/ma_test_recovery:
for repeatable output, no names of varying directories.
storage/maria/maria_chk.c:
function renamed
storage/maria/maria_def.h:
Function became local to ma_open.c. Function renamed.
2007-09-06 16:53:26 +02:00
|
|
|
/* normally any code which creates a WRITE_CACHE destroys it later */
|
|
|
|
DBUG_ASSERT(0);
|
Fix for three bugs:
number 1: "./mtr --mysqld=--default-storage-engine=maria backup"
restored no rows (forgot to flush data pages before my_copy(),
and also the maria_repair() used by ha_maria::restore() needed
a correct data_file_length to not miss rows). [note that BACKUP
TABLE will be removed anyway in 5.2]
number 2: "./mtr --mysqld=--default-storage-engine=maria bootstrap"
caused segfault (uninitialized variable)
number 3: "./mtr --mysqld=--default-storage-engine=maria check"
showed warning in CHECK TABLE (maria_create() created a non-empty
data file with data_file_length==0).
storage/maria/ha_maria.cc:
in ha_maria::backup, need to flush the data file before copying it,
otherwise data misses from the copy (bug 1)
storage/maria/ma_bitmap.c:
when allocating data at the end of the bitmap, best_data is at "end",
should not be left to 0 (bug 2)
storage/maria/ma_check.c:
_ma_scan_block_record() is used in QUICK repair. It relies on
data_file_length. RESTORE TABLE mixes the MAI of an empty table
(so, data_file_length==0) with an non-empty MAD, and does a
QUICK repair; that got fooled (thought it had hit EOF immediately,
so found no records) (bug 1)
storage/maria/ma_create.c:
At the end of maria_create() we have, in the index file,
data_file_length==0, while the data file has a bitmap page (8192).
This inconsistency makes CHECK TABLE rightly complain.
Fixed by not creating a first bitmap page during maria_create()
(also saves disk space) (bug 3) Question for Monty.
storage/maria/ma_extra.c:
A function to flush the data and index files before one can
use OS syscalls (reads, writes) on those files. For example,
ha_maria::backup() does a my_copy() of the data file and so
all cached pieces of this file must be sent to the OS (bug 1)
This function will have to be used elsewhere in Maria, several places
have not been updated when we added pagecache-ing of the data file
(they still only flush the index file), they are probable bugs.
storage/maria/maria_def.h:
new function. Needs to be visible from ha_maria::backup.
2007-08-07 16:06:42 +02:00
|
|
|
if (end_io_cache(&info->rec_cache))
|
|
|
|
goto err;
|
|
|
|
info->opt_flag&= ~WRITE_CACHE_USED;
|
|
|
|
}
|
|
|
|
if (share->data_file_type == BLOCK_RECORD)
|
|
|
|
{
|
|
|
|
if(_ma_flush_bitmap(share) ||
|
|
|
|
flush_pagecache_blocks(share->pagecache, &info->dfile,
|
|
|
|
flush_type_for_data))
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if ((flush_data_or_index & MARIA_FLUSH_INDEX) &&
|
|
|
|
flush_pagecache_blocks(share->pagecache, &share->kfile,
|
|
|
|
flush_type_for_index))
|
|
|
|
goto err;
|
|
|
|
return 0;
|
|
|
|
err:
|
|
|
|
maria_print_error(info->s, HA_ERR_CRASHED);
|
|
|
|
maria_mark_crashed(info);
|
|
|
|
return 1;
|
|
|
|
}
|