mirror of
https://github.com/MariaDB/server.git
synced 2025-01-22 06:44:16 +01:00
548d03d70e
git-svn-id: file:///svn/toku/tokudb@25303 c7de825b-a66e-492c-adef-691d508d4ae1
219 lines
8.1 KiB
Text
219 lines
8.1 KiB
Text
The essential idea of auto-upgrade from BRT_LAYOUT_VERSION 12 to 13 is to
|
|
take advantage of the similarities between the two versions, and not to
|
|
try to create an infrastructure for all future upgrades.
|
|
|
|
As future layouts are created, upgrade paths, if any, will be crafted to
|
|
each particular change.
|
|
|
|
On startup, the version number of the recovery log is checked. If an
|
|
upgrade is needed, then the log is tested for a clean shutdown. If
|
|
there is no clean shutdown, then an error is returned. If the log does
|
|
end in a clean shutdown, then a new log file is created with the current
|
|
version number, starting with an LSN that is one greater than the clean
|
|
shutdown.
|
|
|
|
Once the new log is in place, the persistent environment dictionary is
|
|
upgraded, and then normal operation begins.
|
|
|
|
The startup of a new version of the storage engine might not be crash
|
|
safe.
|
|
|
|
Dictionaries, including the persistent environment and the fileops
|
|
directory, are upgraded as they are read into memory from disk.
|
|
|
|
|
|
The brt header is upgraded by
|
|
- removing an unused flag
|
|
- setting the transaction id to the xid of the clean shutdown
|
|
- marking the header as dirty
|
|
|
|
Each non-leaf node is upgraded by:
|
|
- removing an unused flag
|
|
- upgrading the version numbers in the node
|
|
- marking the node as dirty.
|
|
This works because all of the version 12 messages are unchanged
|
|
in version 13. The version 12 messages will be applied to the
|
|
leafentries using version 13 code.
|
|
|
|
Each non-leaf node is upgraded by
|
|
- removing an unused flag
|
|
- using modified version 12 code to unpack the version 12 packed
|
|
leaf entries into version 13 unpacked leaf entries
|
|
- repacking the leafentries into a new mempool
|
|
- destroying the original mempool (that holds the version 12
|
|
node read from disk)
|
|
The node is marked as dirty.
|
|
|
|
Once the brt is open, a BRT_OPTIMIZE broadcast message is inserted to
|
|
optimize the dictionary.
|
|
|
|
|
|
|
|
A schematic overview of how a brt node is deserialized:
|
|
|
|
toku_deserialize_brtnode_from() { // accepts fd, fills in BRTNODE, brt_header
|
|
|
|
deserialize_brtnode_from_rbuf_versioned() {
|
|
deserialize_brtnode_from_rbuf() // accepts rbuf fills in BRTNODE
|
|
|
|
if nonleaf deserialize_brtnode_nonleaf_from_rbuf(){ // rbuf -> BRTNODE (no version sensitivity)
|
|
if leaf deserialize_brtnode_leaf_from_rbuf() { // calculates node size from leafentry sizes
|
|
// leafentry sizes vary with version
|
|
if version 12 {
|
|
if leaf {
|
|
unpack each leafentry into a version 13 ule
|
|
pack each version 13 ule into version 13 le
|
|
allocate new mempool for version 13 les
|
|
destroy old mempool
|
|
}
|
|
remove unused flag
|
|
increment version number
|
|
mark dirty
|
|
}
|
|
}
|
|
}
|
|
|
|
|
|
|
|
Open issues:
|
|
- The brt layer makes some callbacks to the handlerton layer. If
|
|
any of the functions change from one version to another, then
|
|
the result may not be correct. A version number could be
|
|
included in all the function signatures so the callback function
|
|
could be aware of what version the caller is expecting.
|
|
The callbacks are:
|
|
- comparator
|
|
- hot index generator
|
|
- hot column mutator
|
|
|
|
|
|
|
|
Note, brt-internal.h defines struct subtree_estimates which contains field nkeys.
|
|
This field is obsolete with the removal of dupsort databases (since it will always
|
|
be the same as ndata), but removing it is not worth the trouble.
|
|
|
|
|
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
The changes from version 12 to 13 include (may not be complete list):
|
|
- Persistent environment dictionary
|
|
- version number
|
|
- timestamp of environment creation (database installation)
|
|
- history of previous versions
|
|
- timestamps for upgrades
|
|
- Recovery log
|
|
- version number
|
|
- new log entries (hotindex, maybe others)
|
|
- brt header
|
|
- version number
|
|
- added field (root_xid_that_created), set to last checkpoint lsn
|
|
- deleted flag (built-in comparison function for values)
|
|
- brt internal node
|
|
- version number
|
|
- additional message(s) possible, no upgrade needed beyond changing version number
|
|
- brt leafnode
|
|
- version number
|
|
- new leafentry format
|
|
- version 12 leafentry unpack code is preserved
|
|
- rollback log
|
|
- version number is only change, no upgrade is needed because
|
|
rollback logs are not preserved through clean shutdown
|
|
|
|
|
|
Because version 12 and version 13 leafentries are significantly
|
|
different, the way leafentries is handled is as follows:
|
|
- deserialize_brtnode_leaf_from_rbuf()
|
|
- sets up array of pointers to leafentries (to be unpacked later),
|
|
these pointers are put into an OMT
|
|
- calculates checksum (x1764)
|
|
- adjusts ndone byte counter to verify that entire rbuf is read
|
|
- deserialize_brtnode_from_rbuf_versioned() calls
|
|
deserialize_brtnode_leaf_from_rbuf()
|
|
- loop through all leafentries, one at a time:
|
|
- unpack version 12 le and repack as version 13 le, each in its own malloc'ed memory
|
|
- calculate new fingerprint
|
|
- create new block
|
|
- allocate new mempool
|
|
- copy individual les into new mempool
|
|
- destroy individual les
|
|
- destroy original mempool
|
|
|
|
|
|
|
|
|
|
|
|
Open issues:
|
|
|
|
- We need to verify clean shutdown before upgrade.
|
|
If shutdown was not clean then we would run recovery, and the
|
|
code does not support recovering from an old format version.
|
|
- One way to do this is to increase the log version number (either
|
|
increment or synchronize with BRT_LAYOUT_VERSION).
|
|
- Can we just look at the log? needs_recovery(env);
|
|
If this mechanism is specific
|
|
to the version 12 to 13 upgrade, then that is adequate.
|
|
Once the recovery log format changes, then we need a
|
|
different mechanism, similar to the 3.x->4.x upgrade
|
|
logic in log_upgrade.c.
|
|
|
|
|
|
- How to decide that an upgrade is necessary?
|
|
Needed for logic that says:
|
|
- If upgrade is necessary, then verify clean shutdown:
|
|
If upgrade is necessary (recorded version is old)
|
|
and clean shutdown was not done, then exit with
|
|
error code.
|
|
|
|
- tokudb_needs_recovery() is not separate from verification of
|
|
clean shutdown. This function indicates if a recovery is
|
|
necessary, but it does not verify simple clean shutdown
|
|
with just the shutdown log entry. Instead, it looks for
|
|
checkpoint begin/checkpoint end. (Also, comment at end
|
|
is permitted.)
|
|
|
|
|
|
Proposed solution:
|
|
- Decision on whether to perform upgrade is done by examining log version.
|
|
- If we need an upgrade:
|
|
- If not clean shutdown, then exit with error message, change nothing
|
|
on disk.
|
|
- If clean shutdown, then create new log by simply creating new log file
|
|
(empty, or perhaps with initial comment that says "start of new log").
|
|
- Normal log-trimming code will delete old logs. (None of the
|
|
locking logic in log_upgrade.c is needed.)
|
|
- Log-opening logic needs to be modified to do this. See log file
|
|
manager initialization function (and maybe functions it calls),
|
|
maybe the log cursor:
|
|
- logfilemgr.c: toku_logfilemgr_init()
|
|
- Log-trimming logic loops over pairs of file names and LSNs,
|
|
deleting old files based on LSN.
|
|
|
|
- Question: would it help any if the "clean shutdown" log entry
|
|
was required to be in a new log file of its own? It would
|
|
prevent the creation of an empty log file after "clean shutdown."
|
|
It might, but it's probably not worth doing.
|
|
|
|
|
|
|
|
Issue of optimize message (to be sent into each dictionary on upgrade)
|
|
- BRT_COMMIT_BROADCAST_ALL (should be faster executing, always commits everything, was needed for an earlier upgrade attempt)
|
|
- BRT_OPTIMIZE (better tested, has been used, tests to see if transactions are still live)
|
|
After upgrade (after clean shutdown, no running transactions, trees
|
|
fully flattened), there is no difference in what these two message do.
|
|
Note, BRT_OPTIMIZE requires a clean shutdown if used on upgrade. If used before recovery (which an upgrade
|
|
without clean shutdown would do), then it would be wrong because it would appear that all transactions were
|
|
completed.
|
|
|
|
|
|
|
|
TODO:
|
|
- update brt header fields
|
|
- original layout version
|
|
- version read from disk
|
|
- add accountability counters
|
|
- capture LSN of clean shutdown, use instead of checkpoint lsn
|
|
|