mariadb/man/texi/intro.texi

@node Introduction to TokuDB
@chapter Introduction to TokuDB

TokuDB is an embedded transactional database that provides very fast
insertions and range queries on dictionaries.

TokUDB provides transactions with full ACID properties.

TokuDB is a library that can be linked directly into applications,
making it an @dfn{embedded} database.

TokuDB provides an API that is very similar to the Berkeley@tie{}DB.

@node Dictionaries and Associative Memories
@section Dictionaries and Associative Memories

Here we describe what we mean by ``dictionary''.

A @dfn{dictionary} is an ordered set of key-data pairs.
@itemize @bullet
@item
An @dfn{insertion} stores a key-data pair into a dictionary.
(One of the API calls that inserts pairs is called @code{DB->put}.)

@item
A @dfn{point query} finds the data associated with a particular key.
(One of the API calls that executes a point query is called @code{DB->get}.)

@item
A @dfn{range query} iterates through all the key-value pairs in a
range.  For example, in a phone book, you could step through all the
people whose last names are between @samp{Jones} and @samp{Smith}.  In
the API, range queries are supported using cursors.
@end itemize

An @dfn{associative memory} is like a dictionary but it supports only
insertions and point queries, but not range queries.  Often hash
tables are used to implement associative memories.  (In some
languages, such as Perl, an ``dictionary'' is really an associative
memory.)

@node Asymptotic Performance of Fractal Trees
@section  Asymptotic Performance of Fractal Trees

For in-memory dictionaries, trees are often the data structure of
choice.  For example, red-black trees or AVL trees implement all of
the above operations in @math{O(\log N)} time for in-memory data
structures, where $math{N} is the number of entries in the dictionary
(assuming constant-size key-data pairs).

For disk-resident dictionaries, most systems (such as
Berkeley@tie{}DB) employ B-trees, which can implement insertions and
point queries with @math{O(\log_B N)} disk operations in the worst
case, where @math{B} is the block size of the B-tree (assuming
constant-size key-data pairs).

The @dfn{effective block size} of a disk is the size at which the
disk-head movement does not dominate the cost of storing or retrieving
data off the disk.  The effective block size is difficult to calculate
for actual disk systems, since it depends on how far the head must
move (the disk head moves short distances faster than long distances,
reducing the effective block size), whether the data is near the
center of the disk (where the bandwidth is small and hence the
effective block size tends to be small) or at the outer edge of the
disk (where the bandwidth is larger), and on how much prefetching the
operating system and disk firmware perform.

TokuDB, on the other hand, employs @dfn{fractal trees}, a new class of
data structures that provide the same API as B-trees, but are much
faster.  Fractal tree point-queries have the same asymptotic cost as
B-trees (@math{O(\log_B N)}), but insertions require only
@math{O((\log N)/B)} disk operations on average, where @math{B} is the
effective block size of the disk system.

How much faster is that in concrete terms?  In 2007, the effective
block size of a disk is about a megabyte.  If you use 100-byte
key-data pairs, then you can fit about 10,000 pairs into an effective
block.  If you have a 100 Terabyte database then @math{N=2^{40}}, so
on average a fractal tree requires about @math{(\log N)/B = 40/10000 =
1/250} of a disk transfer per insertion.  That is, on average, you can
perform 250 insertions per disk I/O.  In contrast, a traditional
B-tree equires @math{\log_B N = 2} disk I/O's per insertion.  In
practice, because of caching in main memory, traditional B-trees
require 1 disk read and 1 disk write per insertion.  Thus TokuDB's
insertions are orders of magnitude faster than dictionaries based on
traditional B-trees.

Range queries are also faster for TokuDB than in a typical B-tree.
Since B-trees usually use blocks sizes that relatively small compared
to the effective block size of the disk they require many disk-head
movements to traverse a range.  Fractal trees use block sizes that are
the effective block size of the disk.