mariadb/examples
Rich Prohaska fc962b1012 refs #6040 update the README
git-svn-id: file:///svn/toku/tokudb@53420 c7de825b-a66e-492c-adef-691d508d4ae1
2013-04-17 00:01:29 -04:00
..
CMakeLists.txt refs #5206 #5275 keep the examples c programs 2013-04-17 00:00:59 -04:00
db-insert-multiple.c refs #6040 update the README 2013-04-17 00:01:29 -04:00
db-insert.c refs #6040 get random insertions working in our examples by increasing the cache size to 1G and turning checkpointing on 2013-04-17 00:01:29 -04:00
db-scan.c refs #5206 #5275 keep the examples c programs 2013-04-17 00:00:59 -04:00
db-update.c refs #6040 update the README 2013-04-17 00:01:29 -04:00
Makefile #4536 build examples with shared or static tokudb libs refs[t:4536] 2013-04-17 00:00:52 -04:00
README.examples refs #6040 update the README 2013-04-17 00:01:29 -04:00

The examples includes a pair of programs that can be compiled to use either the Berkeley DB library or the Tokutek Fractal Tree index library.

Note: The file formats are different from TokuDB and Berkley DB.  Thus
you cannot access a database created by Berkeley DB using the Tokutek
DB, or vice-versa.

db-insert is a program that inserts random key-value pairs into a database.

db-scan is a program that scans through the key-value pairs, reading every row, from a database.

db-update is a program that upserts key-value pairs into a database.  If the key already exists it increment a count in the value.

db-insert-multiple is a program and inserts key-value pairs into multiple databases.  This is is now TokuDB maintains consistent
secondary databases.

To build it and run it (it's been tested on Fedora 10):
$ make                                           (Makes the binaries)
Run the insertion workload under TokuDB:
$ ./db-insert
Run the insertion workload under BDB:
$ ./db-insert-bdb

Here is what the output looks like (this on a Thinkpad X61s laptop
running Fedora 10).  BDB is a little faster for sequential insertions
(the first three columns), but much much slower for random insertions
(the next 3 columns), so that TokuDB is faster on combined workload.

$ ./db-insert
serial and random insertions of 1048576 per batch
serial  2.609965s   401759/s    random 10.983798s    95466/s    cumulative 13.593869s   154272/s
serial  3.053433s   343409/s    random 12.008670s    87318/s    cumulative 28.656115s   146367/s
serial  5.198312s   201715/s    random 15.087426s    69500/s    cumulative 48.954605s   128516/s
serial  6.096396s   171999/s    random 13.550688s    77382/s    cumulative 68.638321s   122215/s
Shutdown  4.025110s
Total time 72.677498s for 8388608 insertions =   115422/s
$ ./db-insert-bdb 
serial and random insertions of 1048576 per batch
serial  2.623888s   399627/s    random  8.770850s   119552/s    cumulative 11.394805s   184045/s
serial  3.081946s   340232/s    random 21.046589s    49822/s    cumulative 35.523434s   118071/s
serial 14.160498s    74049/s    random 497.117523s     2109/s    cumulative 546.804504s    11506/s
serial  1.534212s   683462/s    random 1128.525146s      929/s    cumulative 1676.863892s     5003/s
Shutdown 195.879242s
Total time 1872.746582s for 8388608 insertions =     4479/s

The files are smaller for TokuDB than BDB.

$ ls -lh bench.tokudb/
total 39M
-rwxrwxr-x 1 bradley bradley 39M 2009-07-28 15:36 bench.db
$ ls -lh bench.bdb/
total 322M
-rw-r--r-- 1 bradley bradley 322M 2009-07-28 16:14 bench.db

When scanning the table, one can run out of locks with BDB.  There are ways around it (increase the lock table size).

$ ./db-scan-bdb --nox
Lock table is out of available object entries
db-scan-bdb: db-scan.c:177: scanscan_hwc: Assertion `r==(-30988)' failed.
Aborted

TokuDB is fine on a big table scan.

$ ./db-scan --nox
Scan    33162304 bytes (2072644 rows) in  7.924463s at  4.184801MB/s
Scan    33162304 bytes (2072644 rows) in  3.062239s at 10.829431MB/s
0:3 1:53 2:56 
miss=3 hit=53 wait_reading=0 wait=0
VmPeak:	  244668 kB
VmHWM:	   68096 kB
VmRSS:	    1232 kB

The update-bdb program upserts 1B rows into a BDB database. When the database gets larger than memory, the throughput
should tank since every update needs to read a block from the storage system.  The storage system becomes the performance
bottleneck.  The program uses 1 1GB cache in front of the kernel's file system buffer cache.  The program should hit the wall
at about 300M rows on a machine with 16GB of memory since keys are 8 bytes and values are 8 bytes in size.

$ ./db-update-bdb

The update program upserts 1B rows into a TokuDB database.  Throughput should be not degrade significantly since the cost
of the storage system reads is amortized over 1000's of update operations.  One should expect TokuDB to be at least 50 times
faster than BDB.

$ ./db-update

There isn't much documentation for the Tokutek Fractal Tree index library, but most of the API is like Berkeley DB's.