mirror of
https://github.com/MariaDB/server.git
synced 2025-01-16 20:12:31 +01:00
161942e3ce
Docs/internals.texi: Auto merged include/my_global.h: Auto merged include/mysql_com.h: Auto merged innobase/row/row0mysql.c: Auto merged innobase/row/row0sel.c: Auto merged libmysql/Makefile.am: Auto merged libmysqld/Makefile.am: Auto merged libmysqld/lib_vio.c: Auto merged mysql-test/r/heap.result: Auto merged mysql-test/r/innodb.result: Auto merged mysql-test/t/heap.test: Auto merged sql/ha_innodb.h: Auto merged sql/ha_myisam.cc: Auto merged BitKeeper/deleted/.del-password.c~76f30876e68eddb4: Auto merged sql/handler.cc: Auto merged sql/handler.h: Auto merged sql/item_func.cc: Auto merged sql/key.cc: Auto merged sql/lex.h: Auto merged sql/log.cc: Auto merged sql/mysqld.cc: Auto merged sql/slave.cc: Auto merged sql/slave.h: Auto merged sql/sql_class.cc: Auto merged sql/sql_repl.cc: Auto merged sql/sql_show.cc: Auto merged sql/sql_yacc.yy: Auto merged sql/table.cc: Auto merged strings/strto.c: Auto merged
6416 lines
201 KiB
Text
6416 lines
201 KiB
Text
\input texinfo @c -*-texinfo-*-
|
|
@c Copyright 2002 MySQL AB
|
|
@c
|
|
@c %**start of header
|
|
@setfilename internals.info
|
|
|
|
@c We want the types in the same index
|
|
@synindex cp fn
|
|
|
|
@iftex
|
|
@afourpaper
|
|
@end iftex
|
|
|
|
@c Get version and other info
|
|
@include include.texi
|
|
|
|
@ifclear tex-debug
|
|
@c This removes the black squares in the right margin
|
|
@finalout
|
|
@end ifclear
|
|
|
|
@c Set background for HTML
|
|
@set _body_tags BGCOLOR=#FFFFFF TEXT=#000000 LINK=#101090 VLINK=#7030B0
|
|
@settitle @strong{MySQL} Internals Manual for version @value{mysql_version}.
|
|
@setchapternewpage odd
|
|
@paragraphindent 0
|
|
|
|
@c %**end of header
|
|
|
|
@ifinfo
|
|
@format
|
|
START-INFO-DIR-ENTRY
|
|
* mysql-internals: (mysql-internals). @strong{MySQL} internals.
|
|
END-INFO-DIR-ENTRY
|
|
@end format
|
|
@end ifinfo
|
|
|
|
@titlepage
|
|
@sp 10
|
|
@center @titlefont{@strong{MySQL} Internals Manual}
|
|
@sp 10
|
|
@center Copyright @copyright{} 1998-2002 MySQL AB
|
|
@page
|
|
@end titlepage
|
|
|
|
@node Top, caching, (dir), (dir)
|
|
|
|
@ifinfo
|
|
This is a manual about @strong{MySQL} internals.
|
|
@end ifinfo
|
|
|
|
@menu
|
|
* caching:: How MySQL Handles Caching
|
|
* join_buffer_size::
|
|
* flush tables:: How MySQL Handles @code{FLUSH TABLES}
|
|
* filesort:: How MySQL Does Sorting (@code{filesort})
|
|
* selects:: How MySQL performs different selects
|
|
* transformations:: How MySQL transforms subqueries
|
|
* coding guidelines:: Coding Guidelines
|
|
* mysys functions:: Functions In The @code{mysys} Library
|
|
* DBUG:: DBUG Tags To Use
|
|
* protocol:: MySQL Client/Server Protocol
|
|
* Fulltext Search:: Fulltext Search in MySQL
|
|
* MyISAM Record Structure:: MyISAM Record Structure
|
|
* InnoDB Record Structure:: InnoDB Record Structure
|
|
* InnoDB Page Structure:: InnoDB Page Structure
|
|
* Files in MySQL Sources:: Annotated List Of Files in the MySQL Source Code Distribution
|
|
* Files in InnoDB Sources:: Annotated List Of Files in the InnoDB Source Code Distribution
|
|
@end menu
|
|
|
|
|
|
@node caching, join_buffer_size, Top, Top
|
|
@chapter How MySQL Handles Caching
|
|
|
|
@strong{MySQL} has the following caches:
|
|
(Note that the some of the filename have a wrong spelling of cache. :)
|
|
|
|
@table @strong
|
|
|
|
@item Key Cache
|
|
A shared cache for all B-tree index blocks in the different NISAM
|
|
files. Uses hashing and reverse linked lists for quick caching of the
|
|
last used blocks and quick flushing of changed entries for a specific
|
|
table. (@file{mysys/mf_keycash.c})
|
|
|
|
@item Record Cache
|
|
This is used for quick scanning of all records in a table.
|
|
(@file{mysys/mf_iocash.c} and @file{isam/_cash.c})
|
|
|
|
@item Table Cache
|
|
This holds the last used tables. (@file{sql/sql_base.cc})
|
|
|
|
@item Hostname Cache
|
|
For quick lookup (with reverse name resolving). Is a must when one has a
|
|
slow DNS.
|
|
(@file{sql/hostname.cc})
|
|
|
|
@item Privilege Cache
|
|
To allow quick change between databases the last used privileges are
|
|
cached for each user/database combination.
|
|
(@file{sql/sql_acl.cc})
|
|
|
|
@item Heap Table Cache
|
|
Many use of @code{GROUP BY} or @code{DISTINCT} caches all found rows in
|
|
a @code{HEAP} table. (This is a very quick in-memory table with hash index.)
|
|
|
|
@item Join buffer Cache
|
|
For every full join in a @code{SELECT} statement (a full join here means
|
|
there were no keys that one could use to find the next table in a list),
|
|
the found rows are cached in a join cache. One @code{SELECT} query can
|
|
use many join caches in the worst case.
|
|
@end table
|
|
|
|
@node join_buffer_size, flush tables, caching, Top
|
|
@chapter How MySQL uses the join_buffer cache
|
|
|
|
Basic information about @code{join_buffer_size}:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
It's only used in the case when join type is of type @code{ALL} or
|
|
@code{index}; In other words: no possible keys can be used.
|
|
@item
|
|
A join buffer is never allocated for the first not-const table,
|
|
even it it would be of type @code{ALL}/@code{index}.
|
|
@item
|
|
The buffer is allocated when we need to do a each full join between two
|
|
tables and freed after the query is done.
|
|
@item
|
|
Accepted row combinations of tables before the @code{ALL}/@code{index}
|
|
able is stored in the cache and is used to compare against each read
|
|
row in the @code{ALL} table.
|
|
@item
|
|
We only store the used fields in the join_buffer cache, not the
|
|
whole rows.
|
|
@end itemize
|
|
|
|
Assume you have the following join:
|
|
|
|
@example
|
|
Table name Type
|
|
t1 range
|
|
t2 ref
|
|
t3 @code{ALL}
|
|
@end example
|
|
|
|
The join is then done as follows:
|
|
|
|
@example
|
|
- While rows in t1 matching range
|
|
- Read through all rows in t2 according to reference key
|
|
- Store used fields form t1,t2 in cache
|
|
- If cache is full
|
|
- Read through all rows in t3
|
|
- Compare t3 row against all t1,t2 combination in cache
|
|
- If rows satisfying join condition, send it to client
|
|
- Empty cache
|
|
|
|
- Read through all rows in t3
|
|
- Compare t3 row against all stored t1,t2 combinations in cache
|
|
- If rows satisfying join condition, send it to client
|
|
@end example
|
|
|
|
The above means that table t3 is scanned
|
|
|
|
@example
|
|
(size-of-stored-row(t1,t2) * accepted-row-cominations(t1,t2))/
|
|
join_buffer_size+1
|
|
@end example
|
|
times.
|
|
|
|
Some conclusions:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
The larger the join_buff_size, the fewer scans of t3.
|
|
If @code{join_buff_size} is already large enough to hold all previous row
|
|
combinations then there is no speed to gain by making it bigger.
|
|
@item
|
|
If there is several tables of @code{ALL}/@code{index} then the we
|
|
allocate one @code{join_buffer_size buffer} for each of them and use the
|
|
same algorithm described above to handle it. (In other words, we store
|
|
the same row combination several times into different buffers)
|
|
@end itemize
|
|
|
|
@node flush tables, filesort, join_buffer_size, Top
|
|
@chapter How MySQL Handles @code{FLUSH TABLES}
|
|
|
|
@itemize @bullet
|
|
|
|
@item
|
|
Flush tables is handled in @file{sql/sql_base.cc::close_cached_tables()}.
|
|
|
|
@item
|
|
The idea of flush tables is to force all tables to be closed. This
|
|
is mainly to ensure that if someone adds a new table outside of
|
|
@strong{MySQL} (for example with @code{cp}) all threads will start using
|
|
the new table. This will also ensure that all table changes are flushed
|
|
to disk (but of course not as optimally as simple calling a sync on
|
|
all tables)!
|
|
|
|
@item
|
|
When one does a @code{FLUSH TABLES}, the variable @code{refresh_version}
|
|
will be incremented. Every time a thread releases a table it checks if
|
|
the refresh version of the table (updated at open) is the same as
|
|
the current @code{refresh_version}. If not it will close it and broadcast
|
|
a signal on @code{COND_refresh} (to wait any thread that is waiting for
|
|
all instanses of a table to be closed).
|
|
|
|
@item
|
|
The current @code{refresh_version} is also compared to the open
|
|
@code{refresh_version} after a thread gets a lock on a table. If the
|
|
refresh version is different the thread will free all locks, reopen the
|
|
table and try to get the locks again; This is just to quickly get all
|
|
tables to use the newest version. This is handled by
|
|
@file{sql/lock.cc::mysql_lock_tables()} and
|
|
@file{sql/sql_base.cc::wait_for_tables()}.
|
|
|
|
@item
|
|
When all tables has been closed @code{FLUSH TABLES} will return an ok
|
|
to client.
|
|
|
|
@item
|
|
If the thread that is doing @code{FLUSH TABLES} has a lock on some tables,
|
|
it will first close the locked tables, then wait until all other threads
|
|
have also closed them, and then reopen them and get the locks.
|
|
After this it will give other threads a chance to open the same tables.
|
|
|
|
@end itemize
|
|
|
|
@node filesort, selects, flush tables, Top
|
|
@chapter How MySQL Does Sorting (@code{filesort})
|
|
|
|
@itemize @bullet
|
|
|
|
@item
|
|
Read all rows according to key or by table scanning.
|
|
|
|
@item
|
|
Store the sort-key in a buffer (@code{sort_buffer}).
|
|
|
|
@item
|
|
When the buffer gets full, run a @code{qsort} on it and store the result
|
|
in a temporary file. Save a pointer to the sorted block.
|
|
|
|
@item
|
|
Repeat the above until all rows have been read.
|
|
|
|
@item
|
|
Repeat the following until there is less than @code{MERGEBUFF2} (15)
|
|
blocks left.
|
|
|
|
@item
|
|
Do a multi-merge of up to @code{MERGEBUFF} (7) regions to one block in
|
|
another temporary file. Repeat until all blocks from the first file
|
|
are in the second file.
|
|
|
|
@item
|
|
On the last multi-merge, only the pointer to the row (last part of
|
|
the sort-key) is written to a result file.
|
|
|
|
@item
|
|
Now the code in @file{sql/records.cc} will be used to read through them
|
|
in sorted order by using the row pointers in the result file.
|
|
To optimize this, we read in a big block of row pointers, sort these
|
|
and then we read the rows in the sorted order into a row buffer
|
|
(@code{record_buffer}).
|
|
|
|
@end itemize
|
|
|
|
@node selects, transformations, flush tables, Top
|
|
@chapter How MySQL performs different selects
|
|
|
|
@node select steps,,,
|
|
@section Steps of select executing
|
|
|
|
Every select performed in such base steps:
|
|
@itemize
|
|
@item
|
|
JOIN::prepare
|
|
@itemize @bullet
|
|
@item
|
|
initialization and linking JOIN structure to st_select_lex
|
|
@item
|
|
fix_fields() for all items (after fix_fields we know everything
|
|
about item)
|
|
@item
|
|
moving HAVING to WHERE if possible
|
|
@item
|
|
initialization procedure if exists
|
|
@end itemize
|
|
@item
|
|
JOIN::optimize
|
|
@itemize @bullet
|
|
@item
|
|
single select optimization
|
|
@item
|
|
creation first temporary table if need
|
|
@end itemize
|
|
@item
|
|
JOIN::exec
|
|
@itemize @bullet
|
|
@item
|
|
performing select (may be created second temporary table)
|
|
@end itemize
|
|
@item
|
|
JOIN::cleanup
|
|
@itemize @bullet
|
|
@item
|
|
removing all temporary tables, other cleanup
|
|
@end itemize
|
|
@item
|
|
JOIN::reinit
|
|
@itemize @bullet
|
|
@item
|
|
prepare all structures to SELECT executing (with JOIN::exec)
|
|
@end itemize
|
|
@end itemize
|
|
|
|
@node select select_result
|
|
@section select_result CLASS
|
|
|
|
Very important role in SELECT performing have select_result class and
|
|
classes inherited from it (usually called with "select_" prefix). This
|
|
class provide interface for results transmitting.
|
|
|
|
Key methods in this class are following:
|
|
@itemize @bullet
|
|
@item
|
|
@strong{send_fields} sends giving item list headers (type, name, etc..)
|
|
@item
|
|
@strong{send_data} sends giving item list values as row of table of result
|
|
@item
|
|
@strong{send_error} send error to used used mainly for error interception,
|
|
making some operation and then ::send_error will be called.
|
|
@end itemize
|
|
|
|
For example there are fillowing select_result classes:
|
|
@itemize
|
|
@item
|
|
@strong{select_send} used for sending results though network layer
|
|
@item
|
|
@strong{select_export} used for exporting data to file
|
|
@item
|
|
@strong{multi_delete} used for multi-delete
|
|
@item
|
|
@strong{select_insert} used for INSERT ... SELECT ...
|
|
@item
|
|
@strong{multi_update} used for multi-update
|
|
@end itemize
|
|
|
|
@node select simple
|
|
@section SIMPLE or PRIMARY SELECT.
|
|
|
|
For performing single primary select SELECT used function mysql_select,
|
|
which:
|
|
@itemize @bullet
|
|
@item
|
|
allocate JOIN;
|
|
@item
|
|
JOIN::prepare;
|
|
@item
|
|
JOIN::optimize;
|
|
@item
|
|
JOIN::exec;
|
|
@item
|
|
JOIN::cleanup.
|
|
@end itemize
|
|
In previous versions of mysql all SELECTs was performed with help of this
|
|
function and mysql_select() was not divided on parts.
|
|
|
|
@node select structure
|
|
@section Structure Of Complex Select
|
|
|
|
There 2 structures which describe SELECTS:
|
|
@itemize @bullet
|
|
@item
|
|
st_select_lex (SELECT_LEX) it represent SELECT itself
|
|
@item
|
|
st_select_lex_unit (SELECT_LEX_UNIT) group several selects in bunch
|
|
@end itemize
|
|
and represent UNION operation (absence of UNION is union
|
|
with 1 SELECT and this structure present in any case). In future this
|
|
structure will be used for EXCEPT and INTERSECT.
|
|
|
|
For example:
|
|
@example
|
|
(SELECT ... )UNION(SELECT ... (SELECT...)...(SELECT...UNION...SELECT))
|
|
1 2 3 4 5 6 7
|
|
@end example
|
|
|
|
will be represent as
|
|
@example
|
|
------------------------------------------------------------------------
|
|
level 1
|
|
SELECT_LEX_UNIT(2)
|
|
|
|
|
+---------------+
|
|
| |
|
|
SELECT_LEX(1) SELECT_LEX(3)
|
|
|
|
|
--------------- | ------------------------------------------------------
|
|
| level 2
|
|
+-------------------+
|
|
| |
|
|
SELECT_LEX_UNIT(4) SELECT_LEX_UNIT(6)
|
|
| |
|
|
| +--------------+
|
|
| | |
|
|
SELECT_LEX(4) SELECT_LEX(5) SELECT_LEX(7)
|
|
|
|
------------------------------------------------------------------------
|
|
@end example
|
|
Note: single subselect 4 have it's own SELECT_LEX_UNIT.
|
|
|
|
Most upper SELECT_LEX_UNIT (#2 in example) stored in LEX.
|
|
First and most upper SELECT_LEX (#1 in example) stored in LEX, too.
|
|
This two structures always exist.
|
|
|
|
In time of creating or performing any JOIN::* operation
|
|
LEX::current_select point on appropriate SELECT_LEX.
|
|
|
|
Only during parsing global (for whole UNION) ORDER_BY & LIMIT clauses
|
|
LEX::current_select points to SELECT_LEX_UNIT of this unit to store this
|
|
parameter in this SELECT_LEX_UNIT (SELECT_LEX and SELECT_LEX_UNIT are
|
|
inherited from st_select_lex_node).
|
|
|
|
@node select union
|
|
@section Non-Subselect UNIONs Executing
|
|
|
|
Non subselect unions performed with help of mysql_union(). for now it
|
|
divided on following steps:
|
|
@itemize
|
|
@item
|
|
st_select_lex_unit::prepare
|
|
@itemize @bullet
|
|
@item
|
|
create temporary table for union results storing (if UNION witout
|
|
ALL option, 'distinct' parameter will be passed to table creation
|
|
procedure). Types/lengths of table's fields will be determinated
|
|
by first SELECT item list.
|
|
@item
|
|
create select_union (inherited from select_result) which will
|
|
write selects results in this temporary table
|
|
@item
|
|
allocate JOIN and perform JOIN::prepare for all SELECTs belonged
|
|
to UNION
|
|
@end itemize
|
|
@item
|
|
st_select_lex_unit::exec
|
|
@itemize @bullet
|
|
@item
|
|
delete rows from temporary table if it is not first call
|
|
@item
|
|
if first call call JOIN::optimize else JOIN::reinit and then
|
|
JOIN::exec for all SELECTs (select_union will write result for
|
|
temporary table). If union is cacheable and this method called
|
|
second, (third, ...) time it will do nothing.
|
|
@item
|
|
call mysql_select on temporary table with global ORDER BY and
|
|
LIMIT parameters after collecting results from all SELECTs.
|
|
@end itemize
|
|
@end itemize
|
|
As far as mysql_select need SELECT_LEX structure SELECT_LEX of first
|
|
SELECT of this UNION will be passed to it, but also fake_select_lex
|
|
parameter will be passed to mysql_select() too, to prevent linking
|
|
this SELECT_LEX with JOIN on this mysql_select() session.
|
|
|
|
PROBLEM: this fake select need workaround in many places.
|
|
|
|
@node select derived
|
|
@section Derived Tables Executing
|
|
|
|
Derived tables processing is first operation on any query. It performed
|
|
before creation list of tables of whole query and opening/locking this
|
|
tables.
|
|
|
|
If lex->derived_tables flag present will be scanned all SELECT_LEX (there
|
|
are list of all SELECT_LEX in reverse order (first SELECT in query will
|
|
be last in this list) lex->all_selects_list).
|
|
|
|
Pointer on derived table SELECT_LEX_UNIT stored in TABLE_LIST structure
|
|
(TABLE_LIST::derived). And for any table which have this pointer will
|
|
be called mysql_derived().
|
|
|
|
mysql_derived():
|
|
@itemize @bullet
|
|
@item
|
|
Creates list of all tables used in this query, opens and locks it
|
|
@item
|
|
Creates temporary table for storing results
|
|
@item
|
|
Creates union_result for writing result in this table
|
|
@item
|
|
Calls mysql_select or mysql_union for execute query
|
|
@item
|
|
Removes all derived table subtree from SELECTs tree (if it is
|
|
not EXPLAIN)
|
|
@item
|
|
Stores pointer to this temporary table in TABLE_LIST structure, then
|
|
this table will be used by outer query. This table table will not be
|
|
skipped in checking grants, because tables from which this table was
|
|
received was checked in mysql_derived.
|
|
@item
|
|
Links this temporary table in thd->derived_tables for removing after
|
|
query executing. this table will be closed in close_thread_tables if
|
|
second parameter of it (bool skip_derived) will be true.
|
|
@end itemize
|
|
|
|
@node select subselect
|
|
@section Subselects
|
|
|
|
In expression subselect represented by Item inherited from Item_subselect.
|
|
|
|
To hide difference in performing single SELECTs and UNIONs
|
|
Item_subselect use two different engines, which provide uniformed
|
|
interface for access to underplaid SELECT or UNION
|
|
(subselect_single_select_engine and subselect_union_engine, both are
|
|
inherited from subselect_engine).
|
|
|
|
Engine will be created in time of Item_select constructing
|
|
(Item_subselect::init method).
|
|
|
|
On Item_subselect::fix_fields() will be called engine->prepare().
|
|
|
|
Before calling any value getting method (val, val_int, val_str,
|
|
bring_value (in case of row result)) will be called engine->exec(),
|
|
which execute query or just do nothing if subselect is cacheable and
|
|
already executed.
|
|
|
|
Items inherited from provide it's own select_result classes. There are
|
|
2 type of it:
|
|
@itemize @bullet
|
|
@item
|
|
select_singlerow_subselect it store values of giving row in
|
|
Item_singlerow_subselect cache on send_data() call and report error
|
|
if Item_subselect have 'assigned' attribute.
|
|
@item
|
|
select_exists_subselect just store 1 as value of
|
|
Item_exists_subselect on send_data() call. As far as
|
|
Item_in_subselect and Item_allany_subselect inherited from
|
|
Item_exists_subselect, they use same select_result class.
|
|
@end itemize
|
|
|
|
Item_select will never call cleanup() procedure for JOIN. Every
|
|
JOIN::cleanup will call cleanup() for inner JOINs. Most upper
|
|
JOIN::cleanup will be called by mysql_select() or mysql_union().
|
|
|
|
@node select select engine
|
|
@section Single Select Engine
|
|
|
|
subselect_single_select_engine:
|
|
@itemize @bullet
|
|
@item
|
|
@strong{constructor} allocate JOIN and store pointers on SELECT_LEX and JOIN
|
|
@item
|
|
@strong{prepare()} call JOIN::prepare
|
|
@item
|
|
@strong{fix_length_and_dec()} prepare cache and receive type and
|
|
parameters of returning items (it called only by
|
|
Item_singlerow_subselect)
|
|
@item
|
|
@strong{exec()} drop 'assigned flag of Item_subselect. If called first time
|
|
JOIN::optimize and JOINexec(), else do nothing or JOIN::reinit()
|
|
JOIN::exec() depending of type of subquery.
|
|
@end itemize
|
|
|
|
@node select union engine
|
|
@section Union Engine
|
|
|
|
subselect_union_engine:
|
|
@itemize @bullet
|
|
@item
|
|
@strong{constructor} just store pointer to st_select_lex_union
|
|
(SELECT_LEX_UNION)
|
|
@item
|
|
@strong{prepare()} call st_select_lex_unit::prepare
|
|
@item
|
|
@strong{fix_length_and_dec()} prepare cache and receive type and
|
|
parameters (maximum of length) of returning items (it called
|
|
only by Item_singlerow_subselect)
|
|
@item
|
|
@strong{exec()} call st_select_lex_unit::exec(). st_select_lex_unit::exec()
|
|
can drop 'assigned' flag of Item_subselect if
|
|
st_select_lex_unit::item is not 0.
|
|
@end itemize
|
|
|
|
@node selectexplain
|
|
@section Explain Execution
|
|
|
|
For EXPLAIN result showing for every SELECT will be called mysql_select
|
|
with option SELECT_DESCRIBE.
|
|
|
|
For main UNION will be called mysql_explain_union.
|
|
|
|
mysql_explain_union call mysql_explain_select for every SELECT in given
|
|
union.
|
|
|
|
mysql_explain_select call mysql_select with SELECT_DESCRIBE.
|
|
|
|
mysql_select create JOIN for select (if it not exists, because if it
|
|
called for subselect JOIN can be created in JOIN::optimize of outer
|
|
query when it decided to calculate value of subselect). Then it call
|
|
JOIN::prepare, JOIN::optimize, JOIN exec and JOIN::cleanup as usual.
|
|
|
|
JOIN::exec called for SELECT with SELECT_DESCRIBE option call
|
|
select_describe.
|
|
|
|
select_describe return to user description of SELECT and call
|
|
mysql_explain_union for every inner UNION
|
|
|
|
PROBLEM: how it will work with global query optimization?
|
|
|
|
@node transformations, coding guidelines, selects, Top
|
|
@chapter How MySQL transforms subqueries
|
|
|
|
Item_subselect virtual method select_transformer is used to rewrite
|
|
subqueries. It is called from Item_subselect::init (which called in
|
|
Item_subselect constructor)
|
|
|
|
@node transformation IN
|
|
@section Item_in_subselect::select_transformer
|
|
|
|
Item_in_subselect::select_transformer is divided on two parts for
|
|
scalar left part and row left part:
|
|
|
|
@node transformation scalar IN
|
|
@subsection Scalar IN Subselect
|
|
|
|
To rewrite scalar IN subselect used method
|
|
Item_in_subselect::single_value_transformer, Scalar IN subselect will
|
|
be replaced with Item_in_optimizer.
|
|
|
|
Item_in_optimizer item is special boolean function. On value request
|
|
(one of val, val_int or val_str methods) it evaluate left expression of
|
|
IN by storing it value in cache item (one of Item_cache* items), then it
|
|
test cache is it NULL. If left expression (cache) is NULL then
|
|
Item_in_optimizer return NULL, else it evaluate Item_in_subselect.
|
|
|
|
Example queries.
|
|
@example
|
|
a) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2);
|
|
b) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2 GROUP BY t2.a);
|
|
@end example
|
|
|
|
@itemize
|
|
@item
|
|
Item_in_subselect inherit mechanism of getting value from
|
|
Item_exists_subselect.
|
|
@item
|
|
Select_transformer stores reference to left expression in its
|
|
conditions: (in WHERE in case 'a' and in a HAVING in case 'b')
|
|
@item
|
|
Item from item list of this select (t2.a) can be referred with special
|
|
reference (Item_ref_null_helper or Item_asterisk_remover).
|
|
This reference informs Item_in_optimizer if item (t2.a) is NULL by
|
|
setting the 'was_null' flag.
|
|
@item
|
|
The return value from Item_in_subselect will be evaluated as following:
|
|
@itemize @bullet
|
|
@item
|
|
If TRUE return true
|
|
@item
|
|
If NULL return null
|
|
@item
|
|
If FALSE and 'was_null' is set, return null
|
|
@item
|
|
return FALSE
|
|
@end itemize
|
|
@end itemize
|
|
|
|
<left_expression> IN (SELECT <item> ...) will be represented like
|
|
following:
|
|
@example
|
|
+-----------------+
|
|
|Item_in_optimizer|
|
|
+-----------------+
|
|
|
|
|
+---------------------+------------+
|
|
| |
|
|
+-----------------------+ +-----------------+
|
|
| <left_expression> | |Item_in_subselect|
|
|
| | +-----------------+
|
|
+-----------------------+ |
|
|
|<left_expression cache>| +-----------+-----------+
|
|
| | | |
|
|
+-----------------------+ | |
|
|
^ +----------+ +--------------------+
|
|
+<<<<<<<<<<<<<<<<<| Item_ref | +<<<|Item_ref_null_helper|
|
|
+----------+ V +--------------------+
|
|
V +--------------------+
|
|
+>>>| <item> |
|
|
+--------------------+
|
|
@end example
|
|
where '<<<<<<<<<' is reference in meaning of Item_ref.
|
|
|
|
Item_ref used for point to <left_expression cache>, because in time of
|
|
transformation we know only address of variable where pointer on cache
|
|
will be stored.
|
|
|
|
If select have ORDER BY clause it will be wiped out, because no sense in
|
|
ORDER BY without LIMIT here.
|
|
|
|
If IN subselect union condition of every select in UNION will be changed
|
|
personally.
|
|
|
|
Following is examples of IN transformations:
|
|
@example
|
|
a) <left_expression> IN (SELECT <item> FROM t
|
|
WHERE <where_exp>)
|
|
|
|
will be represented as
|
|
|
|
(SELECT 1 FROM t
|
|
WHERE <where_exp> and
|
|
Item_ref(<cached_left_expression>)=<Item_asterisk_remover(<Item>)>)
|
|
|
|
|
|
b) <left_expression> IN (SELECT <item> FROM t
|
|
HAVING <having_expr>
|
|
ORDER BY 1)
|
|
|
|
will be represented as
|
|
|
|
(SELECT <item> as ref_null_helper FROM t
|
|
HAVING <having_exp> AND
|
|
Item_ref(<cached_left_expression>) = ref_null_helper)
|
|
|
|
|
|
c) <left_expression> IN (SELECT <item> UNION ...)
|
|
|
|
will be represented as
|
|
|
|
(SELECT 1
|
|
HAVING Item_ref(<cached_left_expression>)=
|
|
<Item_asterisk_remover(<Item>)>
|
|
UNION ...)
|
|
|
|
(having without FROM is syntax error, but having condition is checked
|
|
even for subselect without FROM)
|
|
|
|
d) <left_expression> IN (select <item>)
|
|
|
|
will be completely replaced with <left_expression> = <item>
|
|
@end example
|
|
|
|
Now conditions (WHERE (a) or HAVING (b)) will be changed depends of
|
|
select in following way:
|
|
|
|
If subselect have HAVING , sum function or GROUP BY (case a) then item
|
|
list will be unchanged and Item_ref_null_helper reference will be
|
|
created on item list element. Condition will be added to HAVING condition.
|
|
|
|
If subselect have not HAVING, sum function or GROUP BY (case b) then:
|
|
@itemize @bullet
|
|
@item
|
|
@strong{item list} will be replaced with 1.
|
|
@item
|
|
@strong{<item>} from item list will be stored in Item_asterisk_remover, which
|
|
inherit from Item_ref_null_helper, but store item on which refer by
|
|
itself, and also it can resolve '*' item.
|
|
@item
|
|
@strong{<left_expression cache> = <Item_ref_null_helper>} will be added to
|
|
WHERE clause this item or to HAVING clause if this subselect have
|
|
no FROM clause and subselect is union (case c).
|
|
@end itemize
|
|
|
|
Single select without FROM will be reduced to just
|
|
<left_expression> = <item> without using Item_in_optimizer.
|
|
|
|
@node transformations row IN
|
|
@subsection Row IN Subselect
|
|
|
|
To rewrite row IN subselect used method
|
|
Item_in_subselect::row_value_transformer. It work in almost same way as
|
|
scalar analog, but work with Item_cache_row for caching left expression
|
|
and use references on elements of Item_cache_row.
|
|
To refer on item list it use Item_ref_on_list_position.
|
|
|
|
Item_ref_on_list_position::fix_fields will find item in item list of
|
|
subselect by number and create Item_ref_null_helper to refer on it. It
|
|
used to find reference when all '*' items will be translated in item
|
|
list. Subselect with have HAVING, sum functions or GROUP BY will
|
|
transformed in following way:
|
|
|
|
@example
|
|
ROW(l1, l2, ... lN) IN (SELECT i1, i2, ... iM FROM t HAVING <having_expr>)
|
|
|
|
will be following:
|
|
|
|
(SELECT i1, i2, ... iM FROM t
|
|
HAVING <having_expr> and
|
|
<cache_l1> = <ref_on_list_position(1)> AND
|
|
<cache_l2> = <ref_on_list_position(2)> AND
|
|
...
|
|
<cache_lN> = <ref_on_list_position(N)>)
|
|
@end example
|
|
|
|
In this way will be transformed select without FROM, too.
|
|
|
|
For other subselect it will be same but for WHERE clause.
|
|
|
|
@node transformations all any
|
|
@section Item_allany_subselect
|
|
|
|
Item_allany_subselect is inherited from Item_in_subselect.
|
|
ALL/ANY/SOME use same algorithm (and same method of Item_in_subselect)
|
|
as scalar IN, but use different function instead of '='.
|
|
|
|
ANY/SOME use same function that was listed after left expression.
|
|
|
|
ALL use inverted function, and all subselect passed as argument to
|
|
Item_func_not.
|
|
|
|
@node transformations singlerow
|
|
@section Item_singlerow_subselect
|
|
|
|
Item_singlerow_subselect will be rewritten only if it have not FROM
|
|
clause, it is not part of UNION and it is scalar subselect. For now will
|
|
not be converted subselects with field or reference on top of item list
|
|
(we can't change name of such items from one hand, but from other hand
|
|
we should assign to it name of whole subselect which will be reduced);
|
|
|
|
Following will not be reduced:
|
|
@example
|
|
SELECT a;
|
|
SELECT 1 UNION SELECT 2;
|
|
SELECT 1 FROM t1;
|
|
@end example
|
|
|
|
Following select will be reduced:
|
|
@example
|
|
SELECT 1;
|
|
SELECT a+2;
|
|
@end example
|
|
|
|
Such subselect will be completely replaced by its expression from item
|
|
list and its SELECT_LEX and SELECT_LEX_UNIT will be removed from
|
|
SELECT_LEX's tree.
|
|
|
|
But all Item_fields and Item_ref of that expression will be marked for
|
|
special fix_fields() procedure. fix_fields() for such Item will be
|
|
performed is same way as for items of inner subselect. Also if this
|
|
expression is Item_fields or Item_ref then name of this new item will
|
|
be same as name of this item (but not '(SELECT ...)'). It is done to
|
|
prevent broke references on such items from more inner subselects.
|
|
|
|
@node coding guidelines, mysys functions, transformations, Top
|
|
@chapter Coding Guidelines
|
|
|
|
@itemize @bullet
|
|
|
|
@item
|
|
We are using @uref{http://www.bitkeeper.com/, BitKeeper} for source management.
|
|
|
|
@item
|
|
You should use the @strong{MySQL} 4.0 source for all developments.
|
|
|
|
@item
|
|
If you have any questions about the @strong{MySQL} source, you can post these
|
|
to @email{dev-public@@mysql.com} and we will answer them. Please
|
|
remember to not use this internal email list in public!
|
|
|
|
@item
|
|
Try to write code in a lot of black boxes that can be reused or use at
|
|
least a clean, easy to change interface.
|
|
|
|
@item
|
|
Reuse code; There is already a lot of algorithms in MySQL for list handling,
|
|
queues, dynamic and hashed arrays, sorting, etc. that can be reused.
|
|
|
|
@item
|
|
Use the @code{my_*} functions like @code{my_read()}/@code{my_write()}/
|
|
@code{my_malloc()} that you can find in the @code{mysys} library instead
|
|
of the direct system calls; This will make your code easier to debug and
|
|
more portable.
|
|
|
|
@item
|
|
Try to always write optimized code, so that you don't have to
|
|
go back and rewrite it a couple of months later. It's better to
|
|
spend 3 times as much time designing and writing an optimal function than
|
|
having to do it all over again later on.
|
|
|
|
@item
|
|
Avoid CPU wasteful code, even where it does not matter, so that
|
|
you will not develop sloppy coding habits.
|
|
|
|
@item
|
|
If you can write it in fewer lines, do it (as long as the code will not
|
|
be slower or much harder to read).
|
|
|
|
@item
|
|
Don't use two commands on the same line.
|
|
|
|
@item
|
|
Do not check the same pointer for @code{NULL} more than once.
|
|
|
|
@item
|
|
Use long function and variable names in English. This makes your code
|
|
easier to read.
|
|
|
|
@item
|
|
Use @code{my_var} as opposed to @code{myVar} or @code{MyVar} (@samp{_}
|
|
rather than dancing SHIFT to seperate words in identifiers).
|
|
|
|
@item
|
|
Think assembly - make it easier for the compiler to optimize your code.
|
|
|
|
@item
|
|
Comment your code when you do something that someone else may think
|
|
is not ``trivial''.
|
|
|
|
@item
|
|
Use @code{libstring} functions (in the @file{strings} directory)
|
|
instead of standard @code{libc} string functions whenever possible.
|
|
|
|
@item
|
|
Avoid using @code{malloc()} (its REAL slow); For memory allocations
|
|
that only need to live for the lifetime of one thread, one should use
|
|
@code{sql_alloc()} instead.
|
|
|
|
@item
|
|
Before making big design decisions, please first post a summary of
|
|
what you want to do, why you want to do it, and how you plan to do
|
|
it. This way we can easily provide you with feedback and also
|
|
easily discuss it thoroughly if some other developer thinks there is better
|
|
way to do the same thing!
|
|
|
|
@item
|
|
Class names start with a capital letter.
|
|
|
|
@item
|
|
Structure types are @code{typedef}'ed to an all-caps identifier.
|
|
|
|
@item
|
|
Any @code{#define}'s are in all-caps.
|
|
|
|
@item
|
|
Matching @samp{@{} are in the same column.
|
|
|
|
@item
|
|
Put the @samp{@{} after a @code{switch} on the same line, as this gives
|
|
better overall indentation for the switch statement:
|
|
|
|
@example
|
|
switch (arg) @{
|
|
@end example
|
|
|
|
@item
|
|
In all other cases, @samp{@{} and @samp{@}} should be on their own line, except
|
|
if there is nothing inside @samp{@{} and @samp{@}}.
|
|
|
|
@item
|
|
Have a space after @code{if}
|
|
|
|
@item
|
|
Put a space after @samp{,} for function arguments
|
|
|
|
@item
|
|
Functions return @samp{0} on success, and non-zero on error, so you can do:
|
|
|
|
@example
|
|
if(a() || b() || c()) @{ error("something went wrong"); @}
|
|
@end example
|
|
|
|
@item
|
|
Using @code{goto} is okay if not abused.
|
|
|
|
@item
|
|
Avoid default variable initalizations, use @code{LINT_INIT()} if the
|
|
compiler complains after making sure that there is really no way
|
|
the variable can be used uninitialized.
|
|
|
|
@item
|
|
Do not instantiate a class if you do not have to.
|
|
|
|
@item
|
|
Use pointers rather than array indexing when operating on strings.
|
|
|
|
@end itemize
|
|
|
|
Suggested mode in emacs:
|
|
|
|
@example
|
|
(load "cc-mode")
|
|
(setq c-mode-common-hook '(lambda ()
|
|
(turn-on-font-lock)
|
|
(setq comment-column 48)))
|
|
(setq c-style-alist
|
|
(cons
|
|
'("MY"
|
|
(c-basic-offset . 2)
|
|
(c-comment-only-line-offset . 0)
|
|
(c-offsets-alist . ((statement-block-intro . +)
|
|
(knr-argdecl-intro . 0)
|
|
(substatement-open . 0)
|
|
(label . -)
|
|
(statement-cont . +)
|
|
(arglist-intro . c-lineup-arglist-intro-after-paren)
|
|
(arglist-close . c-lineup-arglist)
|
|
))
|
|
)
|
|
c-style-alist))
|
|
(c-set-style "MY")
|
|
(setq c-default-style "MY")
|
|
@end example
|
|
|
|
|
|
@node mysys functions, DBUG, coding guidelines, Top
|
|
@chapter Functions In The @code{mysys} Library
|
|
|
|
Functions in @code{mysys}: (For flags see @file{my_sys.h})
|
|
|
|
@table @code
|
|
@item int my_copy _A((const char *from, const char *to, myf MyFlags));
|
|
Copy file from @code{from} to @code{to}.
|
|
|
|
@item int my_delete _A((const char *name, myf MyFlags));
|
|
Delete file @code{name}.
|
|
|
|
@item int my_getwd _A((string buf, uint size, myf MyFlags));
|
|
@item int my_setwd _A((const char *dir, myf MyFlags));
|
|
Get and set working directory.
|
|
|
|
@item string my_tempnam _A((const char *pfx, myf MyFlags));
|
|
Make a unique temporary file name by using dir and adding something after
|
|
@code{pfx} to make name unique. The file name is made by adding a unique
|
|
six character string and @code{TMP_EXT} after @code{pfx}.
|
|
Returns pointer to @code{malloc()}'ed area for filename. Should be freed by
|
|
@code{free()}.
|
|
|
|
@item File my_open _A((const char *FileName,int Flags,myf MyFlags));
|
|
@item File my_create _A((const char *FileName, int CreateFlags, int AccsesFlags, myf MyFlags));
|
|
@item int my_close _A((File Filedes, myf MyFlags));
|
|
@item uint my_read _A((File Filedes, byte *Buffer, uint Count, myf MyFlags));
|
|
@item uint my_write _A((File Filedes, const byte *Buffer, uint Count, myf MyFlags));
|
|
@item ulong my_seek _A((File fd,ulong pos,int whence,myf MyFlags));
|
|
@item ulong my_tell _A((File fd,myf MyFlags));
|
|
Use instead of open, open-with-create-flag, close, read, and write
|
|
to get automatic error messages (flag @code{MYF_WME}) and only have
|
|
to test for != 0 if error (flag @code{MY_NABP}).
|
|
|
|
@item int my_rename _A((const char *from, const char *to, myf MyFlags));
|
|
Rename file from @code{from} to @code{to}.
|
|
|
|
@item FILE *my_fopen _A((const char *FileName,int Flags,myf MyFlags));
|
|
@item FILE *my_fdopen _A((File Filedes,int Flags,myf MyFlags));
|
|
@item int my_fclose _A((FILE *fd,myf MyFlags));
|
|
@item uint my_fread _A((FILE *stream,byte *Buffer,uint Count,myf MyFlags));
|
|
@item uint my_fwrite _A((FILE *stream,const byte *Buffer,uint Count, myf MyFlags));
|
|
@item ulong my_fseek _A((FILE *stream,ulong pos,int whence,myf MyFlags));
|
|
@item ulong my_ftell _A((FILE *stream,myf MyFlags));
|
|
Same read-interface for streams as for files.
|
|
|
|
@item gptr _mymalloc _A((uint uSize,const char *sFile,uint uLine, myf MyFlag));
|
|
@item gptr _myrealloc _A((string pPtr,uint uSize,const char *sFile,uint uLine, myf MyFlag));
|
|
@item void _myfree _A((gptr pPtr,const char *sFile,uint uLine));
|
|
@item int _sanity _A((const char *sFile,unsigned int uLine));
|
|
@item gptr _myget_copy_of_memory _A((const byte *from,uint length,const char *sFile, uint uLine,myf MyFlag));
|
|
@code{malloc(size,myflag)} is mapped to these functions if not compiled
|
|
with @code{-DSAFEMALLOC}.
|
|
|
|
@item void TERMINATE _A((void));
|
|
Writes @code{malloc()} info on @code{stdout} if compiled with
|
|
@code{-DSAFEMALLOC}.
|
|
|
|
@item int my_chsize _A((File fd, ulong newlength, myf MyFlags));
|
|
Change size of file @code{fd} to @code{newlength}.
|
|
|
|
@item void my_error _D((int nr, myf MyFlags, ...));
|
|
Writes message using error number (see @file{mysys/errors.h}) on @code{stdout},
|
|
or using curses, if @code{MYSYS_PROGRAM_USES_CURSES()} has been called.
|
|
|
|
@item void my_message _A((const char *str, myf MyFlags));
|
|
Writes @code{str} on @code{stdout}, or using curses, if
|
|
@code{MYSYS_PROGRAM_USES_CURSES()} has been called.
|
|
|
|
@item void my_init _A((void ));
|
|
Start each program (in @code{main()}) with this.
|
|
|
|
@item void my_end _A((int infoflag));
|
|
Gives info about program.
|
|
If @code{infoflag & MY_CHECK_ERROR}, prints if some files are left open.
|
|
If @code{infoflag & MY_GIVE_INFO}, prints timing info and malloc info
|
|
about program.
|
|
|
|
@item int my_redel _A((const char *from, const char *to, int MyFlags));
|
|
Delete @code{from} before rename of @code{to} to @code{from}. Copies state
|
|
from old file to new file. If @code{MY_COPY_TIME} is set, sets old time.
|
|
|
|
@item int my_copystat _A((const char *from, const char *to, int MyFlags));
|
|
Copy state from old file to new file. If @code{MY_COPY_TIME} is set,
|
|
sets old time.
|
|
|
|
@item string my_filename _A((File fd));
|
|
Returns filename of open file.
|
|
|
|
@item int dirname _A((string to, const char *name));
|
|
Copy name of directory from filename.
|
|
|
|
@item int test_if_hard_path _A((const char *dir_name));
|
|
Test if @code{dir_name} is a hard path (starts from root).
|
|
|
|
@item void convert_dirname _A((string name));
|
|
Convert dirname according to system.
|
|
In MSDOS, changes all characters to capitals and changes @samp{/} to @samp{\}.
|
|
|
|
@item string fn_ext _A((const char *name));
|
|
Returns pointer to extension in filename.
|
|
|
|
@item string fn_format _A((string to,const char *name,const char *dsk,const char *form,int flag));
|
|
format a filename with replace of library and extension and
|
|
converts between different systems.
|
|
params to and name may be identicall
|
|
function dosn't change name if name != to
|
|
Flag may be: 1 force replace filnames library with 'dsk'
|
|
2 force replace extension with 'form' */
|
|
4 force Unpack filename (replace ~ with home)
|
|
8 Pack filename as short as possibly for output to
|
|
user.
|
|
All open requests should allways use at least:
|
|
"open(fn_format(temp_buffe,name,"","",4),...)" to unpack home and
|
|
convert filename to system-form.
|
|
|
|
@item string fn_same _A((string toname, const char *name, int flag));
|
|
Copys directory and extension from @code{name} to @code{toname} if neaded.
|
|
Copying can be forced by same flags used in @code{fn_format()}.
|
|
|
|
@item int wild_compare _A((const char *str, const char *wildstr));
|
|
Compare if @code{str} matches @code{wildstr}. @code{wildstr} can contain
|
|
@samp{*} and @samp{?} as wildcard characters.
|
|
Returns 0 if @code{str} and @code{wildstr} match.
|
|
|
|
@item void get_date _A((string to, int timeflag));
|
|
Get current date in a form ready for printing.
|
|
|
|
@item void soundex _A((string out_pntr, string in_pntr))
|
|
Makes @code{in_pntr} to a 5 char long string. All words that sound
|
|
alike have the same string.
|
|
|
|
@item int init_key_cache _A((ulong use_mem, ulong leave_this_much_mem));
|
|
Use caching of keys in MISAM, PISAM, and ISAM.
|
|
@code{KEY_CACHE_SIZE} is a good size.
|
|
Remember to lock databases for optimal caching.
|
|
|
|
@item void end_key_cache _A((void));
|
|
End key caching.
|
|
@end table
|
|
|
|
|
|
|
|
@node DBUG, protocol, mysys functions, Top
|
|
@chapter DBUG Tags To Use
|
|
|
|
Here is some of the tags we now use:
|
|
(We should probably add a couple of new ones)
|
|
|
|
@table @code
|
|
@item enter
|
|
Arguments to the function.
|
|
|
|
@item exit
|
|
Results from the function.
|
|
|
|
@item info
|
|
Something that may be interesting.
|
|
|
|
@item warning
|
|
When something doesn't go the usual route or may be wrong.
|
|
|
|
@item error
|
|
When something went wrong.
|
|
|
|
@item loop
|
|
Write in a loop, that is probably only useful when debugging
|
|
the loop. These should normally be deleted when one is
|
|
satisfied with the code and it has been in real use for a while.
|
|
@end table
|
|
|
|
Some specific to mysqld, because we want to watch these carefully:
|
|
|
|
@table @code
|
|
@item trans
|
|
Starting/stopping transactions.
|
|
|
|
@item quit
|
|
@code{info} when mysqld is preparing to die.
|
|
|
|
@item query
|
|
Print query.
|
|
@end table
|
|
|
|
|
|
@node protocol, Fulltext Search, DBUG, Top
|
|
@chapter MySQL Client/Server Protocol
|
|
|
|
@menu
|
|
* raw packet without compression::
|
|
* raw packet with compression::
|
|
* basic packets::
|
|
* communication::
|
|
* fieldtype codes::
|
|
* protocol functions::
|
|
* protocol version 2::
|
|
@end menu
|
|
|
|
@node raw packet without compression, raw packet with compression, protocol, protocol
|
|
@section Raw Packet Without Compression
|
|
|
|
@example
|
|
+-----------------------------------------------+
|
|
| Packet Length | Packet no | Data |
|
|
| 3 Bytes | 1 Byte | n Bytes |
|
|
+-----------------------------------------------+
|
|
@end example
|
|
|
|
@table @asis
|
|
@item 3 Byte packet length
|
|
The length is calculated with int3store
|
|
See include/global.h for details.
|
|
The max packetsize can be 16 MB.
|
|
|
|
@item 1 Byte packet no
|
|
If no compression is used the first 4 bytes of each packet is the header
|
|
of the packet. The packet number is incremented for each sent packet.
|
|
The first packet starts with 0.
|
|
@item n Byte data
|
|
|
|
@end table
|
|
|
|
The packet length can be recalculated with:
|
|
|
|
@example
|
|
length = byte1 + (256 * byte2) + (256 * 256 * byte3)
|
|
@end example
|
|
|
|
|
|
@node raw packet with compression, basic packets, raw packet without compression, protocol
|
|
@section Raw Packet With Compression
|
|
|
|
@example
|
|
+---------------------------------------------------+
|
|
| Packet Length | Packet no | Uncomp. Packet Length |
|
|
| 3 Bytes | 1 Byte | 3 Bytes |
|
|
+---------------------------------------------------+
|
|
@end example
|
|
|
|
@table @asis
|
|
@item 3 Byte packet length
|
|
The length is calculated with int3store
|
|
See include/global.h for details.
|
|
The max packetsize can be 16 MB.
|
|
|
|
@item 1 Byte packet no
|
|
@item 3 Byte uncompressed packet length
|
|
@end table
|
|
|
|
If compression is used the first 7 bytes of each packet
|
|
is the header of the packet.
|
|
|
|
|
|
@node basic packets, communication, raw packet with compression, protocol
|
|
@section Basic Packets
|
|
|
|
@menu
|
|
* ok packet::
|
|
* error packet::
|
|
@end menu
|
|
|
|
|
|
@node ok packet, error packet, basic packets, basic packets
|
|
@subsection OK Packet
|
|
|
|
For details, see @file{sql/net_pkg.cc::send_ok()}.
|
|
|
|
@example
|
|
+-----------------------------------------------+
|
|
| Header | No of Rows | Affected Rows |
|
|
| | 1 Byte | 1-8 Byte |
|
|
|-----------------------------------------------|
|
|
| ID (last_insert_id) | Status | Length |
|
|
| 1-8 Byte | 2 Byte | 1-8 Byte |
|
|
|-----------------------------------------------|
|
|
| Messagetext |
|
|
| n Byte |
|
|
+-----------------------------------------------+
|
|
@end example
|
|
|
|
@table @asis
|
|
@item Header
|
|
@item 1 byte number of rows ? (always 0 ?)
|
|
@item 1-8 bytes affected rows
|
|
@item 1-8 byte id (last_insert_id)
|
|
@item 2 byte Status (usually 0)
|
|
@item If the OK-packege includes a message:
|
|
@item 1-8 bytes length of message
|
|
@item n bytes messagetext
|
|
@end table
|
|
|
|
|
|
@node error packet, , ok packet, basic packets
|
|
@subsection Error Packet
|
|
|
|
@example
|
|
+-----------------------------------------------+
|
|
| Header | Status code | Error no |
|
|
| | 1 Byte | 2 Byte |
|
|
|-----------------------------------------------|
|
|
| Messagetext | 0x00 |
|
|
| n Byte | 1 Byte |
|
|
+-----------------------------------------------+
|
|
@end example
|
|
|
|
@table @asis
|
|
@item Header
|
|
@item 1 byte status code (0xFF = ERROR)
|
|
@item 2 byte error number (is only sent to new 3.23 clients.
|
|
@item n byte errortext
|
|
@item 1 byte 0x00
|
|
@end table
|
|
|
|
|
|
@node communication, fieldtype codes, basic packets, protocol
|
|
@section Communication
|
|
|
|
> Packet from server to client
|
|
< Paket from client tor server
|
|
|
|
Login
|
|
------
|
|
> 1. packet
|
|
Header
|
|
1 byte protocolversion
|
|
n byte serverversion
|
|
1 byte 0x00
|
|
4 byte threadnumber
|
|
8 byte crypt seed
|
|
1 byte 0x00
|
|
2 byte CLIENT_xxx options (see include/mysql_com.h
|
|
that is supported by the server
|
|
1 byte number of current server charset
|
|
2 byte server status variables (SERVER_STATUS_xxx flags)
|
|
13 byte 0x00 (not used yet).
|
|
|
|
< 2. packet
|
|
Header
|
|
2 byte CLIENT_xxx options
|
|
3 byte max_allowed_packet for the client
|
|
n byte username
|
|
1 byte 0x00
|
|
8 byte crypted password
|
|
1 byte 0x00
|
|
n byte databasename
|
|
1 byte 0x00
|
|
|
|
> 3. packet
|
|
OK-packet
|
|
|
|
|
|
Command
|
|
--------
|
|
< 1. packet
|
|
Header
|
|
1 byte command type (e.g.0x03 = query)
|
|
n byte query
|
|
|
|
Result set (after command)
|
|
--------------------------
|
|
> 2. packet
|
|
Header
|
|
1-8 byte field_count (packed with net_store_length())
|
|
|
|
If field_count == 0 (command):
|
|
1-8 byte affected rows
|
|
1-8 byte insert id
|
|
2 bytes server_status (SERVER_STATUS_xx)
|
|
|
|
If field_count == NULL_LENGTH (251)
|
|
LOAD DATA LOCAL INFILE
|
|
|
|
If field_count > 0 Result Set:
|
|
|
|
> n packets
|
|
Header Info
|
|
Column description: 5 data object /column
|
|
(See code in unpack_fields())
|
|
|
|
Columninfo for each column:
|
|
1 data block table_name
|
|
1 byte length of block
|
|
n byte data
|
|
1 data block field_name
|
|
1 byte length of block...
|
|
n byte data
|
|
1 data block display length of field
|
|
1 byte length of block
|
|
3 bytes display length of filed
|
|
1 data block type field of type (enum_field_types)
|
|
1 byte length of block
|
|
1 bytexs field of type
|
|
1 data block flags
|
|
1 byte length of block
|
|
2 byte flags for the columns (NOT_NULL_FLAG, ZEROFILL_FLAG....)
|
|
1 byte decimals
|
|
|
|
if table definition:
|
|
1 data block default value
|
|
|
|
Actual result (one packet per row):
|
|
4 byte header
|
|
1-8 byte length of data
|
|
n data
|
|
|
|
|
|
@node fieldtype codes, protocol functions, communication, protocol
|
|
@section Fieldtype Codes
|
|
|
|
@example
|
|
display_length |enum_field_type |flags
|
|
----------------------------------------------------
|
|
Blob 03 FF FF 00 |01 FC |03 90 00 00
|
|
Mediumblob 03 FF FF FF |01 FC |03 90 00 00
|
|
Tinyblob 03 FF 00 00 |01 FC |03 90 00 00
|
|
Text 03 FF FF 00 |01 FC |03 10 00 00
|
|
Mediumtext 03 FF FF FF |01 FC |03 10 00 00
|
|
Tinytext 03 FF 00 00 |01 FC |03 10 00 00
|
|
Integer 03 0B 00 00 |01 03 |03 03 42 00
|
|
Mediumint 03 09 00 00 |01 09 |03 00 00 00
|
|
Smallint 03 06 00 00 |01 02 |03 00 00 00
|
|
Tinyint 03 04 00 00 |01 01 |03 00 00 00
|
|
Varchar 03 XX 00 00 |01 FD |03 00 00 00
|
|
Enum 03 05 00 00 |01 FE |03 00 01 00
|
|
Datetime 03 13 00 00 |01 0C |03 00 00 00
|
|
Timestamp 03 0E 00 00 |01 07 |03 61 04 00
|
|
Time 03 08 00 00 |01 0B |03 00 00 00
|
|
Date 03 0A 00 00 |01 0A |03 00 00 00
|
|
@end example
|
|
|
|
@node protocol functions, protocol version 2, fieldtype codes, protocol
|
|
@section Functions used to implement the protocol
|
|
|
|
This should be merged with the above one and changed to texi format
|
|
|
|
Raw packets
|
|
-----------
|
|
|
|
- The my_net_xxxx() functions handles the packaging of a stream of data
|
|
into a raw packet that contains a packet number, length and data.
|
|
|
|
- This is implemented for the server in sql/net_serv.cc.
|
|
The client file, libmysql/net.c, is symlinked to this file
|
|
|
|
The important functions are:
|
|
|
|
my_net_write() Store a packet (= # number of bytes) to be sent
|
|
net_flush() Send the packets stored in the buffer
|
|
net_write_command() Send a command (1 byte) + packet to the server.
|
|
my_net_read() Read a packet
|
|
|
|
|
|
Include files
|
|
-------------
|
|
|
|
- include/mysql.h is included by all MySQL clients. It includes the
|
|
MYSQL and MYSQL_RES structures.
|
|
- include/mysql_com.h is include by mysql.h and mysql_priv.h (the
|
|
server) and includes a lot of common functions and structures to
|
|
handle the client/server protocol.
|
|
|
|
|
|
Packets from server to client:
|
|
-----------------------------
|
|
|
|
sql/net_pkg.cc:
|
|
|
|
- Sending of error packets
|
|
- Sending of OK packets (= end of data)
|
|
- Storing of values in a packet
|
|
|
|
|
|
sql/sql_base.cc:
|
|
|
|
- Function send_fields() sends the field description to the client.
|
|
|
|
sql/sql_show.cc:
|
|
|
|
- Sends results for a lot of SHOW commands, including:
|
|
SHOW DATABASES [like 'wildcard']
|
|
SHOW TABLES [like 'wildcard']
|
|
|
|
|
|
Packets from client to server:
|
|
------------------------------
|
|
|
|
This is done in libmysql/libmysql.c
|
|
|
|
The important ones are:
|
|
|
|
- mysql_real_connect() Connects to a mysqld server
|
|
- mysql_real_query() Sends a query to the server and
|
|
reads the ok packet or columns header.
|
|
- mysql_store_result() Read a result set from the server to memory
|
|
- mysql_use_result() Read a result set row by row from the server.
|
|
|
|
- net_safe_read() Read a packet from the server with
|
|
error handling.
|
|
- net_field_length() Reads the length of a packet string.
|
|
- simple_command() Sends a command/query to the server.
|
|
|
|
|
|
|
|
Connecting to mysqld (the MySQL server)
|
|
---------------------------------------
|
|
|
|
- On the client side: libmysql/libmysql.c::mysql_real_connect().
|
|
- On the server side: sql/sql_parse.cc::check_connections()
|
|
|
|
The packets sent during a connection are as follows
|
|
|
|
Server: Send greeting package (includes server capabilites, server
|
|
version and a random string of bytes to be used to scramble
|
|
the password.
|
|
Client: Sends package with client capabilites, user name, scrambled
|
|
password, database name
|
|
|
|
Server: Sends ok package or error package.
|
|
|
|
Client: If init command specified, send it t the server and read
|
|
ok/error package.
|
|
|
|
|
|
Password functions
|
|
------------------
|
|
|
|
The passwords are scrambled to a random number and are stored in hex
|
|
format on the server.
|
|
|
|
The password handling is done in sql/password.c. The important
|
|
function is 'scramble()', which takes the a password in clear text
|
|
and uses this to 'encrypt' the random string sent by the server
|
|
to a new message.
|
|
|
|
The encrypted message is sent to the server which uses the stored
|
|
random number password to encrypt the random string sent to the
|
|
client. If this is equal to the new message the client sends to the
|
|
server then the password is accepted.
|
|
|
|
@node protocol version 2, , protocol functions, protocol
|
|
@section Another description of the protocol
|
|
|
|
This should be merged with the above one and changed to texi format.
|
|
|
|
*****************************
|
|
*
|
|
* PROTOCOL OVERVIEW
|
|
*
|
|
*****************************
|
|
|
|
The MySQL protocol is relatively simple, and is designed for high performance
|
|
through minimisation of overhead, and extensibility through versioning and
|
|
options flags. It is a request-response protocol, and does not allow
|
|
multitasking or multiplexing over a single connection. There are two packet
|
|
formats, 'raw' and 'compressed' (which is used when both client and
|
|
server support zlib compression, and the client requests that data be
|
|
compressed):
|
|
|
|
* RAW PACKET, shorter than 16 M *
|
|
|
|
+-----------------------------------------------+
|
|
| Packet Length | Packet no | Data |
|
|
| 3 Bytes | 1 Byte | n Bytes |
|
|
+-----------------------------------------------+
|
|
^ ^
|
|
| 'HEADER' |
|
|
+-------------------------------+
|
|
|
|
|
|
* Packet Length: Calculated with int3store. See include/global.h for
|
|
details. The basic computation is length = byte1 +
|
|
(256 * byte2) + (256 * 256 * byte3). The max packetsize
|
|
can be 16 MB.
|
|
|
|
* Packet no: The packet number is incremented for each sent packet.
|
|
The first packet for each query from the client
|
|
starts with 0.
|
|
|
|
* Data: Specific to the operation being performed. Most often
|
|
used to send string data, such as a SQL query.
|
|
|
|
* COMPRESSED PACKET *
|
|
|
|
+---------------------------------------------------+-----------------+
|
|
| Packet Length | Packet no | Uncomp. Packet Length | Compressed Data |
|
|
| 3 Bytes | 1 Byte | 3 Bytes | n bytes |
|
|
+---------------------------------------------------+-----------------+
|
|
^ ^
|
|
| 'HEADER' |
|
|
+---------------------------------------------------+
|
|
|
|
* Packet Length: Calculated with int3store. See include/my_global.h for
|
|
details. The basic computation is length = byte1 +
|
|
(256 * byte2) + (256 * 256 * byte3). The max packetsize
|
|
can be 16 MB.
|
|
|
|
* Packet no: The packet number is incremented for each sent packet.
|
|
The first packet starts with 0.
|
|
|
|
* Uncomp. Packet Length: The length of the original, uncompressed packet
|
|
If this is zero then the data is not compressed.
|
|
|
|
* Compressed Data: The original packet, compressed with zlib compression
|
|
|
|
|
|
When using the compressed protocol, the client/server will only compress
|
|
send packets where the new packet is smaller than the not compressed one.
|
|
In other words, some packets may be compressed while others will not.
|
|
|
|
The 'compressed data' is one or more packets in *RAW PACKET* format.
|
|
|
|
*****************************
|
|
*
|
|
* FLOW OF EVENTS
|
|
*
|
|
*****************************
|
|
|
|
To understand how a client communicates with a MySQL server, it is easiest
|
|
to start with a high-level flow of events. Each event section will then be
|
|
followed by details of the exact contents of each type of packet involved
|
|
in the event flow.
|
|
|
|
* *
|
|
* CONNECTION ESTABLISHMENT *
|
|
* *
|
|
|
|
Clients connect to the server via a TCP/IP socket (port 3306 by default), a
|
|
Unix Domain Socket, or named pipes (on Windows). Once connected, the
|
|
following connection establishment sequence is followed:
|
|
|
|
+--------+ +--------+
|
|
| Client | | Server |
|
|
+--------+ +--------+
|
|
| |
|
|
| Handshake initialisation, including MySQL server version, |
|
|
| protocol version and options supported, as well as the seed |
|
|
| for the password hash |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| Client options supported, max packet size for client |
|
|
| username, password crypted with seed from server, database |
|
|
| name. |
|
|
| |
|
|
| --------------------------------------------------------------> |
|
|
| |
|
|
| 'OK' packet if authentication succeeds, 'ERROR' packet if |
|
|
| authentication fails. |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
|
|
|
|
|
|
* HANDSHAKE INITIALISATION PACKET *
|
|
|
|
|
|
+--------------------------------------------------------------------+
|
|
| Header | Prot. Version | Server Version String | 0x00 |
|
|
| | 1 Byte | n bytes | 1 byte |
|
|
|--------------------------------------------------------------------|
|
|
| Thread Number | Crypt Seed | 0x00 | CLIENT_xxx options |
|
|
| | | | supported by server |
|
|
| 4 Bytes | 8 Bytes | 1 Byte | 2 Bytes |
|
|
|--------------------------------------------------------------------|
|
|
| Server charset no. | Server status variables | 0x00 padding |
|
|
| 1 Byte | 2 Bytes | 13 bytes |
|
|
+--------------------------------------------------------------------+
|
|
|
|
* Protocol version (currently '10')
|
|
* Server Version String (e.g. '4.0.5-beta-log'). Can be any length as
|
|
it's followed by a 0 byte.
|
|
* Thread Number - ID of server thread handling this connection
|
|
* Crypt seed - seed used to crypt password in auth packet from client
|
|
* CLIENT_xxx options - see include/mysql_com.h
|
|
* Server charset no. - Index of charset in use by server
|
|
* Server status variables - see include/mysql_com.h
|
|
* The padding bytes are reserverd for future extensions to the protocol
|
|
|
|
* CLIENT AUTH PACKET *
|
|
|
|
|
|
+--------------------------------------------------------------------+
|
|
| Header | CLIENT_xxx options supported | max_allowed_packet |
|
|
| | by client | for client |
|
|
| | 2 Bytes | 3 bytes |
|
|
|--------------------------------------------------------------------|
|
|
| User Name | 0x00 | Crypted Password | 0x00 | Database Name |
|
|
| n Bytes | 1 Byte | 8 Bytes | 1 Byte | n Bytes |
|
|
|--------------------------------------------------------------------|
|
|
| 0x00 |
|
|
| 1 Byte |
|
|
+--------------------------------------------------------------------+
|
|
|
|
* CLIENT_xxx options that this client supports:
|
|
|
|
#define CLIENT_LONG_PASSWORD 1 /* new more secure passwords */
|
|
#define CLIENT_FOUND_ROWS 2 /* Found instead of affected rows */
|
|
#define CLIENT_LONG_FLAG 4 /* Get all column flags */
|
|
#define CLIENT_CONNECT_WITH_DB 8 /* One can specify db on connect */
|
|
#define CLIENT_NO_SCHEMA 16 /* Don't allow database.table.column */
|
|
#define CLIENT_COMPRESS 32 /* Can use compression protocol */
|
|
#define CLIENT_ODBC 64 /* Odbc client */
|
|
#define CLIENT_LOCAL_FILES 128 /* Can use LOAD DATA LOCAL */
|
|
#define CLIENT_IGNORE_SPACE 256 /* Ignore spaces before '(' */
|
|
#define CLIENT_INTERACTIVE 1024 /* This is an interactive client */
|
|
#define CLIENT_SSL 2048 /* Switch to SSL after handshake */
|
|
#define CLIENT_IGNORE_SIGPIPE 4096 /* IGNORE sigpipes */
|
|
#define CLIENT_TRANSACTIONS 8192 /* Client knows about transactions */
|
|
|
|
* max_allowed_packet for the client (in 'int3store' form)
|
|
* User Name - user to authenticate as. Is followed by a null byte.
|
|
* Crypted Password - password crypted with seed given in packet from
|
|
server, see scramble() in sql/password.c
|
|
* Database name (optional) - initial database to use once connected
|
|
Is followed by a null byte
|
|
|
|
At the end of every client/server exchange there is either an 'OK' packet
|
|
or an 'ERROR' packet sent from the server. To determine whether a packet is
|
|
an 'OK' packet, or an 'ERROR' packet, check if the first byte (after the
|
|
header) is 0xFF. If it has the value of 0xFF, the packet is an 'ERROR'
|
|
packet.
|
|
|
|
|
|
* OK PACKET *
|
|
|
|
For details, see sql/net_pkg.cc::send_ok()
|
|
|
|
+-----------------------------------------------+
|
|
| Header | No of Rows | Affected Rows |
|
|
| | 1 Byte | 1-9 Byte |
|
|
|-----------------------------------------------|
|
|
| ID (last_insert_id) | Status | Length |
|
|
| 1-9 Byte | 2 Byte | 1-9 Byte |
|
|
|-----------------------------------------------|
|
|
| Messagetext |
|
|
| n Byte |
|
|
+-----------------------------------------------+
|
|
|
|
* Number of rows, always 0
|
|
* Affected rows
|
|
* ID (last_insert_id) - value for auto_increment column (if any)
|
|
* Status (usually 0)
|
|
|
|
In general, in the MySQL protocol, fields in a packet that that
|
|
represent numeric data, such as lengths, that are labeled as '1-9'
|
|
bytes can be decoded by the following logic:
|
|
|
|
If the first byte is '251', the
|
|
corresponding column value is NULL (only appropriate in
|
|
'ROW DATA' packets).
|
|
|
|
If the first byte is '252', the value stored can be read
|
|
from the following 2 bytes as a 16-bit integer.
|
|
|
|
|
|
If the first byte is '253' the value stored can be read
|
|
from the following 4 bytes as a 32-bit long integer
|
|
|
|
|
|
If the first byte is '254', the value stored can be read
|
|
from the following 8 bytes as a 64-byte long
|
|
|
|
Otherwise (values 0-250), the value stored is the value of the
|
|
first byte itself.
|
|
|
|
|
|
If the OK-packet includes a message:
|
|
|
|
* Length of message
|
|
* Message Text
|
|
|
|
|
|
* ERROR PACKET *
|
|
|
|
+-----------------------------------------------+
|
|
| Header | Status code | Error no |
|
|
| | 1 Byte | 2 Byte |
|
|
|-----------------------------------------------|
|
|
| Messagetext | |
|
|
| n Byte | |
|
|
+-----------------------------------------------+
|
|
|
|
* Status code (0xFF = ERROR)
|
|
* Error number (is only sent to 3.23 and newer clients)
|
|
* Error message text (ends at end of packet)
|
|
|
|
Note that the error message is not null terminated.
|
|
The client code can however assume that the packet ends with a null
|
|
as my_net_read() will always add an end-null to all read packets to
|
|
make things easier for the client.
|
|
|
|
Example:
|
|
|
|
Packet dump of client connecting to server:
|
|
|
|
+------------------------- Protocol Version (10)
|
|
|
|
|
| +---------------------- Server Version String (0x00 terminated)
|
|
| |
|
|
| |
|
|
0a 34 2e 30 2e 35 2d 62 . 4 . 0 . 5 - b
|
|
65 74 61 2d 6c 6f 67 00 e t a - l o g .
|
|
15 00 00 00 2b 5a 65 6c . . . . + Z e l
|
|
| |
|
|
| +------------ First 4 bytes of crypt seed
|
|
|
|
|
+------------------------ Thread Number
|
|
|
|
+------------------------- Last 4 bytes of crypt seed
|
|
|
|
|
| +-------- CLIENT_XXX Options supported by server
|
|
| |
|
|
| +-+--+ +--- Server charset index
|
|
| | | |
|
|
6f 69 41 46 00 2c 28 08 o i A F . , ( .
|
|
02 00 00 00 00 00 00 00 . . . . . . . .
|
|
| |
|
|
| +---------------------- 0x00 padding begins
|
|
|
|
|
+------------------------- Server status (0x02 =
|
|
SERVER_STATUS_AUTOCOMMIT)
|
|
|
|
00 00 00 00 00 00 00 00 . . . . . . . .
|
|
|
|
* Client Authentication Response (Username 'test', no database
|
|
selected) *
|
|
|
|
+--------------------- Packet Length (0x13 = 19 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +----------- CLIENT_XXX Options supported by client
|
|
| |
|
|
+---+---+ | +-+-+
|
|
| | | | |
|
|
13 00 00 01 03 00 1e 00 . . . . . . . .
|
|
00 74 65 73 74 00 48 5e . t e s t . H ^
|
|
| | |
|
|
+----+-----+ +------- Scrambled password, 0x00 terminated
|
|
|
|
|
+----------------- Username, 0x00 terminated
|
|
|
|
57 4a 4e 41 4a 4e 00 00 W J N A J N . .
|
|
00 .
|
|
|
|
|
|
>From this point on, the server waits for 'commands' from the client
|
|
which include queries, database shutdown, quit, change user, etc (see
|
|
the COM_xxxx values in include/mysql_com.h for the latest
|
|
command codes).
|
|
|
|
* *
|
|
* COMMAND PROCESSING *
|
|
* *
|
|
|
|
+--------+ +--------+
|
|
| Client | | Server |
|
|
+--------+ +--------+
|
|
| |
|
|
| A command packet, with a command code, and string data |
|
|
| when appropriate (e.g. a query), (see the COM_xxxx values |
|
|
| in include/mysql_com.h for the command codes) |
|
|
| |
|
|
| --------------------------------------------------------------> |
|
|
| |
|
|
| A 'RESULT' packet if the command completed successfully, |
|
|
| an 'ERROR' packet if the command failed. 'RESULT' packets |
|
|
| take different forms (see the details following this chart) |
|
|
| depending on whether or not the command returns rows. |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| n 'FIELD PACKET's (if rows are returned) |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| 'LAST DATA' packet |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| n 'ROW PACKET's (if rows are returned) |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| 'LAST DATA' packet |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
|
|
|
|
* Command Packet *
|
|
|
|
+------------------------------------------------------+
|
|
| Header | Command type | Query (if applicable) |
|
|
| | 1 Byte | n Bytes |
|
|
+------------------------------------------------------+
|
|
|
|
* Command type: (e.g.0x03 = query, see the COM_xxxx values in
|
|
include/mysql_com.h)
|
|
* Query (if applicable)
|
|
|
|
Note that my_net_read() null-terminates all packets on the
|
|
receiving side of the channel to make it easier for the code
|
|
examining the packets.
|
|
|
|
The current command codes are:
|
|
|
|
0x00 COM_SLEEP
|
|
0x01 COM_QUIT
|
|
0x02 COM_INIT_DB
|
|
0x03 COM_QUERY
|
|
0x04 COM_FIELD_LIST
|
|
0x05 COM_CREATE_DB
|
|
0x06 COM_DROP_DB
|
|
0x07 COM_REFRESH
|
|
0x08 COM_SHUTDOWN
|
|
0x09 COM_STATISTICS
|
|
0x0a COM_PROCESS_INFO
|
|
0x0b COM_CONNECT
|
|
0x0c COM_PROCESS_KILL
|
|
0x0d COM_DEBUG
|
|
0x0e COM_PING
|
|
0x0f COM_TIME
|
|
0x10 COM_DELAYED_INSERT
|
|
0x11 COM_CHANGE_USER
|
|
0x12 COM_BINLOG_DUMP
|
|
0x13 COM_TABLE_DUMP
|
|
0x14 COM_CONNECT_OUT
|
|
0x15 COM_REGISTER_SLAVE
|
|
|
|
* Result Packet *
|
|
|
|
Result packet for a command returning _no_ rows:
|
|
|
|
+-----------------------------------------------+
|
|
| Header | Field Count | Affected Rows |
|
|
| | 1-9 Bytes | 1-9 Bytes |
|
|
|-----------------------------------------------|
|
|
| ID (last_insert_id) | Server Status |
|
|
| 1-9 Bytes | 2 Bytes |
|
|
+-----------------------------------------------+
|
|
|
|
* Field Count: Has value of '0' for commands returning _no_ rows
|
|
* Affected rows: Count of rows affected by INSERT/UPDATE/DELETE, etc.
|
|
* ID: value of auto_increment column in row (if any). 0 if
|
|
* Server Status: Usually 0
|
|
|
|
Result packet for a command returning rows:
|
|
|
|
+-------------------------------+
|
|
| Header | Field Count |
|
|
| | 1-9 Bytes |
|
|
+-------------------------------+
|
|
|
|
* Field Count: number of columns/fields in result set,
|
|
(packed with net_store_length() in sql/net_pkg.cc)
|
|
|
|
This is followed by as many packets as the number of fields ('Field Count')
|
|
that contain the metadata for each column/field (see unpack_fields() in
|
|
libmysql/libmysql.c):
|
|
|
|
|
|
* FIELD PACKET *
|
|
|
|
+-----------------------------------------------+
|
|
| Header | Table Name |
|
|
| | length-coded-string |
|
|
|-----------------------------------------------|
|
|
| Field Name |
|
|
| length-code-string |
|
|
|-----------------------------------------------|
|
|
| Display length of field
|
|
| length-coded-binary (4 bytes) |
|
|
|-----------------------------------------------|
|
|
| Field Type (enum_field_types in mysql_com.h) |
|
|
| length-coded-binary (2 bytes) |
|
|
|-----------------------------------------------|
|
|
| Field Flags | Decimal Places|
|
|
| length-coded-binary (3 bytes) | 1 Byte |
|
|
+--------------+-------------+------------------+
|
|
|
|
* A length coded string is a string where we first have a packet
|
|
length (1-9 bytes, packed_with net_store_length()) followed
|
|
by a string.
|
|
* A length coded binary is a length (1 byte) followed by an integer
|
|
value in low-byte-first order. For the moment this type is always
|
|
fixed length in this packet.
|
|
|
|
* Table Name - the name of the table the column comes from
|
|
* Field Name - the name of the column/field
|
|
* Display length of field - length of field
|
|
* Field Type - Type of field, see enum_field_types in
|
|
include/mysql_com.h
|
|
|
|
Current field types are:
|
|
|
|
0x00 FIELD_TYPE_DECIMAL
|
|
0x01 FIELD_TYPE_TINY
|
|
0x02 FIELD_TYPE_SHORT
|
|
0x03 FIELD_TYPE_LONG
|
|
0x04 FIELD_TYPE_FLOAT
|
|
0x05 FIELD_TYPE_DOUBLE
|
|
0x06 FIELD_TYPE_NULL
|
|
0x07 FIELD_TYPE_TIMESTAMP
|
|
0x08 FIELD_TYPE_LONGLONG
|
|
0x09 FIELD_TYPE_INT24
|
|
0x0a FIELD_TYPE_DATE
|
|
0x0b FIELD_TYPE_TIME
|
|
0x0c FIELD_TYPE_DATETIME
|
|
0x0d FIELD_TYPE_YEAR
|
|
0x0e FIELD_TYPE_NEWDATE
|
|
0xf7 FIELD_TYPE_ENUM
|
|
0xf8 FIELD_TYPE_SET
|
|
0xf9 FIELD_TYPE_TINY_BLOB
|
|
0xfa FIELD_TYPE_MEDIUM_BLOB
|
|
0xfb FIELD_TYPE_LONG_BLOB
|
|
0xfc FIELD_TYPE_BLOB
|
|
0xfd FIELD_TYPE_VAR_STRING
|
|
0xfe FIELD_TYPE_STRING
|
|
0xff FIELD_TYPE_GEOMETRY
|
|
|
|
* Field Flags - NOT_NULL_FLAG, PRI_KEY_FLAG, xxx_FLAG in
|
|
include/mysql_com.h
|
|
|
|
|
|
Note that the packet format in 4.1 has slightly changed to allow more values.
|
|
|
|
|
|
* ROW PACKET *
|
|
|
|
+-----------------------------------------------+
|
|
| Header | Data Length | Column Data | ....for each column
|
|
| | 1-9 Bytes | n Bytes |
|
|
+-----------------------------------------------+
|
|
|
|
* Data Length: (packed with net_store_length() in sql/net_pkg.cc)
|
|
|
|
If 'Data Length' == 0, this is an 'ERROR PACKET'.
|
|
|
|
* Column Data: String representation of data. MySQL always sends result set
|
|
data as strings.
|
|
|
|
* LAST DATA PACKET *
|
|
|
|
Packet length is < 9 bytes, and first byte is 0xFE
|
|
|
|
+--------+
|
|
| 0xFE |
|
|
| 1 Byte |
|
|
+--------+
|
|
|
|
Examples:
|
|
|
|
***********
|
|
*
|
|
* INITDB Command
|
|
*
|
|
***********
|
|
|
|
A client issuing an 'INITDB' (select the database to use) command,
|
|
followed by an 'OK' packet with no rows and no affected rows from
|
|
the server:
|
|
|
|
* INITDB (select database to use) 'COMMAND' Packet *
|
|
|
|
+--------------------- Packet Length (5 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +------------ Command # (INITDB = 0x02)
|
|
| |
|
|
+---+---+ | | +---------- Beginning of query data
|
|
| | | | |
|
|
05 00 00 00 02 74 65 73 . . . . . t e s
|
|
74 t
|
|
|
|
* 'OK' Packet with no rows, and no rows affected *
|
|
|
|
+--------------------- Packet Length (3 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
+---+---+ |
|
|
| | |
|
|
03 00 00 01 00 00 00 . . . . . . .
|
|
|
|
|
|
***********
|
|
*
|
|
* SELECT query example
|
|
*
|
|
***********
|
|
|
|
Client issuing a 'SELECT *' query on the following table:
|
|
|
|
CREATE TABLE number_test (minBigInt bigint,
|
|
maxBigInt bigint,
|
|
testBigInt bigint)
|
|
|
|
* 'COMMAND' Packet with QUERY (select ...) *
|
|
|
|
+--------------------- Packet Length (26)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +------------ Command # (QUERY = 0x03)
|
|
| |
|
|
+---+---+ | | +---------- Beginning of query data
|
|
| | | | |
|
|
1a 00 00 00 03 53 45 4c . . . . . S E L
|
|
45 43 54 20 2a 20 66 72 E C T . * . f r
|
|
6f 6d 20 6e 75 6d 62 65 o m . n u m b e
|
|
72 5f 74 65 73 74 r _ t e s t
|
|
|
|
|
|
and receiving an 'OK' packet with a 'FIELD COUNT' of 3
|
|
|
|
|
|
* 'OK' Packet with 3 fields *
|
|
|
|
+--------------------- Packet Length (3 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
+---+---+ |
|
|
| | |
|
|
01 00 00 01 03 . . . . .
|
|
|
|
Followed immediately by 3 'FIELD' Packets. Note, the individual packets
|
|
are delimitted by =======, so that all fields can be annotated in the first
|
|
'FIELD' packet example:
|
|
|
|
=============================================================
|
|
|
|
+--------------------- Packet Length (0x1f = 31 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +------------ Block Length (0x0b = 11 bytes)
|
|
| | |
|
|
+---+---+ | | +--------- Table Name (11 bytes long)
|
|
| | | | |
|
|
1f 00 00 02 0b 6e 75 6d . . . . . n u m
|
|
62 65 72 5f 74 65 73 74 b e r _ t e s t
|
|
|
|
+------------------------ Block Length (9 bytes)
|
|
|
|
|
| +--------------------- Column Name (9 bytes long)
|
|
| |
|
|
09 6d 69 6e 42 69 67 49 . m i n B i g I
|
|
6e 74 03 14 00 00 01 08 n t . . . . . .
|
|
| | | | |
|
|
| +---+---+ | +--- Field Type (0x08 = FIELD_TYPE_LONGLONG)
|
|
| | |
|
|
| | +------ Block Length (1)
|
|
| |
|
|
| +--------------- Display Length (0x14 = 20 chars)
|
|
|
|
|
+------------------ Block Length (3)
|
|
|
|
+------------------------ Block Length (2)
|
|
|
|
|
| +-------------------- Field Flags (0 - no flags set)
|
|
| |
|
|
| +---+ +--------------- Decimal Places (0)
|
|
| | | |
|
|
02 00 00 00 . . . .
|
|
|
|
=============================================================
|
|
|
|
'FIELD' packet for the 'number_Test.maxBigInt' column
|
|
|
|
1f 00 00 03 0b 6e 75 6d . . . . . n u m
|
|
62 65 72 5f 74 65 73 74 b e r _ t e s t
|
|
09 6d 61 78 42 69 67 49 . m a x B i g I
|
|
6e 74 03 14 00 00 01 08 n t . . . . . .
|
|
02 00 00 00 . . . .
|
|
|
|
=============================================================
|
|
|
|
'FIELD' packet for the 'number_test.testBigInt' column
|
|
|
|
20 00 00 04 0b 6e 75 6d . . . . . n u m
|
|
62 65 72 5f 74 65 73 74 b e r _ t e s t
|
|
0a 74 65 73 74 42 69 67 . t e st B i g
|
|
49 6e 74 03 14 00 00 01 I n t . . . . .
|
|
08 02 00 00 00 . . . . .
|
|
=============================================================
|
|
|
|
Followed immediately by one 'LAST DATA' packet:
|
|
|
|
fe 00 . .
|
|
|
|
Followed immediately by 'n' row packets (in this case, only
|
|
one packet is sent from the server, for simplicity's sake):
|
|
|
|
|
|
+--------------------- Packet Length (0x52 = 82 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +------------ Data Length (0x14 = 20 bytes)
|
|
| | |
|
|
+---+---+ | | +--------- String Data '-9223372036854775808'
|
|
| | | | | (repeat Data Length/Data sequence)
|
|
|
|
52 00 00 06 14 2d 39 32 . . . . . - 9 2
|
|
32 33 33 37 32 30 33 36 2 3 3 7 2 0 3 6
|
|
38 35 34 37 37 35 38 30 8 5 4 7 7 5 8 0
|
|
38 13 39 32 32 33 33 37 8 . 9 2 2 3 3 7
|
|
32 30 33 36 38 35 34 37 2 0 3 6 8 5 4 7
|
|
37 35 38 30 37 0a 36 31 7 5 8 0 7 . 6 1
|
|
34 37 34 38 33 36 34 37 4 7 4 8 3 6 4 7
|
|
|
|
Followed immediately by one 'LAST DATA' packet:
|
|
|
|
fe 00 . .
|
|
|
|
|
|
|
|
@c The Index was empty, and ugly, so I removed it. (jcole, Sep 7, 2000)
|
|
|
|
@c @node Index
|
|
@c @unnumbered Index
|
|
|
|
@c @printindex fn
|
|
|
|
@c @node 4.1 protocol,,,
|
|
@c @chapter MySQL 4.1 protocol
|
|
|
|
@node 4.1 protocol changes,,,
|
|
@section Changes to 4.0 protocol in 4.1
|
|
|
|
All basic packet handling is identical to 4.0. When communication
|
|
with an old 4.0 or 3.x client we will use the old protocol.
|
|
|
|
The new things that we support with 4.1 are:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Warnings
|
|
@item
|
|
Prepared statements
|
|
@item
|
|
Binary protocol (will be faster than the current protocol that
|
|
converts everything to strings)
|
|
@end itemize
|
|
|
|
|
|
What has changed in 4.1 are:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
A lot of new field information (database, real table name etc)
|
|
@item
|
|
The 'ok' packet has more status fields
|
|
@item
|
|
The 'end' packet (send last for each result set) now contains some
|
|
extra information
|
|
@item
|
|
New protocol for prepared statements. In this case all parameters and
|
|
results will sent as binary (low-byte-first).
|
|
@end itemize
|
|
|
|
|
|
@node 4.1 field packet,,,
|
|
@section 4.1 field description packet
|
|
|
|
The field description packet is sent as a response to a query that
|
|
contains a result set. It can be distinguished from a ok packet by
|
|
the fact that the first byte can't be 0 for a field packet.
|
|
@xref{4.1 ok packet}.
|
|
|
|
The header packet has the following structure:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 1-9 @tab Number of columns in result set (never 0)
|
|
@item 1-9 @tab Extra information sent be some command (SHOW COLUMNS
|
|
uses this to send the number of rows in the table)
|
|
@end multitable
|
|
|
|
This packet is always followed by a field description set.
|
|
@xref{4.1 field desc}.
|
|
|
|
@node 4.1 field desc,,,
|
|
@section 4.1 field description result set
|
|
|
|
The field description result set contains the meta info for a result set.
|
|
|
|
@multitable @columnfractions .20 .80
|
|
@item Type @tab Comment
|
|
@item string @tab Database name
|
|
@item string @tab Table name alias (or table name if no alias)
|
|
@item string @tab Real table name
|
|
@item string @tab Alias for column name (or column name if not used)
|
|
|
|
@item 11 byte @tab Fixed length fields in one field part:
|
|
@itemize
|
|
@item 2 byte int @tab Character set number
|
|
@item 3 byte int @tab Length of column definition
|
|
@item 1 byte int @tab Enum value for field type
|
|
@item 3 byte int @tab 2 byte column flags (NOT_NULL_FLAG etc..) + 1 byte number of decimals.
|
|
@item 2 byte int @tab zero (reserved for future use)
|
|
@end itemize
|
|
|
|
@item string int @tab Default value, only set when using mysql_list_fields().
|
|
@end multitable
|
|
|
|
|
|
@node 4.1 ok packet,,,
|
|
@section 4.1 ok packet
|
|
|
|
The ok packet is the first that is sent as an response for a query
|
|
that didn't return a result set.
|
|
|
|
The ok packet has the following structure:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 1 @tab 0 ; Marker for ok packet
|
|
@item 1-9 @tab Affected rows
|
|
@item 1-9 @tab Last insert id (0 if one wasn't used)
|
|
@item 2 @tab Server status; Can be used by client to check if we are inside an transaction
|
|
@item 2 @tab Warning count
|
|
@item 1-9 @tab Message length (optional)
|
|
@item xxx @tab Message (optional)
|
|
@end multitable
|
|
|
|
Size 1-9 means that the parameter is packed in to 1-9 bytes depending on
|
|
the value. (See function sql/net_pkg.cc::net_store_length).
|
|
|
|
The message is optional. For example for multi line INSERT it
|
|
contains a string for how many rows was inserted / deleted.
|
|
|
|
|
|
@node 4.1 end packet,,,
|
|
@section 4.1 end packet
|
|
|
|
The end packet is sent as the last packet for
|
|
|
|
@itemize @bullet
|
|
@item
|
|
End of field information
|
|
@item
|
|
End of parameter type information
|
|
@item
|
|
End of result set
|
|
@end itemize
|
|
|
|
The end packet has the following structure:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 1 @tab 254 ; Marker for EOF packet
|
|
@item 2 @tab Warning count
|
|
@item 2 @tab Status flags (For flags like SERVER_STATUS_MORE_RESULTS)
|
|
@end multitable
|
|
|
|
Note that a normal packet may start with byte 254, which means
|
|
'length stored in 9 bytes'. One can different between these cases
|
|
by checking the packet length < 9 bytes (in which case it's and end
|
|
packet).
|
|
|
|
|
|
@node 4.1 error packet
|
|
@section 4.1 error packet.
|
|
|
|
The error packet is sent when something goes wrong.
|
|
The error packet has the following structure:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 1 @tab 255 Error packet marker
|
|
@item 2 @tab Error code
|
|
@item 1-255 @tab Null terminated error message
|
|
@end multitable
|
|
|
|
The client/server protocol is designed in such a way that a packet
|
|
can only start with 255 if it's an error packet.
|
|
|
|
|
|
@node 4.1 prep init,,,
|
|
@section 4.1 prepared statement init packet
|
|
|
|
This is the return packet when one sends a query with the COM_PREPARE
|
|
command.
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 4 @tab Statement handler id
|
|
@item 2 @tab Number of columns in result set
|
|
@item 2 @tab Number of parameters in query
|
|
@end multitable
|
|
|
|
After this, there is a packet that contains the following for each
|
|
parameter in the query:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 2 @tab Enum value for field type. (MYSQL_TYPE_UNKNOWN if not known)
|
|
@item 2 @tab 2 byte column flags (NOT_NULL_FLAG etc)
|
|
@item 1 @tab Number of decimals
|
|
@item 4 @tab Max column length.
|
|
@end multitable
|
|
|
|
Note that the above is not yet in 4.1 but will be added this month.
|
|
|
|
As MySQL can have a parameter 'anywhere' it will in many cases not be
|
|
able to provide the optimal information for all parameters.
|
|
|
|
If number of columns, in the header packet, is not 0 then the
|
|
prepared statement will contain a result set. In this case the packet
|
|
is followed by a field description result set. @xref{4.1 field desc}.
|
|
|
|
|
|
@node 4.1 long data,,,
|
|
@section 4.1 long data handling
|
|
|
|
This is used by mysql_send_long_data() to set any parameter to a string
|
|
value. One can call mysql_send_long_data() multiple times for the
|
|
same parameter; The server will concatenate the results to a one big
|
|
string.
|
|
|
|
The server will not require an end packet for the string.
|
|
mysql_send_long_data() is responsible updating a flag that all data
|
|
has been sent. (Ie; That the last call to mysql_send_long_data() has
|
|
the 'last_data' flag set).
|
|
|
|
This packet is sent from client -> server:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 4 @tab Statement handler
|
|
@item 2 @tab Parameter number
|
|
@item 2 @tab Type of parameter (not used at this point)
|
|
@item # @tab data (Rest of packet)
|
|
@end multitable
|
|
|
|
The server will NOT send an @code{ok} or @code{error} packet in
|
|
responce for this. If there is any errors (like to big string), one
|
|
will get the error when calling execute.
|
|
|
|
@node 4.1 execute,,,
|
|
@section 4.1 execute
|
|
|
|
On execute we send all parameters to the server in a COM_EXECUTE
|
|
packet.
|
|
|
|
The packet contains the following information:
|
|
|
|
@multitable @columnfractions .30 .70
|
|
@item Size @tab Comment
|
|
@item (param_count+9)/8 @tab Null bit map (2 bits reserved for protocol)
|
|
@item 1 @tab new_parameter_bound flag. Is set to 1 for first
|
|
execute or if one has rebound the parameters.
|
|
@item 2*param_count @tab Type of parameters (only given if new_parameter_bound flag is 1)
|
|
@item # @tab Parameter data, repeated for each parameter that are
|
|
NOT NULL and not used with mysql_send_long_data().
|
|
@end multitable
|
|
|
|
The null-bit-map is for all parameters (including parameters sent with
|
|
'mysql_send_long_data). If parameter 0 is NULL, then bit 0 in the
|
|
null-bit-map should be 1 (ie: first byte should be 1)
|
|
|
|
The parameters are stored the following ways:
|
|
|
|
@multitable @columnfractions .20 .10 .70
|
|
@item Type @tab Size @tab Comment
|
|
@item tinyint @tab 1 @tab One byte integer
|
|
@item short @tab 2 @tab
|
|
@item int @tab 4 @tab
|
|
@item longlong @tab 8 @tab
|
|
@item float @tab 4 @tab
|
|
@item double @tab 8 @tab
|
|
@item string @tab 1-9 + # @tab Packed string length + string
|
|
@end multitable
|
|
|
|
The result for this will be either an ok packet or a binary result
|
|
set.
|
|
|
|
@node 4.1 binary result,,,
|
|
@section 4.1 binary result set
|
|
|
|
A binary result are sent the following way.
|
|
|
|
For each result row:
|
|
|
|
@itemize
|
|
@item
|
|
null bit map with first two bits set to 01 (bit 0,1 value 1)
|
|
@item
|
|
parameter data, repeated for each not null result column.
|
|
@end itemize
|
|
|
|
The idea with the reserving two bits in the null map is that we can
|
|
use standard error (first byte 255) and ok packets (first byte 0)
|
|
to end a result sets.
|
|
|
|
Except that the null-bit-map is shifted two steps, the server is
|
|
sending the data to the client the same way that the server is sending
|
|
bound parameters to the client. The server is always sending the data
|
|
as type given for 'column type' for respective column. It's up to the
|
|
client to convert the parameter to the requested type.
|
|
|
|
DATETIME, DATE and TIME are sent to the server in a binary format as follows:
|
|
|
|
@multitable @columnfractions .20 .10 .70
|
|
@item Type @tab Size @tab Comment
|
|
@item date @tab 1 + 0-11 @tab Length + 2 byte year, 1 byte MMDDHHMMSS, 4 byte billionth of a second
|
|
@item datetime @tab 1 + 0-11 @tab Length + 2 byte year, 1 byte MMDDHHMMSS, 4 byte billionth of a second
|
|
@item time @tab 1 + 0-14 @tab Length + sign (0 = pos, 1= neg), 4 byte days, 1 byte HHMMDD, 4 byte billionth of a second
|
|
@end multitable
|
|
|
|
The first byte is a length byte and then comes all parameters that are
|
|
not 0. (Always counted from the beginning).
|
|
|
|
@node Fulltext Search, MyISAM Record Structure, protocol, Top
|
|
@chapter Fulltext Search in MySQL
|
|
|
|
Hopefully, sometime there will be complete description of
|
|
fulltext search algorithms.
|
|
Now it's just unsorted notes.
|
|
|
|
@menu
|
|
* Weighting in boolean mode::
|
|
@end menu
|
|
|
|
@node Weighting in boolean mode, , Fulltext Search, Fulltext Search
|
|
@section Weighting in boolean mode
|
|
|
|
The basic idea is as follows: in expression
|
|
@code{A or B or (C and D and E)}, either @code{A} or @code{B} alone
|
|
is enough to match the whole expression. While @code{C},
|
|
@code{D}, and @code{E} should @strong{all} match. So it's
|
|
reasonable to assign weight 1 to @code{A}, @code{B}, and
|
|
@code{(C and D and E)}. And @code{C}, @code{D}, and @code{E}
|
|
should get a weight of 1/3.
|
|
|
|
Things become more complicated when considering boolean
|
|
operators, as used in MySQL FTB. Obvioulsy, @code{+A +B}
|
|
should be treated as @code{A and B}, and @code{A B} -
|
|
as @code{A or B}. The problem is, that @code{+A B} can @strong{not}
|
|
be rewritten in and/or terms (that's the reason why this - extended -
|
|
set of operators was chosen). Still, aproximations can be used.
|
|
@code{+A B C} can be approximated as @code{A or (A and (B or C))}
|
|
or as @code{A or (A and B) or (A and C) or (A and B and C)}.
|
|
Applying the above logic (and omitting mathematical
|
|
transformations and normalization) one gets that for
|
|
@code{+A_1 +A_2 ... +A_N B_1 B_2 ... B_M} the weights
|
|
should be: @code{A_i = 1/N}, @code{B_j=1} if @code{N==0}, and,
|
|
otherwise, in the first rewritting approach @code{B_j = 1/3},
|
|
and in the second one - @code{B_j = (1+(M-1)*2^M)/(M*(2^(M+1)-1))}.
|
|
|
|
The second expression gives somewhat steeper increase in total
|
|
weight as number of matched B's increases, because it assigns
|
|
higher weights to individual B's. Also the first expression in
|
|
much simplier. So it is the first one, that is implemented in MySQL.
|
|
|
|
|
|
@node MyISAM Record Structure, InnoDB Record Structure, Fulltext Search, Top
|
|
@chapter MyISAM Record Structure
|
|
|
|
@section Introduction
|
|
|
|
When you say:
|
|
@*
|
|
|
|
@strong{CREATE TABLE Table1 ...}
|
|
@*
|
|
|
|
MySQL creates files named Table1.MYD ("MySQL Data"), Table1.MYI
|
|
("MySQL Index"), and Table1.FRM ("Format"). These files will be in the
|
|
directory: @*
|
|
/<datadir>/<database>/
|
|
@*
|
|
|
|
For example, if you use Linux, you might find the files here (assume
|
|
your database name is "test"): @*
|
|
/usr/local/var/test
|
|
@*
|
|
|
|
And if you use Windows, you might find the files in this directory: @*
|
|
\mysql\data\test\
|
|
@*@*
|
|
|
|
Let's look at the .MYD Data file (MyISAM SQL Data file) more closely.
|
|
|
|
@table @strong
|
|
@item Page Size
|
|
Unlike most DBMSs, MySQL doesn't store on disk using pages. Therefore
|
|
you will not see filler space between rows. (Reminder: This does not
|
|
refer to BDB and INNODB tables, which do use pages).
|
|
@*
|
|
|
|
@item Record Header
|
|
The minimal record header is a set of flags:
|
|
@itemize @bullet
|
|
@item
|
|
"X bit" = 0 if row is deleted, = 1 if row is not deleted
|
|
@item
|
|
"Null Bits" = 0 if column is not NULL, = 1 if column is NULL
|
|
@item
|
|
"Filler Bits" = 1
|
|
@end itemize
|
|
@end table
|
|
@*
|
|
|
|
Here's an example. Suppose you say:
|
|
@*
|
|
|
|
@strong{CREATE TABLE Table1 (column1 CHAR(1), column2 CHAR(1), column3 CHAR(1))}
|
|
@*
|
|
|
|
@strong{INSERT INTO Table1 VALUES ('a', 'b', 'c')}
|
|
@*
|
|
|
|
@strong{INSERT INTO Table1 VALUES ('d', NULL, 'e')}
|
|
@*
|
|
|
|
A CHAR(1) column takes precisely one byte (plus one bit of overhead
|
|
that is assigned to every column -- I'll describe the details of
|
|
column storage later). So the file Table1.MYD looks like this:
|
|
@*
|
|
|
|
@strong{Hexadecimal Display of Table1.MYD file}@*
|
|
@code{
|
|
F1 61 62 63 00 F5 64 00 66 00 ... .abc..d e.
|
|
}
|
|
@*
|
|
|
|
Here's how to read this hexadecimal-dump display:@*
|
|
@itemize @bullet
|
|
@item
|
|
The hexadecimal numbers @code{F1 61 62 63 00 F5 64 20 66 00} are byte
|
|
values and the column on the right is an attempt to show the
|
|
same bytes in ASCII.
|
|
@item
|
|
The @code{F1} byte means that there are no null fields in the first row.
|
|
@item
|
|
The @code{F5} byte means that the second column of the second row is NULL.
|
|
@end itemize
|
|
|
|
(It's probably easier to understand the flag setting if you restate
|
|
@code{F5} as @code{11110101 binary}, and (a) notice that the third flag bit from the
|
|
right is @code{on}, and (b) remember that the first flag bit is the X bit.)
|
|
@*
|
|
|
|
There are complications -- the record header is more complex if there
|
|
are variable-length fields -- but the simple display shown in the
|
|
example is exactly what you'd see if you took a debugger and looked
|
|
at the MySQL Data file.
|
|
@*
|
|
|
|
@section Physical Attributes of Columns
|
|
|
|
Next I'll describe the physical attributes of each column in a row.
|
|
The format depends entirely on the data type and the size of the
|
|
column, so, for every data type, I'll give a description and an example.
|
|
@*
|
|
|
|
@table @strong
|
|
@item The character data types
|
|
|
|
@strong{CHAR}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: fixed-length string with space padding on the right.
|
|
@item
|
|
Example: a CHAR(5) column containing the value 'A' looks like:@*
|
|
@code{hexadecimal 41 20 20 20 20} -- (length = 5, value = @code{'A '})
|
|
@end itemize
|
|
|
|
@strong{VARCHAR}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: variable-length string with a preceding length.
|
|
@item
|
|
Example: a VARCHAR(7) column containing 'A' looks like:@*
|
|
@code{hexadecimal 01 41} -- (length = 1, value = @code{'A'})
|
|
@end itemize
|
|
|
|
@item The numeric data types
|
|
|
|
Important: MySQL stores all multi-byte binary numbers with the
|
|
high byte first. This is called "little-endian" numeric storage;
|
|
it's normal on Intel x86 machines; MySQL uses it even for non-Intel
|
|
machines so that databases will be portable.
|
|
@*
|
|
|
|
@strong{TINYINT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: fixed-length binary, always one byte.
|
|
@item
|
|
Example: a TINYINT column containing 65 looks like:@*
|
|
@code{hexadecimal 41} -- (length = 1, value = 65)
|
|
@end itemize
|
|
|
|
@strong{SMALLINT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: fixed-length binary, always two bytes.
|
|
@item
|
|
Example: a SMALLINT column containing 65 looks like:@*
|
|
@code{hexadecimal 41 00} -- (length = 2, value = 65)
|
|
@end itemize
|
|
|
|
@strong{MEDIUMINT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: fixed-length binary, always three bytes.
|
|
@item
|
|
Example: a MEDIUMINT column containing 65 looks like:@*
|
|
@code{hexadecimal 41 00 00} -- (length = 3, value = 65)
|
|
@end itemize
|
|
|
|
@strong{INT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: fixed-length binary, always four bytes.
|
|
@item
|
|
Example: an INT column containing 65 looks like:@*
|
|
@code{hexadecimal 41 00 00 00} -- (length = 4, value = 65)
|
|
@end itemize
|
|
|
|
@strong{BIGINT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: fixed-length binary, always eight bytes.
|
|
@item
|
|
Example: a BIGINT column containing 65 looks like:@*
|
|
@code{hexadecimal 41 00 00 00 00 00 00 00} -- (length = 8, value = 65)
|
|
@end itemize
|
|
|
|
@strong{FLOAT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: fixed-length binary, always four bytes.
|
|
@item
|
|
Example: a FLOAT column containing approximately 65 looks like:@*
|
|
@code{hexadecimal 00 00 82 42} -- (length = 4, value = 65)
|
|
@end itemize
|
|
|
|
@strong{DOUBLE PRECISION}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: fixed-length binary, always eight bytes.
|
|
@item
|
|
Example: a DOUBLE PRECISION column containing approximately 65 looks like:@*
|
|
@code{hexadecimal 00 00 00 00 00 40 50 40} -- (length = 8, value = 65)
|
|
@end itemize
|
|
|
|
@strong{REAL}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: same as FLOAT, or same as DOUBLE PRECISION, depending on setting of the --ansi switch.
|
|
@end itemize
|
|
|
|
@strong{DECIMAL}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: fixed-length string, with a leading byte for the sign, if any.
|
|
@item
|
|
Example: a DECIMAL(2) column containing 65 looks like:@*
|
|
@code{hexadecimal 20 36 35} -- (length = 3, value = @code{' 65'})
|
|
@item
|
|
Example: a DECIMAL(2) UNSIGNED column containing 65 looks like:@*
|
|
@code{hexadecimal 36 35} -- (length = 2, value = @code{'65'})
|
|
@item
|
|
Example: a DECIMAL(4,2) UNSIGNED column containing 65 looks like:@*
|
|
@code{hexadecimal 36 35 2E 30 30} -- (length = 5, value = @code{'65.00'})
|
|
@end itemize
|
|
|
|
@strong{NUMERIC}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: same as DECIMAL.
|
|
@end itemize
|
|
|
|
@strong{BOOL}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: same as TINYINT.
|
|
@end itemize
|
|
|
|
@item The temporal data types
|
|
|
|
@strong{DATE}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: 3 byte integer, low byte first.
|
|
Packed as: 'day + month*32 + year*16*32'
|
|
@item
|
|
Example: a DATE column containing '1962-01-02' looks like:@*
|
|
@code{hexadecimal 22 54 0F}
|
|
@end itemize
|
|
|
|
@strong{DATETIME}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: eight bytes.
|
|
@item
|
|
Part 1 is a 32-bit integer containing year*10000 + month*100 + day.
|
|
@item
|
|
Part 2 is a 32-bit integer containing hour*10000 + minute*100 + second.
|
|
@item
|
|
Example: a DATETIME column for '0001-01-01 01:01:01' looks like:@*
|
|
@code{hexadecimal B5 2E 11 5A 02 00 00 00}
|
|
@end itemize
|
|
|
|
@strong{TIME}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: 3 bytes, low byte first.
|
|
This is stored as seconds: days*24*3600+hours*3600+minutes*60+seconds
|
|
@item
|
|
Example: a TIME column containing '1 02:03:04' (1 day 2 hour 3 minutes and 4 seconds) looks like:@*
|
|
@code{hexadecimal 58 6E 01}
|
|
@end itemize
|
|
|
|
@strong{TIMESTAMP}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: 4 bytes, low byte first.
|
|
Stored as unix @code{time()}, which is seconds since the Epoch
|
|
(00:00:00 UTC, January 1, 1970).
|
|
@item
|
|
Example: a TIMESTAMP column containing '2003-01-01 01:01:01' looks like:@*
|
|
@code{hexadecimal 4D AE 12 23}
|
|
@end itemize
|
|
|
|
@strong{YEAR}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: same as unsigned TINYINT with a base value of 0 = 1901.
|
|
@end itemize
|
|
|
|
@item Others
|
|
|
|
@strong{SET}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: one byte for each eight members in the set.
|
|
@item
|
|
Maximum length: eight bytes (for maximum 64 members).
|
|
@item
|
|
This is a bit list. The least significant bit corresponds to the
|
|
first listed member of the set.
|
|
@item
|
|
Example: a SET('A','B','C') column containing 'A' looks like:@*
|
|
@code{01} -- (length = 1, value = 'A')
|
|
@end itemize
|
|
|
|
@strong{ENUM}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: one byte if less than 256 alternatives, else two bytes.
|
|
@item
|
|
This is an index. The value 1 corresponds to the first listed
|
|
alternative. (Note: ENUM always reserves 0 for a blank '' value. This
|
|
explains why 'A' is 1 instead of 0.)
|
|
@item
|
|
Example: an ENUM('A','B','C') column containing 'A' looks like:@*
|
|
@code{01} -- (length = 1, value = 'A')
|
|
@end itemize
|
|
|
|
@item The Large-Object data types
|
|
|
|
Warning: Because TINYBLOB's preceding length is one byte long (the
|
|
size of a TINYINT) and MEDIUMBLOB's preceding length is three bytes
|
|
long (the size of a MEDIUMINT), it's easy to think there's some sort
|
|
of correspondence between the BLOB and the INT types. There isn't -- a
|
|
BLOB's preceding length is not four bytes long (the size of an INT).
|
|
@*
|
|
|
|
(NOTE TO SELF: BLOB storage has not been fully addressed here.)
|
|
@*
|
|
|
|
@strong{TINYBLOB}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: variable-length string with a preceding one-byte length.
|
|
@item
|
|
Example: a TINYBLOB column containing 'A' looks like:@*
|
|
@code{hexadecimal 01 41} -- (length = 2, value = 'A')
|
|
@end itemize
|
|
|
|
@strong{TINYTEXT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: same as TINYBLOB.
|
|
@end itemize
|
|
|
|
@strong{BLOB}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: variable-length string with a preceding two-byte length.
|
|
@item
|
|
Example: a BLOB column containing 'A' looks like:@*
|
|
@code{hexadecimal 01 00 41} -- (length = 2, value = 'A')
|
|
@end itemize
|
|
|
|
@strong{TEXT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: same as BLOB.
|
|
@end itemize
|
|
|
|
@strong{MEDIUMBLOB}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: variable-length string with a preceding length.
|
|
@item
|
|
Example: a MEDIUMBLOB column containing 'A' looks like:@*
|
|
@code{hexadecimal 01 00 00 41} -- (length = 4, value = 'A')
|
|
@end itemize
|
|
|
|
@strong{MEDIUMTEXT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: same as MEDIUMBLOB.
|
|
@end itemize
|
|
|
|
@strong{LONGBLOB}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: variable-length string with a preceding four-byte length.
|
|
@item
|
|
Example: a LONGBLOB column containing 'A' looks like:@*
|
|
@code{hexadecimal 01 00 00 00 41} -- (length = 5, value = 'A')
|
|
@end itemize
|
|
|
|
@strong{LONGTEXT}
|
|
@itemize @bullet
|
|
@item
|
|
Storage: same as LONGBLOB.
|
|
@end itemize
|
|
|
|
@end table
|
|
|
|
@section Where to Look For More Information
|
|
|
|
@strong{References:} @*
|
|
Most of the formatting work for MyISAM columns is visible
|
|
in the program /sql/field.cc in the source code directory.
|
|
@*
|
|
|
|
@node InnoDB Record Structure,InnoDB Page Structure,MyISAM Record Structure,Top
|
|
@chapter InnoDB Record Structure
|
|
|
|
This page contains:
|
|
@itemize @bullet
|
|
@item
|
|
A high-altitude "summary" picture of the parts of a MySQL/InnoDB
|
|
record structure.
|
|
@item
|
|
A description of each part.
|
|
@item
|
|
An example.
|
|
@end itemize
|
|
|
|
After reading this page, you will know how MySQL/InnoDB stores a
|
|
physical record.
|
|
@*
|
|
|
|
@section High-Altitude Picture
|
|
|
|
The chart below shows the three parts of a physical record.
|
|
|
|
@multitable @columnfractions .10 .35
|
|
|
|
@item @strong{Name} @tab @strong{Size}
|
|
@item Field Start Offsets
|
|
@tab (F*1) or (F*2) bytes
|
|
@item Extra Bytes
|
|
@tab 6 bytes
|
|
@item Field Contents
|
|
@tab depends on content
|
|
|
|
@end multitable
|
|
|
|
Legend: The letter 'F' stands for 'Number Of Fields'.
|
|
|
|
The meaning of the parts is as follows:
|
|
@itemize @bullet
|
|
@item
|
|
The FIELD START OFFSETS is a list of numbers containing the
|
|
information "where a field starts".
|
|
@item
|
|
The EXTRA BYTES is a fixed-size header.
|
|
@item
|
|
The FIELD CONTENTS contains the actual data.
|
|
@end itemize
|
|
|
|
@strong{An Important Note About The Word "Origin"}@*
|
|
The "Origin" or "Zero Point" of a record is the first byte of the
|
|
Field Contents -- not the first byte of the Field Start Offsets. If
|
|
there is a pointer to a record, that pointer is pointing to the
|
|
Origin. Therefore the first two parts of the record are addressed by
|
|
subtracting from the pointer, and only the third part is addressed by
|
|
adding to the pointer.
|
|
|
|
@subsection FIELD START OFFSETS
|
|
|
|
The Field Start Offsets is a list in which each entry is the
|
|
position, relative to the Origin, of the start of the next field. The
|
|
entries are in reverse order, that is, the first field's offset is at
|
|
the end of the list.
|
|
@*
|
|
|
|
An example: suppose there are three columns. The first column's length
|
|
is 1, the second column's length is 2, and the third column's length is 4.
|
|
In this case, the offset values are, respectively, 1, 3 (1+2), and 7 (1+2+4).
|
|
Because values are reversed, a core dump of the Field Start Offsets
|
|
would look like this: @code{07,03,01}.
|
|
@*
|
|
|
|
There are two complications for special cases:
|
|
@itemize @bullet
|
|
@item
|
|
Complication #1: The size of each offset can be either one byte or
|
|
two bytes. One-byte offsets are only usable if the total record size
|
|
is less than 127. There is a flag in the "Extra Bytes" part which will
|
|
tell you whether the size is one byte or two bytes.
|
|
@item
|
|
Complication #2: The most significant bits of an offset may contain
|
|
flag values. The next two paragraphs explain what the contents are.
|
|
@end itemize
|
|
|
|
@strong{When The Size Of Each Offset Is One Byte}
|
|
@itemize @bullet
|
|
@item
|
|
1 bit = 0 if field is non-NULL, = 1 if field is NULL
|
|
@item
|
|
7 bits = the actual offset, a number between 0 and 127
|
|
@end itemize
|
|
|
|
@strong{When The Size Of Each Offset Is Two Bytes}
|
|
@itemize @bullet
|
|
@item
|
|
1 bit = 0 if field is non-NULL, = 1 if field is NULL
|
|
@item
|
|
1 bit = 0 if field is on same page as offset, = 1 if field and offset are on different pages
|
|
@item
|
|
14 bits = the actual offset, a number between 0 and 16383
|
|
@end itemize
|
|
|
|
It is unlikely that the "field and offset are on different pages"
|
|
unless the record contains a large BLOB.
|
|
|
|
@subsection EXTRA BYTES
|
|
|
|
The Extra Bytes are a fixed six-byte header.
|
|
|
|
@multitable @columnfractions .10 .25 .35
|
|
|
|
@item @strong{Name} @tab @strong{Size} @tab @strong{Description}
|
|
@item @strong{info_bits:}
|
|
@item ()
|
|
@tab 1 bit
|
|
@tab unused or unknown
|
|
@item ()
|
|
@tab 1 bit
|
|
@tab unused or unknown
|
|
@item deleted_flag
|
|
@tab 1 bit
|
|
@tab 1 if record is deleted
|
|
@item min_rec_flag
|
|
@tab 1 bit
|
|
@tab 1 if record is predefined minimum record
|
|
@item n_owned
|
|
@tab 4 bits
|
|
@tab number of records owned by this record
|
|
@item heap_no
|
|
@tab 13 bits
|
|
@tab record's order number in heap of index page
|
|
@item n_fields
|
|
@tab 10 bits
|
|
@tab number of fields in this record, 1 to 1023
|
|
@item 1byte_offs_flag
|
|
@tab 1 bit
|
|
@tab 1 if each Field Start Offsets is 1 byte long (this item is also called the "short" flag)
|
|
@item @strong{next 16 bits}
|
|
@tab 16 bits
|
|
@tab pointer to next record in page
|
|
@item @strong{TOTAL}
|
|
@tab 48 bits
|
|
|
|
@end multitable
|
|
|
|
Total size is 48 bits, which is six bytes.
|
|
@*
|
|
|
|
If you're just trying to read the record, the key bit in the Extra
|
|
Bytes is 1byte_offs_flag -- you need to know if 1byte_offs_flag is 1
|
|
(i.e.: "short 1-byteoffsets") or 0 (i.e.: "2-byte offsets").
|
|
@*
|
|
|
|
Given a pointer to the Origin, InnoDB finds the start of the record as follows:
|
|
@itemize @bullet
|
|
@item
|
|
Let X = n_fields (the number of fields is by definition equal to the
|
|
number of entries in the Field Start Offsets Table).
|
|
@item
|
|
If 1byte_offs_flag equals 0, then let X = X * 2 because there are
|
|
two bytes for each entry instead of just one.
|
|
@item
|
|
Let X = X + 6, because the fixed size of Extra Bytes is 6.
|
|
@item
|
|
The start of the record is at (pointer value minus X).
|
|
@end itemize
|
|
|
|
@subsection FIELD CONTENTS
|
|
|
|
The Field Contents part of the record has all the data. Fields are
|
|
stored in the order they were defined in.
|
|
@*
|
|
|
|
There are no markers between fields, and there is no marker or filler
|
|
at the end of a record.
|
|
@*
|
|
|
|
Here's an example.
|
|
@itemize @bullet
|
|
@item
|
|
I made a table with this definition:
|
|
@*@*
|
|
|
|
@strong{CREATE TABLE T
|
|
(FIELD1 VARCHAR(3), FIELD2 VARCHAR(3), FIELD3 VARCHAR(3))
|
|
Type=InnoDB;}
|
|
@*@*
|
|
|
|
To understand what follows, you must know that table T has six columns
|
|
-- not three -- because InnoDB automatically added three "system
|
|
columns" at the start for its own housekeeping. It happens that these
|
|
system columns are the row ID, the transaction ID, and the rollback
|
|
pointer, but their values don't matter now. Regard them as three black
|
|
boxes.
|
|
@*@*
|
|
|
|
@item
|
|
I put some rows in the table. My last three INSERTs were:
|
|
@*@*
|
|
|
|
@strong{INSERT INTO T VALUES ('PP', 'PP', 'PP')}
|
|
@*@*
|
|
|
|
@strong{INSERT INTO T VALUES ('Q', 'Q', 'Q')}
|
|
@*@*
|
|
|
|
@strong{INSERT INTO T VALUES ('R', NULL, NULL)}
|
|
@*@*
|
|
|
|
@item
|
|
I ran Borland's TDUMP to get a hexadecimal dump of
|
|
the contents of \mysql\data\ibdata1, which (in my case) is the
|
|
MySQL/InnoDB data file (on Windows).
|
|
@end itemize
|
|
|
|
Here is an extract of the dump:
|
|
|
|
@multitable @columnfractions .05 .95
|
|
|
|
@item @strong{Address Values In Hexadecimal} @tab @strong{Values In ASCII}
|
|
@item @code{0D4280: 00 00 2D 00 84 4F 4F 4F 4F 4F 4F 4F 4F 4F 19 17}
|
|
@tab @code{..-..OOOOOOOOO..}
|
|
@item @code{0D4290: 15 13 0C 06 00 00 78 0D 02 BF 00 00 00 00 04 21}
|
|
@tab @code{......x........!}
|
|
@item @code{0D42A0: 00 00 00 00 09 2A 80 00 00 00 2D 00 84 50 50 50}
|
|
@tab @code{.....*....-..PPP}
|
|
@item @code{0D42B0: 50 50 50 16 15 14 13 0C 06 00 00 80 0D 02 E1 00}
|
|
@tab @code{PPP.............}
|
|
@item @code{0D42C0: 00 00 00 04 22 00 00 00 00 09 2B 80 00 00 00 2D}
|
|
@tab @code{....".....+....-}
|
|
@item @code{0D42D0: 00 84 51 51 51 94 94 14 13 0C 06 00 00 88 0D 00}
|
|
@tab @code{..QQQ...........}
|
|
@item @code{0D42E0: 74 00 00 00 00 04 23 00 00 00 00 09 2C 80 00 00}
|
|
@tab @code{t.....#.....,...}
|
|
@item @code{0D42F0: 00 2D 00 84 52 00 00 00 00 00 00 00 00 00 00 00}
|
|
@tab @code{.-..R...........}
|
|
|
|
@end multitable
|
|
|
|
A reformatted version of the dump, showing only the relevant bytes,
|
|
looks like this (I've put a line break after each field and added labels):
|
|
|
|
@strong{Reformatted Hexadecimal Dump}@*
|
|
@code{
|
|
19 17 15 13 0C 06 Field Start Offsets /* First Row */@*
|
|
00 00 78 0D 02 BF Extra Bytes@*
|
|
00 00 00 00 04 21 System Column #1@*
|
|
00 00 00 00 09 2A System Column #2@*
|
|
80 00 00 00 2D 00 84 System Column #3@*
|
|
50 50 Field1 'PP'@*
|
|
50 50 Field2 'PP'@*
|
|
50 50 Field3 'PP'}@*
|
|
|
|
@code{
|
|
16 15 14 13 0C 06 Field Start Offsets /* Second Row */@*
|
|
00 00 80 0D 02 E1 Extra Bytes@*
|
|
00 00 00 00 04 22 System Column #1@*
|
|
00 00 00 00 09 2B 80 System Column #2@*
|
|
00 00 00 2D 00 84 System Column #3@*
|
|
51 Field1 'Q'@*
|
|
51 Field2 'Q'@*
|
|
51 Field3 'Q'}@*
|
|
|
|
@code{
|
|
94 94 14 13 0C 06 Field Start Offsets /* Third Row */@*
|
|
00 00 88 0D 00 74 Extra Bytes@*
|
|
00 00 00 00 04 23 System Column #1@*
|
|
00 00 00 00 09 2C System Column #2@*
|
|
80 00 00 00 2D 00 84 System Column #3@*
|
|
52 Field1 'R'}@*
|
|
@*
|
|
|
|
You won't need explanation if you followed everything I've said, but
|
|
I'll add helpful notes for the three trickiest details.
|
|
@itemize @bullet
|
|
@item
|
|
Helpful Notes About "Field Start Offsets": @*
|
|
Notice that the sizes of the record's fields, in forward order, are:
|
|
6, 6, 7, 2, 2, 2. Since each offset is for the start of the "next"
|
|
field, the hexadecimal offsets are 06, 0c (6+6), 13 (6+6+7), 15
|
|
(6+6+7+2), 17 (6+6+7+2+2), 19 (6+6+7+2+2+2). Reversing the order, the
|
|
Field Start Offsets of the first record are: @code{19,17,15,13,0c,06}.
|
|
@item
|
|
Helpful Notes About "Extra Bytes": @*
|
|
Look at the Extra Bytes of the first record: @code{00 00 78 0D 02 BF}. The
|
|
fourth byte is @code{0D hexadecimal}, which is @code{1101 binary} ... the 110 is the
|
|
last bits of n_fields (@code{110 binary} is 6 which is indeed the number of
|
|
fields in the record) and the final 1 bit is 1byte_offs_flag. The
|
|
fifth and sixth bytes, which contain @code{02 BF}, constitute the "next"
|
|
field. Looking at the original hexadecimal dump, at address
|
|
@code{0D42BF} (which is position @code{02BF} within the page), you'll see the beginning bytes of
|
|
System Column #1 of the second row. In other words, the "next" field
|
|
points to the "Origin" of the following row.
|
|
@item
|
|
Helpful Notes About NULLs:@*
|
|
For the third row, I inserted NULLs in FIELD2 and FIELD3. Therefore in
|
|
the Field Start Offsets the top bit is @code{on} for these fields (the
|
|
values are @code{94 hexadecimal}, @code{94 hexadecimal}, instead of
|
|
@code{14 hexadecimal}, @code{14 hexadecimal}). And the row is
|
|
shorter because the NULLs take no space.
|
|
@end itemize
|
|
|
|
@section Where to Look For More Information
|
|
|
|
@strong{References:} @*
|
|
The most relevant InnoDB source-code files are rem0rec.c, rem0rec.ic,
|
|
and rem0rec.h in the rem ("Record Manager") directory.
|
|
|
|
@node InnoDB Page Structure,Files in MySQL Sources,InnoDB Record Structure,Top
|
|
@chapter InnoDB Page Structure
|
|
|
|
InnoDB stores all records inside a fixed-size unit which is commonly called a
|
|
"page" (though InnoDB sometimes calls it a "block" instead).
|
|
Currently all pages are the same size, 16KB.
|
|
@*
|
|
|
|
A page contains records, but it also contains headers and trailers.
|
|
I'll start this description with a high-altitude view of a page's parts,
|
|
then I'll describe each part of a page. Finally, I'll show an example. This
|
|
discussion deals only with the most common format, for the leaf page of a data file.
|
|
@*
|
|
|
|
@section High-Altitude View
|
|
|
|
An InnoDB page has seven parts:
|
|
@itemize @bullet
|
|
@item
|
|
Fil Header
|
|
@item
|
|
Page Header
|
|
@item
|
|
Infimum + Supremum Records
|
|
@item
|
|
User Records
|
|
@item
|
|
Free Space
|
|
@item
|
|
Page Directory
|
|
@item
|
|
Fil Trailer
|
|
@end itemize
|
|
|
|
As you can see, a page has two header/trailer pairs. The inner pair, "Page Header" and
|
|
"Page Directory", are mostly the concern of the \page program group,
|
|
while the outer pair, "Fil Header" and "Fil Trailer", are mostly the
|
|
concern of the \fil program group. The "Fil" header also goes goes by
|
|
the name of "File Page Header".
|
|
@*
|
|
|
|
Sandwiched between the headers and trailers, are the records and
|
|
the free (unused) space. A page always begins with two unchanging
|
|
records called the Infimum and the Supremum. Then come the user
|
|
records. Between the user records (which grow downwards) and the page
|
|
directory (which grows upwards) there is space for new records.
|
|
@*
|
|
|
|
@subsection Fil Header
|
|
|
|
The Fil Header has eight parts, as follows:
|
|
|
|
@multitable @columnfractions .10 .30 .35
|
|
|
|
@item @strong{Name} @tab @strong{Size} @tab @strong{Remarks}
|
|
@item FIL_PAGE_SPACE
|
|
@tab 4
|
|
@tab 4 ID of the space the page is in
|
|
@item FIL_PAGE_OFFSET
|
|
@tab 4
|
|
@tab ordinal page number from start of space
|
|
@item FIL_PAGE_PREV
|
|
@tab 4
|
|
@tab offset of previous page in key order
|
|
@item FIL_PAGE_NEXT
|
|
@tab 4
|
|
@tab offset of next page in key order
|
|
@item FIL_PAGE_LSN
|
|
@tab 8
|
|
@tab log serial number of page's latest log record
|
|
@item FIL_PAGE_TYPE
|
|
@tab 2
|
|
@tab current defined types are: FIL_PAGE_INDEX, FIL_PAGE_UNDO_LOG, FIL_PAGE_INODE, FIL_PAGE_IBUF_FREE_LIST
|
|
@item FIL_PAGE_FILE_FLUSH_LSN
|
|
@tab 8
|
|
@tab "the file has been flushed to disk at least up to this lsn" (log serial number),
|
|
valid only on the first page of the file
|
|
@item FIL_PAGE_ARCH_LOG_NO
|
|
@tab 4
|
|
@tab the latest archived log file number at the time that FIL_PAGE_FILE_FLUSH_LSN was written (in the log)
|
|
@end multitable
|
|
|
|
@itemize
|
|
@item
|
|
FIL_PAGE_SPACE is a necessary identifier because different pages might belong to
|
|
different (table) spaces within the same file. The word
|
|
"space" is generic jargon for either "log" or "tablespace".
|
|
@*@*
|
|
|
|
@item
|
|
FIL_PAGE_PREV and FIL_PAGE_NEXT are the page's "backward" and
|
|
"forward" pointers. To show what they're about, I'll draw a two-level
|
|
B-tree.
|
|
@*@*
|
|
|
|
@example
|
|
--------
|
|
- root -
|
|
--------
|
|
|
|
|
----------------------
|
|
| |
|
|
| |
|
|
-------- --------
|
|
- leaf - <--> - leaf -
|
|
-------- --------
|
|
@end example
|
|
@*
|
|
|
|
Everyone has seen a B-tree and knows that the entries in the root page
|
|
point to the leaf pages. (I indicate those pointers with vertical '|'
|
|
bars in the drawing.) But sometimes people miss the detail that leaf
|
|
pages can also point to each other (I indicate those pointers with a horizontal
|
|
two-way pointer '<-->' in the drawing). This feature allows InnoDB to navigate from
|
|
leaf to leaf without having to back up to the root level. This is a
|
|
sophistication which you won't find in the classic B-tree, which is
|
|
why InnoDB should perhaps be called a B+-tree instead.
|
|
@*@*
|
|
|
|
@item
|
|
The fields FIL_PAGE_FILE_FLUSH_LSN, FIL_PAGE_PREV, and FIL_PAGE_NEXT
|
|
all have to do with logs, so I'll refer you to my article "How Logs
|
|
Work With MySQL And InnoDB" on devarticles.com.
|
|
@*@*
|
|
|
|
@item
|
|
FIL_PAGE_FILE_FLUSH_LSN and FIL_PAGE_ARCH_LOG_NO are only valid for
|
|
the first page of a data file.
|
|
@end itemize
|
|
|
|
@subsection Page Header
|
|
|
|
The Page Header has 14 parts, as follows:
|
|
@*@*
|
|
|
|
@multitable @columnfractions .10 .20 .30
|
|
|
|
@item @strong{Name} @tab @strong{Size} @tab @strong{Remarks}
|
|
@item PAGE_N_DIR_SLOTS
|
|
@tab 2
|
|
@tab number of directory slots in the Page Directory part; initial value = 2
|
|
@item PAGE_HEAP_TOP
|
|
@tab 2
|
|
@tab record pointer to first record in heap
|
|
@item PAGE_N_HEAP
|
|
@tab 2
|
|
@tab number of heap records; initial value = 2
|
|
@item PAGE_FREE
|
|
@tab 2
|
|
@tab record pointer to first free record
|
|
@item PAGE_GARBAGE
|
|
@tab 2
|
|
@tab "number of bytes in deleted records"
|
|
@item PAGE_LAST_INSERT
|
|
@tab 2
|
|
@tab record pointer to the last inserted record
|
|
@item PAGE_DIRECTION
|
|
@tab 2
|
|
@tab either PAGE_LEFT, PAGE_RIGHT, or PAGE_NO_DIRECTION
|
|
@item PAGE_N_DIRECTION
|
|
@tab 2
|
|
@tab number of consecutive inserts in the same direction, e.g. "last 5 were all to the left"
|
|
@item PAGE_N_RECS
|
|
@tab 2
|
|
@tab number of user records
|
|
@item PAGE_MAX_TRX_ID
|
|
@tab 8
|
|
@tab the highest ID of a transaction which might have changed a record on the page (only set for secondary indexes)
|
|
@item PAGE_LEVEL
|
|
@tab 2
|
|
@tab level within the index (0 for a leaf page)
|
|
@item PAGE_INDEX_ID
|
|
@tab 8
|
|
@tab identifier of the index the page belongs to
|
|
@item PAGE_BTR_SEG_LEAF
|
|
@tab 10
|
|
@tab "file segment header for the leaf pages in a B-tree" (this is irrelevant here)
|
|
@item PAGE_BTR_SEG_TOP
|
|
@tab 10
|
|
@tab "file segment header for the non-leaf pages in a B-tree" (this is irrelevant here)
|
|
|
|
@end multitable
|
|
@*
|
|
|
|
(Note: I'll clarify what a "heap" is when I discuss the User Records part of the page.)
|
|
@*@*
|
|
|
|
Some of the Page Header parts require further explanation:
|
|
@itemize @bullet
|
|
@item
|
|
PAGE_FREE: @*
|
|
Records which have been freed (due to deletion or migration) are in a
|
|
one-way linked list. The PAGE_FREE pointer in the page header points
|
|
to the first record in the list. The "next" pointer in the record
|
|
header (specifically, in the record's Extra Bytes) points to the next
|
|
record in the list.
|
|
@item
|
|
PAGE_DIRECTION and PAGE_N_DIRECTION: @*
|
|
It's useful to know whether inserts are coming in a constantly
|
|
ascending sequence. That can affect InnoDB's efficiency.
|
|
@item
|
|
PAGE_HEAP_TOP and PAGE_FREE and PAGE_LAST_INSERT: @*
|
|
Warning: Like all record pointers, these point not to the beginning of the
|
|
record but to its Origin (see the earlier discussion of Record
|
|
Structure).
|
|
@item
|
|
PAGE_BTR_SEG_LEAF and PAGE_BTR_SEG_TOP: @*
|
|
These variables contain information (space ID, page number, and byte offset) about
|
|
index node file segments. InnoDB uses the information for allocating new pages.
|
|
There are two different variables because InnoDB allocates separately for leaf
|
|
pages and upper-level pages.
|
|
@end itemize
|
|
|
|
@subsection The Infimum And Supremum Records
|
|
|
|
"Infimum" and "supremum" are real English words but they are found
|
|
only in arcane mathematical treatises, and in InnoDB comments. To
|
|
InnoDB, an infimum is lower than the the lowest possible real value
|
|
(negative infinity) and a supremum is greater than the greatest
|
|
possible real value (positive infinity). InnoDB sets up an infimum
|
|
record and a supremum record automatically at page-create time, and
|
|
never deletes them. They make a useful barrier to navigation so that
|
|
"get-prev" won't pass the beginning and "get-next" won't pass the end.
|
|
Also, the infimum record can be a dummy target for temporary record
|
|
locks.
|
|
@*@*
|
|
|
|
The InnoDB code comments distinguish between "the infimum and supremum
|
|
records" and the "user records" (all other kinds).
|
|
@*@*
|
|
|
|
It's sometimes unclear whether InnoDB considers the infimum and
|
|
supremum to be part of the header or not. Their size is fixed and
|
|
their position is fixed, so I guess so.
|
|
|
|
@subsection User Records
|
|
|
|
In the User Records part of a page, you'll find all the records that the user
|
|
inserted.
|
|
@*@*
|
|
|
|
There are two ways to navigate through the user records, depending
|
|
whether you want to think of their organization as an unordered or an
|
|
ordered list.
|
|
@*@*
|
|
|
|
An unordered list is often called a "heap". If you make a pile of
|
|
stones by saying "whichever one I happen to pick up next will go on
|
|
top" -- rather than organizing them according to size and colour --
|
|
then you end up with a heap. Similarly, InnoDB does not want to insert
|
|
new rows according to the B-tree's key order (that would involve
|
|
expensive shifting of large amounts of data), so it inserts new rows
|
|
right after the end of the existing rows (at the
|
|
top of the Free Space part) or wherever there's space left by a
|
|
deleted row.
|
|
@*@*
|
|
|
|
But by definition the records of a B-tree must be accessible in order
|
|
by key value, so there is a record pointer in each record (the "next"
|
|
field in the Extra Bytes) which points to the next record in key
|
|
order. In other words, the records are a one-way linked list. So
|
|
InnoDB can access rows in key order when searching.
|
|
|
|
@subsection Free Space
|
|
|
|
I think it's clear what the Free Space part of a page is, from the discussion of
|
|
other parts.
|
|
|
|
@subsection Page Directory
|
|
|
|
The Page Directory part of a page has a variable number of record pointers.
|
|
Sometimes the record pointers are called "slots" or "directory slots".
|
|
Unlike other DBMSs, InnoDB does not have a slot for every record in
|
|
the page. Instead it keeps a sparse directory. In a fullish page,
|
|
there will be one slot for every six records.
|
|
@*@*
|
|
|
|
The slots track the records' logical order (the order by key rather
|
|
than the order by placement on the heap). Therefore, if the records
|
|
are @code{'A' 'B' 'F' 'D'} the slots will be @code{(pointer to 'A') (pointer to
|
|
'B') (pointer to 'D') (pointer to 'F')}. Because the slots are in key
|
|
order, and each slot has a fixed size, it's easy to do a binary
|
|
search of the records on the page via the slots.
|
|
@*@*
|
|
|
|
(Since the Page Directory does not have a slot for every record,
|
|
binary search can only give a rough position and then InnoDB must
|
|
follow the "next" record pointers. InnoDB's "sparse slots" policy also
|
|
accounts for the n_owned field in the Extra Bytes part of a record:
|
|
n_owned indicates how many more records must be gone through because
|
|
they don't have their own slots.)
|
|
|
|
@subsection Fil Trailer
|
|
|
|
The Fil Trailer has one part, as follows:
|
|
@*@*
|
|
|
|
@multitable @columnfractions .10 .35 .40
|
|
|
|
@item @strong{Name} @tab @strong{Size} @tab @strong{Remarks}
|
|
@item FIL_PAGE_END_LSN
|
|
@tab 8
|
|
@tab low 4 bytes = checksum of page, last 4 bytes = same as FIL_PAGE_LSN
|
|
@end multitable
|
|
@*
|
|
|
|
The final part of a page, the fil trailer (or File Page Trailer),
|
|
exists because InnoDB's architect worried about integrity. It's
|
|
impossible for a page to be only half-written, or corrupted by
|
|
crashes, because the log-recovery mechanism restores to a consistent
|
|
state. But if something goes really wrong, then it's nice to have a
|
|
checksum, and to have a value at the very end of the page which must
|
|
be the same as a value at the very beginning of the page.
|
|
|
|
@section Example
|
|
|
|
For this example, I used Borland's TDUMP again, as I did for the earlier chapter on
|
|
Record Format. This is what a page looked like:
|
|
@*@*
|
|
|
|
@multitable @columnfractions .05 .95
|
|
|
|
@item @strong{Address Values In Hexadecimal} @tab @strong{Values In ASCII}
|
|
@item @code{0D4000: 00 00 00 00 00 00 00 35 FF FF FF FF FF FF FF FF}
|
|
@tab @code{.......5........}
|
|
@item @code{0D4010: 00 00 00 00 00 00 E2 64 45 BF 00 00 00 00 00 00}
|
|
@tab @code{.......dE.......}
|
|
@item @code{0D4020: 00 00 00 00 00 00 00 05 02 F5 00 12 00 00 00 00}
|
|
@tab @code{................}
|
|
@item @code{0D4030: 02 E1 00 02 00 0F 00 10 00 00 00 00 00 00 00 00}
|
|
@tab @code{................}
|
|
@item @code{0D4040: 00 00 00 00 00 00 00 00 00 14 00 00 00 00 00 00}
|
|
@tab @code{................}
|
|
@item @code{0D4050: 00 02 16 B2 00 00 00 00 00 00 00 02 15 F2 08 01}
|
|
@tab @code{................}
|
|
@item @code{0D4060: 00 00 03 00 89 69 6E 66 69 6D 75 6D 00 09 05 00}
|
|
@tab @code{.....infimum....}
|
|
@item @code{0D4070: 08 03 00 00 73 75 70 72 65 6D 75 6D 00 22 1D 18}
|
|
@tab @code{....supremum."..}
|
|
@item @code{0D4080: 13 0C 06 00 00 10 0D 00 B7 00 00 00 00 04 14 00}
|
|
@tab @code{................}
|
|
@item @code{0D4090: 00 00 00 09 1D 80 00 00 00 2D 00 84 41 41 41 41}
|
|
@tab @code{.........-..AAAA}
|
|
@item @code{0D40A0: 41 41 41 41 41 41 41 41 41 41 41 1F 1B 17 13 0C}
|
|
@tab @code{AAAAAAAAAAA.....}
|
|
@item @code{ ... }
|
|
@item @code{ ... }
|
|
@item @code{0D7FE0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 74}
|
|
@tab @code{...............t}
|
|
@item @code{0D7FF0: 02 47 01 AA 01 0A 00 65 3A E0 AA 71 00 00 E2 64}
|
|
@tab @code{.G.....e:..q...d}
|
|
@end multitable
|
|
@*@*
|
|
|
|
Let's skip past the first 38 bytes, which are Fil Header. The bytes
|
|
of the Page Header start at location @code{0d4026 hexadecimal}:
|
|
@*@*
|
|
|
|
@multitable @columnfractions .10 .45 .60
|
|
|
|
@item @strong{Location} @tab @strong{Name} @tab @strong{Description}
|
|
@item @code{00 05}
|
|
@tab PAGE_N_DIR_SLOTS
|
|
@tab There are 5 directory slots.
|
|
@item @code{02 F5}
|
|
@tab PAGE_HEAP_TOP
|
|
@tab At location @code{0402F5}, not shown, is the beginning of free space.
|
|
Maybe a better name would have been PAGE_HEAP_END
|
|
@item @code{00 12}
|
|
@tab PAGE_N_HEAP
|
|
@tab There are 18 (hexadecimal 12) records in the page.
|
|
@item @code{00 00}
|
|
@tab PAGE_FREE
|
|
@tab There are zero free (deleted) records.
|
|
@item @code{00 00}
|
|
@tab PAGE_GARBAGE
|
|
@tab There are zero bytes in deleted records.
|
|
@item @code{02 E1}
|
|
@tab PAGE_LAST_INSERT
|
|
@tab The last record was inserted at location @code{02E1}, not shown, within the page.
|
|
@item @code{00 02}
|
|
@tab PAGE_DIRECTION
|
|
@tab A glance at page0page.h will tell you that 2 is the #defined value for PAGE_RIGHT.
|
|
@item @code{00 0F}
|
|
@tab PAGE_N_DIRECTION
|
|
@tab The last 15 (hexadecimal 0F) inserts were all done "to the right"
|
|
because I was inserting in ascending order.
|
|
@item @code{00 10}
|
|
@tab PAGE_N_RECS
|
|
@tab There are 16 (hexadecimal 10) user records. Notice that PAGE_N_RECS is
|
|
smaller than the earlier field, PAGE_N_HEAP.
|
|
@item @code{00 00 00 00 00 00 00}
|
|
@tab PAGE_MAX_TRX_ID
|
|
@item @code{00 00}
|
|
@tab PAGE_LEVEL
|
|
@tab Zero because this is a leaf page.
|
|
@item @code{00 00 00 00 00 00 00 14}
|
|
@tab PAGE_INDEX_ID
|
|
@tab This is index number 20.
|
|
@item @code{00 00 00 00 00 00 00 02 16 B2}
|
|
@tab PAGE_BTR_SEG_LEAF
|
|
@item @code{00 00 00 00 00 00 00 02 15 F2}
|
|
@tab PAGE_BTR_SEG_TOP
|
|
@end multitable
|
|
@*
|
|
|
|
Immediately after the page header are the infimum and supremum
|
|
records. Looking at the "Values In ASCII" column in the hexadecimal
|
|
dump, you will see that the contents are in fact the words "infimum"
|
|
and "supremum" respectively.
|
|
@*@*
|
|
|
|
Skipping past the User Records and the Free Space, many bytes later,
|
|
is the end of the 16KB page. The values shown there are the two trailers.
|
|
@itemize @bullet
|
|
@item
|
|
The first trailer (@code{00 74, 02 47, 01 AA, 01 0A, 00 65}) is the page
|
|
directory. It has 5 entries, because the header field PAGE_N_DIR_SLOTS
|
|
says there are 5.
|
|
@item
|
|
The next trailer (@code{3A E0 AA 71, 00 00 E2 64}) is the fil trailer. Notice
|
|
that the last four bytes, @code{00 00 E2 64}, appeared before in the fil
|
|
header.
|
|
@end itemize
|
|
|
|
@section Where to Look For More Information
|
|
|
|
@strong{References:} @*
|
|
The most relevant InnoDB source-code files are page0page.c,
|
|
page0page.ic, and page0page.h in \page directory.
|
|
|
|
@node Files in MySQL Sources,Files in InnoDB Sources,InnoDB Page Structure,Top
|
|
@chapter Annotated List Of Files in the MySQL Source Code Distribution
|
|
|
|
This is a description of the files that you get when you download the
|
|
source code of MySQL. This description begins with a list
|
|
of the 43 directories and a short comment about each one. Then, for
|
|
each directory, in alphabetical order, a longer description is
|
|
supplied. When a directory contains significant program files, a list of each C
|
|
program is given along with an explanation of its intended function.
|
|
|
|
@section Directory Listing
|
|
|
|
@strong{Directory -- Short Comment}
|
|
@itemize @bullet
|
|
@item
|
|
bdb -- The Berkeley Database table handler
|
|
@item
|
|
BitKeeper -- BitKeeper administration
|
|
@item
|
|
BUILD -- Build switches
|
|
@item
|
|
Build-tools -- Build tools
|
|
@item
|
|
client -- Client library
|
|
@item
|
|
cmd-line-utils -- Command-line utilities
|
|
@item
|
|
dbug -- Fred Fish's dbug library
|
|
@item
|
|
div -- Deadlock test
|
|
@item
|
|
Docs -- Preliminary documents about internals and new modules
|
|
@item
|
|
extra -- Eight minor standalone utility programs
|
|
@item
|
|
fs -- File System
|
|
@item
|
|
heap -- The HEAP table handler
|
|
@item
|
|
Images -- Empty directory
|
|
@item
|
|
include -- Include (*.h) files
|
|
@item
|
|
innobase -- The Innobase (InnoDB) table handler
|
|
@item
|
|
isam -- The ISAM (MySQL) table handler
|
|
@item
|
|
libmysql -- For producing MySQL as a library (e.g. a Windows DLL)
|
|
@item
|
|
libmysql_r -- Only one file, a makefile
|
|
@item
|
|
libmysqld -- The MySQL Library
|
|
@item
|
|
man -- Manual pages
|
|
@item
|
|
merge -- The MERGE table handler (see Reference Manual section 7.2)
|
|
@item
|
|
myisam -- The MyISAM table handler
|
|
@item
|
|
myisammrg -- The MyISAM Merge table handler
|
|
@item
|
|
mysql-test -- A test suite for mysqld
|
|
@item
|
|
mysys -- MySQL system library (Low level routines for file access
|
|
etc.)
|
|
@item
|
|
netware -- Files related to the Novell NetWare version of MySQL
|
|
@item
|
|
NEW-RPMS -- New "RPM Package Manager" files
|
|
@item
|
|
os2 -- Routines for working with the OS/2 operating system
|
|
@item
|
|
pstack -- Process stack display
|
|
@item
|
|
regex -- Regular Expression library for support of REGEXP function
|
|
@item
|
|
repl-tests -- Test cases for replication
|
|
@item
|
|
SCCS -- Source Code Control System
|
|
@item
|
|
scripts -- SQL batches, e.g. for converting msql to MySQL
|
|
@item
|
|
sql -- Programs for handling SQL commands. The "core" of MySQL
|
|
@item
|
|
sql-bench -- The MySQL benchmarks
|
|
@item
|
|
SSL -- Secure Sockets Layer
|
|
@item
|
|
strings -- Library for C string routines, e.g. atof, strchr
|
|
@item
|
|
support-files -- 15 files used for building, containing switches?
|
|
@item
|
|
tests -- Tests in Perl
|
|
@item
|
|
tools -- mysqlmanager.c
|
|
@item
|
|
VC++Files -- Includes this entire directory, repeated for VC++
|
|
(Windows) use
|
|
@item
|
|
vio -- Virtual I/O Library
|
|
@item
|
|
zlib -- data compression library
|
|
@end itemize
|
|
|
|
@subsection bdb
|
|
|
|
The Berkeley Database table handler.
|
|
@*@*
|
|
|
|
The Berkeley Database (BDB) is maintained by Sleepycat Software.
|
|
@*@*
|
|
|
|
The documentation for BDB is available at
|
|
http://www.sleepycat.com/docs/. Since it's reasonably thorough
|
|
documentation, a description of the BDB program files is not included
|
|
in this document.
|
|
@*@*
|
|
|
|
@subsection BitKeeper
|
|
|
|
BitKeeper administration.
|
|
@*@*
|
|
|
|
This directory may be present if you downloaded the MySQL source using
|
|
BitKeeper rather than via the mysql.com site. The files in the
|
|
BitKeeper directory are for maintenance purposes only -- they are not
|
|
part of the MySQL package.
|
|
@*@*
|
|
|
|
@subsection BUILD
|
|
|
|
Build switches.
|
|
@*@*
|
|
|
|
This directory contains the build switches for compilation on various
|
|
platforms. There is a subdirectory for each set of options. The main
|
|
ones are:
|
|
@itemize @bullet
|
|
@item
|
|
alpha
|
|
@item
|
|
ia64
|
|
@item
|
|
pentium (with and without debug or bdb, etc.)
|
|
@item
|
|
solaris
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection Build-tools
|
|
|
|
Build tools.
|
|
@*@*
|
|
|
|
This directory contains batch files for extracting, making
|
|
directories, and making programs from source files. There are several
|
|
subdirectories -- for building Linux executables, for compiling, for
|
|
performing all build steps, etc.
|
|
@*@*
|
|
|
|
@subsection client
|
|
|
|
Client library.
|
|
@*@*
|
|
|
|
The client library includes mysql.cc (the source of the 'mysql'
|
|
executable) and other utilities. Most of the utilities are mentioned
|
|
in the MySQL Reference Manual. Generally these are standalone C
|
|
programs which one runs in "client mode", that is, they call the
|
|
server.
|
|
@*@*
|
|
|
|
The C program files in the directory are:
|
|
@itemize @bullet
|
|
@item
|
|
connect_test.c -- test that a connect is possible
|
|
@item
|
|
get_password.c -- ask for a password from the console
|
|
@item
|
|
insert_test.c -- test that an insert is possible
|
|
@item
|
|
list_test.c -- test that a select is possible
|
|
@item
|
|
mysql.cc -- "The MySQL command tool"
|
|
@item
|
|
mysqladmin.c -- maintenance of MYSQL databases
|
|
@item
|
|
mysqlcheck.c -- check all databases, check connect, etc.
|
|
@item
|
|
mysqldump.c -- dump table's contents in ascii
|
|
@item
|
|
mysqlimport.c -- import file into a table
|
|
@item
|
|
mysqlmanager-pwgen.c -- pwgen seems to stand for "password
|
|
generation"
|
|
@item
|
|
mysqlmanagerc.c -- entry point for mysql manager
|
|
@item
|
|
mysqlshow.c -- show databases, tables or columns
|
|
@item
|
|
mysqltest.c -- test program
|
|
@item
|
|
password.c -- password checking routines
|
|
@item
|
|
select_test.c -- test that a select is possible
|
|
@item
|
|
showdb_test.c -- test that a show-databases is possible
|
|
@item
|
|
ssl_test.c -- test that SSL is possible
|
|
@item
|
|
thread_test.c -- test that threading is possible
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection cmd-line-utils
|
|
|
|
Command-line utilities.
|
|
@*@*
|
|
|
|
There are two subdirectories: \readline and \libedit. All the files
|
|
here are "non-MYSQL" files, in the sense that MySQL AB didn't produce
|
|
them, it just uses them. It should be unnecessary to study the
|
|
programs in these files unless
|
|
you are writing or debugging a tty-like client for MySQL, such as
|
|
mysql.exe.
|
|
@*@*
|
|
|
|
The \readline subdirectory contains the files of the GNU Readline
|
|
Library, "a library for reading lines of text with interactive input
|
|
and history editing". The programs are copyrighted by the Free
|
|
Software Foundation.
|
|
@*@*
|
|
|
|
The \libedit (library of edit functions) subdirectory has files
|
|
written by Christos Zoulas. They are for editing the line contents.
|
|
These are the program files in the \libedit subdirectory:
|
|
@itemize @bullet
|
|
@item
|
|
chared.c -- character editor
|
|
@item
|
|
common.c -- common editor functions
|
|
@item
|
|
el.c -- editline interface functions
|
|
@item
|
|
emacs.c -- emacs functions
|
|
@item
|
|
fgetln.c -- get line
|
|
@item
|
|
hist.c -- history access functions
|
|
@item
|
|
history.c -- more history access functions
|
|
@item
|
|
key.c -- procedures for maintaining the extended-key map
|
|
@item
|
|
map.c -- editor function definitions
|
|
@item
|
|
parse.c -- parse an editline extended command
|
|
@item
|
|
prompt.c -- prompt printing functions
|
|
@item
|
|
read.c -- terminal read functions
|
|
@item
|
|
readline.c -- read line
|
|
@item
|
|
refresh.c -- "lower level screen refreshing functions"
|
|
@item
|
|
search.c -- "history and character search functions"
|
|
@item
|
|
sig.c -- for signal handling
|
|
@item
|
|
strlcpy.c -- string copy
|
|
@item
|
|
term.c -- "editor/termcap-curses interface"
|
|
@item
|
|
tokenizer.c -- Bourne shell line tokenizer
|
|
@item
|
|
tty.c -- for a tty interface
|
|
@item
|
|
vi.c -- commands used when in the vi (editor) mode
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection dbug
|
|
|
|
Fred Fish's dbug library.
|
|
@*@*
|
|
|
|
This is not really part of the MySQL package. Rather, it's a set of
|
|
public-domain routines which are useful for debugging MySQL programs.
|
|
@*@*
|
|
|
|
How it works: One inserts a function call that begins with DBUG_* in
|
|
one of the regular MYSQL programs. For example, in get_password.c, you
|
|
will find this line: @*
|
|
DBUG_ENTER("get_tty_password"); @*
|
|
at the start of a routine, and this line: @*
|
|
DBUG_RETURN(my_strdup(to,MYF(MY_FAE))); @*
|
|
at the end of the routine. These lines don't affect production code.
|
|
Features of the dbug library include profiling and state pushing.
|
|
@*@*
|
|
|
|
The C programs in this directory are:
|
|
@itemize @bullet
|
|
@item
|
|
dbug.c -- The main module
|
|
@item
|
|
dbug_analyze.c -- Reads a file produced by trace functions
|
|
@item
|
|
example1.c -- A tiny example
|
|
@item
|
|
example2.c -- A tiny example
|
|
@item
|
|
example3.c -- A tiny example
|
|
@item
|
|
factorial.c -- A tiny example
|
|
@item
|
|
main.c -- A tiny example
|
|
@item
|
|
sanity.c -- Declaration of a variable
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection div
|
|
|
|
Deadlock test.
|
|
@*@*
|
|
|
|
This file contains only one program, deadlock_test.c.
|
|
@*@*
|
|
|
|
@subsection Docs
|
|
|
|
Preliminary documents about internals and new modules.
|
|
@*@*
|
|
|
|
This directory doesn't have much at present that's very useful to the
|
|
student, but the plan is that some documentation related to the source
|
|
files and the internal workings of MySQL, including perhaps some
|
|
documentation from developers themselves, will be placed here.
|
|
@*@*
|
|
|
|
These sub-directories are part of this directory:
|
|
@itemize @bullet
|
|
@item
|
|
books -- .gif images and empty .txt files; no real information
|
|
@item
|
|
flags -- images of flags of countries
|
|
@item
|
|
images -- flag backgrounds and the MySQL dolphin logo
|
|
@item
|
|
mysql-logos -- more MySQL-related logos, some of them moving
|
|
@item
|
|
raw-flags -- more country flags, all .gif files
|
|
@item
|
|
support -- various files for generating texinfo/docbook
|
|
documentation
|
|
@item
|
|
to-be-included... -- an empty subdirectory
|
|
@item
|
|
translations -- some Portuguese myodbc documentation
|
|
@end itemize
|
|
@*@*
|
|
|
|
In the main directory, you'll find some .txt files related to the
|
|
methods that MySQL uses to produce its printed and html documents, odd
|
|
bits in various languages, and the single file in the directory which
|
|
has any importance -- internals.texi -- The "MySQL Internals"
|
|
document.
|
|
@*@*
|
|
|
|
Despite the name, internals.texi is not really much of a description
|
|
of MySQL internals. However, there is some useful description of the
|
|
functions in the mysys directory (see below), and of the structure of
|
|
client/server messages (doubtless very useful for people who want to
|
|
make their own JDBC drivers, or just sniff).
|
|
@*@*
|
|
|
|
@subsection extra
|
|
|
|
Eight minor standalone utility programs.
|
|
@*@*
|
|
|
|
These eight programs are all standalone utilities, that is, they have
|
|
a main() function and their main role is to show information that the
|
|
MySQL server needs or produces. Most are unimportant. They are as
|
|
follows:
|
|
@itemize @bullet
|
|
@item
|
|
my_print_defaults.c -- print all parameters in a default file
|
|
@item
|
|
mysql_install.c -- startup: install MySQL server
|
|
@item
|
|
mysql_waitpid.c -- wait for a program to terminate
|
|
@item
|
|
perror.c -- "print error" -- given error number, display message
|
|
@item
|
|
replace.c -- replace strings in text files
|
|
@item
|
|
resolve_stack_dump.c -- show symbolic info from a stack dump
|
|
@item
|
|
resolveip.c -- convert an IP address to a hostname, or vice versa
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection fs
|
|
|
|
File System.
|
|
@*@*
|
|
|
|
Here the word "File System" does not refer to the mere idea of a
|
|
directory of files on a disk drive, but to object-based access. The
|
|
concept has been compared with Oracle's Internet File System (iFS).
|
|
@*@*
|
|
|
|
The original developer of the files on this directory is Tonu Samuel,
|
|
a former MySQL AB employee. Here is a quote (somewhat edited) from
|
|
Tonu Samuel's web page (http://no.spam.ee/~tonu/index.php):
|
|
"Question: What is it?
|
|
Answer: Actually this is not filesystem in common terms. MySQL FS
|
|
makes it possible to make SQL tables and some functions available over
|
|
a filesystem. MySQL does not require disk space, it uses an ordinary
|
|
MySQL daemon to store data."
|
|
The descriptions imply that this is a development project.
|
|
@*@*
|
|
|
|
There are four program files in the directory:
|
|
@itemize @bullet
|
|
@item
|
|
database.c -- "emulate filesystem behaviour on top of SQL database"
|
|
@item
|
|
libmysqlfs.c -- Search/replace, show-functions, and parse routines
|
|
@item
|
|
mysqlcorbafs.c -- Connection with the CORBA "Object Request Broker"
|
|
@item
|
|
mysqlcorbafs_test.c -- Utility to test the working of mysqlcorbafs.c
|
|
|
|
@*@*
|
|
|
|
@subsection heap
|
|
|
|
The HEAP table handler.
|
|
@*@*
|
|
|
|
All the MySQL table handlers (i.e. the handlers that MySQL itself
|
|
produces) have files with similar names and functions. Thus, this
|
|
(heap) directory contains a lot of duplication of the myisam directory
|
|
(for the MyISAM table handler). Such duplicates have been marked with
|
|
an "*" in the following list. For example, you will find that
|
|
\heap\hp_extra.c has a close equivalent in the myisam directory
|
|
(\myisam\mi_extra.c) with the same descriptive comment.
|
|
@*@*
|
|
|
|
@item
|
|
hp_block.c -- Read/write a block (i.e. a page)
|
|
@item
|
|
hp_clear.c -- Remove all records in the database
|
|
@item
|
|
hp_close.c -- * close database
|
|
@item
|
|
hp_create.c -- * create a table
|
|
@item
|
|
hp_delete.c -- * delete a row
|
|
@item
|
|
hp_extra.c -- * for setting options and buffer sizes when optimizing
|
|
@item
|
|
hp_hash.c -- Hash functions used for saving keys
|
|
@item
|
|
hp_info.c -- * Information about database status
|
|
@item
|
|
hp_open.c -- * open database
|
|
@item
|
|
hp_panic.c -- * the hp_panic routine, probably for sudden shutdowns
|
|
@item
|
|
hp_rename.c -- * rename a table
|
|
@item
|
|
hp_rfirst.c -- * read first row through a specific key (very short)
|
|
@item
|
|
hp_rkey.c -- * read record using a key
|
|
@item
|
|
hp_rlast.c -- * read last row with same key as previously-read row
|
|
@item
|
|
hp_rnext.c -- * read next row with same key as previously-read row
|
|
@item
|
|
hp_rprev.c -- * read previous row with same key as previously-read
|
|
row
|
|
@item
|
|
hp_rrnd.c -- * read a row based on position
|
|
@item
|
|
hp_rsame.c -- * find current row using positional read or key-based
|
|
read
|
|
@item
|
|
hp_scan.c -- * read all rows sequentially
|
|
@item
|
|
hp_static.c -- * static variables (very short)
|
|
@item
|
|
hp_test1.c -- * testing basic functions
|
|
@item
|
|
hp_test2.c -- * testing database and storing results
|
|
@item
|
|
hp_update.c -- * update an existing row
|
|
@item
|
|
hp_write.c -- * insert a new row
|
|
@end itemize
|
|
@*@*
|
|
|
|
There are fewer files in the heap directory than in the myisam
|
|
directory, because fewer are necessary. For example, there is no need
|
|
for a \myisam\mi_cache.c equivalent (to cache reads) or a
|
|
\myisam\log.c equivalent (to log statements).
|
|
@*@*
|
|
|
|
@subsection Images
|
|
|
|
Empty directory.
|
|
@*@*
|
|
|
|
There are no files in this directory.
|
|
@*@*
|
|
|
|
@subsection include
|
|
|
|
Include (*.h) files.
|
|
@*@*
|
|
|
|
These files may be included in C program files. Note that each
|
|
individual directory will also have its own *.h files, for including
|
|
in its own *.c programs. The *.h files in the include directory are
|
|
ones that might be included from more than one place.
|
|
@*@*
|
|
|
|
For example, the mysys directory contains a C file named rijndael.c,
|
|
but does not include rijndael.h. The include directory contains
|
|
rijndael.h. Looking further, you'll find that rijndael.h is also
|
|
included in other places: by my_aes.c and my_aes.h.
|
|
@*@*
|
|
|
|
The include directory contains 51 *.h (include) files.
|
|
@*@*
|
|
|
|
@subsection innobase
|
|
|
|
The Innobase (InnoDB) table handler.
|
|
@*@*
|
|
|
|
A full description of these files can be found elsewhere in this
|
|
document.
|
|
@*@*
|
|
|
|
@subsection isam
|
|
|
|
The ISAM table handler.
|
|
@*@*
|
|
|
|
The C files in this directory are:
|
|
@itemize @bullet
|
|
@item
|
|
_cache.c -- for reading records from a cache
|
|
@item
|
|
changed.c -- a single routine for setting a "changed" flag (very
|
|
short)
|
|
@item
|
|
close.c -- close database
|
|
@item
|
|
create.c -- create a table
|
|
@item
|
|
_dbug.c -- support routines for use with "dbug" (see the \dbug
|
|
description)
|
|
@item
|
|
delete.c -- delete a row
|
|
@item
|
|
_dynrec.c -- functions to handle space-packed records and blobs
|
|
@item
|
|
extra.c -- setting options and buffer sizes when optimizing table
|
|
handling
|
|
@item
|
|
info.c -- Information about database status
|
|
@item
|
|
_key.c -- for handling keys
|
|
@item
|
|
_locking.c -- lock database
|
|
@item
|
|
log.c -- save commands in log file which myisamlog program can read
|
|
@item
|
|
_packrec.c -- compress records
|
|
@item
|
|
_page.c -- read and write pages containing keys
|
|
@item
|
|
panic.c -- the mi_panic routine, probably for sudden shutdowns
|
|
@item
|
|
range.c -- approximate count of how many records lie between two
|
|
keys
|
|
@item
|
|
rfirst.c -- read first row through a specific key (very short)
|
|
@item
|
|
rkey.c -- read a record using a key
|
|
@item
|
|
rlast.c -- read last row with same key as previously-read row
|
|
@item
|
|
rnext.c -- read next row with same key as previously-read row
|
|
@item
|
|
rprev.c -- read previous row with same key as previously-read row
|
|
@item
|
|
rrnd.c -- read a row based on position
|
|
@item
|
|
rsame.c -- find current row using positional read or key-based read
|
|
@item
|
|
rsamepos.c -- positional read
|
|
@item
|
|
_search.c -- key-handling functions
|
|
@item
|
|
static.c -- static variables (very short)
|
|
@item
|
|
_statrec.c -- functions to handle fixed-length records
|
|
@item
|
|
test1.c -- testing basic functions
|
|
@item
|
|
test2.c -- testing database and storing results
|
|
@item
|
|
test3.c -- testing locking
|
|
@item
|
|
update.c -- update an existing row
|
|
@item
|
|
write.c -- insert a new row
|
|
@item
|
|
pack_isam.c -- pack isam file (NOTE TO SELF ?? equivalent to
|
|
\myisam\myisampack.c)
|
|
@end itemize
|
|
@*@*
|
|
|
|
Except for one minor C file (pack_isam.c) every program in the ISAM
|
|
directory has a counterpart in the MyISAM directory. For example
|
|
\isam\update.c corresponds to \myisam\mi_update.c. However, the
|
|
reverse is not true -- there are many files in the MyISAM directory
|
|
which have no counterpart in the ISAM directory.
|
|
@*@*
|
|
|
|
The reason is simple -- it's because the ISAM files are becoming
|
|
obsolete. When MySQL programmers add new features, they add them for
|
|
MyISAM only. The student can therefore ignore all files in this
|
|
directory and study the MyISAM programs instead.
|
|
@*@*
|
|
|
|
@subsection libmysql
|
|
|
|
The MySQL Library, Part 1.
|
|
@*@*
|
|
|
|
The files here are for producing MySQL as a library (e.g. a Windows
|
|
DLL). The idea is that, instead of producing separate mysql (client)
|
|
and mysqld (server) programs, one produces a library. Instead of
|
|
sending messages, the client part merely calls the server part.
|
|
@*@*
|
|
|
|
The libmysql files are split into three directories: libmysql (this
|
|
one), libmysql_r (the next one), and libmysqld (the next one after
|
|
that). It may be that the original intention was that the libmysql
|
|
directory would hold the "client part" files, and the libmysqld
|
|
directory would hold the "server part" files.
|
|
@*@*
|
|
|
|
The program files on this directory are:
|
|
@itemize @bullet
|
|
@item
|
|
conf_to_src.c -- has to do with charsets
|
|
@item
|
|
dll.c -- initialization of the dll library
|
|
@item
|
|
errmsg.c -- English error messages, compare \mysys\errors.c
|
|
@item
|
|
get_password.c -- get password
|
|
@item
|
|
libmysql.c -- the main "packet-sending emulation" program
|
|
@item
|
|
manager.c -- initialize/connect/fetch with MySQL manager
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection libmysql_r
|
|
|
|
The MySQL Library, Part 2.
|
|
@*@*
|
|
|
|
This is a continuation of the libmysql directory. There is only one
|
|
file here:
|
|
@itemize @bullet
|
|
@item
|
|
makefile.am
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection libmysqld
|
|
|
|
The MySQL library, Part 3.
|
|
@*@*
|
|
|
|
This is a continuation of the libmysql directory. The program files on
|
|
this directory are:
|
|
@itemize @bullet
|
|
@item
|
|
libmysqld.c -- The called side, compare the mysqld.exe source
|
|
@item
|
|
lib_vio.c -- Emulate the vio directory's communication buffer
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection man
|
|
|
|
Manual pages.
|
|
@*@*
|
|
|
|
These are not the actual "man" (manual) pages, they are switches for
|
|
the production.
|
|
@*@*
|
|
|
|
@subsection merge
|
|
|
|
The MERGE table handler.
|
|
@*@*
|
|
|
|
For a description of the MERGE table handler, see the MySQL Reference
|
|
Manual, section 7.2.
|
|
@*@*
|
|
|
|
You'll notice that there seem to be several directories with
|
|
similar-sounding names of C files in them. That's because the MySQL
|
|
table handlers are all quite similar.
|
|
@*@*
|
|
|
|
The related directories are:
|
|
@itemize @bullet
|
|
@item
|
|
\isam -- for ISAM
|
|
@item
|
|
\myisam -- for MyISAM
|
|
@item
|
|
\merge -- for ISAM MERGE (mostly call functions in \isam programs)
|
|
@item
|
|
\myisammrg -- for MyISAM MERGE (mostly call functions in \myisam
|
|
programs)
|
|
@end itemize
|
|
@*@*
|
|
|
|
To avoid duplication, only the \myisam program versions are discussed.
|
|
@*@*
|
|
|
|
The C programs in this (merge) directory are:
|
|
@itemize @bullet
|
|
@item
|
|
mrg_close.c -- compare \isam's close.c
|
|
@item
|
|
mrg_create.c -- "" create.c
|
|
@item
|
|
mrg_delete.c -- "" delete.c
|
|
@item
|
|
mrg_extra.c -- "" extra.c
|
|
@item
|
|
mrg_info.c -- "" info.c
|
|
@item
|
|
mrg_locking.c -- "" locking.c
|
|
@item
|
|
mrg_open.c -- "" open.c
|
|
@item
|
|
mrg_panic.c -- "" panic.c
|
|
@item
|
|
mrg_rrnd.c -- "" rrnd.c
|
|
@item
|
|
mrg_rsame.c -- "" rsame.c
|
|
@item
|
|
mrg_static.c -- "" static.c
|
|
@item
|
|
mrg_update.c -- "" update.c
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection myisam
|
|
|
|
The MyISAM table handler.
|
|
@*@*
|
|
|
|
The C files in this subdirectory come in six main groups:
|
|
@itemize @bullet
|
|
@item
|
|
ft*.c files -- ft stands for "Full Text", code contributed by Sergei
|
|
Golubchik
|
|
@item
|
|
mi*.c files -- mi stands for "My Isam", these are the main programs
|
|
for Myisam
|
|
@item
|
|
myisam*.c files -- for example, "myisamchk" utility routine
|
|
functions source
|
|
@item
|
|
rt*.c files -- rt stands for "rtree", some code was written by
|
|
Alexander Barkov
|
|
@item
|
|
sp*.c files -- sp stands for "spatial", some code was written by
|
|
Ramil Kalimullin
|
|
@item
|
|
sort.c -- this is a single file that sorts keys for index-create
|
|
purposes
|
|
@end itemize
|
|
@*@*
|
|
|
|
The "full text" and "rtree" and "spatial" program sets are for special
|
|
purposes, so this document focuses only on the mi*.c "myisam" C
|
|
programs. They are:
|
|
@itemize @bullet
|
|
@item
|
|
mi_cache.c -- for reading records from a cache
|
|
@item
|
|
mi_changed.c -- a single routine for setting a "changed" flag (very
|
|
short)
|
|
@item
|
|
mi_check.c -- doesn't just do checks, ?? for myisamchk program?
|
|
@item
|
|
mi_checksum.c -- calculates a checksum for a row
|
|
@item
|
|
mi_close.c -- close database
|
|
@item
|
|
mi_create.c -- create a table
|
|
@item
|
|
mi_dbug.c -- support routines for use with "dbug" (see \dbug
|
|
description)
|
|
@item
|
|
mi_delete.c -- delete a row
|
|
@item
|
|
mi_delete_all.c -- delete all rows
|
|
@item
|
|
mi_delete_table.c -- delete a table (very short)
|
|
@item
|
|
mi_dynrec.c -- functions to handle space-packed records and blobs
|
|
@item
|
|
mi_extra.c -- setting options and buffer sizes when optimizing
|
|
@item
|
|
mi_info.c -- "Ger tillbaka en struct med information om isam-filen"
|
|
@item
|
|
mi_key.c -- for handling keys
|
|
@item
|
|
mi_locking.c -- lock database
|
|
@item
|
|
mi_log.c -- save commands in log file which myisamlog program can
|
|
read
|
|
@item
|
|
mi_open.c -- open database
|
|
@item
|
|
mi_packrec.c -- compress records
|
|
@item
|
|
mi_page.c -- read and write pages containing keys
|
|
@item
|
|
mi_panic.c -- the mi_panic routine, probably for sudden shutdowns
|
|
@item
|
|
mi_range.c -- approximate count of how many records lie between two
|
|
keys
|
|
@item
|
|
mi_rename.c -- rename a table
|
|
@item
|
|
mi_rfirst.c -- read first row through a specific key (very short)
|
|
@item
|
|
mi_rkey.c -- read a record using a key
|
|
@item
|
|
mi_rlast.c -- read last row with same key as previously-read row
|
|
@item
|
|
mi_rnext.c -- read next row with same key as previously-read row
|
|
@item
|
|
mi_rnext_same.c -- same as mi_rnext.c, but abort if the key changes
|
|
@item
|
|
mi_rprev.c -- read previous row with same key as previously-read row
|
|
@item
|
|
mi_rrnd.c -- read a row based on position
|
|
@item
|
|
mi_rsame.c -- find current row using positional read or key-based
|
|
read
|
|
@item
|
|
mi_rsamepos.c -- positional read
|
|
@item
|
|
mi_scan.c -- read all rows sequentially
|
|
@item
|
|
mi_search.c -- key-handling functions
|
|
@item
|
|
mi_static.c -- static variables (very short)
|
|
@item
|
|
mi_statrec.c -- functions to handle fixed-length records
|
|
@item
|
|
mi_test1.c -- testing basic functions
|
|
@item
|
|
mi_test2.c -- testing database and storing results
|
|
@item
|
|
mi_test3.c -- testing locking
|
|
@item
|
|
mi_unique.c -- functions to check if a row is unique
|
|
@item
|
|
mi_update.c -- update an existing row
|
|
@item
|
|
mi_write.c -- insert a new row
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection myisammrg
|
|
|
|
MyISAM Merge table handler.
|
|
@*@*
|
|
|
|
As with other table handlers, you'll find that the *.c files in the
|
|
myissammrg directory have counterparts in the myisam directory. In
|
|
fact, this general description of a myisammrg program is almost always
|
|
true: The myisammrg
|
|
function checks an argument, the myisammrg function formulates an
|
|
expression for passing to a myisam function, the myisammrg calls a
|
|
myisam function, the myisammrg function returns.
|
|
@*@*
|
|
|
|
These are the 21 files in the myisammrg directory, with notes about
|
|
the myisam functions or programs they're connected with:
|
|
@itemize @bullet
|
|
@item
|
|
myrg_close.c -- mi_close.c
|
|
@item
|
|
myrg_create.c -- mi_create.c
|
|
@item
|
|
myrg_delete.c -- mi_delete.c / delete last-read record
|
|
@item
|
|
myrg_extra.c -- mi_extra.c / "extra functions we want to do ..."
|
|
@item
|
|
myrg_info.c -- mi_info.c / display information about a mymerge file
|
|
@item
|
|
myrg_locking.c -- mi_locking.c / lock databases
|
|
@item
|
|
myrg_open.c -- mi_open.c / open a MyISAM MERGE table
|
|
@item
|
|
myrg_panic.c -- mi_panic.c / close in a hurry
|
|
@item
|
|
myrg_queue.c -- read record based on a key
|
|
@item
|
|
myrg_range.c -- mi_range.c / find records in a range
|
|
@item
|
|
myrg_rfirst.c -- mi_rfirst.c / read first record according to
|
|
specific key
|
|
@item
|
|
myrg_rkey.c -- mi_rkey.c / read record based on a key
|
|
@item
|
|
myrg_rlast.c -- mi_rlast.c / read last row with same key as previous
|
|
read
|
|
@item
|
|
myrg_rnext.c -- mi_rnext.c / read next row with same key as previous
|
|
read
|
|
@item
|
|
myrg_rnext_same.c -- mi_rnext_same.c / read next row with same key
|
|
@item
|
|
myrg_rprev.c -- mi_rprev.c / read previous row with same key
|
|
@item
|
|
myrg_rrnd.c -- mi_rrnd.c / read record with random access
|
|
@item
|
|
myrg_rsame.c -- mi_rsame.c / call mi_rsame function, see
|
|
\myisam\mi_rsame.c
|
|
@item
|
|
myrg_static.c -- mi_static.c / static variable declaration
|
|
@item
|
|
myrg_update.c -- mi_update.c / call mi_update function, see
|
|
\myisam\mi_update.c
|
|
@item
|
|
myrg_write.c -- mi_write.c / call mi_write function, see
|
|
\myisam\mi_write.c
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection mysql-test
|
|
|
|
A test suite for mysqld.
|
|
@*@*
|
|
|
|
The directory has a README file which explains how to run the tests,
|
|
how to make new tests (in files with the filename extension "*.test"),
|
|
and how to report errors.
|
|
@*@*
|
|
|
|
There are four subdirectories:
|
|
@itemize @bullet
|
|
@item
|
|
\misc -- contains one minor Perl program
|
|
@item
|
|
\r -- contains *.result, i.e. "what happened" files and
|
|
*.required, i.e. "what should happen" file
|
|
@item
|
|
\std_data -- contains standard data for input to tests
|
|
@item
|
|
\t -- contains tests
|
|
@end itemize
|
|
@*@*
|
|
|
|
There are 186 *.test files in the \t subdirectory. Primarily these are
|
|
SQL scripts which try out a feature, output a result, and compare the
|
|
result with what's required. Some samples of what the test files check
|
|
are: latin1_de comparisons, date additions, the HAVING clause, outer
|
|
joins, openSSL, load data, logging, truncate, and UNION.
|
|
@*@*
|
|
|
|
There are other tests in these directories:
|
|
@itemize @bullet
|
|
@item
|
|
sql-bench
|
|
@item
|
|
repl-tests
|
|
@item
|
|
tests
|
|
@end itemize
|
|
|
|
@subsection mysys
|
|
|
|
MySQL system library (Low level routines for file access etc.).
|
|
@*@*
|
|
|
|
There are 115 *.c programs in this directory:
|
|
@itemize @bullet
|
|
@item
|
|
array.c -- Dynamic array handling
|
|
@item
|
|
charset.c -- Using dynamic character sets, set default character
|
|
set, ...
|
|
@item
|
|
charset2html.c -- Checking what character set a browser is using
|
|
@item
|
|
checksum.c -- Calculate checksum for a memory block, used for
|
|
pack_isam
|
|
@item
|
|
default.c -- Find defaults from *.cnf or *.ini files
|
|
@item
|
|
errors.c -- English text of global errors
|
|
@item
|
|
hash.c -- Hash search/compare/free functions "for saving keys"
|
|
@item
|
|
list.c -- Double-linked lists
|
|
@item
|
|
make-conf.c -- "Make a charset .conf file out of a ctype-charset.c
|
|
file"
|
|
@item
|
|
md5.c -- MD5 ("Message Digest 5") algorithm from RSA Data Security
|
|
@item
|
|
mf_brkhant.c -- Prevent user from doing a Break during critical
|
|
execution
|
|
@item
|
|
mf_cache.c -- "Open a temporary file and cache it with io_cache"
|
|
@item
|
|
mf_dirname.c -- Parse/convert directory names
|
|
@item
|
|
mf_fn_ext.c -- Get filename extension
|
|
@item
|
|
mf_format.c -- Format a filename
|
|
@item
|
|
mf_getdate.c -- Get date, return in yyyy-mm-dd hh:mm:ss format
|
|
@item
|
|
mf_iocache.c -- Cached read/write of files in fixed-size units
|
|
@item
|
|
mf_iocache2.c -- Continuation of mf_iocache.c
|
|
@item
|
|
mf_keycache.c -- Key block caching for certain file types
|
|
@item
|
|
mf_loadpath.c -- Return full path name (no ..\ stuff)
|
|
@item
|
|
mf_pack.c -- Packing/unpacking directory names for create purposes
|
|
@item
|
|
mf_path.c -- Determine where a program can find its files
|
|
@item
|
|
mf_qsort.c -- Quicksort
|
|
@item
|
|
mf_qsort2.c -- Quicksort, part 2
|
|
@item
|
|
mf_radix.c -- Radix sort
|
|
@item
|
|
mf_same.c -- Determine whether filenames are the same
|
|
@item
|
|
mf_sort.c -- Sort with choice of Quicksort or Radix sort
|
|
@item
|
|
mf_soundex.c -- Soundex algorithm derived from EDN Nov. 14, 1985
|
|
(pg. 36)
|
|
@item
|
|
mf_strip.c -- Strip trail spaces from a string
|
|
@item
|
|
mf_tempdir.c -- Initialize/find/free temporary directory
|
|
@item
|
|
mf_tempfile.c -- Create a temporary file
|
|
@item
|
|
mf_unixpath.c -- Convert filename to UNIX-style filename
|
|
@item
|
|
mf_util.c -- Routines, #ifdef'd, which may be missing on some
|
|
machines
|
|
@item
|
|
mf_wcomp.c -- Comparisons with wildcards
|
|
@item
|
|
mf_wfile.c -- Finding files with wildcards
|
|
@item
|
|
mulalloc.c -- Malloc many pointers at the same time
|
|
@item
|
|
my_aes.c -- AES encryption
|
|
@item
|
|
my_alarm.c -- Set a variable value when an alarm is received
|
|
@item
|
|
my_alloc.c -- malloc of results which will be freed simultaneously
|
|
@item
|
|
my_append.c -- one file to another
|
|
@item
|
|
my_bit.c -- smallest X where 2^X >= value, maybe useful for
|
|
divisions
|
|
@item
|
|
my_bitmap.c -- Handle uchar arrays as large bitmaps
|
|
@item
|
|
my_chsize.c -- Truncate file if shorter, else fill with a filler
|
|
character
|
|
@item
|
|
my_clock.c -- Time-of-day ("clock()") function, with OS-dependent
|
|
#ifdef's
|
|
@item
|
|
my_compress.c -- Compress packet (see also description of \zlib
|
|
directory)
|
|
@item
|
|
my_copy.c -- Copy files
|
|
@item
|
|
my_create.c -- Create file
|
|
@item
|
|
my_delete.c -- Delete file
|
|
@item
|
|
my_div.c -- Get file's name
|
|
@item
|
|
my_dup.c -- Open a duplicated file
|
|
@item
|
|
my_error.c -- Return formatted error to user
|
|
@item
|
|
my_fopen.c -- File open
|
|
@item
|
|
my_fstream.c -- Streaming file read/write
|
|
@item
|
|
my_getwd.c -- Get working directory
|
|
@item
|
|
my_gethostbyname.c -- Thread-safe version of standard net
|
|
gethostbyname() func
|
|
@item
|
|
my_getopt.c -- Find out what options are in effect
|
|
@item
|
|
my_handler.c -- Compare two keys in various possible formats
|
|
@item
|
|
my_init.c -- Initialize variables and functions in the mysys library
|
|
@item
|
|
my_lib.c -- Compare/convert directory names and file names
|
|
@item
|
|
my_lock.c -- Lock part of a file
|
|
@item
|
|
my_lockmem.c -- "Allocate a block of locked memory"
|
|
@item
|
|
my_lread.c -- Read a specified number of bytes from a file into
|
|
memory
|
|
@item
|
|
my_lwrite.c -- Write a specified number of bytes from memory into a
|
|
file
|
|
@item
|
|
my_malloc.c -- Malloc (memory allocate) and dup functions
|
|
@item
|
|
my_messnc.c -- Put out a message on stderr with "no curses"
|
|
@item
|
|
my_mkdir.c -- Make directory
|
|
@item
|
|
my_net.c -- Thread-safe version of net inet_ntoa function
|
|
@item
|
|
my_netware.c -- Functions used only with the Novell Netware version
|
|
of MySQL
|
|
@item
|
|
my_once.c -- Allocation / duplication for "things we don't need to
|
|
free"
|
|
@item
|
|
my_open.c -- Open a file
|
|
@item
|
|
my_os2cond.c -- OS2-specific: "A simple implementation of posix
|
|
conditions"
|
|
@item
|
|
my_os2dirsrch.c -- OS2-specific: Emulate a Win32 directory search
|
|
@item
|
|
my_os2dlfcn.c -- OS2-specific: Emulate UNIX dynamic loading
|
|
@item
|
|
my_os2file64.c -- OS2-specific: For File64bit setting
|
|
@item
|
|
my_os2mutex.c -- OS2-specific: For mutex handling
|
|
@item
|
|
my_os2thread.c -- OS2-specific: For thread handling
|
|
@item
|
|
my_os2tls.c -- OS2-specific: For thread-local storage
|
|
@item
|
|
my_port.c -- AIX-specific: my_ulonglong2double()
|
|
@item
|
|
my_pread.c -- Read a specified number of bytes from a file
|
|
@item
|
|
my_pthread.c -- A wrapper for thread-handling functions in different
|
|
OSs
|
|
@item
|
|
my_quick.c -- Read/write (labelled a "quicker" interface, perhaps
|
|
obsolete)
|
|
@item
|
|
my_read.c -- Read a specified number of bytes from a file, possibly
|
|
retry
|
|
@item
|
|
my_realloc.c -- Reallocate memory allocated with my_alloc.c
|
|
(probably)
|
|
@item
|
|
my_redel.c -- Rename and delete file
|
|
@item
|
|
my_rename.c -- Rename without delete
|
|
@item
|
|
my_seek.c -- Seek, i.e. point to a spot within a file
|
|
@item
|
|
my_semaphore.c -- Semaphore routines, for use on OS that doesn't
|
|
support them
|
|
@item
|
|
my_sleep.c -- Wait n microseconds
|
|
@item
|
|
my_static.c -- Static-variable definitions
|
|
@item
|
|
my_symlink.c -- Read a symbolic link (symlinks are a UNIX thing, I
|
|
guess)
|
|
@item
|
|
my_symlink2.c -- Part 2 of my_symlink.c
|
|
@item
|
|
my_tempnam.c -- Obsolete temporary-filename routine used by ISAM
|
|
table handler
|
|
@item
|
|
my_thr_init.c -- initialize/allocate "all mysys & debug thread
|
|
variables"
|
|
@item
|
|
my_wincond.c -- Windows-specific: emulate Posix conditions
|
|
@item
|
|
my_winsem.c -- Windows-specific: emulate Posix threads
|
|
@item
|
|
my_winthread.c -- Windows-specific: emulate Posix threads
|
|
@item
|
|
my_write.c -- Write a specified number of bytes to a file
|
|
@item
|
|
ptr_cmp.c -- Point to an optimal byte-comparison function
|
|
@item
|
|
queues.c -- Handle priority queues as in Robert Sedgewick's book
|
|
@item
|
|
raid2.c -- RAID support (the true implementation is in raid.cc)
|
|
@item
|
|
rijndael.c -- "Optimised ANSI C code for the Rijndael cipher (now
|
|
AES")
|
|
@item
|
|
safemalloc.c -- A version of the standard malloc() with safety
|
|
checking
|
|
@item
|
|
sha1.c -- Implementation of Secure Hashing Algorithm 1
|
|
@item
|
|
string.c -- Initialize/append/free dynamically-sized strings
|
|
@item
|
|
testhash.c -- Standalone program: test the hash library routines
|
|
@item
|
|
test_charset.c -- Standalone program: display character set
|
|
information
|
|
@item
|
|
test_dir.c -- Standalone program: placeholder for "test all
|
|
functions" idea
|
|
@item
|
|
test_fn.c -- Standalone program: apparently tests a function
|
|
@item
|
|
test_xml.c -- Standalone program: test XML routines
|
|
@item
|
|
thr_alarm.c -- Thread alarms and signal handling
|
|
@item
|
|
thr_lock.c -- "Read and write locks for Posix threads"
|
|
@item
|
|
thr_mutex.c -- A wrapper for mutex functions
|
|
@item
|
|
thr_rwlock.c -- Synchronizes the readers' thread locks with the
|
|
writer's lock
|
|
@item
|
|
tree.c -- Initialize/search/free binary trees
|
|
@item
|
|
typelib.c -- Determine what type a field has
|
|
@end itemize
|
|
@*@*
|
|
|
|
You can find documentation for the main functions in these files
|
|
elsewhere in this document.
|
|
For example, the main functions in my_getwd.c are described thus:
|
|
@*@*
|
|
|
|
@example
|
|
"int my_getwd _A((string buf, uint size, myf MyFlags)); @*
|
|
int my_setwd _A((const char *dir, myf MyFlags)); @*
|
|
Get and set working directory." @*
|
|
@end example
|
|
|
|
@subsection netware
|
|
|
|
Files related to the Novell NetWare version of MySQL.
|
|
@*@*
|
|
|
|
There are 39 files on this directory. Most have filename extensions of
|
|
*.def, *.sql, or *.c.
|
|
@*@*
|
|
|
|
The twenty-five *.def files are all from Novell Inc. They contain import or
|
|
export symbols. (".def" is a common filename extension for
|
|
"definition".)
|
|
@*@*
|
|
|
|
The two *.sql files are short scripts of SQL statements used in
|
|
testing.
|
|
@*@*
|
|
|
|
These are the five *.c files, all from Novell Inc.:
|
|
@itemize @bullet
|
|
@item
|
|
libmysqlmain.c -- Only one function: init_available_charsets()
|
|
@item
|
|
my_manage.c -- Standalone management utility
|
|
@item
|
|
mysql_install_db.c -- Compare \scripts\mysql_install_db.sh
|
|
@item
|
|
mysql_test_run.c -- Short test program
|
|
@item
|
|
mysqld_safe.c -- Compare \scripts\mysqld_safe.sh
|
|
@end itemize
|
|
|
|
Perhaps the most important file is:
|
|
@itemize @bullet
|
|
@item
|
|
netware.patch -- NetWare-specific build instructions and switches
|
|
(compare \mysql-4.1\ltmain.sh)
|
|
@end itemize
|
|
@*@*
|
|
|
|
For instructions about basic installation, see "Deployment Guide For
|
|
NetWare AMP" at:
|
|
@url{http://developer.novell.com/ndk/whitepapers/namp.htm}
|
|
@*
|
|
|
|
@subsection NEW-RPMS
|
|
|
|
New "RPM Package Manager" files.
|
|
@*@*
|
|
|
|
This directory is not part of the Windows distribution. Perhaps in
|
|
MYSQL's Linux distribution it has files for use with Red Hat
|
|
installations -- a point that needs checking someday.
|
|
@*@*
|
|
|
|
@subsection os2
|
|
|
|
Routines for working with the OS2 operating system.
|
|
@*@*
|
|
|
|
The files in this directory are the product of the efforts of three
|
|
people from outside MySQL: Yuri Dario, Timo Maier, and John M
|
|
Alfredsson. There are no .C program files in this directory.
|
|
@*@*
|
|
|
|
The contents of \os2 are:
|
|
@itemize @bullet
|
|
@item
|
|
A Readme.Txt file
|
|
@item
|
|
An \include subdirectory containing .h files which are for OS/2 only
|
|
@item
|
|
Files used in the build process (configuration, switches, and one
|
|
.obj)
|
|
@end itemize
|
|
@*@*
|
|
|
|
The README file refers to MySQL version 3.23, which suggests that
|
|
there have been no updates for MySQL 4.0 for this section.
|
|
@*@*
|
|
|
|
@subsection pstack
|
|
|
|
Process stack display.
|
|
@*@*
|
|
|
|
This is a set of publicly-available debugging aids which all do pretty
|
|
well the same thing: display the contents of the stack, along with
|
|
symbolic information, for a running process. There are versions for
|
|
various object file formats (such as ELF and IEEE-695). Most of the
|
|
programs are copyrighted by the Free Software Foundation and are
|
|
marked as "part of GNU Binutils".
|
|
@*@*
|
|
|
|
In other words, the pstack files are not really part of the MySQL
|
|
library. They are merely useful when you re-program some MYSQL code
|
|
and it crashes.
|
|
@*@*
|
|
|
|
@subsection regex
|
|
|
|
Regular Expression library for support of REGEXP function.
|
|
@*@*
|
|
|
|
This is the copyrighted product of Henry Spencer from the University
|
|
of Toronto. It's a fairly-well-known implementation of the
|
|
requirements of POSIX 1003.2 Section 2.8. The library is bundled with
|
|
Apache and is the default implementation for regular-expression
|
|
handling in BSD Unix. MySQL's Monty Widenius has made minor changes in
|
|
three programs (debug.c, engine.c, regexec.c) but this is not a MySQL
|
|
package. MySQL calls it only in order to support two MySQL functions:
|
|
REGEXP and RLIKE.
|
|
@*@*
|
|
|
|
Some of Mr Spencer's documentation for the regex library can be found
|
|
in the README and WHATSNEW files.
|
|
@*@*
|
|
|
|
One MySQL program which uses regex is \cmd-line-utils\libedit\search.c
|
|
@*@*
|
|
|
|
This program calls the 'regcomp' function, which is the entry point in
|
|
\regex\regexp.c.
|
|
@*@*
|
|
|
|
@subsection repl-tests
|
|
|
|
Test cases for replication.
|
|
@*@*
|
|
|
|
There are six short and trivial-looking tests in these subdirectories:
|
|
@itemize @bullet
|
|
@item
|
|
\test-auto-inc -- Do auto-Increment columns work?
|
|
@item
|
|
\test-bad-query -- Does insert in PK column work?
|
|
@item
|
|
\test-dump -- Do LOAD statements work?
|
|
@item
|
|
\test-repl -- Does replication work?
|
|
@item
|
|
\test-repl-alter -- Does ALTER TABLE work?
|
|
@item
|
|
\test-repl-ts -- Does TIMESTAMP column work?
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection SCCS
|
|
|
|
Source Code Control System.
|
|
@*@*
|
|
|
|
You will see this directory if and only if you used BitKeeper for
|
|
downloading the source. The files here are for BitKeeper
|
|
administration and are not of interest to application programmers.
|
|
@*@*
|
|
|
|
@subsection scripts
|
|
|
|
SQL batches, e.g. for converting msql to MySQL.
|
|
@*@*
|
|
|
|
The *.sh filename extension apparently stands for "shell script".
|
|
Linux programmers use it where Windows programmers would use a *.bat
|
|
(batch filename extension).
|
|
@*@*
|
|
|
|
The *.sh files on this directory are:
|
|
@itemize @bullet
|
|
@item
|
|
fill_help_tables.sh -- Create help-information tables and insert
|
|
@item
|
|
make_binary_distribution.sh -- Get configure information, make,
|
|
produce tar
|
|
@item
|
|
msql2mysql.sh -- Convert mSQL to MySQL
|
|
@item
|
|
mysqlbug.sh -- Create a bug report and mail it
|
|
@item
|
|
mysqld_multi.sh -- Start/stop any number of mysqld instances
|
|
@item
|
|
mysqld_safe-watch.sh -- Start/restart in safe mode
|
|
@item
|
|
mysqld_safe.sh -- Start/restart in safe mode
|
|
@item
|
|
mysqldumpslow.sh -- Parse and summarize the slow query log
|
|
@item
|
|
mysqlhotcopy.sh -- Hot backup
|
|
@item
|
|
mysql_config.sh -- Get configure information that client might need
|
|
@item
|
|
mysql_convert_table_format.sh -- Conversion, e.g. from ISAM to
|
|
MyISAM
|
|
@item
|
|
mysql_explain_log.sh -- Put a log (made with --log) into a MySQL
|
|
table
|
|
@item
|
|
mysql_find_rows.sh -- Search for queries containing <regexp>
|
|
@item
|
|
mysql_fix_extensions.sh -- Renames some file extensions, not
|
|
recommended
|
|
@item
|
|
mysql_fix_privilege_tables.sh -- Fix mysql.user etc. if upgrading to
|
|
MySQL 3.23.14+
|
|
@item
|
|
mysql_install_db.sh -- Create privilege tables and func table
|
|
@item
|
|
mysql_secure_installation.sh -- Disallow remote root login,
|
|
eliminate test, etc.
|
|
@item
|
|
mysql_setpermission.sh -- Aid to add users or databases, sets
|
|
privileges
|
|
@item
|
|
mysql_tableinfo.sh -- Puts info re MySQL tables into a MySQL table
|
|
@item
|
|
mysql_zap.sh -- Kill processes which match pattern
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection sql
|
|
|
|
Programs for handling SQL commands. The "core" of MySQL.
|
|
@*@*
|
|
|
|
These are the .c and .cc files in the sql directory:
|
|
@itemize @bullet
|
|
@item
|
|
cache_manager.cc -- manages a number of blocks
|
|
@item
|
|
convert.cc -- convert tables between different character sets
|
|
@item
|
|
derror.cc -- read language-dependent message file
|
|
@item
|
|
des_key_file.cc -- load DES keys from plaintext file
|
|
@item
|
|
field.cc -- "implement classes defined in field.h" (long)
|
|
@item
|
|
field_conv.cc -- functions to copy data to or from fields
|
|
@item
|
|
filesort.cc -- sort file
|
|
@item
|
|
frm_crypt.cc -- contains only one short function: get_crypt_for_frm
|
|
@item
|
|
gen_lex_hash.cc -- Knuth's algorithm from Vol 3 Sorting and
|
|
Searching, Chapter 6.3
|
|
@item
|
|
gstream.cc -- GTextReadStream
|
|
@item
|
|
handler.cc -- handler-calling functions
|
|
@item
|
|
hash_filo.cc -- static-sized hash tables
|
|
@item
|
|
ha_berkeley.cc -- Handler: BDB
|
|
@item
|
|
ha_heap.cc -- Handler: Heap
|
|
@item
|
|
ha_innodb.cc -- Handler: InnoDB
|
|
@item
|
|
ha_isam.cc -- Handler: ISAM
|
|
@item
|
|
ha_isammrg.cc -- Handler: (ISAM MERGE)
|
|
@item
|
|
ha_myisam.cc -- Handler: MyISAM
|
|
@item
|
|
ha_myisammrg.cc -- Handler: (MyISAM MERGE)
|
|
@item
|
|
hostname.cc -- Given IP, return hostname
|
|
@item
|
|
init.cc -- Init and dummy functions for interface with unireg
|
|
@item
|
|
item.cc -- Item functions
|
|
@item
|
|
item_buff.cc -- Buffers to save and compare item values
|
|
@item
|
|
item_cmpfunc.cc -- Definition of all compare functions
|
|
@item
|
|
item_create.cc -- Create an item. Used by lex.h.
|
|
@item
|
|
item_func.cc -- Numerical functions
|
|
@item
|
|
item_row.cc -- Row items for comparing rows and for IN on rows
|
|
@item
|
|
item_sum.cc -- Set functions (sum, avg, etc.)
|
|
@item
|
|
item_strfunc.cc -- String functions
|
|
@item
|
|
item_subselect.cc -- Item subselect
|
|
@item
|
|
item_timefunc.cc -- Date/time functions, e.g. week of year
|
|
@item
|
|
item_uniq.cc -- Empty file, here for compatibility reasons
|
|
@item
|
|
key.cc -- Functions to handle keys and fields in forms
|
|
@item
|
|
lock.cc -- Locks
|
|
@item
|
|
log.cc -- Logs
|
|
@item
|
|
log_event.cc -- Log event
|
|
@item
|
|
matherr.c -- Handling overflow, underflow, etc.
|
|
@item
|
|
mf_iocache.cc -- Caching of (sequential) reads
|
|
@item
|
|
mini_client.cc -- Client included in server for server-server
|
|
messaging
|
|
@item
|
|
mysqld.cc -- Source of mysqld.exe
|
|
@item
|
|
my_lock.c -- Lock part of a file
|
|
@item
|
|
net_serv.cc -- Read/write of packets on a network socket
|
|
@item
|
|
nt_servc.cc -- Initialize/register/remove an NT service
|
|
@item
|
|
opt_ft.cc -- Create a FT or QUICK RANGE based on a key (very short)
|
|
* opt_range.cc -- Range of keys
|
|
@item
|
|
opt_sum.cc -- Optimize functions in presence of (implied) GROUP BY
|
|
@item
|
|
password.c -- Password checking
|
|
@item
|
|
procedure.cc -- Procedure
|
|
@item
|
|
protocol.cc -- Low level functions for storing data to be sent to
|
|
client
|
|
@item
|
|
records.cc -- Functions to read, write, and lock records
|
|
@item
|
|
repl_failsafe.cc -- Replication fail-save
|
|
@item
|
|
set_var.cc -- MySQL variables
|
|
@item
|
|
slave.cc -- Procedures for a slave in a master/slave (replication?)
|
|
relation
|
|
@item
|
|
spatial.cc -- Geometry stuff (lines, points, etc.)
|
|
@item
|
|
sql_acl.cc -- Functions related to ACL security
|
|
@item
|
|
sql_analyse.cc -- Analyse an input string (?)
|
|
@item
|
|
sql_base.cc -- Basic functions needed by many modules
|
|
@item
|
|
sql_cache.cc -- SQL cache, with long comments about how caching
|
|
works
|
|
@item
|
|
sql_class.cc -- SQL class
|
|
@item
|
|
sql_crypt.cc -- Encode / decode, very short
|
|
@item
|
|
sql_db.cc -- Create / drop database
|
|
@item
|
|
sql_delete.cc -- The DELETE statement
|
|
@item
|
|
sql_derived.cc -- Derived tables, with long comments
|
|
@item
|
|
sql_do.cc -- The DO statement
|
|
@item
|
|
sql_error.cc -- Errors and warnings
|
|
@item
|
|
sql_handler.cc -- Direct access to ISAM
|
|
@item
|
|
sql_help.cc -- The HELP statement (if there is one?)
|
|
@item
|
|
sql_insert.cc -- The INSERT statement
|
|
@item
|
|
sql_lex.cc -- Related to lex or yacc
|
|
@item
|
|
sql_list.cc -- Only list_node_end_of_list, short
|
|
@item
|
|
sql_load.cc -- The LOAD DATA statement?
|
|
@item
|
|
sql_map.cc -- Memory-mapped files?
|
|
@item
|
|
sql_manager.cc -- Maintenance tasks, e.g. flushing the buffers
|
|
periodically
|
|
@item
|
|
sql_olap.cc -- ROLLUP
|
|
@item
|
|
sql_parse.cc -- Parse an SQL statement
|
|
@item
|
|
sql_prepare.cc -- Prepare an SQL statement
|
|
@item
|
|
sql_repl.cc -- Replication
|
|
@item
|
|
sql_rename.cc -- Rename table
|
|
@item
|
|
sql_select.cc -- Select and join optimisation
|
|
@item
|
|
sql_show.cc -- The SHOW statement
|
|
@item
|
|
sql_string.cc -- String functions: alloc, realloc, copy, convert,
|
|
etc.
|
|
@item
|
|
sql_table.cc -- The DROP TABLE and ALTER TABLE statements
|
|
@item
|
|
sql_test.cc -- Some debugging information
|
|
@item
|
|
sql_udf.cc -- User-defined functions
|
|
@item
|
|
sql_union.cc -- The UNION operator
|
|
@item
|
|
sql_update.cc -- The UPDATE statement
|
|
@item
|
|
stacktrace.c -- Display stack trace (Linux/Intel only?)
|
|
@item
|
|
table.cc -- Table metadata retrieval, mostly
|
|
@item
|
|
thr_malloc.cc -- Mallocs used in threads
|
|
@item
|
|
time.cc -- Date and time functions
|
|
@item
|
|
udf_example.cc -- Example file of user-defined functions
|
|
@item
|
|
uniques.cc -- Function to handle quick removal of duplicates
|
|
@item
|
|
unireg.cc -- Create a unireg form file from a FIELD and field-info struct
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection sql-bench
|
|
|
|
The MySQL Benchmarks.
|
|
@*@*
|
|
|
|
This directory has the programs and input files which MySQL uses for
|
|
its comparisons of MySQL, PostgreSQL, mSQL, Solid, etc. Since MySQL
|
|
publishes the comparative results, it's only right that it should make
|
|
available all the material necessary to reproduce all the tests.
|
|
@*@*
|
|
|
|
There are five subdirectories and sub-subdirectories:
|
|
@itemize @bullet
|
|
@item
|
|
\Comments -- Comments about results from tests of Access, Adabas,
|
|
etc.
|
|
@item
|
|
\Data\ATIS -- .txt files containing input data for the "ATIS" tests
|
|
@item
|
|
\Data\Wisconsin -- .txt files containing input data for the
|
|
"Wisconsin" tests
|
|
@item
|
|
\Results -- old test results
|
|
@item
|
|
\Results-win32 -- old test results from Windows 32-bit tests
|
|
@end itemize
|
|
@*@*
|
|
|
|
There are twenty-four *.sh (shell script) files, which involve Perl
|
|
programs.
|
|
@*@*
|
|
|
|
There are three *.bat (batch) files.
|
|
@*@*
|
|
|
|
There is one README file and one TODO file.
|
|
@*@*
|
|
|
|
@subsection SSL
|
|
|
|
Secure Sockets Layer.
|
|
@*@*
|
|
|
|
This isn't a code directory. It contains a short note from Tonu Samuel
|
|
(the NOTES file) and seven *.pem files. PEM stands for "Privacy
|
|
Enhanced Mail" and is an Internet standard for adding security to
|
|
electronic mail. Finally, there are two short scripts for running
|
|
clients and servers over SSL connections.
|
|
@*@*
|
|
|
|
@subsection strings
|
|
|
|
The string library.
|
|
@*@*
|
|
|
|
Many of the files in this subdirectory are equivalent to well-known
|
|
functions that appear in most C string libraries. For those, there is
|
|
documentation available in most compiler handbooks.
|
|
@*@*
|
|
|
|
On the other hand, some of the files are MySQL additions or
|
|
improvements. Often the MySQL changes are attempts to optimize the
|
|
standard libraries. It doesn't seem that anyone tried to optimize for
|
|
recent Pentium class processors, though.
|
|
@*@*
|
|
|
|
The .C files are:
|
|
@itemize @bullet
|
|
@item
|
|
atof.c -- ascii-to-float, MySQL version
|
|
@item
|
|
bchange.c -- short replacement routine written by Monty Widenius in
|
|
1987
|
|
@item
|
|
bcmp.c -- binary compare, rarely used
|
|
@item
|
|
bcopy-duff.c -- block copy: attempt to copy memory blocks faster
|
|
than cmemcpy
|
|
@item
|
|
bfill.c -- byte fill, to fill a buffer with (length) copies of a
|
|
byte
|
|
@item
|
|
bmove.c -- block move
|
|
@item
|
|
bmove512.c -- "should be the fastest way to move a multiple of 512
|
|
bytes"
|
|
@item
|
|
bmove_upp.c -- bmove.c variant, starting with last byte
|
|
@item
|
|
bzero.c -- something like bfill with an argument of 0
|
|
@item
|
|
conf_to_src.c -- reading a configuration file (NOTE TO SELF ? what's
|
|
this doing here?)
|
|
@item
|
|
ctype*.c -- string handling programs for each char type MySQL
|
|
handles
|
|
@item
|
|
do_ctype.c -- display case-conversion and sort-conversion tables
|
|
@item
|
|
int2str.c -- integer-to-string
|
|
@item
|
|
is_prefix.c -- checks whether string1 starts with string2
|
|
@item
|
|
llstr.c -- convert long long to temporary-buffer string, return
|
|
pointer
|
|
@item
|
|
longlong2str.c -- ditto, but to argument-buffer
|
|
@item
|
|
memcmp.c -- memory compare
|
|
@item
|
|
memset.c -- memory set
|
|
@item
|
|
my_vsnprintf.c -- variant of printf
|
|
@item
|
|
r_strinstr.c -- see if one string is within another
|
|
@item
|
|
str2int.c -- convert string to integer
|
|
@item
|
|
strappend.c -- append one string to another
|
|
@item
|
|
strcat.c -- concatenate strings
|
|
@item
|
|
strcend.c -- point to where a character C occurs within str, or NULL
|
|
@item
|
|
strchr.c -- point to first place in string where character occurs
|
|
@item
|
|
strcmp.c -- compare two strings
|
|
@item
|
|
strcont.c -- point to where any one of a set of characters appears
|
|
@item
|
|
strend.c -- point to the '\0' byte which terminates str
|
|
@item
|
|
strfill.c -- fill a string with n copies of a byte
|
|
@item
|
|
strinstr.c -- find string within string
|
|
@item
|
|
strlen.c -- return length of string in bytes
|
|
@item
|
|
strmake.c -- move n characters, or move till end
|
|
@item
|
|
strmov.c -- move source to dest and return pointer to end
|
|
@item
|
|
strnlen.c -- return length of string, or return n
|
|
@item
|
|
strnmov.c -- move source to dest for source size, or for n bytes
|
|
@item
|
|
strrchr.c -- find a character within string, searching from end
|
|
@item
|
|
strstr.c -- find an instance of pattern within source
|
|
@item
|
|
strto.c -- string to long, to long long, to unsigned long, etc.
|
|
@item
|
|
strtol.c -- string to long
|
|
@item
|
|
strtoll.c -- string to long long
|
|
@item
|
|
strtoul.c -- string to unsigned long
|
|
@item
|
|
strtoull.c -- string to unsigned long long
|
|
@item
|
|
strxmov.c -- move a series of concatenated source strings to dest
|
|
@item
|
|
strxnmov.c -- like strxmov.c but with a maximum length n
|
|
@item
|
|
str_test.c -- test of all the string functions encoded in assembler
|
|
@item
|
|
udiv.c -- unsigned long divide
|
|
@item
|
|
xml.c -- read and parse XML strings
|
|
@end itemize
|
|
@*@*
|
|
|
|
There are also four .ASM files -- macros.asm, ptr_cmp.asm,
|
|
strings.asm, and strxmov.asm -- which can replace some of the
|
|
C-program functions. But again, they look like optimizations for old
|
|
members of the Intel processor family.
|
|
@*@*
|
|
|
|
@subsection support-files
|
|
|
|
Support files.
|
|
@*@*
|
|
|
|
The files here are for building ("making") MySQL given a package
|
|
manager, compiler, linker, and other build tools. The support files
|
|
provide instructions and switches for the build processes.
|
|
@*@*
|
|
|
|
@subsection tests
|
|
|
|
Tests in Perl.
|
|
@*@*
|
|
|
|
These are tests that were run once to check for bugs in various
|
|
scenarios: forks, locks, big records, exporting, truncating, etc.
|
|
@*@*
|
|
|
|
@subsection tools
|
|
|
|
Tools -- well, actually, one tool.
|
|
@*@*
|
|
|
|
The only file is:
|
|
@itemize @bullet
|
|
@item
|
|
mysqlmanager.c -- A "server management daemon" by Sasha Pachev
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection VC++Files
|
|
|
|
Visual C++ Files.
|
|
@*@*
|
|
|
|
Includes this entire directory, repeated for VC++ (Windows) use.
|
|
@*@*
|
|
|
|
VC++Files has subdirectories which are copies of the main directories.
|
|
For example there is a subdirectory \VC++Files\heap, which has the
|
|
same files as \heap. So for a description of the files in
|
|
\VC++Files\heap, see the description of the files in \heap. The same
|
|
applies for almost all of VC++Files's subdirectories (bdb, client,
|
|
isam, libmysql, etc.). The difference is that the \VC++Files variants
|
|
are specifically for compilation with Microsoft Visual C++ in 32-bit
|
|
Windows environments.
|
|
@*@*
|
|
|
|
In addition to the "subdirectories which are duplicates of
|
|
directories", VC++Files contains these subdirectories, which are not
|
|
duplicates:
|
|
@itemize @bullet
|
|
@item
|
|
comp_err -- (nearly empty)
|
|
@item
|
|
contrib -- (nearly empty)
|
|
@item
|
|
InstallShield script files
|
|
@item
|
|
isamchk -- (nearly empty)
|
|
@item
|
|
libmysqltest -- one small non-MySQL test program: mytest.c
|
|
@item
|
|
myisamchk -- (nearly empty)
|
|
@item
|
|
myisamlog -- (nearly empty)
|
|
@item
|
|
myisammrg -- (nearly empty)
|
|
@item
|
|
mysqlbinlog -- (nearly empty)
|
|
@item
|
|
mysqlmanager -- MFC foundation class files created by AppWizard
|
|
@item
|
|
mysqlserver -- (nearly empty)
|
|
@item
|
|
mysqlshutdown -- one short program, mysqlshutdown.c
|
|
@item
|
|
mysqlwatch.c -- Windows service initialization and monitoring
|
|
@item
|
|
my_print_defaults -- (nearly empty)
|
|
@item
|
|
pack_isam -- (nearly empty)
|
|
@item
|
|
perror -- (nearly empty)
|
|
@item
|
|
prepare -- (nearly empty)
|
|
@item
|
|
replace -- (nearly empty)
|
|
@item
|
|
SCCS -- source code control system
|
|
@item
|
|
test1 -- tests connecting via X threads
|
|
@item
|
|
thr_insert_test -- (nearly empty)
|
|
@item
|
|
thr_test -- one short program used to test for memory-allocation bug
|
|
@item
|
|
winmysqladmin -- the winmysqladmin.exe source. machine-generated?
|
|
@end itemize
|
|
@*@*
|
|
|
|
@subsection vio
|
|
|
|
Virtual I/O Library.
|
|
@*@*
|
|
|
|
The VIO routines are wrappers for the various network I/O calls that
|
|
happen with different protocols. The idea is that in the main modules
|
|
one won't have to write separate bits of code for each protocol. Thus
|
|
vio's purpose is somewhat like the purpose of Microsoft's winsock
|
|
library.
|
|
@*@*
|
|
|
|
The underlying protocols at this moment are: TCP/IP, Named Pipes (for
|
|
WindowsNT), Shared Memory, and Secure Sockets (SSL).
|
|
@*@*
|
|
|
|
The C programs are:
|
|
@itemize @bullet
|
|
@item
|
|
test-ssl.c -- Short standalone test program: SSL
|
|
@item
|
|
test-sslclient.c -- Short standalone test program: clients
|
|
@item
|
|
test-sslserver.c -- Short standalone test program: server
|
|
@item
|
|
vio.c -- Declarations + open/close functions
|
|
@item
|
|
viosocket.c -- Send/retrieve functions
|
|
@item
|
|
viossl.c -- SSL variations for the above
|
|
@item
|
|
viosslfactories.c -- Certification / Verification
|
|
@item
|
|
viotest.cc -- Short standalone test program: general
|
|
@item
|
|
viotest-ssl.c -- Short standalone test program: SSL
|
|
@item
|
|
viotest-sslconnect.cc -- Short standalone test program: SSL connect
|
|
@end itemize
|
|
@*@*
|
|
|
|
The older functions -- raw_net_read, raw_net_write -- are now
|
|
obsolete.
|
|
@*@*
|
|
|
|
@subsection zlib
|
|
|
|
Data compression library.
|
|
@*@*
|
|
|
|
Zlib -- which presumably stands for "Zip Library" -- is not a MySQL
|
|
package. It was produced by the GNU Zip (gzip.org) people. Zlib is a
|
|
variation of the famous "Lempel-Ziv" method, which is also used by
|
|
"Zip". The method for reducing the size of any arbitrary string of
|
|
bytes is as follows:
|
|
@itemize @bullet
|
|
@item
|
|
Find a substring which occurs twice in the string.
|
|
@item
|
|
Replace the second occurrence of the substring with (a) a pointer to
|
|
the first occurrence, plus (b) an indication of the length of the
|
|
first occurrence.
|
|
@end itemize
|
|
|
|
There is a full description of the library's functions in the gzip
|
|
manual at: @*
|
|
@url{http://www.gzip.org/zlib/manual.html} @*
|
|
There is therefore no need to list the modules in this document.
|
|
@*@*
|
|
|
|
The MySQL program that uses zlib is \mysys\my_compress.c. The use is
|
|
for packet compression. The client sends messages to the server which
|
|
are compressed by zlib. See also: \sql\net_serv.cc.
|
|
|
|
@node Files in InnoDB Sources,,Files in MySQL Sources,Top
|
|
@chapter Annotated List Of Files in the InnoDB Source Code Distribution
|
|
|
|
ERRATUM BY HEIKKI TUURI (START)
|
|
@*@*
|
|
|
|
Errata about InnoDB row locks:@*@*
|
|
|
|
@example
|
|
#define LOCK_S 4 /* shared */
|
|
#define LOCK_X 5 /* exclusive */
|
|
...
|
|
@strong{/* Waiting lock flag */}
|
|
#define LOCK_WAIT 256
|
|
/* this wait bit should be so high that it can be ORed to the lock
|
|
mode and type; when this bit is set, it means that the lock has not
|
|
yet been granted, it is just waiting for its turn in the wait queue */
|
|
...
|
|
@strong{/* Precise modes */}
|
|
#define LOCK_ORDINARY 0
|
|
/* this flag denotes an ordinary next-key lock in contrast to LOCK_GAP
|
|
or LOCK_REC_NOT_GAP */
|
|
#define LOCK_GAP 512
|
|
/* this gap bit should be so high that it can be ORed to the other
|
|
flags; when this bit is set, it means that the lock holds only on the
|
|
gap before the record; for instance, an x-lock on the gap does not
|
|
give permission to modify the record on which the bit is set; locks of
|
|
this type are created when records are removed from the index chain of
|
|
records */
|
|
#define LOCK_REC_NOT_GAP 1024
|
|
/* this bit means that the lock is only on the index record and does
|
|
NOT block inserts to the gap before the index record; this is used in
|
|
the case when we retrieve a record with a unique key, and is also used
|
|
in locking plain SELECTs (not part of UPDATE or DELETE) when the user
|
|
has set the READ COMMITTED isolation level */
|
|
#define LOCK_INSERT_INTENTION 2048
|
|
/* this bit is set when we place a waiting gap type record lock
|
|
request in order to let an insert of an index record to wait until
|
|
there are no conflicting locks by other transactions on the gap; note
|
|
that this flag remains set when the waiting lock is granted, or if the
|
|
lock is inherited to a neighboring record */
|
|
@end example
|
|
@*
|
|
|
|
ERRATUM BY HEIKKI TUURI (END)
|
|
@*@*
|
|
|
|
The InnoDB source files are the best place to look for information
|
|
about internals of the file structure that MySQLites can optionally
|
|
use for transaction support. But when you first look at all the
|
|
subdirectories and file names you'll wonder: Where Do I Start? It can
|
|
be daunting.
|
|
@*@*
|
|
|
|
Well, I've been through that phase, so I'll pass on what I had to
|
|
learn on the first day that I looked at InnoDB source files. I am very
|
|
sure that this will help you grasp, in overview, the organization of
|
|
InnoDB modules. I'm also going to add comments about what is going on
|
|
-- which you should mistrust! These comments are reasonable working
|
|
hypotheses; nevertheless, they have not been subjected to expert peer
|
|
review.
|
|
@*@*
|
|
|
|
Here's how I'm going to organize the discussion. I'll take each of the
|
|
32 InnoDB subdirectories that come with the MySQL 4.0 source code in
|
|
\mysql\innobase (on my Windows directory). The format of each section
|
|
will be like this every time:
|
|
@*@*
|
|
|
|
@strong{\subdirectory-name (LONGER EXPLANATORY NAME)}@*
|
|
@multitable @columnfractions .10 .20 .40 .50
|
|
@item @strong{File Name} @tab @strong{What Name Stands For} @tab @strong{Size} @tab @strong{Comment Inside File}
|
|
@item file-name
|
|
@tab my-own-guess
|
|
@tab in-bytes
|
|
@tab from-the-file-itself
|
|
@end multitable
|
|
...@*
|
|
My-Comments@*
|
|
@*
|
|
|
|
For example: @*
|
|
@example
|
|
"
|
|
@strong{\ha (HASHING)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
ha0ha.c Hashing/Hashing 7,452 Hash table with external chains
|
|
|
|
Comments about hashing will be here.
|
|
"
|
|
@end example
|
|
@*
|
|
|
|
The "Comment Inside File" column is a direct copy from the first /*
|
|
comment */ line inside the file. All other comments are mine. After
|
|
I've discussed each directory, I'll finish with some notes about
|
|
naming conventions and a short list of URLs that you can use for
|
|
further reference.
|
|
@*@*
|
|
|
|
Now let's begin.
|
|
@*@*
|
|
|
|
@example
|
|
|
|
@strong{\ha (HASHING)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
ha0ha.c Hashing / Hashing 7,452 Hash table with external chains
|
|
|
|
I'll hold my comments until the next section, \hash (HASHING).
|
|
|
|
@strong{\hash (HASHING)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
hash0hash.c Hashing / Hashing 3,257 Simple hash table utility
|
|
|
|
The two C programs in the \ha and \hashing directories -- ha0ha.c and
|
|
hash0hash.c -- both refer to a "hash table" but hash0hash.c is
|
|
specialized, it is mostly about accessing points in the table under
|
|
mutex control.
|
|
|
|
When a "database" is so small that InnoDB can load it all into memory
|
|
at once, it's more efficient to access it via a hash table. After all,
|
|
no disk i/o can be saved by using an index lookup, if there's no disk.
|
|
|
|
@strong{\os (OPERATING SYSTEM)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
os0shm.c OS / Shared Memory 3,150 To shared memory primitives
|
|
os0file.c OS / File 64,412 To i/o primitives
|
|
os0thread.c OS / Thread 6,827 To thread control primitives
|
|
os0proc.c OS / Process 3,700 To process control primitives
|
|
os0sync.c OS / Synchronization 10,208 To synchronization primitives
|
|
|
|
This is a group of utilities that other modules may call whenever they
|
|
want to use an operating-system resource. For example, in os0file.c
|
|
there is a public InnoDB function named os_file_create_simple(), which
|
|
simply calls the Windows-API function CreateFile. Naturally the
|
|
contents of this group are somewhat different for other operating systems.
|
|
|
|
The "Shared Memory" functions in os0shm.c are only called from the
|
|
communications program com0shm.c (see \com COMMUNICATIONS). The i/o
|
|
and thread-control primitives are called extensively. The word
|
|
"synchronization" in this context refers to the mutex-create and
|
|
mutex-wait functionality.
|
|
|
|
@strong{\ut (UTILITIES)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
ut0ut.c Utilities / Utilities 7,041 Various utilities
|
|
ut0byte.c Utilities / Debug 1,856 Byte utilities
|
|
ut0rnd.c Utilities / Random 1,475 Random numbers and hashing
|
|
ut0mem.c Utilities / Memory 5,530 Memory primitives
|
|
ut0dbg.c Utilities / Debug 642 Debug utilities
|
|
|
|
The two functions in ut0byte.c are just for lower/upper case
|
|
conversion and comparison. The single function in ut0rnd.c is for
|
|
finding a prime slightly greater than the given argument, which is
|
|
useful for hash functions, but unrelated to randomness. The functions
|
|
in ut0mem.c are wrappers for "malloc" and "free" calls -- for the
|
|
real "memory" module see section \mem (MEMORY). Finally, the
|
|
functions in ut0ut.c are a miscellany that didn't fit better elsewhere:
|
|
get_high_bytes, clock, time, difftime, get_year_month_day, and "sprintf"
|
|
for various diagnostic purposes.
|
|
|
|
In short: the \ut group is trivial.
|
|
|
|
@strong{\buf (BUFFERING)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
buf0buf.c Buffering / Buffering 53,246 The database buffer buf_pool
|
|
buf0flu.c Buffering / Flush 23,711 ... flush algorithm
|
|
buf0lru.c / least-recently-used 20,245 ... replacement algorithm
|
|
buf0rea.c Buffering / read 17,399 ... read
|
|
|
|
There is a separate file group (\mem MEMORY) which handles memory
|
|
requests in general.A "buffer" usually has a more specific
|
|
definition, as a memory area which contains copies of pages that
|
|
ordinarily are in the main data file. The "buffer pool" is the set
|
|
of all buffers (there are lots of them because InnoDB doesn't
|
|
depend on the OS's caching to make things faster).
|
|
|
|
The pool size is fixed (at the time of this writing) but the rest of
|
|
the buffering architecture is sophisticated, involving a host of
|
|
control structures. In general: when InnoDB needs to access a new page
|
|
it looks first in the buffer pool; InnoDB reads from disk to a new
|
|
buffer when the page isn't there; InnoDB chucks old buffers (basing
|
|
its decision on a conventional Least-Recently-Used algorithm) when it
|
|
has to make space for a new buffer.
|
|
|
|
There are routines for checking a page's validity, and for read-ahead.
|
|
An example of "read-ahead" use: if a sequential scan is going on, then
|
|
a DBMS can read more than one page at a time, which is efficient
|
|
because reading 32,768 bytes (two pages) takes less than twice as long
|
|
as reading 16,384 bytes (one page).
|
|
|
|
@strong{\btr (B-TREE)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
btr0btr.c B-tree / B-tree 74,255 B-tree
|
|
btr0cur.c B-tree / Cursor 94,950 index tree cursor
|
|
btr0sea.c B-tree / Search 36,580 index tree adaptive search
|
|
btr0pcur.c B-tree / persistent cursor 14,548 index tree persistent cursor
|
|
|
|
If you total up the sizes of the C files, you'll see that \btr is the
|
|
second-largest file group in InnoDB. This is understandable because
|
|
maintaining a B-tree is a relatively complex task. Luckily, there has
|
|
been a lot of work done to describe efficient management of B-tree and
|
|
B+-tree structures, much of it open-source or public-domain, since
|
|
their original invention over thirty years ago.
|
|
|
|
InnoDB likes to put everything in B-trees. This is what I'd call a
|
|
"distinguishing characteristic" because in all the major DBMSs (like
|
|
IBM DB2, Microsoft SQL Server, and Oracle), the main or default or
|
|
classic structure is the heap-and-index. In InnoDB the main structure
|
|
is just the index. To put it another way: InnoDB keeps the rows in the
|
|
leaf node of the index, rather than in a separate file. Compare
|
|
Oracle's Index Organized Tables, and Microsoft SQL Server's Clustered
|
|
Indexes.
|
|
|
|
This, by the way, has some consequences. For example, you may as well
|
|
have a primary key since otherwise InnoDB will make one anyway. And
|
|
that primary key should be the shortest of the candidate keys, since
|
|
InnoDB
|
|
will use it as a pointer if there are secondary indexes.
|
|
|
|
Most importantly, it means that rows have no fixed address. Therefore
|
|
the routines for managing file pages should be good. We'll see about
|
|
that when we look at the \row (ROW) program group later.
|
|
|
|
@strong{\com (COMMUNCATION)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
com0com.c Communication 6,913 Communication primitives
|
|
com0shm.c Communication / 24,633 ... through shared memory
|
|
Shared Memory
|
|
|
|
The communication primitives in com0com.c are said to be modelled
|
|
after the ones in Microsoft's winsock library (the Windows Sockets
|
|
interface). The communication primitives in com0shm.c are at a
|
|
slightly lower level, and are called from the routines in com0com.c.
|
|
|
|
I was interested in seeing how InnoDB would handle inter-process
|
|
communication, since there are many options -- named pipes, TCP/IP,
|
|
Windows messaging, and Shared Memory being the main ones that come to
|
|
mind. It appears that InnoDB prefers Shared Memory. The main idea is:
|
|
there is an area of memory which two different processes (or threads,
|
|
of course) can both access. To communicate, a thread gets an
|
|
appropriate mutex, puts in a request, and waits for a response. Thread
|
|
interaction is also a subject for the os0thread.c program in another
|
|
program group, \os (OPERATING SYSTEM).
|
|
|
|
@strong{\dyn (DYNAMICALLY ALLOCATED ARRAY)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
dyn0dyn.c Dynamic / Dynamic 994 dynamically allocated array
|
|
|
|
There is a single function in the dyn0dyn.c program, for adding a
|
|
block to the dynamically allocated array. InnoDB might use the array
|
|
for managing concurrency between threads.
|
|
|
|
At the moment, the \dyn program group is trivial.
|
|
|
|
@strong{\fil (FILE)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
fil0fil.c File / File 39,725 The low-level file system
|
|
|
|
The reads and writes to the database files happen here, in
|
|
co-ordination with the low-level file i/o routines (see os0file.h in
|
|
the \os program group).
|
|
|
|
Briefly: a table's contents are in pages, which are in files, which
|
|
are in tablespaces. Files do not grow; instead one can add new files
|
|
to the tablespace. As we saw earlier (discussing the \btr program group)
|
|
the pages are nodes of B-trees. Since that's the case, new additions can
|
|
happen at various places in the logical file structure, not
|
|
necessarily at the end. Reads and writes are asynchronous, and go into
|
|
buffers, which are set up by routines in the \buf program group.
|
|
|
|
@strong{\fsp (FILE SPACE)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
fsp0fsp.c File Space Management 100,271 File space management
|
|
|
|
I would have thought that the \fil (FILE) and \fsp (FILE SPACE)
|
|
MANAGEMENT programs would fit together in the same program group;
|
|
however, I guess the InnoDB folk are splitters rather than lumpers.
|
|
|
|
It's in fsp0fsp.c that one finds some of the descriptions and comments
|
|
of extents, segments, and headers. For example, the "descriptor bitmap
|
|
of the pages in the extent" is in here, and you can find as well how
|
|
the free-page list is maintained, what's in the bitmaps, and what
|
|
various header fields' contents are.
|
|
|
|
@strong{\fut (FILE UTILITY)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
fut0fut.c File Utility / Utility 293 File-based utilities
|
|
fut0lst.c File Utility / List 14,129 File-based list utilities
|
|
|
|
Mainly these small programs affect only file-based lists, so maybe
|
|
saying "File Utility" is too generic. The real work with data files
|
|
goes on in the \fsp program group.
|
|
|
|
@strong{\log (LOGGING)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
log0log.c Logging / Logging 77,834 Database log
|
|
log0recv.c Logging / Recovery 80,701 Recovery
|
|
|
|
I've already written about the \log program group, so here's a link to
|
|
my previous article: "How Logs work with MySQL and InnoDB":
|
|
@url{http://www.devarticles.com/art/1/181/2}
|
|
|
|
@strong{\mem (MEMORY)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
mem0mem.c Memory / Memory 9,971 The memory management
|
|
mem0dbg.c Memory / Debug 21,297 ... the debug code
|
|
mem0pool.c Memory / Pool 16,293 ... the lowest level
|
|
|
|
There is a long comment at the start of the mem0pool.c program, which
|
|
explains what the memory-consumers are, and how InnoDB tries to
|
|
satisfy them. The main thing to know is that there are really three
|
|
pools: the buffer pool (see the \buf program group), the log pool (see the \log
|
|
program group), and the common pool, which is where everything that's
|
|
not in the buffer or log pools goes (for example the parsed SQL
|
|
statements and the data dictionary cache).
|
|
|
|
@strong{\mtr (MINI-TRANSACTION)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
mtr0mtr.c Mini-transaction / 12,433 Mini-transaction buffer
|
|
mtr0log.c Mini-transaction / Log 8,180 ... log routines
|
|
|
|
The mini-transaction routines are called from most of the other
|
|
program groups. I'd describe this as a low-level utility set.
|
|
|
|
@strong{\que (QUERY GRAPH)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
que0que.c Query Graph / Query 35,964 Query graph
|
|
|
|
The program que0que.c ostensibly is about the execution of stored
|
|
procedures which contain commit/rollback statements. I took it that
|
|
this has little importance for the average MySQL user.
|
|
|
|
@strong{\rem (RECORD MANAGER)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
rem0rec.c Record Manager 14,961 Record Manager
|
|
rem0cmp.c Record Manager / 25,263 Comparison services for records
|
|
Comparison
|
|
|
|
There's an extensive comment near the start of rem0rec.c title
|
|
"Physical Record" and it's recommended reading. At some point you'll
|
|
ask what are all those bits that surround the data in the rows on a page,
|
|
and this is where you'll find the answer.
|
|
|
|
@strong{\row (ROW)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
row0row.c Row / Row 16,764 General row routines
|
|
row0uins.c Row / Undo Insert 7,199 Fresh insert undo
|
|
row0umod.c Row / Undo Modify 17,147 Undo modify of a row
|
|
row0undo.c Row / Undo 10,254 Row undo
|
|
row0vers.c Row / Version 12,288 Row versions
|
|
row0mysql.c Row / MySQL 63,556 Interface [to MySQL]
|
|
row0ins.c Row / Insert 42,829 Insert into a table
|
|
row0sel.c Row / Select 85,923 Select
|
|
row0upd.c Row / Update 44,456 Update of a row
|
|
row0purge.c Row / Purge 14,961 Purge obsolete records
|
|
|
|
Rows can be selected, inserted, updated/deleted, or purged (a
|
|
maintenance activity). These actions have ancillary actions, for
|
|
example after insert there can be an index-update test, but it seems
|
|
to me that sometimes the ancillary action has no MySQL equivalent (yet)
|
|
and so is inoperative.
|
|
|
|
Speaking of MySQL, notice that one of the larger programs in the \row
|
|
program group is the "interface between Innobase row operations and
|
|
MySQL" (row0mysql.c) -- information interchange happens at this level
|
|
because rows in InnoDB and in MySQL are analogous, something which
|
|
can't be said for pages and other levels.
|
|
|
|
@strong{\srv (Server)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
srv0srv.c Server / Server 79,058 Server main program
|
|
srv0que.c Server / Query 2,361 Server query execution
|
|
srv0start.c Server / Start 34,586 Starts the server
|
|
|
|
This is where the server reads the initial configuration files, splits
|
|
up the threads, and gets going. There is a long comment deep in the
|
|
program (you might miss it at first glance) titled "IMPLEMENTATION OF
|
|
THE SERVER MAIN PROGRAM" in which you'll find explanations about
|
|
thread priority, and about what the responsibiities are for various
|
|
thread types.
|
|
|
|
InnoDB has many threads, for example "user threads" (which wait for
|
|
client requests and reply to them), "parallel communication threads"
|
|
(which take part of a user thread's job if a query process can be
|
|
split), "utility threads" (background priority), and a "master thread"
|
|
(high priority, usually asleep).
|
|
|
|
@strong{\thr (Thread Local Storage)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
thr0loc.c Thread / Local 5,261 The thread local storage
|
|
|
|
InnoDB doesn't use the Windows-API thread-local-storage functions,
|
|
perhaps because they're not portable enough.
|
|
|
|
@strong{\trx (Transaction)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
trx0trx.c Transaction / 37,447 The transaction
|
|
trx0purge.c Transaction / Purge 26,782 ... Purge old versions
|
|
trx0rec.c Transaction / Record 36,525 ... Undo log record
|
|
trx0sys.c Transaction / System 20,671 ... System
|
|
trx0rseg.c / Rollback segment 6,214 ... Rollback segment
|
|
trx0undo.c Transaction / Undo 46,595 ... Undo log
|
|
|
|
InnoDB's transaction management is supposedly "in the style of Oracle"
|
|
and that's close to true but can mislead you.
|
|
@itemize
|
|
@item
|
|
First: InnoDB uses rollback segments like Oracle8i does -- but
|
|
Oracle9i uses a different name
|
|
@item
|
|
Second: InnoDB uses multi-versioning like Oracle does -- but I see
|
|
nothing that looks like an Oracle ITL being stored in the InnoDB data
|
|
pages.
|
|
@item
|
|
Third: InnoDB and Oracle both have short (back-to-statement-start)
|
|
versioning for the READ COMMITTED isolation level and long
|
|
(back-to-transaction-start) versioning for higher levels -- but InnoDB
|
|
and Oracle have different "default" isolation levels.
|
|
@item
|
|
Finally: InnoDB's documentation says it has to lock "the gaps before
|
|
index keys" to prevent phantoms -- but any Oracle user will tell you that
|
|
phantoms are impossible anyway at the SERIALIZABLE isolation level, so
|
|
key-locks are unnecessary.
|
|
@end itemize
|
|
|
|
The main idea, though, is that InnoDB has multi-versioning. So does
|
|
Oracle. This is very different from the way that DB2 and SQL Server do
|
|
things.
|
|
|
|
@strong{\usr (USER)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
usr0sess.c User / Session 27,415 Sessions
|
|
|
|
One user can have multiple sessions (the session being all the things
|
|
that happen betweeen a connect and disconnect). This is where InnoDB
|
|
tracks session IDs, and server/client messaging. It's another of those
|
|
items which is usually MySQL's job, though.
|
|
|
|
@strong{\data (DATA)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
data0data.c Data / Data 26,002 SQL data field and tuple
|
|
data0type.c Data / Type 2,122 Data types
|
|
|
|
This is a collection of minor utility routines affecting rows.
|
|
|
|
@strong{\dict (DICTIONARY)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
dict0dict.c Dictionary / Dictionary 84,667 Data dictionary system
|
|
dict0boot.c Dictionary / boot 12,134 ... creation and booting
|
|
dict0load.c Dictionary / load 26,546 ... load to memory cache
|
|
dict0mem.c Dictionary / memory 8,221 ... memory object creation
|
|
|
|
The data dictionary (known in some circles as the catalog) has the
|
|
metadata information about objects in the database -- column sizes,
|
|
table names, and the like.
|
|
|
|
@strong{\eval (EVALUATING)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
eval0eval.c Evaluating/Evaluating 15,682 SQL evaluator
|
|
eval0proc.c Evaluating/Procedures 5,000 Executes SQL procedures
|
|
|
|
The evaluating step is a late part of the process of interpreting an
|
|
SQL statement -- parsing has already occurred during \pars (PARSING).
|
|
|
|
The ability to execute SQL stored procedures is an InnoDB feature, but
|
|
not a MySQL feature, so the eval0proc.c program is unimportant.
|
|
|
|
@strong{\ibuf (INSERT BUFFER)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
ibuf0ibuf.c Insert Buffer / 69,884 Insert buffer
|
|
|
|
The words "Insert Buffer" mean not "buffer used for INSERT" but
|
|
"insertion of a buffer into the buffer pool" (see the \buf BUFFER
|
|
program group description). The matter is complex due to possibilities
|
|
for deadlocks, a problem to which the comments in the ibuf0ibuf.c
|
|
program devote considerable attention.
|
|
|
|
@strong{\mach (MACHINE FORMAT)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
mach0data.c Machine/Data 2,319 Utilities for converting
|
|
|
|
The mach0data.c program has two small routines for reading compressed
|
|
ulints (unsigned long integers).
|
|
|
|
@strong{\lock (LOCKING)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
lock0lock.c Lock / Lock 127,646 The transaction lock system
|
|
|
|
If you've used DB2 or SQL Server, you might think that locks have their
|
|
own in-memory table, that row locks might need occasional escalation to
|
|
table locks, and that there are three lock types: Shared, Update, Exclusive.
|
|
|
|
All those things are untrue with InnoDB! Locks are kept in the database
|
|
pages. A bunch of row locks can't be rolled together into a single table
|
|
lock. And most importantly there's only one lock type. I call this type
|
|
"Update" because it has the characteristics of DB2 / SQL Server Update
|
|
locks, that is, it blocks other updates but doesn't block reads.
|
|
Unfortunately, InnoDB comments refer to them as "x-locks" etc.
|
|
|
|
To sum it up: if your background is Oracle you won't find too much
|
|
surprising, but if your background is DB2 or SQL Server the locking
|
|
concepts and terminology will probably confuse you at first.
|
|
|
|
You can find an online article about the differences between
|
|
Oracle-style and DB2/SQL-Server-style locks at:
|
|
@url{http://dbazine.com/gulutzan6.html}
|
|
|
|
@strong{\odbc (ODBC)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
odbc0odbc.c ODBC / ODBC 16,865 ODBC client library
|
|
|
|
The odbc0odbc.c program has a small selection of old ODBC-API
|
|
functions: SQLAllocEnv, SQLAllocConnect, SQLAllocStmt, SQLConnect,
|
|
SQLError, SQLPrepare, SQLBindParameter, SQLExecute.
|
|
|
|
@strong{\page (PAGE)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
page0page.c Page / Page 44,309 Index page routines
|
|
page0cur.c Page / Cursor 30,305 The page cursor
|
|
|
|
It's in the page0page.c program that you'll learn as follows: index
|
|
pages start with a header, entries in the page are in order, at the
|
|
end of the page is a sparse "page directory" (what I would have called
|
|
a slot table) which makes binary searches easier.
|
|
|
|
Incidentally, the program comments refer to "a page size of 8 kB"
|
|
which seems obsolete. In univ.i (a file containing universal
|
|
constants) the page size is now #defined as 16KB.
|
|
|
|
@strong{\pars (PARSING)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
pars0pars.c Parsing/Parsing 49,947 SQL parser
|
|
pars0grm.c Parsing/Grammar 62,685 A Bison parser
|
|
pars0opt.c Parsing/Optimizer 30,809 Simple SQL Optimizer
|
|
pars0sym.c Parsing/Symbol Table 5,541 SQL parser symbol table
|
|
lexyy.c ?/Lexer 59,948 Lexical scanner
|
|
|
|
The job is to input a string containing an SQL statement and output an
|
|
in-memory parse tree. The EVALUATING (subdirectory \eval) programs
|
|
will use the tree.
|
|
|
|
As is common practice, the Bison and Flex tools were used -- pars0grm.c
|
|
is what the Bison parser produced from an original file named pars0grm.y
|
|
(not supplied), and lexyy.c is what Flex produced.
|
|
|
|
Since InnoDB is a DBMS by itself, it's natural to find SQL parsing in
|
|
it. But in the MySQL/InnoDB combination, MySQL handles most of the
|
|
parsing. These files are unimportant.
|
|
|
|
@strong{\read (READ)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
read0read.c Read / Read 6,244 Cursor read
|
|
|
|
The read0read.c program opens a "read view" of a query result, using
|
|
some functions in the \trx program group.
|
|
|
|
@strong{\sync (SYNCHRONIZATION)}
|
|
File Name What Name Stands For Size Comment Inside File
|
|
--------- -------------------- ------ -------------------
|
|
sync0sync.c Synchronization / 35,918 Mutex, the basic sync primitive
|
|
sync0arr.c ... / array 26,461 Wait array used in primitives
|
|
sync0ipm.c ... / interprocess 4,027 for interprocess sync
|
|
sync0rw.c ... / read-write 22,220 read-write lock for thread sync
|
|
|
|
A mutex (Mutual Exclusion) is an object which only one thread/process
|
|
can hold at a time. Any modern operating system API has some functions
|
|
for mutexes; however, as the comments in the sync0sync.c code indicate, it
|
|
can be faster to write one's own low-level mechanism. In fact the old
|
|
assembly-language XCHG trick is in here -- this is the only program
|
|
that contains any assembly code.
|
|
@end example
|
|
@*
|
|
@*
|
|
|
|
This is the end of the section-by-section account of InnoDB
|
|
subdirectories.
|
|
@*@*
|
|
|
|
@strong{A Note About File Naming} @*@*
|
|
|
|
There appears to be a naming convention. The first letters of the file
|
|
name are the same as the subdirectory name, then there is a '0'
|
|
separator, then there is an individual name. For the main program in a
|
|
subdirectory, the individual name may be a repeat of the subdirectory
|
|
name. For example, there is a file named ha0ha.c (the first two
|
|
letters ha mean "it's in in subdirectory ..\ha", the next letter 0
|
|
means "0 separator", the next two letters mean "this is the main ha
|
|
program"). This naming convention is not strict, though: for example
|
|
the file lexyy.c is in the \pars subdirectory.
|
|
@*@*
|
|
|
|
@strong{A Note About Copyrights} @*@*
|
|
|
|
Most of the files begin with a copyright notice or a creation date,
|
|
for example "Created 10/25/1995 Heikki Tuuri". I don't know a great
|
|
deal about the history of InnoDB, but found it interesting that most
|
|
creation dates were between 1994 and 1998.
|
|
@*@*
|
|
|
|
@strong{References} @*@*
|
|
|
|
Ryan Bannon, Alvin Chin, Faryaaz Kassam and Andrew Roszko @*
|
|
"InnoDB Concrete Architecture" @*
|
|
@url{http://www.swen.uwaterloo.ca/~mrbannon/cs798/assignment_02/innodb.pdf}
|
|
|
|
A student paper. It's an interesting attempt to figure out InnoDB's
|
|
architecture using tools, but I didn't end up using it for the specific
|
|
purposes of this article.
|
|
@*@*
|
|
|
|
Peter Gulutzan @*
|
|
"How Logs Work With MySQL And InnoDB" @*
|
|
@url{http://www.devarticles.com/art/1/181/2}
|
|
@*@*
|
|
|
|
Heikki Tuuri @*
|
|
"InnoDB Engine in MySQL-Max-3.23.54 / MySQL-4.0.9: The Up-to-Date
|
|
Reference Manual of InnoDB" @*
|
|
@url{http://www.innodb.com/ibman.html}
|
|
|
|
This is the natural starting point for all InnoDB information. Mr
|
|
Tuuri also appears frequently on MySQL forums.
|
|
@*@*
|
|
|
|
@summarycontents
|
|
@contents
|
|
|
|
@bye
|