mirror of
https://github.com/MariaDB/server.git
synced 2025-01-26 00:34:18 +01:00
2566 lines
84 KiB
Text
2566 lines
84 KiB
Text
\input texinfo @c -*-texinfo-*-
|
|
@c Copyright 2002 MySQL AB
|
|
@c
|
|
@c %**start of header
|
|
@setfilename internals.info
|
|
|
|
@c We want the types in the same index
|
|
@synindex cp fn
|
|
|
|
@iftex
|
|
@afourpaper
|
|
@end iftex
|
|
|
|
@c Get version and other info
|
|
@include include.texi
|
|
|
|
@ifclear tex-debug
|
|
@c This removes the black squares in the right margin
|
|
@finalout
|
|
@end ifclear
|
|
|
|
@c Set background for HTML
|
|
@set _body_tags BGCOLOR=#FFFFFF TEXT=#000000 LINK=#101090 VLINK=#7030B0
|
|
@settitle @strong{MySQL} Internals Manual for version @value{mysql_version}.
|
|
@setchapternewpage odd
|
|
@paragraphindent 0
|
|
|
|
@c %**end of header
|
|
|
|
@ifinfo
|
|
@format
|
|
START-INFO-DIR-ENTRY
|
|
* mysql-internals: (mysql-internals). @strong{MySQL} internals.
|
|
END-INFO-DIR-ENTRY
|
|
@end format
|
|
@end ifinfo
|
|
|
|
@titlepage
|
|
@sp 10
|
|
@center @titlefont{@strong{MySQL} Internals Manual}
|
|
@sp 10
|
|
@center Copyright @copyright{} 1998-2002 MySQL AB
|
|
@page
|
|
@end titlepage
|
|
|
|
@node Top, caching, (dir), (dir)
|
|
|
|
@ifinfo
|
|
This is a manual about @strong{MySQL} internals.
|
|
@end ifinfo
|
|
|
|
@menu
|
|
* caching:: How MySQL Handles Caching
|
|
* flush tables:: How MySQL Handles @code{FLUSH TABLES}
|
|
* filesort:: How MySQL Does Sorting (@code{filesort})
|
|
* selects:: How MySQL performs different selects
|
|
* transformations:: How MySQL transforms subqueries
|
|
* coding guidelines:: Coding Guidelines
|
|
* mysys functions:: Functions In The @code{mysys} Library
|
|
* DBUG:: DBUG Tags To Use
|
|
* protocol:: MySQL Client/Server Protocol
|
|
* Fulltext Search:: Fulltext Search in MySQL
|
|
@end menu
|
|
|
|
|
|
@node caching, flush tables, Top, Top
|
|
@chapter How MySQL Handles Caching
|
|
|
|
@strong{MySQL} has the following caches:
|
|
(Note that the some of the filename have a wrong spelling of cache. :)
|
|
|
|
@table @strong
|
|
|
|
@item Key Cache
|
|
A shared cache for all B-tree index blocks in the different NISAM
|
|
files. Uses hashing and reverse linked lists for quick caching of the
|
|
last used blocks and quick flushing of changed entries for a specific
|
|
table. (@file{mysys/mf_keycash.c})
|
|
|
|
@item Record Cache
|
|
This is used for quick scanning of all records in a table.
|
|
(@file{mysys/mf_iocash.c} and @file{isam/_cash.c})
|
|
|
|
@item Table Cache
|
|
This holds the last used tables. (@file{sql/sql_base.cc})
|
|
|
|
@item Hostname Cache
|
|
For quick lookup (with reverse name resolving). Is a must when one has a
|
|
slow DNS.
|
|
(@file{sql/hostname.cc})
|
|
|
|
@item Privilege Cache
|
|
To allow quick change between databases the last used privileges are
|
|
cached for each user/database combination.
|
|
(@file{sql/sql_acl.cc})
|
|
|
|
@item Heap Table Cache
|
|
Many use of @code{GROUP BY} or @code{DISTINCT} caches all found rows in
|
|
a @code{HEAP} table. (This is a very quick in-memory table with hash index.)
|
|
|
|
@item Join buffer Cache
|
|
For every full join in a @code{SELECT} statement (a full join here means
|
|
there were no keys that one could use to find the next table in a list),
|
|
the found rows are cached in a join cache. One @code{SELECT} query can
|
|
use many join caches in the worst case.
|
|
@end table
|
|
|
|
@node join_buffer_size, flush tables, caching, Top
|
|
@subchapter How MySQL uses the join_buffer cache
|
|
|
|
Basic information about @code{join_buffer_size}:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
It's only used in the case when join type is of type @code{ALL} or
|
|
@code{index}; In other words: no possible keys can be used.
|
|
@item
|
|
A join buffer is never allocated for the first not-const table,
|
|
even it it would be of type @code{ALL}/@code{index}.
|
|
@item
|
|
The buffer is allocated when we need to do a each full join between two
|
|
tables and freed after the query is done.
|
|
@item
|
|
Accepted row combinations of tables before the @code{ALL}/@code{index}
|
|
able is stored in the cache and is used to compare against each read
|
|
row in the @code{ALL} table.
|
|
@item
|
|
We only store the used fields in the join_buffer cache, not the
|
|
whole rows.
|
|
@end itemize
|
|
|
|
Assume you have the following join:
|
|
|
|
@example
|
|
Table name Type
|
|
t1 range
|
|
t2 ref
|
|
t3 @code{ALL}
|
|
@end example
|
|
|
|
The join is then done as follows:
|
|
|
|
@example
|
|
- While rows in t1 matching range
|
|
- Read through all rows in t2 according to reference key
|
|
- Store used fields form t1,t2 in cache
|
|
- If cache is full
|
|
- Read through all rows in t3
|
|
- Compare t3 row against all t1,t2 combination in cache
|
|
- If rows satisfying join condition, send it to client
|
|
- Empty cache
|
|
|
|
- Read through all rows in t3
|
|
- Compare t3 row against all stored t1,t2 combinations in cache
|
|
- If rows satisfying join condition, send it to client
|
|
@end example
|
|
|
|
The above means that table t3 is scanned
|
|
|
|
@example
|
|
(size-of-stored-row(t1,t2) * accepted-row-cominations(t1,t2))/
|
|
join_buffer_size+1
|
|
@end example
|
|
times.
|
|
|
|
Some conclusions:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
The larger the join_buff_size, the fewer scans of t3.
|
|
If @code{join_buff_size} is already large enough to hold all previous row
|
|
combinations then there is no speed to gain by making it bigger.
|
|
@item
|
|
If there is several tables of @code{ALL}/@code{index} then the we
|
|
allocate one @code{join_buffer_size buffer} for each of them and use the
|
|
same algorithm described above to handle it. (In other words, we store
|
|
the same row combination several times into different buffers)
|
|
@end itemize
|
|
|
|
@node flush tables, filesort, caching, Top
|
|
@chapter How MySQL Handles @code{FLUSH TABLES}
|
|
|
|
@itemize @bullet
|
|
|
|
@item
|
|
Flush tables is handled in @file{sql/sql_base.cc::close_cached_tables()}.
|
|
|
|
@item
|
|
The idea of flush tables is to force all tables to be closed. This
|
|
is mainly to ensure that if someone adds a new table outside of
|
|
@strong{MySQL} (for example with @code{cp}) all threads will start using
|
|
the new table. This will also ensure that all table changes are flushed
|
|
to disk (but of course not as optimally as simple calling a sync on
|
|
all tables)!
|
|
|
|
@item
|
|
When one does a @code{FLUSH TABLES}, the variable @code{refresh_version}
|
|
will be incremented. Every time a thread releases a table it checks if
|
|
the refresh version of the table (updated at open) is the same as
|
|
the current @code{refresh_version}. If not it will close it and broadcast
|
|
a signal on @code{COND_refresh} (to wait any thread that is waiting for
|
|
all instanses of a table to be closed).
|
|
|
|
@item
|
|
The current @code{refresh_version} is also compared to the open
|
|
@code{refresh_version} after a thread gets a lock on a table. If the
|
|
refresh version is different the thread will free all locks, reopen the
|
|
table and try to get the locks again; This is just to quickly get all
|
|
tables to use the newest version. This is handled by
|
|
@file{sql/lock.cc::mysql_lock_tables()} and
|
|
@file{sql/sql_base.cc::wait_for_tables()}.
|
|
|
|
@item
|
|
When all tables has been closed @code{FLUSH TABLES} will return an ok
|
|
to client.
|
|
|
|
@item
|
|
If the thread that is doing @code{FLUSH TABLES} has a lock on some tables,
|
|
it will first close the locked tables, then wait until all other threads
|
|
have also closed them, and then reopen them and get the locks.
|
|
After this it will give other threads a chance to open the same tables.
|
|
|
|
@end itemize
|
|
|
|
@node filesort, selects, flush tables, Top
|
|
@chapter How MySQL Does Sorting (@code{filesort})
|
|
|
|
@itemize @bullet
|
|
|
|
@item
|
|
Read all rows according to key or by table scanning.
|
|
|
|
@item
|
|
Store the sort-key in a buffer (@code{sort_buffer}).
|
|
|
|
@item
|
|
When the buffer gets full, run a @code{qsort} on it and store the result
|
|
in a temporary file. Save a pointer to the sorted block.
|
|
|
|
@item
|
|
Repeat the above until all rows have been read.
|
|
|
|
@item
|
|
Repeat the following until there is less than @code{MERGEBUFF2} (15)
|
|
blocks left.
|
|
|
|
@item
|
|
Do a multi-merge of up to @code{MERGEBUFF} (7) regions to one block in
|
|
another temporary file. Repeat until all blocks from the first file
|
|
are in the second file.
|
|
|
|
@item
|
|
On the last multi-merge, only the pointer to the row (last part of
|
|
the sort-key) is written to a result file.
|
|
|
|
@item
|
|
Now the code in @file{sql/records.cc} will be used to read through them
|
|
in sorted order by using the row pointers in the result file.
|
|
To optimize this, we read in a big block of row pointers, sort these
|
|
and then we read the rows in the sorted order into a row buffer
|
|
(@code{record_buffer}).
|
|
|
|
@end itemize
|
|
|
|
@node selects, transformations, flush tables, Top
|
|
@chapter How MySQL performs different selects
|
|
|
|
@node select steps,,,
|
|
@section Steps of select executing
|
|
|
|
Every select performed in such base steps:
|
|
@itemize
|
|
@item
|
|
JOIN::prepare
|
|
@itemize @bullet
|
|
@item
|
|
initialization and linking JOIN structure to st_select_lex
|
|
@item
|
|
fix_fields() for all items (after fix_fields we know everything
|
|
about item)
|
|
@item
|
|
moving HAVING to WHERE if possible
|
|
@item
|
|
initialization procedure if exists
|
|
@end itemize
|
|
@item
|
|
JOIN::optimize
|
|
@itemize @bullet
|
|
@item
|
|
single select optimization
|
|
@item
|
|
creation first temporary table if need
|
|
@end itemize
|
|
@item
|
|
JOIN::exec
|
|
@itemize @bullet
|
|
@item
|
|
performing select (may be created second temporary table)
|
|
@end itemize
|
|
@item
|
|
JOIN::cleanup
|
|
@itemize @bullet
|
|
@item
|
|
removing all temporary tables, other cleanup
|
|
@end itemize
|
|
@item
|
|
JOIN::reinit
|
|
@itemize @bullet
|
|
@item
|
|
prepare all structures to SELECT executing (with JOIN::exec)
|
|
@end itemize
|
|
@end itemize
|
|
|
|
@node select select_result
|
|
@section select_result CLASS
|
|
|
|
Very important role in SELECT performing have select_result class and
|
|
classes inherited from it (usually called with "select_" prefix). This
|
|
class provide interface for results transmitting.
|
|
|
|
Key methods in this class are following:
|
|
@itemize @bullet
|
|
@item
|
|
@strong{send_fields} sends giving item list headers (type, name, etc..)
|
|
@item
|
|
@strong{send_data} sends giving item list values as row of table of result
|
|
@item
|
|
@strong{send_error} send error to used used mainly for error interception,
|
|
making some operation and then ::send_error will be called.
|
|
@end itemize
|
|
|
|
For example there are fillowing select_result classes:
|
|
@itemize
|
|
@item
|
|
@strong{select_send} used for sending results though network layer
|
|
@item
|
|
@strong{select_export} used for exporting data to file
|
|
@item
|
|
@strong{multi_delete} used for multi-delete
|
|
@item
|
|
@strong{select_insert} used for INSERT ... SELECT ...
|
|
@item
|
|
@strong{multi_update} used for multi-update
|
|
@end itemize
|
|
|
|
@node select simple
|
|
@section SIMPLE or PRIMARY SELECT.
|
|
|
|
For performing single primary select SELECT used function mysql_select,
|
|
which:
|
|
@itemize @bullet
|
|
@item
|
|
allocate JOIN;
|
|
@item
|
|
JOIN::prepare;
|
|
@item
|
|
JOIN::optimize;
|
|
@item
|
|
JOIN::exec;
|
|
@item
|
|
JOIN::cleanup.
|
|
@end itemize
|
|
In previous versions of mysql all SELECTs was performed with help of this
|
|
function and mysql_select() was not divided on parts.
|
|
|
|
@node select structure
|
|
@section Structure Of Complex Select
|
|
|
|
There 2 structures which describe SELECTS:
|
|
@itemize @bullet
|
|
@item
|
|
st_select_lex (SELECT_LEX) it represent SELECT itself
|
|
@item
|
|
st_select_lex_unit (SELECT_LEX_UNIT) group several selects in bunch
|
|
@end itemize
|
|
and represent UNION operation (absence of UNION is union
|
|
with 1 SELECT and this structure present in any case). In future this
|
|
structure will be used for EXCEPT and INTERSECT.
|
|
|
|
For example:
|
|
@example
|
|
(SELECT ... )UNION(SELECT ... (SELECT...)...(SELECT...UNION...SELECT))
|
|
1 2 3 4 5 6 7
|
|
@end example
|
|
|
|
will be represent as
|
|
@example
|
|
------------------------------------------------------------------------
|
|
level 1
|
|
SELECT_LEX_UNIT(2)
|
|
|
|
|
+---------------+
|
|
| |
|
|
SELECT_LEX(1) SELECT_LEX(3)
|
|
|
|
|
--------------- | ------------------------------------------------------
|
|
| level 2
|
|
+-------------------+
|
|
| |
|
|
SELECT_LEX_UNIT(4) SELECT_LEX_UNIT(6)
|
|
| |
|
|
| +--------------+
|
|
| | |
|
|
SELECT_LEX(4) SELECT_LEX(5) SELECT_LEX(7)
|
|
|
|
------------------------------------------------------------------------
|
|
@end example
|
|
Note: single subselect 4 have it's own SELECT_LEX_UNIT.
|
|
|
|
Most upper SELECT_LEX_UNIT (#2 in example) stored in LEX.
|
|
First and most upper SELECT_LEX (#1 in example) stored in LEX, too.
|
|
This two structures always exist.
|
|
|
|
In time of creating or performing any JOIN::* operation
|
|
LEX::current_select point on appropriate SELECT_LEX.
|
|
|
|
Only during parsing global (for whole UNION) ORDER_BY & LIMIT clauses
|
|
LEX::current_select points to SELECT_LEX_UNIT of this unit to store this
|
|
parameter in this SELECT_LEX_UNIT (SELECT_LEX and SELECT_LEX_UNIT are
|
|
inherited from st_select_lex_node).
|
|
|
|
@node select union
|
|
@section Non-Subselect UNIONs Executing
|
|
|
|
Non subselect unions performed with help of mysql_union(). for now it
|
|
divided on following steps:
|
|
@itemize
|
|
@item
|
|
st_select_lex_unit::prepare
|
|
@itemize @bullet
|
|
@item
|
|
create temporary table for union results storing (if UNION witout
|
|
ALL option, 'distinct' parameter will be passed to table creation
|
|
procedure). Types/lengths of table's fields will be determinated
|
|
by first SELECT item list.
|
|
@item
|
|
create select_union (inherited from select_result) which will
|
|
write selects results in this temporary table
|
|
@item
|
|
allocate JOIN and perform JOIN::prepare for all SELECTs belonged
|
|
to UNION
|
|
@end itemize
|
|
@item
|
|
st_select_lex_unit::exec
|
|
@itemize @bullet
|
|
@item
|
|
delete rows from temporary table if it is not first call
|
|
@item
|
|
if first call call JOIN::optimize else JOIN::reinit and then
|
|
JOIN::exec for all SELECTs (select_union will write result for
|
|
temporary table). If union is cacheable and this method called
|
|
second, (third, ...) time it will do nothing.
|
|
@item
|
|
call mysql_select on temporary table with global ORDER BY and
|
|
LIMIT parameters after collecting results from all SELECTs.
|
|
@end itemize
|
|
@end itemize
|
|
As far as mysql_select need SELECT_LEX structure SELECT_LEX of first
|
|
SELECT of this UNION will be passed to it, but also fake_select_lex
|
|
parameter will be passed to mysql_select() too, to prevent linking
|
|
this SELECT_LEX with JOIN on this mysql_select() session.
|
|
|
|
PROBLEM: this fake select need workaround in many places.
|
|
|
|
@node select derived
|
|
@section Derived Tables Executing
|
|
|
|
Derived tables processing is first operation on any query. It performed
|
|
before creation list of tables of whole query and opening/locking this
|
|
tables.
|
|
|
|
If lex->derived_tables flag present will be scanned all SELECT_LEX (there
|
|
are list of all SELECT_LEX in reverse order (first SELECT in query will
|
|
be last in this list) lex->all_selects_list).
|
|
|
|
Pointer on derived table SELECT_LEX_UNIT stored in TABLE_LIST structure
|
|
(TABLE_LIST::derived). And for any table which have this pointer will
|
|
be called mysql_derived().
|
|
|
|
mysql_derived():
|
|
@itemize @bullet
|
|
@item
|
|
Creates list of all tables used in this query, opens and locks it
|
|
@item
|
|
Creates temporary table for storing results
|
|
@item
|
|
Creates union_result for writing result in this table
|
|
@item
|
|
Calls mysql_select or mysql_union for execute query
|
|
@item
|
|
Removes all derived table subtree from SELECTs tree (if it is
|
|
not EXPLAIN)
|
|
@item
|
|
Stores pointer to this temporary table in TABLE_LIST structure, then
|
|
this table will be used by outer query. This table table will not be
|
|
skipped in checking grants, because tables from which this table was
|
|
received was checked in mysql_derived.
|
|
@item
|
|
Links this temporary table in thd->derived_tables for removing after
|
|
query executing. this table will be closed in close_thread_tables if
|
|
second parameter of it (bool skip_derived) will be true.
|
|
@end itemize
|
|
|
|
@node select subselect
|
|
@section Subselects
|
|
|
|
In expression subselect represented by Item inherited from Item_subselect.
|
|
|
|
To hide difference in performing single SELECTs and UNIONs
|
|
Item_subselect use two different engines, which provide uniformed
|
|
interface for access to underplaid SELECT or UNION
|
|
(subselect_single_select_engine and subselect_union_engine, both are
|
|
inherited from subselect_engine).
|
|
|
|
Engine will be created in time of Item_select constructing
|
|
(Item_subselect::init method).
|
|
|
|
On Item_subselect::fix_fields() will be called engine->prepare().
|
|
|
|
Before calling any value getting method (val, val_int, val_str,
|
|
bring_value (in case of row result)) will be called engine->exec(),
|
|
which execute query or just do nothing if subselect is cacheable and
|
|
already executed.
|
|
|
|
Items inherited from provide it's own select_result classes. There are
|
|
2 type of it:
|
|
@itemize @bullet
|
|
@item
|
|
select_singlerow_subselect it store values of giving row in
|
|
Item_singlerow_subselect cache on send_data() call and report error
|
|
if Item_subselect have 'assigned' attribute.
|
|
@item
|
|
select_exists_subselect just store 1 as value of
|
|
Item_exists_subselect on send_data() call. As far as
|
|
Item_in_subselect and Item_allany_subselect inherited from
|
|
Item_exists_subselect, they use same select_result class.
|
|
@end itemize
|
|
|
|
Item_select will never call cleanup() procedure for JOIN. Every
|
|
JOIN::cleanup will call cleanup() for inner JOINs. Most upper
|
|
JOIN::cleanup will be called by mysql_select() or mysql_union().
|
|
|
|
@node select select engine
|
|
@section Single Select Engine
|
|
|
|
subselect_single_select_engine:
|
|
@itemize @bullet
|
|
@item
|
|
@strong{constructor} allocate JOIN and store pointers on SELECT_LEX and JOIN
|
|
@item
|
|
@strong{prepare()} call JOIN::prepare
|
|
@item
|
|
@strong{fix_length_and_dec()} prepare cache and receive type and
|
|
parameters of returning items (it called only by
|
|
Item_singlerow_subselect)
|
|
@item
|
|
@strong{exec()} drop 'assigned flag of Item_subselect. If called first time
|
|
JOIN::optimize and JOINexec(), else do nothing or JOIN::reinit()
|
|
JOIN::exec() depending of type of subquery.
|
|
@end itemize
|
|
|
|
@node select union engine
|
|
@section Union Engine
|
|
|
|
subselect_union_engine:
|
|
@itemize @bullet
|
|
@item
|
|
@strong{constructor} just store pointer to st_select_lex_union
|
|
(SELECT_LEX_UNION)
|
|
@item
|
|
@strong{prepare()} call st_select_lex_unit::prepare
|
|
@item
|
|
@strong{fix_length_and_dec()} prepare cache and receive type and
|
|
parameters (maximum of length) of returning items (it called
|
|
only by Item_singlerow_subselect)
|
|
@item
|
|
@strong{exec()} call st_select_lex_unit::exec(). st_select_lex_unit::exec()
|
|
can drop 'assigned' flag of Item_subselect if
|
|
st_select_lex_unit::item is not 0.
|
|
@end itemize
|
|
|
|
@node selectexplain
|
|
@section Explain Execution
|
|
|
|
For EXPLAIN result showing for every SELECT will be called mysql_select
|
|
with option SELECT_DESCRIBE.
|
|
|
|
For main UNION will be called mysql_explain_union.
|
|
|
|
mysql_explain_union call mysql_explain_select for every SELECT in given
|
|
union.
|
|
|
|
mysql_explain_select call mysql_select with SELECT_DESCRIBE.
|
|
|
|
mysql_select create JOIN for select (if it not exists, because if it
|
|
called for subselect JOIN can be created in JOIN::optimize of outer
|
|
query when it decided to calculate value of subselect). Then it call
|
|
JOIN::prepare, JOIN::optimize, JOIN exec and JOIN::cleanup as usual.
|
|
|
|
JOIN::exec called for SELECT with SELECT_DESCRIBE option call
|
|
select_describe.
|
|
|
|
select_describe return to user description of SELECT and call
|
|
mysql_explain_union for every inner UNION
|
|
|
|
PROBLEM: how it will work with global query optimization?
|
|
|
|
@node transformations, coding guidelines, selects, Top
|
|
@chapter How MySQL transforms subqueries
|
|
|
|
Item_subselect virtual method select_transformer is used to rewrite
|
|
subqueries. It is called from Item_subselect::init (which called in
|
|
Item_subselect constructor)
|
|
|
|
@node transformation IN
|
|
@section Item_in_subselect::select_transformer
|
|
|
|
Item_in_subselect::select_transformer is divided on two parts for
|
|
scalar left part and row left part:
|
|
|
|
@node transformation scalar IN
|
|
@subsection Scalar IN Subselect
|
|
|
|
To rewrite scalar IN subselect used method
|
|
Item_in_subselect::single_value_transformer, Scalar IN subselect will
|
|
be replaced with Item_in_optimizer.
|
|
|
|
Item_in_optimizer item is special boolean function. On value request
|
|
(one of val, val_int or val_str methods) it evaluate left expression of
|
|
IN by storing it value in cache item (one of Item_cache* items), then it
|
|
test cache is it NULL. If left expression (cache) is NULL then
|
|
Item_in_optimizer return NULL, else it evaluate Item_in_subselect.
|
|
|
|
Example queries.
|
|
@example
|
|
a) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2);
|
|
b) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2 GROUP BY t2.a);
|
|
@end example
|
|
|
|
@itemize
|
|
@item
|
|
Item_in_subselect inherit mechanism of getting value from
|
|
Item_exists_subselect.
|
|
@item
|
|
Select_transformer stores reference to left expression in its
|
|
conditions: (in WHERE in case 'a' and in a HAVING in case 'b')
|
|
@item
|
|
Item from item list of this select (t2.a) can be referred with special
|
|
reference (Item_ref_null_helper or Item_asterisk_remover).
|
|
This reference informs Item_in_optimizer if item (t2.a) is NULL by
|
|
setting the 'was_null' flag.
|
|
@item
|
|
The return value from Item_in_subselect will be evaluated as following:
|
|
@itemize @bullet
|
|
@item
|
|
If TRUE return true
|
|
@item
|
|
If NULL return null
|
|
@item
|
|
If FALSE and 'was_null' is set, return null
|
|
@item
|
|
return FALSE
|
|
@end itemize
|
|
@end itemize
|
|
|
|
<left_expression> IN (SELECT <item> ...) will be represented like
|
|
following:
|
|
@example
|
|
+-----------------+
|
|
|Item_in_optimizer|
|
|
+-----------------+
|
|
|
|
|
+---------------------+------------+
|
|
| |
|
|
+-----------------------+ +-----------------+
|
|
| <left_expression> | |Item_in_subselect|
|
|
| | +-----------------+
|
|
+-----------------------+ |
|
|
|<left_expression cache>| +-----------+-----------+
|
|
| | | |
|
|
+-----------------------+ | |
|
|
^ +----------+ +--------------------+
|
|
+<<<<<<<<<<<<<<<<<| Item_ref | +<<<|Item_ref_null_helper|
|
|
+----------+ V +--------------------+
|
|
V +--------------------+
|
|
+>>>| <item> |
|
|
+--------------------+
|
|
@end example
|
|
where '<<<<<<<<<' is reference in meaning of Item_ref.
|
|
|
|
Item_ref used for point to <left_expression cache>, because in time of
|
|
transformation we know only address of variable where pointer on cache
|
|
will be stored.
|
|
|
|
If select have ORDER BY clause it will be wiped out, because no sense in
|
|
ORDER BY without LIMIT here.
|
|
|
|
If IN subselect union condition of every select in UNION will be changed
|
|
personally.
|
|
|
|
Following is examples of IN transformations:
|
|
@example
|
|
a) <left_expression> IN (SELECT <item> FROM t
|
|
WHERE <where_exp>)
|
|
|
|
will be represented as
|
|
|
|
(SELECT 1 FROM t
|
|
WHERE <where_exp> and
|
|
Item_ref(<cached_left_expression>)=<Item_asterisk_remover(<Item>)>)
|
|
|
|
|
|
b) <left_expression> IN (SELECT <item> FROM t
|
|
HAVING <having_expr>
|
|
ORDER BY 1)
|
|
|
|
will be represented as
|
|
|
|
(SELECT <item> as ref_null_helper FROM t
|
|
HAVING <having_exp> AND
|
|
Item_ref(<cached_left_expression>) = ref_null_helper)
|
|
|
|
|
|
c) <left_expression> IN (SELECT <item> UNION ...)
|
|
|
|
will be represented as
|
|
|
|
(SELECT 1
|
|
HAVING Item_ref(<cached_left_expression>)=
|
|
<Item_asterisk_remover(<Item>)>
|
|
UNION ...)
|
|
|
|
(having without FROM is syntax error, but having condition is checked
|
|
even for subselect without FROM)
|
|
|
|
d) <left_expression> IN (select <item>)
|
|
|
|
will be completely replaced with <left_expression> = <item>
|
|
@end example
|
|
|
|
Now conditions (WHERE (a) or HAVING (b)) will be changed depends of
|
|
select in following way:
|
|
|
|
If subselect have HAVING , sum function or GROUP BY (case a) then item
|
|
list will be unchanged and Item_ref_null_helper reference will be
|
|
created on item list element. Condition will be added to HAVING condition.
|
|
|
|
If subselect have not HAVING, sum function or GROUP BY (case b) then:
|
|
@itemize @bullet
|
|
@item
|
|
@strong{item list} will be replaced with 1.
|
|
@item
|
|
@strong{<item>} from item list will be stored in Item_asterisk_remover, which
|
|
inherit from Item_ref_null_helper, but store item on which refer by
|
|
itself, and also it can resolve '*' item.
|
|
@item
|
|
@strong{<left_expression cache> = <Item_ref_null_helper>} will be added to
|
|
WHERE clause this item or to HAVING clause if this subselect have
|
|
no FROM clause and subselect is union (case c).
|
|
@end itemize
|
|
|
|
Single select without FROM will be reduced to just
|
|
<left_expression> = <item> without using Item_in_optimizer.
|
|
|
|
@node transformations row IN
|
|
@subsection Row IN Subselect
|
|
|
|
To rewrite row IN subselect used method
|
|
Item_in_subselect::row_value_transformer. It work in almost same way as
|
|
scalar analog, but work with Item_cache_row for caching left expression
|
|
and use references on elements of Item_cache_row.
|
|
To refer on item list it use Item_ref_on_list_position.
|
|
|
|
Item_ref_on_list_position::fix_fields will find item in item list of
|
|
subselect by number and create Item_ref_null_helper to refer on it. It
|
|
used to find reference when all '*' items will be translated in item
|
|
list. Subselect with have HAVING, sum functions or GROUP BY will
|
|
transformed in following way:
|
|
|
|
@example
|
|
ROW(l1, l2, ... lN) IN (SELECT i1, i2, ... iM FROM t HAVING <having_expr>)
|
|
|
|
will be following:
|
|
|
|
(SELECT i1, i2, ... iM FROM t
|
|
HAVING <having_expr> and
|
|
<cache_l1> = <ref_on_list_position(1)> AND
|
|
<cache_l2> = <ref_on_list_position(2)> AND
|
|
...
|
|
<cache_lN> = <ref_on_list_position(N)>)
|
|
@end example
|
|
|
|
In this way will be transformed select without FROM, too.
|
|
|
|
For other subselect it will be same but for WHERE clause.
|
|
|
|
@node transformations all any
|
|
@section Item_allany_subselect
|
|
|
|
Item_allany_subselect is inherited from Item_in_subselect.
|
|
ALL/ANY/SOME use same algorithm (and same method of Item_in_subselect)
|
|
as scalar IN, but use different function instead of '='.
|
|
|
|
ANY/SOME use same function that was listed after left expression.
|
|
|
|
ALL use inverted function, and all subselect passed as argument to
|
|
Item_func_not.
|
|
|
|
@node transformations singlerow
|
|
@section Item_singlerow_subselect
|
|
|
|
Item_singlerow_subselect will be rewritten only if it have not FROM
|
|
clause, it is not part of UNION and it is scalar subselect. For now will
|
|
not be converted subselects with field or reference on top of item list
|
|
(we can't change name of such items from one hand, but from other hand
|
|
we should assign to it name of whole subselect which will be reduced);
|
|
|
|
Following will not be reduced:
|
|
@example
|
|
SELECT a;
|
|
SELECT 1 UNION SELECT 2;
|
|
SELECT 1 FROM t1;
|
|
@end example
|
|
|
|
Following select will be reduced:
|
|
@example
|
|
SELECT 1;
|
|
SELECT a+2;
|
|
@end example
|
|
|
|
Such subselect will be completely replaced by its expression from item
|
|
list and its SELECT_LEX and SELECT_LEX_UNIT will be removed from
|
|
SELECT_LEX's tree.
|
|
|
|
But all Item_fields and Item_ref of that expression will be marked for
|
|
special fix_fields() procedure. fix_fields() for such Item will be
|
|
performed is same way as for items of inner subselect. Also if this
|
|
expression is Item_fields or Item_ref then name of this new item will
|
|
be same as name of this item (but not '(SELECT ...)'). It is done to
|
|
prevent broke references on such items from more inner subselects.
|
|
|
|
@node coding guidelines, mysys functions, transformations, Top
|
|
@chapter Coding Guidelines
|
|
|
|
@itemize @bullet
|
|
|
|
@item
|
|
We are using @uref{http://www.bitkeeper.com/, BitKeeper} for source management.
|
|
|
|
@item
|
|
You should use the @strong{MySQL} 4.0 source for all developments.
|
|
|
|
@item
|
|
If you have any questions about the @strong{MySQL} source, you can post these
|
|
to @email{dev-public@@mysql.com} and we will answer them. Please
|
|
remember to not use this internal email list in public!
|
|
|
|
@item
|
|
Try to write code in a lot of black boxes that can be reused or use at
|
|
least a clean, easy to change interface.
|
|
|
|
@item
|
|
Reuse code; There is already a lot of algorithms in MySQL for list handling,
|
|
queues, dynamic and hashed arrays, sorting, etc. that can be reused.
|
|
|
|
@item
|
|
Use the @code{my_*} functions like @code{my_read()}/@code{my_write()}/
|
|
@code{my_malloc()} that you can find in the @code{mysys} library instead
|
|
of the direct system calls; This will make your code easier to debug and
|
|
more portable.
|
|
|
|
@item
|
|
Try to always write optimized code, so that you don't have to
|
|
go back and rewrite it a couple of months later. It's better to
|
|
spend 3 times as much time designing and writing an optimal function than
|
|
having to do it all over again later on.
|
|
|
|
@item
|
|
Avoid CPU wasteful code, even where it does not matter, so that
|
|
you will not develop sloppy coding habits.
|
|
|
|
@item
|
|
If you can write it in fewer lines, do it (as long as the code will not
|
|
be slower or much harder to read).
|
|
|
|
@item
|
|
Don't use two commands on the same line.
|
|
|
|
@item
|
|
Do not check the same pointer for @code{NULL} more than once.
|
|
|
|
@item
|
|
Use long function and variable names in English. This makes your code
|
|
easier to read.
|
|
|
|
@item
|
|
Use @code{my_var} as opposed to @code{myVar} or @code{MyVar} (@samp{_}
|
|
rather than dancing SHIFT to seperate words in identifiers).
|
|
|
|
@item
|
|
Think assembly - make it easier for the compiler to optimize your code.
|
|
|
|
@item
|
|
Comment your code when you do something that someone else may think
|
|
is not ``trivial''.
|
|
|
|
@item
|
|
Use @code{libstring} functions (in the @file{strings} directory)
|
|
instead of standard @code{libc} string functions whenever possible.
|
|
|
|
@item
|
|
Avoid using @code{malloc()} (its REAL slow); For memory allocations
|
|
that only need to live for the lifetime of one thread, one should use
|
|
@code{sql_alloc()} instead.
|
|
|
|
@item
|
|
Before making big design decisions, please first post a summary of
|
|
what you want to do, why you want to do it, and how you plan to do
|
|
it. This way we can easily provide you with feedback and also
|
|
easily discuss it thoroughly if some other developer thinks there is better
|
|
way to do the same thing!
|
|
|
|
@item
|
|
Class names start with a capital letter.
|
|
|
|
@item
|
|
Structure types are @code{typedef}'ed to an all-caps identifier.
|
|
|
|
@item
|
|
Any @code{#define}'s are in all-caps.
|
|
|
|
@item
|
|
Matching @samp{@{} are in the same column.
|
|
|
|
@item
|
|
Put the @samp{@{} after a @code{switch} on the same line, as this gives
|
|
better overall indentation for the switch statement:
|
|
|
|
@example
|
|
switch (arg) @{
|
|
@end example
|
|
|
|
@item
|
|
In all other cases, @samp{@{} and @samp{@}} should be on their own line, except
|
|
if there is nothing inside @samp{@{} and @samp{@}}.
|
|
|
|
@item
|
|
Have a space after @code{if}
|
|
|
|
@item
|
|
Put a space after @samp{,} for function arguments
|
|
|
|
@item
|
|
Functions return @samp{0} on success, and non-zero on error, so you can do:
|
|
|
|
@example
|
|
if(a() || b() || c()) @{ error("something went wrong"); @}
|
|
@end example
|
|
|
|
@item
|
|
Using @code{goto} is okay if not abused.
|
|
|
|
@item
|
|
Avoid default variable initalizations, use @code{LINT_INIT()} if the
|
|
compiler complains after making sure that there is really no way
|
|
the variable can be used uninitialized.
|
|
|
|
@item
|
|
Do not instantiate a class if you do not have to.
|
|
|
|
@item
|
|
Use pointers rather than array indexing when operating on strings.
|
|
|
|
@end itemize
|
|
|
|
Suggested mode in emacs:
|
|
|
|
@example
|
|
(load "cc-mode")
|
|
(setq c-mode-common-hook '(lambda ()
|
|
(turn-on-font-lock)
|
|
(setq comment-column 48)))
|
|
(setq c-style-alist
|
|
(cons
|
|
'("MY"
|
|
(c-basic-offset . 2)
|
|
(c-comment-only-line-offset . 0)
|
|
(c-offsets-alist . ((statement-block-intro . +)
|
|
(knr-argdecl-intro . 0)
|
|
(substatement-open . 0)
|
|
(label . -)
|
|
(statement-cont . +)
|
|
(arglist-intro . c-lineup-arglist-intro-after-paren)
|
|
(arglist-close . c-lineup-arglist)
|
|
))
|
|
)
|
|
c-style-alist))
|
|
(c-set-style "MY")
|
|
(setq c-default-style "MY")
|
|
@end example
|
|
|
|
|
|
@node mysys functions, DBUG, coding guidelines, Top
|
|
@chapter Functions In The @code{mysys} Library
|
|
|
|
Functions in @code{mysys}: (For flags see @file{my_sys.h})
|
|
|
|
@table @code
|
|
@item int my_copy _A((const char *from, const char *to, myf MyFlags));
|
|
Copy file from @code{from} to @code{to}.
|
|
|
|
@item int my_delete _A((const char *name, myf MyFlags));
|
|
Delete file @code{name}.
|
|
|
|
@item int my_getwd _A((string buf, uint size, myf MyFlags));
|
|
@item int my_setwd _A((const char *dir, myf MyFlags));
|
|
Get and set working directory.
|
|
|
|
@item string my_tempnam _A((const char *pfx, myf MyFlags));
|
|
Make a unique temporary file name by using dir and adding something after
|
|
@code{pfx} to make name unique. The file name is made by adding a unique
|
|
six character string and @code{TMP_EXT} after @code{pfx}.
|
|
Returns pointer to @code{malloc()}'ed area for filename. Should be freed by
|
|
@code{free()}.
|
|
|
|
@item File my_open _A((const char *FileName,int Flags,myf MyFlags));
|
|
@item File my_create _A((const char *FileName, int CreateFlags, int AccsesFlags, myf MyFlags));
|
|
@item int my_close _A((File Filedes, myf MyFlags));
|
|
@item uint my_read _A((File Filedes, byte *Buffer, uint Count, myf MyFlags));
|
|
@item uint my_write _A((File Filedes, const byte *Buffer, uint Count, myf MyFlags));
|
|
@item ulong my_seek _A((File fd,ulong pos,int whence,myf MyFlags));
|
|
@item ulong my_tell _A((File fd,myf MyFlags));
|
|
Use instead of open, open-with-create-flag, close, read, and write
|
|
to get automatic error messages (flag @code{MYF_WME}) and only have
|
|
to test for != 0 if error (flag @code{MY_NABP}).
|
|
|
|
@item int my_rename _A((const char *from, const char *to, myf MyFlags));
|
|
Rename file from @code{from} to @code{to}.
|
|
|
|
@item FILE *my_fopen _A((const char *FileName,int Flags,myf MyFlags));
|
|
@item FILE *my_fdopen _A((File Filedes,int Flags,myf MyFlags));
|
|
@item int my_fclose _A((FILE *fd,myf MyFlags));
|
|
@item uint my_fread _A((FILE *stream,byte *Buffer,uint Count,myf MyFlags));
|
|
@item uint my_fwrite _A((FILE *stream,const byte *Buffer,uint Count, myf MyFlags));
|
|
@item ulong my_fseek _A((FILE *stream,ulong pos,int whence,myf MyFlags));
|
|
@item ulong my_ftell _A((FILE *stream,myf MyFlags));
|
|
Same read-interface for streams as for files.
|
|
|
|
@item gptr _mymalloc _A((uint uSize,const char *sFile,uint uLine, myf MyFlag));
|
|
@item gptr _myrealloc _A((string pPtr,uint uSize,const char *sFile,uint uLine, myf MyFlag));
|
|
@item void _myfree _A((gptr pPtr,const char *sFile,uint uLine));
|
|
@item int _sanity _A((const char *sFile,unsigned int uLine));
|
|
@item gptr _myget_copy_of_memory _A((const byte *from,uint length,const char *sFile, uint uLine,myf MyFlag));
|
|
@code{malloc(size,myflag)} is mapped to these functions if not compiled
|
|
with @code{-DSAFEMALLOC}.
|
|
|
|
@item void TERMINATE _A((void));
|
|
Writes @code{malloc()} info on @code{stdout} if compiled with
|
|
@code{-DSAFEMALLOC}.
|
|
|
|
@item int my_chsize _A((File fd, ulong newlength, myf MyFlags));
|
|
Change size of file @code{fd} to @code{newlength}.
|
|
|
|
@item void my_error _D((int nr, myf MyFlags, ...));
|
|
Writes message using error number (see @file{mysys/errors.h}) on @code{stdout},
|
|
or using curses, if @code{MYSYS_PROGRAM_USES_CURSES()} has been called.
|
|
|
|
@item void my_message _A((const char *str, myf MyFlags));
|
|
Writes @code{str} on @code{stdout}, or using curses, if
|
|
@code{MYSYS_PROGRAM_USES_CURSES()} has been called.
|
|
|
|
@item void my_init _A((void ));
|
|
Start each program (in @code{main()}) with this.
|
|
|
|
@item void my_end _A((int infoflag));
|
|
Gives info about program.
|
|
If @code{infoflag & MY_CHECK_ERROR}, prints if some files are left open.
|
|
If @code{infoflag & MY_GIVE_INFO}, prints timing info and malloc info
|
|
about program.
|
|
|
|
@item int my_redel _A((const char *from, const char *to, int MyFlags));
|
|
Delete @code{from} before rename of @code{to} to @code{from}. Copies state
|
|
from old file to new file. If @code{MY_COPY_TIME} is set, sets old time.
|
|
|
|
@item int my_copystat _A((const char *from, const char *to, int MyFlags));
|
|
Copy state from old file to new file. If @code{MY_COPY_TIME} is set,
|
|
sets old time.
|
|
|
|
@item string my_filename _A((File fd));
|
|
Returns filename of open file.
|
|
|
|
@item int dirname _A((string to, const char *name));
|
|
Copy name of directory from filename.
|
|
|
|
@item int test_if_hard_path _A((const char *dir_name));
|
|
Test if @code{dir_name} is a hard path (starts from root).
|
|
|
|
@item void convert_dirname _A((string name));
|
|
Convert dirname according to system.
|
|
In MSDOS, changes all characters to capitals and changes @samp{/} to @samp{\}.
|
|
|
|
@item string fn_ext _A((const char *name));
|
|
Returns pointer to extension in filename.
|
|
|
|
@item string fn_format _A((string to,const char *name,const char *dsk,const char *form,int flag));
|
|
format a filename with replace of library and extension and
|
|
converts between different systems.
|
|
params to and name may be identicall
|
|
function dosn't change name if name != to
|
|
Flag may be: 1 force replace filnames library with 'dsk'
|
|
2 force replace extension with 'form' */
|
|
4 force Unpack filename (replace ~ with home)
|
|
8 Pack filename as short as possibly for output to
|
|
user.
|
|
All open requests should allways use at least:
|
|
"open(fn_format(temp_buffe,name,"","",4),...)" to unpack home and
|
|
convert filename to system-form.
|
|
|
|
@item string fn_same _A((string toname, const char *name, int flag));
|
|
Copys directory and extension from @code{name} to @code{toname} if neaded.
|
|
Copying can be forced by same flags used in @code{fn_format()}.
|
|
|
|
@item int wild_compare _A((const char *str, const char *wildstr));
|
|
Compare if @code{str} matches @code{wildstr}. @code{wildstr} can contain
|
|
@samp{*} and @samp{?} as wildcard characters.
|
|
Returns 0 if @code{str} and @code{wildstr} match.
|
|
|
|
@item void get_date _A((string to, int timeflag));
|
|
Get current date in a form ready for printing.
|
|
|
|
@item void soundex _A((string out_pntr, string in_pntr))
|
|
Makes @code{in_pntr} to a 5 char long string. All words that sound
|
|
alike have the same string.
|
|
|
|
@item int init_key_cache _A((ulong use_mem, ulong leave_this_much_mem));
|
|
Use caching of keys in MISAM, PISAM, and ISAM.
|
|
@code{KEY_CACHE_SIZE} is a good size.
|
|
Remember to lock databases for optimal caching.
|
|
|
|
@item void end_key_cache _A((void));
|
|
End key caching.
|
|
@end table
|
|
|
|
|
|
|
|
@node DBUG, protocol, mysys functions, Top
|
|
@chapter DBUG Tags To Use
|
|
|
|
Here is some of the tags we now use:
|
|
(We should probably add a couple of new ones)
|
|
|
|
@table @code
|
|
@item enter
|
|
Arguments to the function.
|
|
|
|
@item exit
|
|
Results from the function.
|
|
|
|
@item info
|
|
Something that may be interesting.
|
|
|
|
@item warning
|
|
When something doesn't go the usual route or may be wrong.
|
|
|
|
@item error
|
|
When something went wrong.
|
|
|
|
@item loop
|
|
Write in a loop, that is probably only useful when debugging
|
|
the loop. These should normally be deleted when one is
|
|
satisfied with the code and it has been in real use for a while.
|
|
@end table
|
|
|
|
Some specific to mysqld, because we want to watch these carefully:
|
|
|
|
@table @code
|
|
@item trans
|
|
Starting/stopping transactions.
|
|
|
|
@item quit
|
|
@code{info} when mysqld is preparing to die.
|
|
|
|
@item query
|
|
Print query.
|
|
@end table
|
|
|
|
|
|
@node protocol, Fulltext Search, DBUG, Top
|
|
@chapter MySQL Client/Server Protocol
|
|
|
|
@menu
|
|
* raw packet without compression::
|
|
* raw packet with compression::
|
|
* basic packets::
|
|
* communication::
|
|
* fieldtype codes::
|
|
* protocol functions::
|
|
* protocol version 2::
|
|
@end menu
|
|
|
|
@node raw packet without compression, raw packet with compression, protocol, protocol
|
|
@section Raw Packet Without Compression
|
|
|
|
@example
|
|
+-----------------------------------------------+
|
|
| Packet Length | Packet no | Data |
|
|
| 3 Bytes | 1 Byte | n Bytes |
|
|
+-----------------------------------------------+
|
|
@end example
|
|
|
|
@table @asis
|
|
@item 3 Byte packet length
|
|
The length is calculated with int3store
|
|
See include/global.h for details.
|
|
The max packetsize can be 16 MB.
|
|
|
|
@item 1 Byte packet no
|
|
If no compression is used the first 4 bytes of each packet is the header
|
|
of the packet. The packet number is incremented for each sent packet.
|
|
The first packet starts with 0.
|
|
@item n Byte data
|
|
|
|
@end table
|
|
|
|
The packet length can be recalculated with:
|
|
|
|
@example
|
|
length = byte1 + (256 * byte2) + (256 * 256 * byte3)
|
|
@end example
|
|
|
|
|
|
@node raw packet with compression, basic packets, raw packet without compression, protocol
|
|
@section Raw Packet With Compression
|
|
|
|
@example
|
|
+---------------------------------------------------+
|
|
| Packet Length | Packet no | Uncomp. Packet Length |
|
|
| 3 Bytes | 1 Byte | 3 Bytes |
|
|
+---------------------------------------------------+
|
|
@end example
|
|
|
|
@table @asis
|
|
@item 3 Byte packet length
|
|
The length is calculated with int3store
|
|
See include/global.h for details.
|
|
The max packetsize can be 16 MB.
|
|
|
|
@item 1 Byte packet no
|
|
@item 3 Byte uncompressed packet length
|
|
@end table
|
|
|
|
If compression is used the first 7 bytes of each packet
|
|
is the header of the packet.
|
|
|
|
|
|
@node basic packets, communication, raw packet with compression, protocol
|
|
@section Basic Packets
|
|
|
|
@menu
|
|
* ok packet::
|
|
* error packet::
|
|
@end menu
|
|
|
|
|
|
@node ok packet, error packet, basic packets, basic packets
|
|
@subsection OK Packet
|
|
|
|
For details, see @file{sql/net_pkg.cc::send_ok()}.
|
|
|
|
@example
|
|
+-----------------------------------------------+
|
|
| Header | No of Rows | Affected Rows |
|
|
| | 1 Byte | 1-8 Byte |
|
|
|-----------------------------------------------|
|
|
| ID (last_insert_id) | Status | Length |
|
|
| 1-8 Byte | 2 Byte | 1-8 Byte |
|
|
|-----------------------------------------------|
|
|
| Messagetext |
|
|
| n Byte |
|
|
+-----------------------------------------------+
|
|
@end example
|
|
|
|
@table @asis
|
|
@item Header
|
|
@item 1 byte number of rows ? (always 0 ?)
|
|
@item 1-8 bytes affected rows
|
|
@item 1-8 byte id (last_insert_id)
|
|
@item 2 byte Status (usually 0)
|
|
@item If the OK-packege includes a message:
|
|
@item 1-8 bytes length of message
|
|
@item n bytes messagetext
|
|
@end table
|
|
|
|
|
|
@node error packet, , ok packet, basic packets
|
|
@subsection Error Packet
|
|
|
|
@example
|
|
+-----------------------------------------------+
|
|
| Header | Status code | Error no |
|
|
| | 1 Byte | 2 Byte |
|
|
|-----------------------------------------------|
|
|
| Messagetext | 0x00 |
|
|
| n Byte | 1 Byte |
|
|
+-----------------------------------------------+
|
|
@end example
|
|
|
|
@table @asis
|
|
@item Header
|
|
@item 1 byte status code (0xFF = ERROR)
|
|
@item 2 byte error number (is only sent to new 3.23 clients.
|
|
@item n byte errortext
|
|
@item 1 byte 0x00
|
|
@end table
|
|
|
|
|
|
@node communication, fieldtype codes, basic packets, protocol
|
|
@section Communication
|
|
|
|
> Packet from server to client
|
|
< Paket from client tor server
|
|
|
|
Login
|
|
------
|
|
> 1. packet
|
|
Header
|
|
1 byte protocolversion
|
|
n byte serverversion
|
|
1 byte 0x00
|
|
4 byte threadnumber
|
|
8 byte crypt seed
|
|
1 byte 0x00
|
|
2 byte CLIENT_xxx options (see include/mysql_com.h
|
|
that is supported by the server
|
|
1 byte number of current server charset
|
|
2 byte server status variables (SERVER_STATUS_xxx flags)
|
|
13 byte 0x00 (not used yet).
|
|
|
|
< 2. packet
|
|
Header
|
|
2 byte CLIENT_xxx options
|
|
3 byte max_allowed_packet for the client
|
|
n byte username
|
|
1 byte 0x00
|
|
8 byte crypted password
|
|
1 byte 0x00
|
|
n byte databasename
|
|
1 byte 0x00
|
|
|
|
> 3. packet
|
|
OK-packet
|
|
|
|
|
|
Command
|
|
--------
|
|
< 1. packet
|
|
Header
|
|
1 byte command type (e.g.0x03 = query)
|
|
n byte query
|
|
|
|
Result set (after command)
|
|
--------------------------
|
|
> 2. packet
|
|
Header
|
|
1-8 byte field_count (packed with net_store_length())
|
|
|
|
If field_count == 0 (command):
|
|
1-8 byte affected rows
|
|
1-8 byte insert id
|
|
2 bytes server_status (SERVER_STATUS_xx)
|
|
|
|
If field_count == NULL_LENGTH (251)
|
|
LOAD DATA LOCAL INFILE
|
|
|
|
If field_count > 0 Result Set:
|
|
|
|
> n packets
|
|
Header Info
|
|
Column description: 5 data object /column
|
|
(See code in unpack_fields())
|
|
|
|
Columninfo for each column:
|
|
1 data block table_name
|
|
1 byte length of block
|
|
n byte data
|
|
1 data block field_name
|
|
1 byte length of block...
|
|
n byte data
|
|
1 data block display length of field
|
|
1 byte length of block
|
|
3 bytes display length of filed
|
|
1 data block type field of type (enum_field_types)
|
|
1 byte length of block
|
|
1 bytexs field of type
|
|
1 data block flags
|
|
1 byte length of block
|
|
2 byte flags for the columns (NOT_NULL_FLAG, ZEROFILL_FLAG....)
|
|
1 byte decimals
|
|
|
|
if table definition:
|
|
1 data block default value
|
|
|
|
Actual result (one packet per row):
|
|
4 byte header
|
|
1-8 byte length of data
|
|
n data
|
|
|
|
|
|
@node fieldtype codes, protocol functions, communication, protocol
|
|
@section Fieldtype Codes
|
|
|
|
@example
|
|
display_length |enum_field_type |flags
|
|
----------------------------------------------------
|
|
Blob 03 FF FF 00 |01 FC |03 90 00 00
|
|
Mediumblob 03 FF FF FF |01 FC |03 90 00 00
|
|
Tinyblob 03 FF 00 00 |01 FC |03 90 00 00
|
|
Text 03 FF FF 00 |01 FC |03 10 00 00
|
|
Mediumtext 03 FF FF FF |01 FC |03 10 00 00
|
|
Tinytext 03 FF 00 00 |01 FC |03 10 00 00
|
|
Integer 03 0B 00 00 |01 03 |03 03 42 00
|
|
Mediumint 03 09 00 00 |01 09 |03 00 00 00
|
|
Smallint 03 06 00 00 |01 02 |03 00 00 00
|
|
Tinyint 03 04 00 00 |01 01 |03 00 00 00
|
|
Varchar 03 XX 00 00 |01 FD |03 00 00 00
|
|
Enum 03 05 00 00 |01 FE |03 00 01 00
|
|
Datetime 03 13 00 00 |01 0C |03 00 00 00
|
|
Timestamp 03 0E 00 00 |01 07 |03 61 04 00
|
|
Time 03 08 00 00 |01 0B |03 00 00 00
|
|
Date 03 0A 00 00 |01 0A |03 00 00 00
|
|
@end example
|
|
|
|
@node protocol functions, protocol version 2, fieldtype codes, protocol
|
|
@section Functions used to implement the protocol
|
|
|
|
This should be merged with the above one and changed to texi format
|
|
|
|
Raw packets
|
|
-----------
|
|
|
|
- The my_net_xxxx() functions handles the packaging of a stream of data
|
|
into a raw packet that contains a packet number, length and data.
|
|
|
|
- This is implemented for the server in sql/net_serv.cc.
|
|
The client file, libmysql/net.c, is symlinked to this file
|
|
|
|
The important functions are:
|
|
|
|
my_net_write() Store a packet (= # number of bytes) to be sent
|
|
net_flush() Send the packets stored in the buffer
|
|
net_write_command() Send a command (1 byte) + packet to the server.
|
|
my_net_read() Read a packet
|
|
|
|
|
|
Include files
|
|
-------------
|
|
|
|
- include/mysql.h is included by all MySQL clients. It includes the
|
|
MYSQL and MYSQL_RES structures.
|
|
- include/mysql_com.h is include by mysql.h and mysql_priv.h (the
|
|
server) and includes a lot of common functions and structures to
|
|
handle the client/server protocol.
|
|
|
|
|
|
Packets from server to client:
|
|
-----------------------------
|
|
|
|
sql/net_pkg.cc:
|
|
|
|
- Sending of error packets
|
|
- Sending of OK packets (= end of data)
|
|
- Storing of values in a packet
|
|
|
|
|
|
sql/sql_base.cc:
|
|
|
|
- Function send_fields() sends the field description to the client.
|
|
|
|
sql/sql_show.cc:
|
|
|
|
- Sends results for a lot of SHOW commands, including:
|
|
SHOW DATABASES [like 'wildcard']
|
|
SHOW TABLES [like 'wildcard']
|
|
|
|
|
|
Packets from client to server:
|
|
------------------------------
|
|
|
|
This is done in libmysql/libmysql.c
|
|
|
|
The important ones are:
|
|
|
|
- mysql_real_connect() Connects to a mysqld server
|
|
- mysql_real_query() Sends a query to the server and
|
|
reads the ok packet or columns header.
|
|
- mysql_store_result() Read a result set from the server to memory
|
|
- mysql_use_result() Read a result set row by row from the server.
|
|
|
|
- net_safe_read() Read a packet from the server with
|
|
error handling.
|
|
- net_field_length() Reads the length of a packet string.
|
|
- simple_command() Sends a command/query to the server.
|
|
|
|
|
|
|
|
Connecting to mysqld (the MySQL server)
|
|
---------------------------------------
|
|
|
|
- On the client side: libmysql/libmysql.c::mysql_real_connect().
|
|
- On the server side: sql/sql_parse.cc::check_connections()
|
|
|
|
The packets sent during a connection are as follows
|
|
|
|
Server: Send greeting package (includes server capabilites, server
|
|
version and a random string of bytes to be used to scramble
|
|
the password.
|
|
Client: Sends package with client capabilites, user name, scrambled
|
|
password, database name
|
|
|
|
Server: Sends ok package or error package.
|
|
|
|
Client: If init command specified, send it t the server and read
|
|
ok/error package.
|
|
|
|
|
|
Password functions
|
|
------------------
|
|
|
|
The passwords are scrambled to a random number and are stored in hex
|
|
format on the server.
|
|
|
|
The password handling is done in sql/password.c. The important
|
|
function is 'scramble()', which takes the a password in clear text
|
|
and uses this to 'encrypt' the random string sent by the server
|
|
to a new message.
|
|
|
|
The encrypted message is sent to the server which uses the stored
|
|
random number password to encrypt the random string sent to the
|
|
client. If this is equal to the new message the client sends to the
|
|
server then the password is accepted.
|
|
|
|
@node protocol version 2, , protocol functions, protocol
|
|
@section Another description of the protocol
|
|
|
|
This should be merged with the above one and changed to texi format.
|
|
|
|
*****************************
|
|
*
|
|
* PROTOCOL OVERVIEW
|
|
*
|
|
*****************************
|
|
|
|
The MySQL protocol is relatively simple, and is designed for high performance
|
|
through minimisation of overhead, and extensibility through versioning and
|
|
options flags. It is a request-response protocol, and does not allow
|
|
multitasking or multiplexing over a single connection. There are two packet
|
|
formats, 'raw' and 'compressed' (which is used when both client and
|
|
server support zlib compression, and the client requests that data be
|
|
compressed):
|
|
|
|
* RAW PACKET, shorter than 16 M *
|
|
|
|
+-----------------------------------------------+
|
|
| Packet Length | Packet no | Data |
|
|
| 3 Bytes | 1 Byte | n Bytes |
|
|
+-----------------------------------------------+
|
|
^ ^
|
|
| 'HEADER' |
|
|
+-------------------------------+
|
|
|
|
|
|
* Packet Length: Calculated with int3store. See include/global.h for
|
|
details. The basic computation is length = byte1 +
|
|
(256 * byte2) + (256 * 256 * byte3). The max packetsize
|
|
can be 16 MB.
|
|
|
|
* Packet no: The packet number is incremented for each sent packet.
|
|
The first packet for each query from the client
|
|
starts with 0.
|
|
|
|
* Data: Specific to the operation being performed. Most often
|
|
used to send string data, such as a SQL query.
|
|
|
|
* COMPRESSED PACKET *
|
|
|
|
+---------------------------------------------------+-----------------+
|
|
| Packet Length | Packet no | Uncomp. Packet Length | Compressed Data |
|
|
| 3 Bytes | 1 Byte | 3 Bytes | n bytes |
|
|
+---------------------------------------------------+-----------------+
|
|
^ ^
|
|
| 'HEADER' |
|
|
+---------------------------------------------------+
|
|
|
|
* Packet Length: Calculated with int3store. See include/my_global.h for
|
|
details. The basic computation is length = byte1 +
|
|
(256 * byte2) + (256 * 256 * byte3). The max packetsize
|
|
can be 16 MB.
|
|
|
|
* Packet no: The packet number is incremented for each sent packet.
|
|
The first packet starts with 0.
|
|
|
|
* Uncomp. Packet Length: The length of the original, uncompressed packet
|
|
If this is zero then the data is not compressed.
|
|
|
|
* Compressed Data: The original packet, compressed with zlib compression
|
|
|
|
|
|
When using the compressed protocol, the client/server will only compress
|
|
send packets where the new packet is smaller than the not compressed one.
|
|
In other words, some packets may be compressed while others will not.
|
|
|
|
The 'compressed data' is one or more packets in *RAW PACKET* format.
|
|
|
|
*****************************
|
|
*
|
|
* FLOW OF EVENTS
|
|
*
|
|
*****************************
|
|
|
|
To understand how a client communicates with a MySQL server, it is easiest
|
|
to start with a high-level flow of events. Each event section will then be
|
|
followed by details of the exact contents of each type of packet involved
|
|
in the event flow.
|
|
|
|
* *
|
|
* CONNECTION ESTABLISHMENT *
|
|
* *
|
|
|
|
Clients connect to the server via a TCP/IP socket (port 3306 by default), a
|
|
Unix Domain Socket, or named pipes (on Windows). Once connected, the
|
|
following connection establishment sequence is followed:
|
|
|
|
+--------+ +--------+
|
|
| Client | | Server |
|
|
+--------+ +--------+
|
|
| |
|
|
| Handshake initialisation, including MySQL server version, |
|
|
| protocol version and options supported, as well as the seed |
|
|
| for the password hash |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| Client options supported, max packet size for client |
|
|
| username, password crypted with seed from server, database |
|
|
| name. |
|
|
| |
|
|
| --------------------------------------------------------------> |
|
|
| |
|
|
| 'OK' packet if authentication succeeds, 'ERROR' packet if |
|
|
| authentication fails. |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
|
|
|
|
|
|
* HANDSHAKE INITIALISATION PACKET *
|
|
|
|
|
|
+--------------------------------------------------------------------+
|
|
| Header | Prot. Version | Server Version String | 0x00 |
|
|
| | 1 Byte | n bytes | 1 byte |
|
|
|--------------------------------------------------------------------|
|
|
| Thread Number | Crypt Seed | 0x00 | CLIENT_xxx options |
|
|
| | | | supported by server |
|
|
| 4 Bytes | 8 Bytes | 1 Byte | 2 Bytes |
|
|
|--------------------------------------------------------------------|
|
|
| Server charset no. | Server status variables | 0x00 padding |
|
|
| 1 Byte | 2 Bytes | 13 bytes |
|
|
+--------------------------------------------------------------------+
|
|
|
|
* Protocol version (currently '10')
|
|
* Server Version String (e.g. '4.0.5-beta-log'). Can be any length as
|
|
it's followed by a 0 byte.
|
|
* Thread Number - ID of server thread handling this connection
|
|
* Crypt seed - seed used to crypt password in auth packet from client
|
|
* CLIENT_xxx options - see include/mysql_com.h
|
|
* Server charset no. - Index of charset in use by server
|
|
* Server status variables - see include/mysql_com.h
|
|
* The padding bytes are reserverd for future extensions to the protocol
|
|
|
|
* CLIENT AUTH PACKET *
|
|
|
|
|
|
+--------------------------------------------------------------------+
|
|
| Header | CLIENT_xxx options supported | max_allowed_packet |
|
|
| | by client | for client |
|
|
| | 2 Bytes | 3 bytes |
|
|
|--------------------------------------------------------------------|
|
|
| User Name | 0x00 | Crypted Password | 0x00 | Database Name |
|
|
| n Bytes | 1 Byte | 8 Bytes | 1 Byte | n Bytes |
|
|
|--------------------------------------------------------------------|
|
|
| 0x00 |
|
|
| 1 Byte |
|
|
+--------------------------------------------------------------------+
|
|
|
|
* CLIENT_xxx options that this client supports:
|
|
|
|
#define CLIENT_LONG_PASSWORD 1 /* new more secure passwords */
|
|
#define CLIENT_FOUND_ROWS 2 /* Found instead of affected rows */
|
|
#define CLIENT_LONG_FLAG 4 /* Get all column flags */
|
|
#define CLIENT_CONNECT_WITH_DB 8 /* One can specify db on connect */
|
|
#define CLIENT_NO_SCHEMA 16 /* Don't allow database.table.column */
|
|
#define CLIENT_COMPRESS 32 /* Can use compression protocol */
|
|
#define CLIENT_ODBC 64 /* Odbc client */
|
|
#define CLIENT_LOCAL_FILES 128 /* Can use LOAD DATA LOCAL */
|
|
#define CLIENT_IGNORE_SPACE 256 /* Ignore spaces before '(' */
|
|
#define CLIENT_INTERACTIVE 1024 /* This is an interactive client */
|
|
#define CLIENT_SSL 2048 /* Switch to SSL after handshake */
|
|
#define CLIENT_IGNORE_SIGPIPE 4096 /* IGNORE sigpipes */
|
|
#define CLIENT_TRANSACTIONS 8192 /* Client knows about transactions */
|
|
|
|
* max_allowed_packet for the client (in 'int3store' form)
|
|
* User Name - user to authenticate as. Is followed by a null byte.
|
|
* Crypted Password - password crypted with seed given in packet from
|
|
server, see scramble() in sql/password.c
|
|
* Database name (optional) - initial database to use once connected
|
|
Is followed by a null byte
|
|
|
|
At the end of every client/server exchange there is either an 'OK' packet
|
|
or an 'ERROR' packet sent from the server. To determine whether a packet is
|
|
an 'OK' packet, or an 'ERROR' packet, check if the first byte (after the
|
|
header) is 0xFF. If it has the value of 0xFF, the packet is an 'ERROR'
|
|
packet.
|
|
|
|
|
|
* OK PACKET *
|
|
|
|
For details, see sql/net_pkg.cc::send_ok()
|
|
|
|
+-----------------------------------------------+
|
|
| Header | No of Rows | Affected Rows |
|
|
| | 1 Byte | 1-9 Byte |
|
|
|-----------------------------------------------|
|
|
| ID (last_insert_id) | Status | Length |
|
|
| 1-9 Byte | 2 Byte | 1-9 Byte |
|
|
|-----------------------------------------------|
|
|
| Messagetext |
|
|
| n Byte |
|
|
+-----------------------------------------------+
|
|
|
|
* Number of rows, always 0
|
|
* Affected rows
|
|
* ID (last_insert_id) - value for auto_increment column (if any)
|
|
* Status (usually 0)
|
|
|
|
In general, in the MySQL protocol, fields in a packet that that
|
|
represent numeric data, such as lengths, that are labeled as '1-9'
|
|
bytes can be decoded by the following logic:
|
|
|
|
If the first byte is '251', the
|
|
corresponding column value is NULL (only appropriate in
|
|
'ROW DATA' packets).
|
|
|
|
If the first byte is '252', the value stored can be read
|
|
from the following 2 bytes as a 16-bit integer.
|
|
|
|
|
|
If the first byte is '253' the value stored can be read
|
|
from the following 4 bytes as a 32-bit long integer
|
|
|
|
|
|
If the first byte is '254', the value stored can be read
|
|
from the following 8 bytes as a 64-byte long
|
|
|
|
Otherwise (values 0-250), the value stored is the value of the
|
|
first byte itself.
|
|
|
|
|
|
If the OK-packet includes a message:
|
|
|
|
* Length of message
|
|
* Message Text
|
|
|
|
|
|
* ERROR PACKET *
|
|
|
|
+-----------------------------------------------+
|
|
| Header | Status code | Error no |
|
|
| | 1 Byte | 2 Byte |
|
|
|-----------------------------------------------|
|
|
| Messagetext | |
|
|
| n Byte | |
|
|
+-----------------------------------------------+
|
|
|
|
* Status code (0xFF = ERROR)
|
|
* Error number (is only sent to 3.23 and newer clients)
|
|
* Error message text (ends at end of packet)
|
|
|
|
Note that the error message is not null terminated.
|
|
The client code can however assume that the packet ends with a null
|
|
as my_net_read() will always add an end-null to all read packets to
|
|
make things easier for the client.
|
|
|
|
Example:
|
|
|
|
Packet dump of client connecting to server:
|
|
|
|
+------------------------- Protocol Version (10)
|
|
|
|
|
| +---------------------- Server Version String (0x00 terminated)
|
|
| |
|
|
| |
|
|
0a 34 2e 30 2e 35 2d 62 . 4 . 0 . 5 - b
|
|
65 74 61 2d 6c 6f 67 00 e t a - l o g .
|
|
15 00 00 00 2b 5a 65 6c . . . . + Z e l
|
|
| |
|
|
| +------------ First 4 bytes of crypt seed
|
|
|
|
|
+------------------------ Thread Number
|
|
|
|
+------------------------- Last 4 bytes of crypt seed
|
|
|
|
|
| +-------- CLIENT_XXX Options supported by server
|
|
| |
|
|
| +-+--+ +--- Server charset index
|
|
| | | |
|
|
6f 69 41 46 00 2c 28 08 o i A F . , ( .
|
|
02 00 00 00 00 00 00 00 . . . . . . . .
|
|
| |
|
|
| +---------------------- 0x00 padding begins
|
|
|
|
|
+------------------------- Server status (0x02 =
|
|
SERVER_STATUS_AUTOCOMMIT)
|
|
|
|
00 00 00 00 00 00 00 00 . . . . . . . .
|
|
|
|
* Client Authentication Response (Username 'test', no database
|
|
selected) *
|
|
|
|
+--------------------- Packet Length (0x13 = 19 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +----------- CLIENT_XXX Options supported by client
|
|
| |
|
|
+---+---+ | +-+-+
|
|
| | | | |
|
|
13 00 00 01 03 00 1e 00 . . . . . . . .
|
|
00 74 65 73 74 00 48 5e . t e s t . H ^
|
|
| | |
|
|
+----+-----+ +------- Scrambled password, 0x00 terminated
|
|
|
|
|
+----------------- Username, 0x00 terminated
|
|
|
|
57 4a 4e 41 4a 4e 00 00 W J N A J N . .
|
|
00 .
|
|
|
|
|
|
>From this point on, the server waits for 'commands' from the client
|
|
which include queries, database shutdown, quit, change user, etc (see
|
|
the COM_xxxx values in include/mysql_com.h for the latest
|
|
command codes).
|
|
|
|
* *
|
|
* COMMAND PROCESSING *
|
|
* *
|
|
|
|
+--------+ +--------+
|
|
| Client | | Server |
|
|
+--------+ +--------+
|
|
| |
|
|
| A command packet, with a command code, and string data |
|
|
| when appropriate (e.g. a query), (see the COM_xxxx values |
|
|
| in include/mysql_com.h for the command codes) |
|
|
| |
|
|
| --------------------------------------------------------------> |
|
|
| |
|
|
| A 'RESULT' packet if the command completed successfully, |
|
|
| an 'ERROR' packet if the command failed. 'RESULT' packets |
|
|
| take different forms (see the details following this chart) |
|
|
| depending on whether or not the command returns rows. |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| n 'FIELD PACKET's (if rows are returned) |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| 'LAST DATA' packet |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| n 'ROW PACKET's (if rows are returned) |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
| 'LAST DATA' packet |
|
|
| |
|
|
| <-------------------------------------------------------------- |
|
|
| |
|
|
|
|
|
|
* Command Packet *
|
|
|
|
+------------------------------------------------------+
|
|
| Header | Command type | Query (if applicable) |
|
|
| | 1 Byte | n Bytes |
|
|
+------------------------------------------------------+
|
|
|
|
* Command type: (e.g.0x03 = query, see the COM_xxxx values in
|
|
include/mysql_com.h)
|
|
* Query (if applicable)
|
|
|
|
Note that my_net_read() null-terminates all packets on the
|
|
receiving side of the channel to make it easier for the code
|
|
examining the packets.
|
|
|
|
The current command codes are:
|
|
|
|
0x00 COM_SLEEP
|
|
0x01 COM_QUIT
|
|
0x02 COM_INIT_DB
|
|
0x03 COM_QUERY
|
|
0x04 COM_FIELD_LIST
|
|
0x05 COM_CREATE_DB
|
|
0x06 COM_DROP_DB
|
|
0x07 COM_REFRESH
|
|
0x08 COM_SHUTDOWN
|
|
0x09 COM_STATISTICS
|
|
0x0a COM_PROCESS_INFO
|
|
0x0b COM_CONNECT
|
|
0x0c COM_PROCESS_KILL
|
|
0x0d COM_DEBUG
|
|
0x0e COM_PING
|
|
0x0f COM_TIME
|
|
0x10 COM_DELAYED_INSERT
|
|
0x11 COM_CHANGE_USER
|
|
0x12 COM_BINLOG_DUMP
|
|
0x13 COM_TABLE_DUMP
|
|
0x14 COM_CONNECT_OUT
|
|
0x15 COM_REGISTER_SLAVE
|
|
|
|
* Result Packet *
|
|
|
|
Result packet for a command returning _no_ rows:
|
|
|
|
+-----------------------------------------------+
|
|
| Header | Field Count | Affected Rows |
|
|
| | 1-9 Bytes | 1-9 Bytes |
|
|
|-----------------------------------------------|
|
|
| ID (last_insert_id) | Server Status |
|
|
| 1-9 Bytes | 2 Bytes |
|
|
+-----------------------------------------------+
|
|
|
|
* Field Count: Has value of '0' for commands returning _no_ rows
|
|
* Affected rows: Count of rows affected by INSERT/UPDATE/DELETE, etc.
|
|
* ID: value of auto_increment column in row (if any). 0 if
|
|
* Server Status: Usually 0
|
|
|
|
Result packet for a command returning rows:
|
|
|
|
+-------------------------------+
|
|
| Header | Field Count |
|
|
| | 1-9 Bytes |
|
|
+-------------------------------+
|
|
|
|
* Field Count: number of columns/fields in result set,
|
|
(packed with net_store_length() in sql/net_pkg.cc)
|
|
|
|
This is followed by as many packets as the number of fields ('Field Count')
|
|
that contain the metadata for each column/field (see unpack_fields() in
|
|
libmysql/libmysql.c):
|
|
|
|
|
|
* FIELD PACKET *
|
|
|
|
+-----------------------------------------------+
|
|
| Header | Table Name |
|
|
| | length-coded-string |
|
|
|-----------------------------------------------|
|
|
| Field Name |
|
|
| length-code-string |
|
|
|-----------------------------------------------|
|
|
| Display length of field
|
|
| length-coded-binary (4 bytes) |
|
|
|-----------------------------------------------|
|
|
| Field Type (enum_field_types in mysql_com.h) |
|
|
| length-coded-binary (2 bytes) |
|
|
|-----------------------------------------------|
|
|
| Field Flags | Decimal Places|
|
|
| length-coded-binary (3 bytes) | 1 Byte |
|
|
+--------------+-------------+------------------+
|
|
|
|
* A length coded string is a string where we first have a packet
|
|
length (1-9 bytes, packed_with net_store_length()) followed
|
|
by a string.
|
|
* A length coded binary is a length (1 byte) followed by an integer
|
|
value in low-byte-first order. For the moment this type is always
|
|
fixed length in this packet.
|
|
|
|
* Table Name - the name of the table the column comes from
|
|
* Field Name - the name of the column/field
|
|
* Display length of field - length of field
|
|
* Field Type - Type of field, see enum_field_types in
|
|
include/mysql_com.h
|
|
|
|
Current field types are:
|
|
|
|
0x00 FIELD_TYPE_DECIMAL
|
|
0x01 FIELD_TYPE_TINY
|
|
0x02 FIELD_TYPE_SHORT
|
|
0x03 FIELD_TYPE_LONG
|
|
0x04 FIELD_TYPE_FLOAT
|
|
0x05 FIELD_TYPE_DOUBLE
|
|
0x06 FIELD_TYPE_NULL
|
|
0x07 FIELD_TYPE_TIMESTAMP
|
|
0x08 FIELD_TYPE_LONGLONG
|
|
0x09 FIELD_TYPE_INT24
|
|
0x0a FIELD_TYPE_DATE
|
|
0x0b FIELD_TYPE_TIME
|
|
0x0c FIELD_TYPE_DATETIME
|
|
0x0d FIELD_TYPE_YEAR
|
|
0x0e FIELD_TYPE_NEWDATE
|
|
0xf7 FIELD_TYPE_ENUM
|
|
0xf8 FIELD_TYPE_SET
|
|
0xf9 FIELD_TYPE_TINY_BLOB
|
|
0xfa FIELD_TYPE_MEDIUM_BLOB
|
|
0xfb FIELD_TYPE_LONG_BLOB
|
|
0xfc FIELD_TYPE_BLOB
|
|
0xfd FIELD_TYPE_VAR_STRING
|
|
0xfe FIELD_TYPE_STRING
|
|
0xff FIELD_TYPE_GEOMETRY
|
|
|
|
* Field Flags - NOT_NULL_FLAG, PRI_KEY_FLAG, xxx_FLAG in
|
|
include/mysql_com.h
|
|
|
|
|
|
Note that the packet format in 4.1 has slightly changed to allow more values.
|
|
|
|
|
|
* ROW PACKET *
|
|
|
|
+-----------------------------------------------+
|
|
| Header | Data Length | Column Data | ....for each column
|
|
| | 1-9 Bytes | n Bytes |
|
|
+-----------------------------------------------+
|
|
|
|
* Data Length: (packed with net_store_length() in sql/net_pkg.cc)
|
|
|
|
If 'Data Length' == 0, this is an 'ERROR PACKET'.
|
|
|
|
* Column Data: String representation of data. MySQL always sends result set
|
|
data as strings.
|
|
|
|
* LAST DATA PACKET *
|
|
|
|
Packet length is < 9 bytes, and first byte is 0xFE
|
|
|
|
+--------+
|
|
| 0xFE |
|
|
| 1 Byte |
|
|
+--------+
|
|
|
|
Examples:
|
|
|
|
***********
|
|
*
|
|
* INITDB Command
|
|
*
|
|
***********
|
|
|
|
A client issuing an 'INITDB' (select the database to use) command,
|
|
followed by an 'OK' packet with no rows and no affected rows from
|
|
the server:
|
|
|
|
* INITDB (select database to use) 'COMMAND' Packet *
|
|
|
|
+--------------------- Packet Length (5 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +------------ Command # (INITDB = 0x02)
|
|
| |
|
|
+---+---+ | | +---------- Beginning of query data
|
|
| | | | |
|
|
05 00 00 00 02 74 65 73 . . . . . t e s
|
|
74 t
|
|
|
|
* 'OK' Packet with no rows, and no rows affected *
|
|
|
|
+--------------------- Packet Length (3 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
+---+---+ |
|
|
| | |
|
|
03 00 00 01 00 00 00 . . . . . . .
|
|
|
|
|
|
***********
|
|
*
|
|
* SELECT query example
|
|
*
|
|
***********
|
|
|
|
Client issuing a 'SELECT *' query on the following table:
|
|
|
|
CREATE TABLE number_test (minBigInt bigint,
|
|
maxBigInt bigint,
|
|
testBigInt bigint)
|
|
|
|
* 'COMMAND' Packet with QUERY (select ...) *
|
|
|
|
+--------------------- Packet Length (26)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +------------ Command # (QUERY = 0x03)
|
|
| |
|
|
+---+---+ | | +---------- Beginning of query data
|
|
| | | | |
|
|
1a 00 00 00 03 53 45 4c . . . . . S E L
|
|
45 43 54 20 2a 20 66 72 E C T . * . f r
|
|
6f 6d 20 6e 75 6d 62 65 o m . n u m b e
|
|
72 5f 74 65 73 74 r _ t e s t
|
|
|
|
|
|
and receiving an 'OK' packet with a 'FIELD COUNT' of 3
|
|
|
|
|
|
* 'OK' Packet with 3 fields *
|
|
|
|
+--------------------- Packet Length (3 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
+---+---+ |
|
|
| | |
|
|
01 00 00 01 03 . . . . .
|
|
|
|
Followed immediately by 3 'FIELD' Packets. Note, the individual packets
|
|
are delimitted by =======, so that all fields can be annotated in the first
|
|
'FIELD' packet example:
|
|
|
|
=============================================================
|
|
|
|
+--------------------- Packet Length (0x1f = 31 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +------------ Block Length (0x0b = 11 bytes)
|
|
| | |
|
|
+---+---+ | | +--------- Table Name (11 bytes long)
|
|
| | | | |
|
|
1f 00 00 02 0b 6e 75 6d . . . . . n u m
|
|
62 65 72 5f 74 65 73 74 b e r _ t e s t
|
|
|
|
+------------------------ Block Length (9 bytes)
|
|
|
|
|
| +--------------------- Column Name (9 bytes long)
|
|
| |
|
|
09 6d 69 6e 42 69 67 49 . m i n B i g I
|
|
6e 74 03 14 00 00 01 08 n t . . . . . .
|
|
| | | | |
|
|
| +---+---+ | +--- Field Type (0x08 = FIELD_TYPE_LONGLONG)
|
|
| | |
|
|
| | +------ Block Length (1)
|
|
| |
|
|
| +--------------- Display Length (0x14 = 20 chars)
|
|
|
|
|
+------------------ Block Length (3)
|
|
|
|
+------------------------ Block Length (2)
|
|
|
|
|
| +-------------------- Field Flags (0 - no flags set)
|
|
| |
|
|
| +---+ +--------------- Decimal Places (0)
|
|
| | | |
|
|
02 00 00 00 . . . .
|
|
|
|
=============================================================
|
|
|
|
'FIELD' packet for the 'number_Test.maxBigInt' column
|
|
|
|
1f 00 00 03 0b 6e 75 6d . . . . . n u m
|
|
62 65 72 5f 74 65 73 74 b e r _ t e s t
|
|
09 6d 61 78 42 69 67 49 . m a x B i g I
|
|
6e 74 03 14 00 00 01 08 n t . . . . . .
|
|
02 00 00 00 . . . .
|
|
|
|
=============================================================
|
|
|
|
'FIELD' packet for the 'number_test.testBigInt' column
|
|
|
|
20 00 00 04 0b 6e 75 6d . . . . . n u m
|
|
62 65 72 5f 74 65 73 74 b e r _ t e s t
|
|
0a 74 65 73 74 42 69 67 . t e st B i g
|
|
49 6e 74 03 14 00 00 01 I n t . . . . .
|
|
08 02 00 00 00 . . . . .
|
|
=============================================================
|
|
|
|
Followed immediately by one 'LAST DATA' packet:
|
|
|
|
fe 00 . .
|
|
|
|
Followed immediately by 'n' row packets (in this case, only
|
|
one packet is sent from the server, for simplicity's sake):
|
|
|
|
|
|
+--------------------- Packet Length (0x52 = 82 bytes)
|
|
|
|
|
| +--------------- Packet Sequence #
|
|
| |
|
|
| | +------------ Data Length (0x14 = 20 bytes)
|
|
| | |
|
|
+---+---+ | | +--------- String Data '-9223372036854775808'
|
|
| | | | | (repeat Data Length/Data sequence)
|
|
|
|
52 00 00 06 14 2d 39 32 . . . . . - 9 2
|
|
32 33 33 37 32 30 33 36 2 3 3 7 2 0 3 6
|
|
38 35 34 37 37 35 38 30 8 5 4 7 7 5 8 0
|
|
38 13 39 32 32 33 33 37 8 . 9 2 2 3 3 7
|
|
32 30 33 36 38 35 34 37 2 0 3 6 8 5 4 7
|
|
37 35 38 30 37 0a 36 31 7 5 8 0 7 . 6 1
|
|
34 37 34 38 33 36 34 37 4 7 4 8 3 6 4 7
|
|
|
|
Followed immediately by one 'LAST DATA' packet:
|
|
|
|
fe 00 . .
|
|
|
|
|
|
|
|
@c The Index was empty, and ugly, so I removed it. (jcole, Sep 7, 2000)
|
|
|
|
@c @node Index
|
|
@c @unnumbered Index
|
|
|
|
@c @printindex fn
|
|
|
|
@node 4.1 protocol,,,
|
|
@subchapter MySQL 4.1 protocol
|
|
|
|
@node 4.1 protocol changes,,,
|
|
@section Changes to 4.0 protocol in 4.1
|
|
|
|
All basic packet handling is identical to 4.0. When communication
|
|
with an old 4.0 or 3.x client we will use the old protocol.
|
|
|
|
The new things that we support with 4.1 are:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Warnings
|
|
@item
|
|
Prepared statements
|
|
@item
|
|
Binary protocol (will be faster than the current protocol that
|
|
converts everything to strings)
|
|
@end itemize
|
|
|
|
|
|
What has changed in 4.1 are:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
A lot of new field information (database, real table name etc)
|
|
@item
|
|
The 'ok' packet has more status fields
|
|
@item
|
|
The 'end' packet (send last for each result set) now contains some
|
|
extra information
|
|
@item
|
|
New protocol for prepared statements. In this case all parameters and
|
|
results will sent as binary (low-byte-first).
|
|
@end itemize
|
|
|
|
|
|
@node 4.1 field packet,,,
|
|
@section 4.1 field description packet
|
|
|
|
The field description packet is sent as a response to a query that
|
|
contains a result set. It can be distinguished from a ok packet by
|
|
the fact that the first byte can't be 0 for a field packet.
|
|
@xref {4.1 ok packet}.
|
|
|
|
The header packet has the following structure:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 1-9 @tab Number of columns in result set (never 0)
|
|
@item 1-9 @tab Extra information sent be some command (SHOW COLUMNS
|
|
uses this to send the number of rows in the table)
|
|
@end multitable
|
|
|
|
This packet is always followed by a field description set.
|
|
@xref{4.1 field desc}.
|
|
|
|
@node 4.1 field desc,,,
|
|
@section 4.1 field description result set
|
|
|
|
The field description result set contains the meta info for a result set.
|
|
|
|
@multitable @columnfractions .20 .80
|
|
@item Type @tab Comment
|
|
@item string @tab Database name
|
|
@item string @tab Table name alias (or table name if no alias)
|
|
@item string @tab Real table name
|
|
@item string @tab Alias for column name (or column name if not used)
|
|
|
|
@item 11 byte @tab Fixed length fields in one field part:
|
|
@itemize
|
|
@item 2 byte int @tab Character set number
|
|
@item 3 byte int @tab Length of column definition
|
|
@item 1 byte int @tab Enum value for field type
|
|
@item 3 byte int @tab 2 byte column flags (NOT_NULL_FLAG etc..) + 1 byte number of decimals.
|
|
@item 2 byte int @tab zero (reserved for future use)
|
|
@end itemize
|
|
|
|
@item string int @tab Default value, only set when using mysql_list_fields().
|
|
@end multitable
|
|
|
|
|
|
@node 4.1 ok packet,,,
|
|
@section 4.1 ok packet
|
|
|
|
The ok packet is the first that is sent as an response for a query
|
|
that didn't return a result set.
|
|
|
|
The ok packet has the following structure:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 1 @tab 0 ; Marker for ok packet
|
|
@item 1-9 @tab Affected rows
|
|
@item 1-9 @tab Last insert id (0 if one wasn't used)
|
|
@item 2 @tab Server status; Can be used by client to check if we are inside an transaction
|
|
@item 2 @tab Warning count
|
|
@item 1-9 @tab Message length (optional)
|
|
@item xxx @tab Message (optional)
|
|
@end multitable
|
|
|
|
Size 1-9 means that the parameter is packed in to 1-9 bytes depending on
|
|
the value. (See function sql/net_pkg.cc::net_store_length).
|
|
|
|
The message is optional. For example for multi line INSERT it
|
|
contains a string for how many rows was inserted / deleted.
|
|
|
|
|
|
@node 4.1 end packet,,,
|
|
@section 4.1 end packet
|
|
|
|
The end packet is sent as the last packet for
|
|
|
|
@itemize @bullet
|
|
@item
|
|
End of field information
|
|
@item
|
|
End of parameter type information
|
|
@item
|
|
End of result set
|
|
@end itemize
|
|
|
|
The end packet has the following structure:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 1 @tab 254 ; Marker for EOF packet
|
|
@item 2 @tab Warning count
|
|
@item 2 @tab Status flags (For flags like SERVER_STATUS_MORE_RESULTS)
|
|
@end multitable
|
|
|
|
Note that a normal packet may start with byte 254, which means
|
|
'length stored in 9 bytes'. One can different between these cases
|
|
by checking the packet length < 9 bytes (in which case it's and end
|
|
packet).
|
|
|
|
|
|
@node 4.1 error packet
|
|
@section 4.1 error packet.
|
|
|
|
The error packet is sent when something goes wrong.
|
|
The error packet has the following structure:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 1 @tab 255 Error packet marker
|
|
@item 2 @tab Error code
|
|
@item 1-255 @tab Null terminated error message
|
|
@end multitable
|
|
|
|
The client/server protocol is designed in such a way that a packet
|
|
can only start with 255 if it's an error packet.
|
|
|
|
|
|
@node 4.1 prep init,,,
|
|
@section 4.1 prepared statement init packet
|
|
|
|
This is the return packet when one sends a query with the COM_PREPARE
|
|
command.
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 4 @tab Statement handler id
|
|
@item 2 @tab Number of columns in result set
|
|
@item 2 @tab Number of parameters in query
|
|
@end multitable
|
|
|
|
After this, there is a packet that contains the following for each
|
|
parameter in the query:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 2 @tab Enum value for field type. (MYSQL_TYPE_UNKNOWN if not known)
|
|
@item 2 @tab 2 byte column flags (NOT_NULL_FLAG etc)
|
|
@item 1 @tab Number of decimals
|
|
@item 4 @tab Max column length.
|
|
@end itemize
|
|
|
|
Note that the above is not yet in 4.1 but will be added this month.
|
|
|
|
As MySQL can have a parameter 'anywhere' it will in many cases not be
|
|
able to provide the optimal information for all parameters.
|
|
|
|
If number of columns, in the header packet, is not 0 then the
|
|
prepared statement will contain a result set. In this case the packet
|
|
is followed by a field description result set. @xref{4.1 field desc}.
|
|
|
|
|
|
@node 4.1 long data,,,
|
|
@section 4.1 long data handling
|
|
|
|
This is used by mysql_send_long_data() to set any parameter to a string
|
|
value. One can call mysql_send_long_data() multiple times for the
|
|
same parameter; The server will concatenate the results to a one big
|
|
string.
|
|
|
|
The server will not require an end packet for the string.
|
|
mysql_send_long_data() is responsible updating a flag that all data
|
|
has been sent. (Ie; That the last call to mysql_send_long_data() has
|
|
the 'last_data' flag set).
|
|
|
|
This packet is sent from client -> server:
|
|
|
|
@multitable @columnfractions .10 .90
|
|
@item Size @tab Comment
|
|
@item 4 @tab Statement handler
|
|
@item 2 @tab Parameter number
|
|
@item 2 @tab Type of parameter (not used at this point)
|
|
@item # @tab data (Rest of packet)
|
|
@end itemize
|
|
|
|
The server will NOT send an @code{ok} or @code{error} packet in
|
|
responce for this. If there is any errors (like to big string), one
|
|
will get the error when calling execute.
|
|
|
|
@node 4.1 execute,,,
|
|
@section 4.1 execute
|
|
|
|
On execute we send all parameters to the server in a COM_EXECUTE
|
|
packet.
|
|
|
|
The packet contains the following information:
|
|
|
|
@multitable @columnfractions .30 .70
|
|
@item Size @tab Comment
|
|
@item (param_count+9)/8 @tab Null bit map (2 bits reserved for protocol)
|
|
@item 1 @tab new_parameter_bound flag. Is set to 1 for first
|
|
execute or if one has rebound the parameters.
|
|
@item 2*param_count @tab Type of parameters (only given if new_parameter_bound flag is 1)
|
|
@item # @tab Parameter data, repeated for each parameter that are
|
|
NOT NULL and not used with mysql_send_long_data().
|
|
@end itemize
|
|
|
|
The null-bit-map is for all parameters (including parameters sent with
|
|
'mysql_send_long_data). If parameter 0 is NULL, then bit 0 in the
|
|
null-bit-map should be 1 (ie: first byte should be 1)
|
|
|
|
The parameters are stored the following ways:
|
|
|
|
@multitable @columnfractions .20 .10 .70
|
|
@item Type @tab Size @tab Comment
|
|
@item tinyint @tab 1 @tab One byte integer
|
|
@item short @tab 2 @tab
|
|
@item int @tab 4 @tab
|
|
@item longlong @tab 8 @tab
|
|
@item float @tab 4 @tab
|
|
@item double @tab 8 @tab
|
|
@item string @tab 1-9 + # @tab Packed string length + string
|
|
@end multitable
|
|
|
|
The result for this will be either an ok packet or a binary result
|
|
set.
|
|
|
|
@node 4.1 binary result,,,
|
|
@section 4.1 binary result set
|
|
|
|
A binary result are sent the following way.
|
|
|
|
For each result row:
|
|
|
|
@itemize
|
|
@item
|
|
null bit map with first two bits set to 01 (bit 0,1 value 1)
|
|
@item
|
|
parameter data, repeated for each not null result column.
|
|
@end itemize
|
|
|
|
The idea with the reserving two bits in the null map is that we can
|
|
use standard error (first byte 255) and ok packets (first byte 0)
|
|
to end a result sets.
|
|
|
|
Except that the null-bit-map is shifted two steps, the server is
|
|
sending the data to the client the same way that the server is sending
|
|
bound parameters to the client. The server is always sending the data
|
|
as type given for 'column type' for respective column. It's up to the
|
|
client to convert the parameter to the requested type.
|
|
|
|
DATETIME, DATE and TIME are sent to the server in a binary format as follows:
|
|
|
|
@multitable @columnfractions .20 .10 .70
|
|
@item Type @tab Size @tab Comment
|
|
@item date @tab 1 + 0-11 @tab Length + 2 byte year, 1 byte MMDDHHMMSS, 4 byte billionth of a second
|
|
@item datetime @tab 1 + 0-11 @tab Length + 2 byte year, 1 byte MMDDHHMMSS, 4 byte billionth of a second
|
|
@item time @tab 1 + 0-14 @tab Length + sign (0 = pos, 1= neg), 4 byte days, 1 byte HHMMDD, 4 byte billionth of a second
|
|
@end multitable
|
|
|
|
The first byte is a length byte and then comes all parameters that are
|
|
not 0. (Always counted from the beginning).
|
|
|
|
@node Fulltext Search, , protocol, Top
|
|
@chapter Fulltext Search in MySQL
|
|
|
|
Hopefully, sometime there will be complete description of
|
|
fulltext search algorithms.
|
|
Now it's just unsorted notes.
|
|
|
|
@menu
|
|
* Weighting in boolean mode::
|
|
@end menu
|
|
|
|
@node Weighting in boolean mode, , Fulltext Search, Fulltext Search
|
|
@section Weighting in boolean mode
|
|
|
|
The basic idea is as follows: in expression
|
|
@code{A or B or (C and D and E)}, either @code{A} or @code{B} alone
|
|
is enough to match the whole expression. While @code{C},
|
|
@code{D}, and @code{E} should @strong{all} match. So it's
|
|
reasonable to assign weight 1 to @code{A}, @code{B}, and
|
|
@code{(C and D and E)}. And @code{C}, @code{D}, and @code{E}
|
|
should get a weight of 1/3.
|
|
|
|
Things become more complicated when considering boolean
|
|
operators, as used in MySQL FTB. Obvioulsy, @code{+A +B}
|
|
should be treated as @code{A and B}, and @code{A B} -
|
|
as @code{A or B}. The problem is, that @code{+A B} can @strong{not}
|
|
be rewritten in and/or terms (that's the reason why this - extended -
|
|
set of operators was chosen). Still, aproximations can be used.
|
|
@code{+A B C} can be approximated as @code{A or (A and (B or C))}
|
|
or as @code{A or (A and B) or (A and C) or (A and B and C)}.
|
|
Applying the above logic (and omitting mathematical
|
|
transformations and normalization) one gets that for
|
|
@code{+A_1 +A_2 ... +A_N B_1 B_2 ... B_M} the weights
|
|
should be: @code{A_i = 1/N}, @code{B_j=1} if @code{N==0}, and,
|
|
otherwise, in the first rewritting approach @code{B_j = 1/3},
|
|
and in the second one - @code{B_j = (1+(M-1)*2^M)/(M*(2^(M+1)-1))}.
|
|
|
|
The second expression gives somewhat steeper increase in total
|
|
weight as number of matched B's increases, because it assigns
|
|
higher weights to individual B's. Also the first expression in
|
|
much simplier. So it is the first one, that is implemented in MySQL.
|
|
|
|
@summarycontents
|
|
@contents
|
|
|
|
@bye
|