mirror of
https://github.com/MariaDB/server.git
synced 2025-01-31 11:01:52 +01:00
Merge paul@work.mysql.com:/home/bk/mysql-4.0
into teton.kitebird.com:/home/paul/mysql-4.0
This commit is contained in:
commit
1ecb9e48bc
4 changed files with 51 additions and 3 deletions
|
@ -463,3 +463,4 @@ mysql-test/r/rpl000001.eval
|
|||
Docs/safe-mysql.xml
|
||||
mysys/test_vsnprintf
|
||||
Docs/manual.de.log
|
||||
Docs/internals.info
|
||||
|
|
|
@ -57,6 +57,7 @@ This is a manual about @strong{MySQL} internals.
|
|||
* mysys functions:: Functions In The @code{mysys} Library
|
||||
* DBUG:: DBUG Tags To Use
|
||||
* protocol:: MySQL Client/Server Protocol
|
||||
* Fulltext Search:: Fulltext Search in MySQL
|
||||
@end menu
|
||||
|
||||
|
||||
|
@ -535,7 +536,7 @@ Print query.
|
|||
@end table
|
||||
|
||||
|
||||
@node protocol, , DBUG, Top
|
||||
@node protocol, Fulltext Search, DBUG, Top
|
||||
@chapter MySQL Client/Server Protocol
|
||||
|
||||
@menu
|
||||
|
@ -785,6 +786,48 @@ Date 03 0A 00 00 |01 0A |03 00 00 00
|
|||
|
||||
@c @printindex fn
|
||||
|
||||
@node Fulltext Search, , protocol, Top
|
||||
@chapter Fulltext Search in MySQL
|
||||
|
||||
Hopefully, sometime there will be complete description of
|
||||
fulltext search algorithms.
|
||||
Now it's just unsorted notes.
|
||||
|
||||
@menu
|
||||
* Weighting in boolean mode::
|
||||
@end menu
|
||||
|
||||
@node Weighting in boolean mode, , , Fulltext Search
|
||||
@section Weighting in boolean mode
|
||||
|
||||
The basic idea is as follows: in expression
|
||||
@code{A or B or (C and D and E)}, either @code{A} or @code{B} alone
|
||||
is enough to match the whole expression. While @code{C},
|
||||
@code{D}, and @code{E} should @strong{all} match. So it's
|
||||
reasonable to assign weight 1 to @code{A}, @code{B}, and
|
||||
@code{(C and D and E)}. And @code{C}, @code{D}, and @code{E}
|
||||
should get a weight of 1/3.
|
||||
|
||||
Things become more complicated when considering boolean
|
||||
operators, as used in MySQL FTB. Obvioulsy, @code{+A +B}
|
||||
should be treated as @code{A and B}, and @code{A B} -
|
||||
as @code{A or B}. The problem is, that @code{+A B} can @strong{not}
|
||||
be rewritten in and/or terms (that's the reason why this - extended -
|
||||
set of operators was chosen). Still, aproximations can be used.
|
||||
@code{+A B C} can be approximated as @code{A or (A and (B or C))}
|
||||
or as @code{A or (A and B) or (A and C) or (A and B and C)}.
|
||||
Applying the above logic (and omitting mathematical
|
||||
transformations and normalization) one gets that for
|
||||
@code{+A_1 +A_2 ... +A_N B_1 B_2 ... B_M} the weights
|
||||
should be: @code{A_i = 1/N}, @code{B_j=1} if @code{N==0}, and,
|
||||
otherwise, in the first rewritting approach @code{B_j = 1/3},
|
||||
and in the second one - @code{B_j = (1+(M-1)*2^M)/(M*(2^(M+1)-1))}.
|
||||
|
||||
The second expression gives somewhat steeper increase in total
|
||||
weight as number of matched B's increases, because it assigns
|
||||
higher weights to individual B's. Also the first expression in
|
||||
much simplier. So it is the first one, that is implemented in MySQL.
|
||||
|
||||
@summarycontents
|
||||
@contents
|
||||
|
||||
|
|
|
@ -48951,6 +48951,8 @@ Our TODO section contains what we plan to have in 4.0. @xref{TODO MySQL 4.0}.
|
|||
|
||||
@itemize @bullet
|
||||
@item
|
||||
Boolean fulltext search weighting scheme changed to something more reasonable.
|
||||
@item
|
||||
Fixed bug in boolean fulltext search, that caused MySQL to ignore queries of
|
||||
@code{ft_min_word_len} characters.
|
||||
@item
|
||||
|
|
|
@ -322,7 +322,8 @@ void _ftb_climb_the_tree(FTB *ftb, FTB_WORD *ftbw, FT_SEG_ITERATOR *ftsi_orig)
|
|||
break;
|
||||
if (yn & FTB_FLAG_YES)
|
||||
{
|
||||
ftbe->cur_weight+=weight;
|
||||
weight /= ftbe->ythresh;
|
||||
ftbe->cur_weight += weight;
|
||||
if (++ftbe->yesses == ythresh)
|
||||
{
|
||||
yn=ftbe->flags;
|
||||
|
@ -360,7 +361,8 @@ void _ftb_climb_the_tree(FTB *ftb, FTB_WORD *ftbw, FT_SEG_ITERATOR *ftsi_orig)
|
|||
}
|
||||
else
|
||||
{
|
||||
ftbe->cur_weight+=weight;
|
||||
if (ftbe->ythresh) weight/=3;
|
||||
ftbe->cur_weight += weight;
|
||||
if (ftbe->yesses < ythresh)
|
||||
break;
|
||||
yn= (ftbe->yesses++ == ythresh) ? ftbe->flags : 0 ;
|
||||
|
|
Loading…
Add table
Reference in a new issue