MDEV-33407 Parser support for vector indexes
The syntax is
create table t1 (... vector index (v) ...);
limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported
MDEV-33404 Engine-independent indexes: subtable method
added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.
MDEV-33406 basic optimizer support for k-NN searches
for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
let the caller tell init_tmp_table_share() whether the table
should be thread_specific or not.
In particular, internal tmp tables created in the slave thread
are perfectly thread specific
create templates
thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)
and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.
The code in best_access_path() uses PREV_BITS(uint, N) to
compute a bitmap of all keyparts: {keypart0, ... keypart{N-1}).
The problem is that PREV_BITS($type, N) macro code can't handle the case
when N=<number of bits in $type).
Also, why use PREV_BITS(uint, ...) for key part map computations when
we could have used PREV_BITS(key_part_map) ?
Fixed both:
- Change PREV_BITS(type, N) to handle any N in [0; n_bits(type)].
- Change PREV_BITS() to use key_part_map when computing key_part_map bitmaps.
Single-table UPDATE/DELETE didn't provide outer_lookup_keys value for
subqueries. This didn't allow to make a meaningful choice between
IN->EXISTS and Materialization strategies for subqueries.
Fix this:
* Make UPDATE/DELETE save Sql_cmd_dml::scanned_rows,
* Then, subquery's JOIN::choose_subquery_plan() can fetch it from
there for outer_lookup_keys
Details:
UPDATE/DELETE now calls select_lex->optimize_unflattened_subqueries()
twice, like SELECT does (first call optimize_constant_subquries() in
JOIN::optimize_inner(), then call optimize_unflattened_subqueries() in
JOIN::optimize_stage2()):
1. Call with const_only=true before any optimizations. This allows
range optimizer and others to use the values of cheap const
subqueries.
2. Call it with const_only=false after range optimizer, partition
pruning, etc. outer_lookup_keys value is provided, so it's possible to
pick a good subquery strategy.
Note: PROTECT_STATEMENT_MEMROOT requires that first SP execution
performs subquery optimization for all subqueries, even for degenerate
query plans like "Impossible WHERE". Due to that, we ensure that the
call to optimize_unflattened_subqueries (with const_only=false) even
for degenerate query plans still happens, as was the case before this
change.
Improve performance of queries like
SELECT * FROM t1 WHERE field = NAME_CONST('a', 4);
by, in this example, replacing the WHERE clause with field = 4
in the case of ref access.
The rewrite is done during fix_fields and we disambiguate this
case from other cases of NAME_CONST by inspecting where we are
in parsing. We rely on THD::where to accomplish this. To
improve performance there, we change the type of THD::where to
be an enumeration, so we can avoid string comparisons during
Item_name_const::fix_fields. Consequently, this patch also
changes all usages of THD::where to conform likewise.
This patch also fixes:
MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
MDEV-33088 Cannot create triggers in the database `MYSQL`
MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0
- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER
- Adding a wrapper function CHARSET_INFO::streq(), to compare
two strings for equality. For now it calls strnncoll() internally.
In the future it will turn into a virtual function.
- Adding new accent sensitive case insensitive collations:
- utf8mb4_general1400_as_ci
- utf8mb3_general1400_as_ci
They implement accent sensitive case insensitive comparison.
The weight of a character is equal to the code point of its
upper case variant. These collations use Unicode-14.0.0 casefolding data.
The result of
my_charset_utf8mb3_general1400_as_ci.strcoll()
is very close to the former
my_charset_utf8mb3_general_ci.strcasecmp()
There is only a difference in a couple dozen rare characters, because:
- the switch from "tolower" to "toupper" comparison, to make
utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
- the switch from Unicode-3.0.0 to Unicode-14.0.0
This difference should be tolarable. See the list of affected
characters in the MDEV description.
Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
Unlike utf8mb4_general_ci, it does not treat all BMP characters
as equal.
- Adding classes representing names of the file based database objects:
Lex_ident_db
Lex_ident_table
Lex_ident_trigger
Their comparison collation depends on the underlying
file system case sensitivity and on --lower-case-table-names
and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.
- Adding classes representing names of other database objects,
whose names have case insensitive comparison style,
using my_charset_utf8mb3_general1400_as_ci:
Lex_ident_column
Lex_ident_sys_var
Lex_ident_user_var
Lex_ident_sp_var
Lex_ident_ps
Lex_ident_i_s_table
Lex_ident_window
Lex_ident_func
Lex_ident_partition
Lex_ident_with_element
Lex_ident_rpl_filter
Lex_ident_master_info
Lex_ident_host
Lex_ident_locale
Lex_ident_plugin
Lex_ident_engine
Lex_ident_server
Lex_ident_savepoint
Lex_ident_charset
engine_option_value::Name
- All the mentioned Lex_ident_xxx classes implement a method streq():
if (ident1.streq(ident2))
do_equal();
This method works as a wrapper for CHARSET_INFO::streq().
- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
in class members and in function/method parameters.
- Replacing all calls like
system_charset_info->coll->strcasecmp(ident1, ident2)
to
ident1.streq(ident2)
- Taking advantage of the c++11 user defined literal operator
for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
data types. Use example:
const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;
is now a shorter version of:
const Lex_ident_column primary_key_name=
Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
The code inside Item_subselect::fix_fields() could fail to check
that left expression had an Item_row, like this:
(('x', 1.0) ,1) IN (SELECT 'x', 1.23 FROM ... UNION ...)
In order to hit the failure, the first SELECT of the subquery had
to be a degenerate no-tables select. In this case, execution will
not enter into Item_in_subselect::create_row_in_to_exists_cond()
and will not check if left_expr is composed of scalars.
But the subquery is a UNION so as a whole it is not degenerate.
We try to create an expression cache for the subquery.
We create a temp.table from left_expr columns. No field is created
for the Item_row. Then, we crash when trying to add an index over a
non-existent field.
Fixed by moving the left_expr cardinality check to a point in
check_and_do_in_subquery_rewrites() which gets executed for all
cases.
It's better to make the check early so we don't have to care about
subquery rewrite code hitting Item_row in left_expr.
This patch also fixes
MDEV-31391 Assertion `((best.records_out) == 0.0 ... failed
Cost changes caused by this change:
- range queries with join buffer now have a notable smaller cost.
- range ranges are bit more expensive as the MULTI_RANGE_COST is now
properly applied to it in all cases (this extra cost is equal to a
key lookup).
- table scan cost is slight smaller as we now assume data is cached in
the engine after the first scan pass. (We did this before for range
scans and other access methods).
- partition tables had wrong values for max_row_blocks and
max_index_blocks. Correcting this, causes range access on
partitioned tables to have slightly higher cost because of the
increased estimated IO.
- Using first match + join buffer caused 'filtered' to be calcualted
wrong. (Only affected EXPLAIN, not query costs).
- Added cost_without_join_buffer to optimizer_trace.
- check_quick_select() adjusted the number of rows according to persistent
statistics, but did not adjust cost. Now fixed.
The big change in the patch are:
- In best_access_path(), where we now are using storing the cost in
'ALL_READ_COST cost' and only converting it to a double at the end.
This allows us to more exactly calculate the effect of the join_cache.
- In JOIN_TAB::estimate_scan_time(), store the cost also in a
ALL_READ_COST object.
One of effect if this change is that when joining very small tables:
t1 some_access_method
t2 range
t3 ALL Use join buffer
This is swiched to
t1 some_access_method
t3 ALL
t2 range use join buffer
Both plans has the same cost, but as table scan in this case has less
cost than rang, the table scan will be considered first and thus have
precidence.
Test case changes:
- optimizer_trace - Addition of cost_without_join_buffer
- subselect_mat_cost_bugs - Small tables and scan versus range
- range & range_mrr_icp - Range + join_cache is faster than ref
- optimizer_trace - cost_without_join_buffer, smaller scan cost,
range setup cost.
- mrr - range+join_buffer used as smaller cost
The code in create_internal_tmp_table() didn't take into account that
now temporary (derived) tables may have multiple indexes:
- one index due to duplicate removal
= In this example created by conversion of big-IN(...) into subquery
= this index might be converted into a "unique constraint" if the key
length is too large.
- one index added by derived_with_keys optimization.
Make create_internal_tmp_table() handle multiple indexes.
Before this patch, use of a unique constraint was indicated in
TABLE_SHARE::uniques. This was ok as unique constraint was the only index
in the table. Now it's no longer the case so TABLE_SHARE::uniques is removed
and replaced with an in-memory-only flag HA_UNIQUE_HASH.
This patch is based on Monty's patch.
Co-Author: Monty <monty@mariadb.org>
This bug affected EXPLAIN EXTENDED command for single-table DELETE that
used an IN subquery in its WHERE clause. A crash happened if the optimizer
chose to employ index_subquery or unique_subquery access when processing
such command.
The crash happened when the command tried to print the transformed query.
In the current code of 10.4 for single-table DELETE statements the output
of any explain command is produced after the join structures of all used
subqueries have been destroyed. JOIN::destroy() sets the field tab of the
JOIN_TAB structures created for subquery tables to NULL. As a result
subselect_indexsubquery_engine::print(), subselect_indexsubquery_engine()
cannot use this field to get the alias name of the joined table.
This patch suggests to use the field TABLE_LIST::TAB that can be accessed
from JOIN_TAB::tab_list to get the alias name of the joined table.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch allows to use semi-join optimization at the top level of
single-table update and delete statements.
The problem of supporting such optimization became easy to resolve after
processing a single-table update/delete statement started using JOIN
structure. This allowed to use JOIN::prepare() not only for multi-table
updates/deletes but for single-table ones as well. This was done in the
patch for mdev-28883:
Re-design the upper level of handling UPDATE and DELETE statements.
Note that JOIN::prepare() detects all subqueries that can be considered
as candidates for semi-join optimization. The code added by this patch
looks for such candidates at the top level and if such candidates are found
in the processed single-table update/delete the statement is handled in
the same way as a multi-table update/delete.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch fixes not only the assertion failure in the function
Field_iterator_table_ref::set_field_iterator() but also:
- fixes the problem of forced materialization of derived tables used
in subqueries contained in WHERE clauses of single-table and multi-table
UPDATE and DELETE statements
- fixes the problem of MDEV-17954 that prevented execution of multi-table
DELETE statements if they use in their WHERE clauses references to
the tables that are updated.
The patch must be considered a complement to the patch for MDEV-28883.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch introduces a new way of handling UPDATE and DELETE commands at
the top level after the parsing phase. This new way of processing update
and delete statements can be seen in the implementation of the prepare()
and execute() methods from the new Sql_cmd_dml class. This class derived
from the Sql_cmd class can be considered as an interface class for processing
such commands as SELECT, INSERT, UPDATE, DELETE and other comands
manipulating data in tables.
With this patch processing of update and delete statements after parsing
proceeds by the following schema:
- precheck of the access rights is performed for the used tables
- the used tables are opened
- context analysis phase is performed for the statement
- the used tables are locked
- the statement is optimized and executed
- clean-up is performed for the statement
The implementation of the method Sql_cmd_dml::execute() adheres this schema.
The virtual functions of the class Sql_cmd_dml used for precheck of the
access rights, context analysis, optimization and execution allow to adjust
this schema for processing data manipulation statements of any types.
This schema of processing data manipulation statements is taken from the
current MySQL code. Moreover the definition the class Sql_cmd_dml introduced
in this patch is almost a full replica of such class in the existing MySQL.
However the implementation of the derived classes for update and delete
statements is quite different. This implementation employs the JOIN class
for all kinds of update and delete statements. It allows to perform main
bulk of context analysis actions by the function JOIN::prepare(). This
guarantees that characteristics and properties of the statement tree
discovered for optimization phase when doing context analysis are the same
for single-table and multi-table updates and deletes.
With this patch the following functions are gone:
mysql_prepare_update(), mysql_multi_update_prepare(),
mysql_update(), mysql_multi_update(),
mysql_prepare_delete(), mysql_multi_delete_prepare(), mysql_delete().
The code within these functions have been used as much as possible though.
The functions mysql_test_update() and mysql_test_delete() are also not
needed anymore. The method Sql_cmd_dml::prepare() serves processing
- update/delete statement
- PREPARE stmt FROM "<update/delete statement>"
- EXECUTE stmt when stmt is prepared from update/delete statement.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Firstmatch_picker::check_qep() has an optimization that allows firstmatch
to be used together with join buffer under some conditions. In this
case the cost was assumed to be same as what best_access_path()
had calculated.
However if HASH+join_buffer was used, then
fix_semijoin_strategies_for_picked_join_order() would remove the
join_buffer (which would cause a full join to be used) and the cost
assumption by Firstmatch_picker::check_qep() would be wrong.
Later check_join_cache_usage() sees that it's a full scan and decides
it can use join buffering, (But not the hash join).
Fixed by also allowing HASH joins with firstmatch.
This removes the need to change disable and re-enable join buffer.
Test case changes:
- HASH join used with firstmatch (Using join buffer (flat, BNLH join))
- Filtered could change with firstmatch as the conversion with and without
join_buffered lost the filtering information.
- The not "re-enabling join buffer" is shown in main.optimizer_trace
Original code by Sergei, optimized by Monty.
Author: Sergei Petrunia <sergey@mariadb.com>, monty@mariadb.org
This error was discovered while working on
MDEV-30540 Wrong result with IN list length reaching
IN_PREDICATE_CONVERSION_THRESHOLD
If there is read error from handler::ha_rnd_next() during a recursive
query, st_select_lex_unit::exec_recursive() will crash as it will try to
get the error code from a structure that was deleted by the callee.
The code was using the construct:
sl->join->exec();
saved_error=sl->join->error;
This does not work as sl->join was freed by the exec() and sl->join would
be set to 0.
Fixed by having JOIN::exec() return the error code.
The included test case simulates the error in ha_rnd_next(), which causes
a crash without the patch.
scovered whle working on
MDEV-30540 Wrong result with IN list length reaching
IN_PREDICATE_CONVERSION_THRESHOLD
If there is read error from handler::ha_rnd_next() during a recursive
query, st_select_lex_unit::exec_recursive() will crash as it will try to
get the error code from a structure that was deleted by the callee.
The code was using the construct:
sl->join->exec();
saved_error=sl->join->error;
This does not work as sl->join was freed by the exec() and sl->join was
set to 0.
Fixed by having JOIN::exec() return the error code.
The included test case simulates the error in ha_rnd_next(), which causes
a crash without the patch.