This patch also fixes:
MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
MDEV-33088 Cannot create triggers in the database `MYSQL`
MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0
- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER
- Adding a wrapper function CHARSET_INFO::streq(), to compare
two strings for equality. For now it calls strnncoll() internally.
In the future it will turn into a virtual function.
- Adding new accent sensitive case insensitive collations:
- utf8mb4_general1400_as_ci
- utf8mb3_general1400_as_ci
They implement accent sensitive case insensitive comparison.
The weight of a character is equal to the code point of its
upper case variant. These collations use Unicode-14.0.0 casefolding data.
The result of
my_charset_utf8mb3_general1400_as_ci.strcoll()
is very close to the former
my_charset_utf8mb3_general_ci.strcasecmp()
There is only a difference in a couple dozen rare characters, because:
- the switch from "tolower" to "toupper" comparison, to make
utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
- the switch from Unicode-3.0.0 to Unicode-14.0.0
This difference should be tolarable. See the list of affected
characters in the MDEV description.
Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
Unlike utf8mb4_general_ci, it does not treat all BMP characters
as equal.
- Adding classes representing names of the file based database objects:
Lex_ident_db
Lex_ident_table
Lex_ident_trigger
Their comparison collation depends on the underlying
file system case sensitivity and on --lower-case-table-names
and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.
- Adding classes representing names of other database objects,
whose names have case insensitive comparison style,
using my_charset_utf8mb3_general1400_as_ci:
Lex_ident_column
Lex_ident_sys_var
Lex_ident_user_var
Lex_ident_sp_var
Lex_ident_ps
Lex_ident_i_s_table
Lex_ident_window
Lex_ident_func
Lex_ident_partition
Lex_ident_with_element
Lex_ident_rpl_filter
Lex_ident_master_info
Lex_ident_host
Lex_ident_locale
Lex_ident_plugin
Lex_ident_engine
Lex_ident_server
Lex_ident_savepoint
Lex_ident_charset
engine_option_value::Name
- All the mentioned Lex_ident_xxx classes implement a method streq():
if (ident1.streq(ident2))
do_equal();
This method works as a wrapper for CHARSET_INFO::streq().
- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
in class members and in function/method parameters.
- Replacing all calls like
system_charset_info->coll->strcasecmp(ident1, ident2)
to
ident1.streq(ident2)
- Taking advantage of the c++11 user defined literal operator
for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
data types. Use example:
const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;
is now a shorter version of:
const Lex_ident_column primary_key_name=
Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
If a query contained a CTE whose name coincided with the name of one of
the base tables used in the specification of the CTE and the query had at
least two references to this CTE in the specifications of other CTEs then
processing of the query led to unlimited recursion that ultimately caused
a crash of the server.
Any secondary non-recursive reference to a CTE requires creation of a copy
of the CTE specification. All the references to CTEs in this copy must be
resolved. If the specification contains a reference to a base table whose
name coincides with the name of then CTE then it should be ensured that
this reference in no way can be resolved against the name of the CTE.
This patch fixes the patch for bug MDEV-30248 that unsatisfactorily
resolved the problem of resolution of references to CTE. In some cases
when such a reference has the same table name as the name of one of
CTEs containing this reference the reference could be resolved incorrectly
that led to an invalid select tree where units could be mutually dependent.
This in its turn could lead to an infinite sequence of recursive calls or
to falls into infinite loops.
The patch also removes LEX::resolve_references_to_cte_in_hanging_cte() as
with the new code for resolution of CTE references the call of this
function is not needed anymore.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
This patch resolves the problem of improper name resolution of table
references to embedded CTEs for some queries. This improper binding could
lead to
- infinite sequence of calls of recursive functions
- crashes due to resolution of null pointers
- wrong result sets returned by queries
- bogus error messages
If the definition of a CTE contains with clauses then such CTE is called
embedding CTE while CTEs from the with clauses are called embedded CTEs.
If a table reference used in the definition of an embedded CTE cannot be
resolved within the unit that contains this reference it still may be
resolved against a CTE definition from the with clause with one of the
embedding CTEs.
A table reference can be resolved against a CTE definition if it used in
the the scope of this definition and it refers to the name of the CTE.
Table reference t is in the scope of the CTE definition of CTE cte if
- the definition of cte is an element of a with clause declared as
RECURSIVE and the reference t belongs either to the unit to which
this with clause is attached or to one of the elements of this clause
- the definition of cte is an element of a with clause without RECURSIVE
specifier and the reference t belongs either to the unit to which this
with clause is attached or to one of the elements from this clause that
are placed before the definition of cte.
If a table reference can be resolved against several CTE definitions then
it is bound to the most embedded.
The code before this patch not always resolved table references used in
embedded CTE according to the above rules.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
SQL processor failed to catch references to unknown columns and other
errors of the phase of semantic analysis in the specification of a
hanging recursive CTE. This happened because the function
With_clause::prepare_unreferenced_elements() failed to detect a CTE as
a hanging CTE if the CTE was recursive.
Fixing this problem in the code of the mentioned function opened another
problem: EXPLAIN started including the lines for the specifications of
hanging recursive CTEs in its output. This problem also was fixed in this
patch.
Approved by Dmitry Shulga <dmitry.shulga@mariadb.com>
In the code existed just before this patch binding of a table reference to
the specification of the corresponding CTE happens in the function
open_and_process_table(). If the table reference is not the first in the
query the specification is cloned in the same way as the specification of
a view is cloned for any reference of the view. This works fine for
standalone queries, but does not work for stored procedures / functions
for the following reason.
When the first call of a stored procedure/ function SP is processed the
body of SP is parsed. When a query of SP is parsed the info on each
encountered table reference is put into a TABLE_LIST object linked into
a global chain associated with the query. When parsing of the query is
finished the basic info on the table references from this chain except
table references to derived tables and information schema tables is put
in one hash table associated with SP. When parsing of the body of SP is
finished this hash table is used to construct TABLE_LIST objects for all
table references mentioned in SP and link them into the list of such
objects passed to a pre-locking process that calls open_and_process_table()
for each table from the list.
When a TABLE_LIST for a view is encountered the view is opened and its
specification is parsed. For any table reference occurred in
the specification a new TABLE_LIST object is created to be included into
the list for pre-locking. After all objects in the pre-locking have been
looked through the tables mentioned in the list are locked. Note that the
objects referenced CTEs are just skipped here as it is impossible to
resolve these references without any info on the context where they occur.
Now the statements from the body of SP are executed one by one that.
At the very beginning of the execution of a query the tables used in the
query are opened and open_and_process_table() now is called for each table
reference mentioned in the list of TABLE_LIST objects associated with the
query that was built when the query was parsed.
For each table reference first the reference is checked against CTEs
definitions in whose scope it occurred. If such definition is found the
reference is considered resolved and if this is not the first reference
to the found CTE the the specification of the CTE is re-parsed and the
result of the parsing is added to the parsing tree of the query as a
sub-tree. If this sub-tree contains table references to other tables they
are added to the list of TABLE_LIST objects associated with the query in
order the referenced tables to be opened. When the procedure that opens
the tables comes to the TABLE_LIST object created for a non-first
reference to a CTE it discovers that the referenced table instance is not
locked and reports an error.
Thus processing non-first table references to a CTE similar to how
references to view are processed does not work for queries used in stored
procedures / functions. And the main problem is that the current
pre-locking mechanism employed for stored procedures / functions does not
allow to save the context in which a CTE reference occur. It's not trivial
to save the info about the context where a CTE reference occurs while the
resolution of the table reference cannot be done without this context and
consequentially the specification for the table reference cannot be
determined.
This patch solves the above problem by moving resolution of all CTE
references at the parsing stage. More exactly references to CTEs occurred in
a query are resolved right after parsing of the query has finished. After
resolution any CTE reference it is marked as a reference to to derived
table. So it is excluded from the hash table created for pre-locking used
base tables and view when the first call of a stored procedure / function
is processed.
This solution required recursive calls of the parser. The function
THD::sql_parser() has been added specifically for recursive invocations of
the parser.
# Conflicts:
# sql/sql_cte.cc
# sql/sql_cte.h
# sql/sql_lex.cc
# sql/sql_lex.h
# sql/sql_view.cc
# sql/sql_yacc.yy
# sql/sql_yacc_ora.yy
In the code existed just before this patch binding of a table reference to
the specification of the corresponding CTE happens in the function
open_and_process_table(). If the table reference is not the first in the
query the specification is cloned in the same way as the specification of
a view is cloned for any reference of the view. This works fine for
standalone queries, but does not work for stored procedures / functions
for the following reason.
When the first call of a stored procedure/ function SP is processed the
body of SP is parsed. When a query of SP is parsed the info on each
encountered table reference is put into a TABLE_LIST object linked into
a global chain associated with the query. When parsing of the query is
finished the basic info on the table references from this chain except
table references to derived tables and information schema tables is put
in one hash table associated with SP. When parsing of the body of SP is
finished this hash table is used to construct TABLE_LIST objects for all
table references mentioned in SP and link them into the list of such
objects passed to a pre-locking process that calls open_and_process_table()
for each table from the list.
When a TABLE_LIST for a view is encountered the view is opened and its
specification is parsed. For any table reference occurred in
the specification a new TABLE_LIST object is created to be included into
the list for pre-locking. After all objects in the pre-locking have been
looked through the tables mentioned in the list are locked. Note that the
objects referenced CTEs are just skipped here as it is impossible to
resolve these references without any info on the context where they occur.
Now the statements from the body of SP are executed one by one that.
At the very beginning of the execution of a query the tables used in the
query are opened and open_and_process_table() now is called for each table
reference mentioned in the list of TABLE_LIST objects associated with the
query that was built when the query was parsed.
For each table reference first the reference is checked against CTEs
definitions in whose scope it occurred. If such definition is found the
reference is considered resolved and if this is not the first reference
to the found CTE the the specification of the CTE is re-parsed and the
result of the parsing is added to the parsing tree of the query as a
sub-tree. If this sub-tree contains table references to other tables they
are added to the list of TABLE_LIST objects associated with the query in
order the referenced tables to be opened. When the procedure that opens
the tables comes to the TABLE_LIST object created for a non-first
reference to a CTE it discovers that the referenced table instance is not
locked and reports an error.
Thus processing non-first table references to a CTE similar to how
references to view are processed does not work for queries used in stored
procedures / functions. And the main problem is that the current
pre-locking mechanism employed for stored procedures / functions does not
allow to save the context in which a CTE reference occur. It's not trivial
to save the info about the context where a CTE reference occurs while the
resolution of the table reference cannot be done without this context and
consequentially the specification for the table reference cannot be
determined.
This patch solves the above problem by moving resolution of all CTE
references at the parsing stage. More exactly references to CTEs occurred in
a query are resolved right after parsing of the query has finished. After
resolution any CTE reference it is marked as a reference to to derived
table. So it is excluded from the hash table created for pre-locking used
base tables and view when the first call of a stored procedure / function
is processed.
This solution required recursive calls of the parser. The function
THD::sql_parser() has been added specifically for recursive invocations of
the parser.
In the code existed just before this patch binding of a table reference to
the specification of the corresponding CTE happens in the function
open_and_process_table(). If the table reference is not the first in the
query the specification is cloned in the same way as the specification of
a view is cloned for any reference of the view. This works fine for
standalone queries, but does not work for stored procedures / functions
for the following reason.
When the first call of a stored procedure/ function SP is processed the
body of SP is parsed. When a query of SP is parsed the info on each
encountered table reference is put into a TABLE_LIST object linked into
a global chain associated with the query. When parsing of the query is
finished the basic info on the table references from this chain except
table references to derived tables and information schema tables is put
in one hash table associated with SP. When parsing of the body of SP is
finished this hash table is used to construct TABLE_LIST objects for all
table references mentioned in SP and link them into the list of such
objects passed to a pre-locking process that calls open_and_process_table()
for each table from the list.
When a TABLE_LIST for a view is encountered the view is opened and its
specification is parsed. For any table reference occurred in
the specification a new TABLE_LIST object is created to be included into
the list for pre-locking. After all objects in the pre-locking have been
looked through the tables mentioned in the list are locked. Note that the
objects referenced CTEs are just skipped here as it is impossible to
resolve these references without any info on the context where they occur.
Now the statements from the body of SP are executed one by one that.
At the very beginning of the execution of a query the tables used in the
query are opened and open_and_process_table() now is called for each table
reference mentioned in the list of TABLE_LIST objects associated with the
query that was built when the query was parsed.
For each table reference first the reference is checked against CTEs
definitions in whose scope it occurred. If such definition is found the
reference is considered resolved and if this is not the first reference
to the found CTE the the specification of the CTE is re-parsed and the
result of the parsing is added to the parsing tree of the query as a
sub-tree. If this sub-tree contains table references to other tables they
are added to the list of TABLE_LIST objects associated with the query in
order the referenced tables to be opened. When the procedure that opens
the tables comes to the TABLE_LIST object created for a non-first
reference to a CTE it discovers that the referenced table instance is not
locked and reports an error.
Thus processing non-first table references to a CTE similar to how
references to view are processed does not work for queries used in stored
procedures / functions. And the main problem is that the current
pre-locking mechanism employed for stored procedures / functions does not
allow to save the context in which a CTE reference occur. It's not trivial
to save the info about the context where a CTE reference occurs while the
resolution of the table reference cannot be done without this context and
consequentially the specification for the table reference cannot be
determined.
This patch solves the above problem by moving resolution of all CTE
references at the parsing stage. More exactly references to CTEs occurred in
a query are resolved right after parsing of the query has finished. After
resolution any CTE reference it is marked as a reference to to derived
table. So it is excluded from the hash table created for pre-locking used
base tables and view when the first call of a stored procedure / function
is processed.
This solution required recursive calls of the parser. The function
THD::sql_parser() has been added specifically for recursive invocations of
the parser.
Rewriting GRANT/REVOKE grammar to use more bison stack and use Sql_cmd_ style
1. Removing a few members from LEX:
- uint grant, grant_to_col, which_columns
- List<LEX_COLUMN> columns
- bool all_privileges
2. Adding classes Grand_object_name, Lex_grant_object_name
3. Adding classes Grand_privilege, Lex_grand_privilege
4. Adding struct Lex_column_list_privilege_st, class Lex_column_list_privilege
5. Rewriting the GRANT/REVOKE grammar to use new classes and pass them through
bison stack (rather than directly access LEX members)
6. Adding classes Sql_cmd_grant* and Sql_cmd_revoke*,
changing GRANT/REVOKE to use LEX::m_sql_cmd.
7. Adding the "sp_handler" grammar rule and removing some duplicate grammar
for GRANT/REVOKE for different kinds of SP objects.
8. Adding a new rule comma_separated_ident_list, reusing it in:
- with_column_list
- colum_list_privilege
This patch fills a serious flaw in the implementation of common table
expressions. Before this patch an attempt to prepare a statement from
a query with a parameter marker in a CTE that was used more than once
in the query ended up with a bogus error message. Similarly if a statement
in a stored procedure contained a CTE whose specification used a
local variables and this CTE was referred to more than once in the
statement then the server failed to execute the stored procedure returning
a bogus error message on a non-existing field.
The problems appeared due to incorrect handling of parameter markers /
local variables in CTEs that were referred more than once.
This patch fixes the problems by differentiating between the original
occurrences of a parameter marker / local variable used in the
specification of a CTE and the corresponding occurrences used
in copies of this specification. These copies are substituted
instead of non-first references to the CTE.
The idea of the fix and even some code were taken from the MySQL
implementation of the common table expressions.
This problem manifested itself when a join query used two or more
materialized CTE such that each of them employed the same recursive CTE.
The bug caused a crash. The crash happened because the cleanup()
function was performed premature for recursive CTE. This clean up was
induced by the cleanup of the first CTE referenced the recusrsive CTE.
This cleanup destroyed the structures that would allow to read from the
temporary table containing the rows of the recursive CTE and an attempt to read
these rows for the second CTE referencing the recursive CTE triggered a
crash.
The clean up for a recursive CTE R should be performed after the cleanup
of the last materialized CTE that uses R.
is not supported
Allowed to use recursive references in derived tables.
As a result usage of recursive references in operands of
INTERSECT / EXCEPT is now supported.
The bug happened when the specification of a recursive CTE had
no recursive references at the top level of the specification.
In this case the regular processing of derived table references
of the select containing a non-recursive reference to this
recursive CTE misses handling the specification unit.
At the preparation stage any non-recursive reference to a
recursive CTE must be handled after the preparation of the
specification unit for this CTE. So we have to force this
preparation when regular handling of derived tables does not
do it.
Benefits of this patch:
- Removed a lot of calls to strlen(), especially for field_string
- Strings generated by parser are now const strings, less chance of
accidently changing a string
- Removed a lot of calls with LEX_STRING as parameter (changed to pointer)
- More uniform code
- Item::name_length was not kept up to date. Now fixed
- Several bugs found and fixed (Access to null pointers,
access of freed memory, wrong arguments to printf like functions)
- Removed a lot of casts from (const char*) to (char*)
Changes:
- This caused some ABI changes
- lex_string_set now uses LEX_CSTRING
- Some fucntions are now taking const char* instead of char*
- Create_field::change and after changed to LEX_CSTRING
- handler::connect_string, comment and engine_name() changed to LEX_CSTRING
- Checked printf() related calls to find bugs. Found and fixed several
errors in old code.
- A lot of changes from LEX_STRING to LEX_CSTRING, especially related to
parsing and events.
- Some changes from LEX_STRING and LEX_STRING & to LEX_CSTRING*
- Some changes for char* to const char*
- Added printf argument checking for my_snprintf()
- Introduced null_clex_str, star_clex_string, temp_lex_str to simplify
code
- Added item_empty_name and item_used_name to be able to distingush between
items that was given an empty name and items that was not given a name
This is used in sql_yacc.yy to know when to give an item a name.
- select table_name."*' is not anymore same as table_name.*
- removed not used function Item::rename()
- Added comparision of item->name_length before some calls to
my_strcasecmp() to speed up comparison
- Moved Item_sp_variable::make_field() from item.h to item.cc
- Some minimal code changes to avoid copying to const char *
- Fixed wrong error message in wsrep_mysql_parse()
- Fixed wrong code in find_field_in_natural_join() where real_item() was
set when it shouldn't
- ER_ERROR_ON_RENAME was used with extra arguments.
- Removed some (wrong) ER_OUTOFMEMORY, as alloc_root will already
give the error.
TODO:
- Check possible unsafe casts in plugin/auth_examples/qa_auth_interface.c
- Change code to not modify LEX_CSTRING for database name
(as part of lower_case_table_names)
This patch fixed some problems that occurred with subqueries that
contained directly or indirectly recursive references to recursive CTEs.
1. A [NOT] IN predicate with a constant left operand and a non-correlated
subquery as the right operand used in the specification of a recursive CTE
was considered as a constant predicate and was evaluated only once.
Now such a predicate is re-evaluated after every iteration of the process
that produces the records of the recursive CTE.
2. The Exists-To-IN transformation could be applied to [NOT] IN predicates
with recursive references. This opened a possibility of materialization
for the subqueries used as right operands. Yet, materialization
is prohibited for the subqueries if they contain a recursive reference.
Now the Exists-To-IN transformation cannot be applied for subquery
predicates with recursive references.
The function st_select_lex::check_subqueries_with_recursive_references()
is called now only for the first execution of the SELECT.
When a prepared statement uses a CTE definition with a column list
renaming of columns of the CTE expression must be performed
for every execution of the prepared statement.
Added comments.
Added reaction for exceeding maximum number of elements in with clause.
Added a test case to check this reaction.
Added a test case where the specification of a recursive table
uses two non-recursive with tables.
Moved checking whether the limit set for the number of iterations
when executing a recursive query has been reached from
st_select_lex_unit::exec_recursive to TABLE_LIST::fill_recursive.
Changed the name of the system variable max_recursion_level for
max_recursive_iterations.
Adjusted test cases.
Temporary tables created for recursive CTE
were instantiated at the prepare phase. As
a result these temporary tables missed
indexes for look-ups and optimizer could not
use them.
Actually mutually recursive CTE were not functional. Now the code
for mutually recursive CTE looks like functional, but still needs
re-writing.
Added many new test cases for mutually recursive CTE.
Added test cases to check the fix.
Fixed the problem of wrong types of recursive tables when the type of anchor part does not coincide with the
type of recursive part.
Prevented usage of marerialization and subquery cache for subqueries with recursive references.
Introduced system variables 'max_recursion_level'.
Added a test case to test usage of this variable.
Added the check whether there are set functions in the specifications of recursive CTE.
Added the check whether there are recursive references in subqueries.
Introduced boolean system variable 'standards_compliant_cte'. By default it's set to 'on'.
When it's set to 'off' non-standard compliant CTE can be executed.