create templates
thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)
and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.
the information about index algorithm was stored in two
places inconsistently split between both.
BTREE index could have key->algorithm == HA_KEY_ALG_BTREE, if the user
explicitly specified USING BTREE or HA_KEY_ALG_UNDEF, if not.
RTREE index had key->algorithm == HA_KEY_ALG_RTREE
and always had key->flags & HA_SPATIAL
FULLTEXT index had key->algorithm == HA_KEY_ALG_FULLTEXT
and always had key->flags & HA_FULLTEXT
HASH index had key->algorithm == HA_KEY_ALG_HASH or HA_KEY_ALG_UNDEF
long unique index always had key->algorithm == HA_KEY_ALG_LONG_HASH
In this commit:
All indexes except BTREE and HASH always have key->algorithm
set, HA_SPATIAL and HA_FULLTEXT flags are not used anymore (except
for storage to keep frms backward compatible).
As a side effect ALTER TABLE now detects FULLTEXT index renames correctly
Based on the current logic, objects of classes Item_func_charset and
Item_func_coercibility (responsible for CHARSET() and COERCIBILITY()
functions) are always considered constant.
However, SQL syntax allows their use in a non-constant manner, such as
CHARSET(t1.a), COERCIBILITY(t1.a).
In these cases, the `used_tables()` parameter corresponds to table names
in the function parameters, creating an inconsistency: the item is marked
as constant but accesses tables. This leads to crashes when
conditions with CHARSET()/COERCIBILITY() are pushed into derived tables.
This commit addresses the issue by setting `used_tables()` to 0 for
`Item_func_charset` and `Item_func_coercibility`. Additionally, the items
now store the return values during the preparation phase and return
them during the execution phase. This ensures that the items do not call
its arguments methods during the execution and are truly constant.
Reviewer: Alexander Barkov <bar@mariadb.com>
Step#1 - Changing collation derivation for string user variables
from IMPLICIT to COERCIBLE.
Retionale:
Without this preparatory change, switching the default collation for
Unicode character sets from xxx_general_ci to uca1400_ai_ci would cause
"Illegal mix of collations" errors in scenarios comparing a column with
a non-default collation to a string user variable
This is especially important for queries to INFORMATION_SCHEMA tables,
whose columns use utf8mb3_general_ci.
See the description of MDEV-25829 for more details and SQL script examples.
The negation in this line:
ulonglong abs_dec= dec_negative ? -dec : dec;
did not take into account that 'dec' can be the smallest possible
signed negative value -9223372036854775808. Its negation is
an operation with an undefined behavior.
Fixing the code to use Longlong_hybrid, which implements a safe
method to get an absolute value.
Fixing the problem that an operation involving a mix of
two or more GEOMETRY operands did not preserve their SRIDs.
Now SRIDs are preserved by hybrid functions, subqueries, TVCs, UNIONs, VIEWs.
Incompatible change:
An attempt to mix two different SRIDs now raises an error.
Details:
- Adding a new class Type_extra_attributes. It's a generic
container which can store very specific data type attributes.
For now it can store one uint32 and one const pointer attribute
(for GEOMETRY's SRID and for ENUM/SET TYPELIB respectively).
In the future it can grow as needed.
Type_extra_attributes will also be reused soon to store "const Type_zone*"
pointers for the TIMESTAMP's "WITH TIME ZONE 'tz'" attribute
(a timestamp data type with a fixed time zone independent from @@time_zone).
The time zone attribute will be stored in exactly the same way like
a TYPELIB pointer is stored by ENUM/SET.
- Removing Column_definition_attributes members "interval" and "srid".
Deriving Column_definition_attributes from the generic attribute container
Type_extra_attributes instead.
- Adding a new class Type_typelib_attributes, to store
the TYPELIB of the ENUM and SET data types. Deriving Field_enum from it.
Removing the member Field_enum::typelib.
- Adding a new class Type_geom_attributes, to store
the GEOMETRY related attributes. Deriving Field_geom from it.
Removing the member Field_geom::srid.
- Removing virtual methods:
Field::get_typelib()
Type_all_attributes::get_typelib() and
Type_all_attributes::set_typelib()
They were very specific to TYPELIB.
Adding more generic virtual methods instead:
* Field::type_extra_attributes() - to get extra attributes
* Type_all_attributes::type_extra_attributes() - to get extra attributes
* Type_all_attributes::type_extra_attributes_addr() - to set extra attributes
- Removing Item_type_holder::enum_set_typelib. Deriving Item_type_holder
from the generic attribute container Type_extra_attributes instead.
This makes it possible for UNION to preserve SRID
(in addition to preserving TYPELIB).
- Deriving Item_hybrid_func from Type_extra_attributes.
This makes it possible for hybrid functions (e.g. CASE, COALESCE,
LEAST, GREATEST etc) to preserve SRID.
- Deriving Item_singlerow_subselect from Type_extra_attributes and
overriding methods:
* Item_cache::type_extra_attributes()
* subselect_single_select_engine::fix_length_and_dec()
* Item_singlerow_subselect::type_extra_attributes()
* Item_singlerow_subselect::type_extra_attributes_addr()
This is needed to preserve SRID in subqueries and TVCs
- Cleanup: fixing the data type of members
* Binlog_type_info::m_enum_typelib
* Binlog_type_info::m_set_typelib
from "TYPELIB *" to "const TYPELIB *"
There is a convention that Item::val_int() and Item::val_real() return
SQL NULL doing effectively what this code does:
null_value= true;
return 0; // Always return 0 for SQL NULL
This is done to optimize boolean value evaluation:
if Item::val_int() or Item::val_real() returned 1 -
that always means TRUE and never can means SQL NULL.
This convention helps to avoid unnecessary testing
Item::null_value after getting a non-zero return value.
Item_func_min_max did not follow this convention.
It could return a non-zero value together with null_value==true.
This made evaluate_join_record() erroneously misinterpret
SQL NULL as TRUE in this call:
select_cond_result= MY_TEST(select_cond->val_int());
Fixing Item_func_min_max to follow the convention.
This patch also fixes:
MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
MDEV-33088 Cannot create triggers in the database `MYSQL`
MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0
- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER
- Adding a wrapper function CHARSET_INFO::streq(), to compare
two strings for equality. For now it calls strnncoll() internally.
In the future it will turn into a virtual function.
- Adding new accent sensitive case insensitive collations:
- utf8mb4_general1400_as_ci
- utf8mb3_general1400_as_ci
They implement accent sensitive case insensitive comparison.
The weight of a character is equal to the code point of its
upper case variant. These collations use Unicode-14.0.0 casefolding data.
The result of
my_charset_utf8mb3_general1400_as_ci.strcoll()
is very close to the former
my_charset_utf8mb3_general_ci.strcasecmp()
There is only a difference in a couple dozen rare characters, because:
- the switch from "tolower" to "toupper" comparison, to make
utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
- the switch from Unicode-3.0.0 to Unicode-14.0.0
This difference should be tolarable. See the list of affected
characters in the MDEV description.
Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
Unlike utf8mb4_general_ci, it does not treat all BMP characters
as equal.
- Adding classes representing names of the file based database objects:
Lex_ident_db
Lex_ident_table
Lex_ident_trigger
Their comparison collation depends on the underlying
file system case sensitivity and on --lower-case-table-names
and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.
- Adding classes representing names of other database objects,
whose names have case insensitive comparison style,
using my_charset_utf8mb3_general1400_as_ci:
Lex_ident_column
Lex_ident_sys_var
Lex_ident_user_var
Lex_ident_sp_var
Lex_ident_ps
Lex_ident_i_s_table
Lex_ident_window
Lex_ident_func
Lex_ident_partition
Lex_ident_with_element
Lex_ident_rpl_filter
Lex_ident_master_info
Lex_ident_host
Lex_ident_locale
Lex_ident_plugin
Lex_ident_engine
Lex_ident_server
Lex_ident_savepoint
Lex_ident_charset
engine_option_value::Name
- All the mentioned Lex_ident_xxx classes implement a method streq():
if (ident1.streq(ident2))
do_equal();
This method works as a wrapper for CHARSET_INFO::streq().
- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
in class members and in function/method parameters.
- Replacing all calls like
system_charset_info->coll->strcasecmp(ident1, ident2)
to
ident1.streq(ident2)
- Taking advantage of the c++11 user defined literal operator
for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
data types. Use example:
const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;
is now a shorter version of:
const Lex_ident_column primary_key_name=
Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
Json test about max statement time fails with freebsd because on some
architectures the test might execute faster and the statement may not fail.
To simulate failure regardless of architecture, introduce a wait of seconds
longer than the max_statement_time.
Some fixes related to commit f838b2d799 and
Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row()
for system-versioned tables were provided by Nikita Malyavin.
This was required by test versioning.rpl,trx_id,row.
Functions extracting non-negative datetime components:
- YEAR(dt), EXTRACT(YEAR FROM dt)
- QUARTER(td), EXTRACT(QUARTER FROM dt)
- MONTH(dt), EXTRACT(MONTH FROM dt)
- WEEK(dt), EXTRACT(WEEK FROM dt)
- HOUR(dt),
- MINUTE(dt),
- SECOND(dt),
- MICROSECOND(dt),
- DAYOFYEAR(dt)
- EXTRACT(YEAR_MONTH FROM dt)
did not set their max_length properly, so in the DECIMAL
context they created a too small DECIMAL column, which
led to the 'Out of range value' error.
The problem is that most of these functions historically
returned the signed INT data type.
There were two simple ways to fix these functions:
1. Add +1 to max_length.
But this would also change their size in the string context
and create too long VARCHAR columns, with +1 excessive size.
2. Preserve max_length, but change the data type from INT to INT UNSIGNED.
But this would break backward compatibility.
Also, using UNSIGNED is generally not desirable,
it's better to stay with signed when possible.
This fix implements another solution, which it makes all these functions
work well in all contexts: int, decimal, string.
Fix details:
- Adding a new special class Type_handler_long_ge0 - the data type
handler for expressions which:
* should look like normal signed INT
* but which known not to return negative values
Expressions handled by Type_handler_long_ge0 store in Item::max_length
only the number of digits, without adding +1 for the sign.
- Fixing Item_extract to use Type_handler_long_ge0
for non-negative datetime components:
YEAR, YEAR_MONTH, QUARTER, MONTH, WEEK
- Adding a new abstract class Item_long_ge0_func, for functions
returning non-negative datetime components.
Item_long_ge0_func uses Type_handler_long_ge0 as the type handler.
The class hierarchy now looks as follows:
Item_long_ge0_func
Item_long_func_date_field
Item_func_to_days
Item_func_dayofmonth
Item_func_dayofyear
Item_func_quarter
Item_func_year
Item_long_func_time_field
Item_func_hour
Item_func_minute
Item_func_second
Item_func_microsecond
- Cleanup: EXTRACT(QUARTER FROM dt) created an excessive VARCHAR column
in string context. Changing its length from 2 to 1.
- Add `as <int_type>` to sequence creation options
- int_type can be signed or unsigned integer types, including
tinyint, smallint, mediumint, int and bigint
- Limitation: when alter sequence as <new_int_type>, cannot have any
other alter options in the same statement
- Limitation: increment remains signed longlong, and the hidden
constraint (cache_size x abs(increment) < longlong_max) stays for
unsigned types. This means for bigint unsigned, neither
abs(increment) nor (cache_size x abs(increment)) can be between
longlong_max and ulonglong_max
- Truncating maxvalue and minvalue from user input to the nearest max
or min value of the type, plus or minus 1. When the truncation
happens, a warning is emitted
- Information schema table for sequences
These changes are submitted under the BSD 3-clause License.
The original ticket describes a server crash when using a UDF in the WHERE clause of a view. The crash also happens when using a UDF in the WHERE clause of a SELECT that uses a sub-query in the FROM clause.
When the UDF does not have a _deinit function the server crashes in udf_handler::cleanup (sql/item_func.cc:3467).
When the UDF has both an _init and a _deinit function but _init does not allocate memory for initid->ptr the server crashes in udf_handler::cleanup (sql/item_func.cc:3467).
When the UDF has both an _init and a _deinit function and allocates/deallocates memory for initid->ptr the server crashes in the memory deallocation of the _deinit function.
The sequence of events seen are:
1. A UDF, U, is created for the query.
2. The UDF _init function is called using U->initid.
3. U is cloned for the sub-query using the [default|implicit] copy constructor, resulting in V.
4. The UDF _init function is called using V->initid. U->initid and V->initid are the same value.
5. The UDF function is called.
6. The UDF _deinit function is called using U->initid. If any memory was allocated for initid->ptr it is deallocated here.
7. udf_handler::cleanup deletes the U->buffers String array.
8. The UDF _deinit function is called using V->initid. If any memory was allocated for initid->ptr it was previously deallocated and _deinit crashes the server.
9. udf_handler::cleanup deletes the V->buffers String array. V->buffers was the same values as U->buffers which was already deallocated. The server crashes.
The solution is to create a[n explicit] copy constructor for udf_handler which sets not_original to true. Later, not_original is set back to false (0) after udf_handler::fix_fields has set up a new value for initid->ptr.
This is the follow-up patch that removes explicit use of thd->stmt_arena
for memory allocation and replaces it with call of the method
THD::active_stmt_arena_to_use()
Additionally, this patch adds extra DBUG_ASSERT to check that right
query arena is in use.