There's currently no way of knowing the determinicity of an UDF.
And the optimizer and the sequence() UDFs were making wrong
assumptions about what the is_const member means.
Plus there was no implementation of update_system_tables()
causing the optimizer to overwrite the information returned by
the <udf>_init function.
Fixed by equating the assumptions about the semantics of
is_const and providing a implementation of update_used_tables().
Added a TODO item for the UDF API change needed to make a better
implementation.
Previously, UDF *_init functions were passed constant strings with erroneous lengths. The length came from the containing variable's size, not the length of the value itself.
Now the *_init functions get the constant as a null terminated string with the correct length supplied too.
The root cause of the issue was that the CREATE FUNCTION grammar,
for User Defined Functions, was using the sp_name rule.
The sp_name rule is intended for fully qualified stored procedure names,
like either ident.ident, or just ident but with a default database
implicitly selected.
A UDF does not have a fully qualified name, only a name (ident), and should
not use the sp_name grammar fragment during parsing.
The fix is to re-organize the CREATE FUNCTION grammar, to better separate:
- creating UDF (no definer, can have AGGREGATE, simple ident)
- creating Stored Functions (definer, no AGGREGATE, fully qualified name)
With the test case provided, another issue was exposed which is also fixed:
the DROP FUNCTION statement was using sp_name and also failing when no database
is implicitly selected, when droping UDF functions.
The fix is also to change the grammar so that DROP FUNCTION works with
both the ident.ident syntax (to drop a stored function), or just the ident
syntax (to drop either a UDF or a Stored Function, in the current database)
Bug#29816 Syntactically wrong query fails with misleading error message
The core problem is that an SQL-invoked function name can be a <schema
qualified routine name> that contains no <schema name>, but the mysql
parser insists that all stored procedures (function, procedures and
triggers) must have a <schema name>, which is not true for functions.
This problem is especially visible when trying to create a function
or when a query contains a syntax error after a function call (in the
same query), both will fail with a "No database selected" message if
the session is not attached to a particular schema, but the first
one should succeed and the second fail with a "syntax error" message.
Part of the fix is to revamp the sp name handling so that a schema
name may be omitted for functions -- this means that the internal
function name representation may not have a dot, which represents
that the function doesn't have a schema name. The other part is
to place schema checks after the type (function, trigger or procedure)
of the routine is known.
- mysqldump executes a SHOW CREATE VIEW statement to generate the text
that it outputs. When the function name is retrieved it's database
name is unconditionally prepended. This change causes the function's
database name to be prepended only when it was used to define the
function.
crashes server
Check for null value is reliable only after calling some of the
val_xxx() methods. If the val_xxx() method is not called
the null_value flag will be set only for certain types of NULL
values (like SQL constant NULLs for example).
This caused a crash while trying to dereference a NULL pointer
that is returned by val_str() for NULL values.
Fixed by swapping the order of val_xxx() and null_value check.
the UDF
When deleting a user defined function MySQL must remove it from both the
in-memory hash table and the mysql.proc system table.
Finding (and removal therefore) from the internal hash table is case
insensitive (or whatever the default charset is), whereas finding and
removal from the system table is case sensitive.
As a result if you supply a function name that is not in the same character
case to DROP FUNCTION the server will remove the function only from the
in-memory hash table and will keep the row in mysql.proc system table.
This will cause inconsistency between the two structures (that is fixed
only by restarting the server).
Fixed by using the name in the precise case (from the in-memory hash table)
to delete the row in the mysql.proc system table.
The code that set up data to be passed to user-defined functions was very
old and analyzed the "Type" of the data that was passed into the UDF, when
it really should analyze the "return_type", which is hard-coded for simple
Items and works correctly for complex ones like functions.
---
Added test at Sergei's behest.
select OK.
The SQL parser was using Item::name to transfer user defined function attributes
to the user defined function (udf). It was not distinguishing between user defined
function call arguments and stored procedure call arguments. Setting Item::name
was causing Item_ref::print() method to print the argument as quoted identifiers
and caused views that reference aggregate functions as udf call arguments (and
rely on Item::print() for the text of the view to store) to throw an undefined
identifier error.
Overloaded Item_ref::print to print aggregate functions as such when printing
the references to aggregate functions taken out of context by split_sum_func2()
Fixed the parser to properly detect using AS clause in stored procedure arguments
as an error.
Fixed printing the arguments of udf call to print properly the udf attribute.
The problem was that the grammar allows to create a function with an optional
definer clause, and define it as a UDF with the SONAME keyword.
Such combination should be reported as an error.
The solution is to not change the grammar itself, and to introduce a
specific check in the yacc actions in 'create_function_tail' for UDF,
that now reports ER_WRONG_USAGE when using both DEFINER and SONAME.
When there is no index defined filesort is used to sort the result of a
query. If there is a function in the select list and the result set should be
ordered by it's value then this function will be evaluated twice. First time to
get the value of the sort key and second time to send its value to a user.
This happens because filesort when sorts a table remembers only values of its
fields but not values of functions.
All functions are affected. But taking into account that SP and UDF functions
can be both expensive and non-deterministic a temporary table should be used
to store their results and then sort it to avoid twice SP evaluation and to
get a correct result.
If an expression referenced in an ORDER clause contains a SP or UDF
function, force the use of a temporary table.
A new Item_processor function called func_type_checker_processor is added
to check whether the expression contains a function of a particular type.
The is_null value was initialized once and thereafter only set to indicate
NULL, and never unset to indicate not-NULL.
Now set is_null to false, in addition to only setting it to true when the value
in question is null.
- Pass "buffers[i]" to val_str() in udf_handler::fix_fields insteead of NULL.
- Add testcase for UDF that will load and run the udf_example functions
if available