limitation)
Note to the reviewer
====================
Warning: reviewing this patch is somewhat involved.
Due to the nature of several issues all affecting the same area,
fixing separately each issue is not practical, since each fix can not be
implemented and tested independently.
In particular, the issues with
- rule recursion
- nested case statements
- forward jump resolution (backpatch list)
are tightly coupled (see below).
Definitions
===========
The expression
CASE expr
WHEN expr THEN expr
WHEN expr THEN expr
...
END
is a "Simple Case Expression".
The expression
CASE
WHEN expr THEN expr
WHEN expr THEN expr
...
END
is a "Searched Case Expression".
The statement
CASE expr
WHEN expr THEN stmts
WHEN expr THEN stmts
...
END CASE
is a "Simple Case Statement".
The statement
CASE
WHEN expr THEN stmts
WHEN expr THEN stmts
...
END CASE
is a "Searched Case Statement".
A "Left Recursive" rule is like
list:
element
| list element
;
A "Right Recursive" rule is like
list:
element
| element list
;
Left and right recursion produces the same language, the difference only
affects the *order* in which the text is parsed.
In a descendant parser (usually written manually), right recursion works
very well, and is typically implemented with a while loop.
In an ascendant parser (yacc/bison) left recursion works very well,
and is implemented naturally by the parser stack.
In both cases, using the wrong type or recursion is very bad and should be
avoided, as it causes technical issues with the parser implementation.
Before this change
==================
The "Simple Case Expression" and "Searched Case Expression" were both
implemented by the "when_list" and "when_list2" rules, which are left
recursive (ok).
These rules, however, used lex->when_list instead of using the parser stack,
which is more complex that necessary, and potentially dangerous because
of other rules using THD::reset_lex.
The "Simple Case Statement" and "Searched Case Statements" were implemented
by the "sp_case", "sp_whens" and in part by "sp_proc_stmt" rules.
Both cases were right recursive (bad).
The grammar involved was convoluted, and is assumed to be the results of
tweaks to get the code generation to work, but is not what someone would
naturally write.
In addition, using a common rule for both "Simple" and "Searched" case
statements was implemented with sp_head::m_flags |= IN_SIMPLE_CASE,
which is a flag and not a stack, and therefore does not take into account
*nested* case statements. This leads to incorrect generated code, and either
a server crash or an incorrect result.
With regards to the backpatch mechanism, a *different* backpatch list was
created for each jump from "WHEN expr THEN stmt" to "END CASE", which
relied on the grammar to be right recursive.
This is a mis-use of the backpatch list, since this list can resolve
multiple references to the same target at once.
The optimizer algorithm used to detect dead code in the "assembly" SQL
instructions, implemented by sp_head::opt_mark(uint ip), was recursive
in some cases (a conditional jump pointing forward to another conditional
jump).
In case of specially crafted code, like
- a long list of "IF expr THEN stmt END IF"
- a long CASE statement
this would actually cause a server crash with a stack overflow.
In general, having a stack that grows proportionally with user data (the
SQL code given by the client in a CREATE PROCEDURE) is to be avoided.
In debug builds only, creating a SP / SF / Trigger which had a significant
amount of code would spend --literally-- several minutes in sp_head::create,
because of the debug code involved with DBUG_PRINT("info", ("Code %s ...
There are several issues with this code:
- in a CASE with 5 000 WHEN, there are 15 000 instructions generated,
which create a sting representation of the code which is 500 000 bytes
long,
- using a String instead of an io stream causes performances to degrade
to a total server freeze, as time is spent doing realloc of a buffer
always too short,
- Printing a 500 000 long string in the debug log is too verbose,
- Generating this string even when DBUG_PRINT is off is useless,
- Having code that potentially can affect the server behavior, used with
#ifdef / #endif is useful in some cases, but is also a bad practice.
After this change
=================
"Case Expressions" (both simple and searched) have been simplified to
not use LEX::when_list, which has been removed.
Considering all the issues affecting case statements, the grammar for these
has been totally re written.
The existing actions, used to generate "assembly" sp_inst* code, have been
preserved but moved in the new grammar, with the following changes:
a) Bison rules are no longer shared between "Simple" and "Searched" case
statements, because a stack instead of a flag is required to handle them.
Nested statements are handled naturally by the parser stack, which by
definition uses the correct rule in the correct context.
Nested statements of the opposite type (simple vs searched) works correctly.
The flag sp_head::IN_SIMPLE_CASE is no longer used.
This is a step towards resolution of WL#2999, which correctly identified
that temporary parsing flags do not belong to sp_head.
The code in the action is shared by mean of the case_stmt_action_xxx()
helpers.
b) The backpatch mechanism, used to resolve forward jumps in the generated
code, has been changed to:
- create a label for the instruction following 'END CASE',
- register each jump at the end of a "WHEN expr THEN stmt" in a *unique*
backpatch list associated with the 'END CASE' label
- resolve all the forward jumps for this label at once.
In addition, the code involving backpatch has been commented, so that a
reader can now understand by reading matching "Registering" and "Resolving"
comments how the forward jumps are resolved and what target they resolve to,
as this is far from evident when reading the code alone.
The implementation of sp_head::opt_mark() has been revised to avoid
recursive calls from jump instructions, and instead add the jump location
to the list of paths to explore during the flow analysis of the instruction
graph, with a call to sp_head::add_mark_lead().
In addition, the flow analysis will stop if an instruction has already
been marked as reachable, which the previous code failed to do in the
recursive case.
sp_head::opt_mark() is now private, to prevent new calls to this method from
being introduced.
The debug code present in sp_head::create() has been removed.
Considering that SHOW PROCEDURE CODE is also available in debug builds,
and can be used anytime regardless of the trace level, as opposed to
"CREATE PROCEDURE" time and only if the trace was on,
removing the code actually makes debugging easier (usable trace).
Tests have been written to cover the parser overflow (big CASE),
and to cover nested CASE statements.
erroneous check
Problem: Actually there were two problems in the server code. The check
for SQLCOM_FLUSH in SF/Triggers were not according to the existing
architecture which uses sp_get_flags_for_command() from sp_head.cc .
This function was also missing a check for SQLCOM_FLUSH which has a
problem combined with prelocking. This changeset fixes both of these
deficiencies as well as the erroneous check in
sp_head::is_not_allowed_in_function() which was a copy&paste error.
Fix for BUG#16676: Database CHARSET not used for stored procedures
The problem in BUG#16211 is that CHARSET-clause of the return type for
stored functions is just ignored.
The problem in BUG#16676 is that if character set is not explicitly
specified for sp-variable, the server character set is used instead
of the database one.
The fix has two parts:
- always store CHARSET-clause of the return type along with the
type definition in mysql.proc.returns column. "Always" means that
CHARSET-clause is appended even if it has not been explicitly
specified in CREATE FUNCTION statement (this affects BUG#16211 only).
Storing CHARSET-clause if it is not specified is essential to avoid
changing character set if the database character set is altered in
the future.
NOTE: this change is not backward compatible with the previous releases.
- use database default character set if CHARSET-clause is not explicitly
specified (this affects both BUG#16211 and BUG#16676).
NOTE: this also breaks backward compatibility.
context.
Routine arguments were evaluated in the security context of the routine
itself, not in the caller's context.
The bug is fixed the following way:
- Item_func_sp::find_and_check_access() has been split into two
functions: Item_func_sp::find_and_check_access() itself only
finds the function and check that the caller have EXECUTE privilege
on it. New function set_routine_security_ctx() changes security
context for SUID routines and checks that definer have EXECUTE
privilege too.
- new function sp_head::execute_trigger() is called from
Table_triggers_list::process_triggers() instead of
sp_head::execute_function(), and is effectively just as the
sp_head::execute_function() is, with all non-trigger related code
removed, and added trigger-specific security context switch.
- call to Item_func_sp::find_and_check_access() stays outside
of sp_head::execute_function(), and there is a code in
sql_parse.cc before the call to sp_head::execute_procedure() that
checks that the caller have EXECUTE privilege, but both
sp_head::execute_function() and sp_head::execute_procedure() call
set_routine_security_ctx() after evaluating their parameters,
and restore the context after the body is executed.
Bug#19022 "Memory bug when switching db during trigger execution"
Bug#17199 "Problem when view calls function from another database."
Bug#18444 "Fully qualified stored function names don't work correctly in
SELECT statements"
Documentation note: this patch introduces a change in behaviour of prepared
statements.
This patch adds a few new invariants with regard to how THD::db should
be used. These invariants should be preserved in future:
- one should never refer to THD::db by pointer and always make a deep copy
(strmake, strdup)
- one should never compare two databases by pointer, but use strncmp or
my_strncasecmp
- TABLE_LIST object table->db should be always initialized in the parser or
by creator of the object.
For prepared statements it means that if the current database is changed
after a statement is prepared, the database that was current at prepare
remains active. This also means that you can not prepare a statement that
implicitly refers to the current database if the latter is not set.
This is not documented, and therefore needs documentation. This is NOT a
change in behavior for almost all SQL statements except:
- ALTER TABLE t1 RENAME t2
- OPTIMIZE TABLE t1
- ANALYZE TABLE t1
- TRUNCATE TABLE t1 --
until this patch t1 or t2 could be evaluated at the first execution of
prepared statement.
CURRENT_DATABASE() still works OK and is evaluated at every execution
of prepared statement.
Note, that in stored routines this is not an issue as the default
database is the database of the stored procedure and "use" statement
is prohibited in stored routines.
This patch makes obsolete the use of check_db_used (it was never used in the
old code too) and all other places that check for table->db and assign it
from THD::db if it's NULL, except the parser.
How this patch was created: THD::{db,db_length} were replaced with a
LEX_STRING, THD::db. All the places that refer to THD::{db,db_length} were
manually checked and:
- if the place uses thd->db by pointer, it was fixed to make a deep copy
- if a place compared two db pointers, it was fixed to compare them by value
(via strcmp/my_strcasecmp, whatever was approproate)
Then this intermediate patch was used to write a smaller patch that does the
same thing but without a rename.
TODO in 5.1:
- remove check_db_used
- deploy THD::set_db in mysql_change_db
See also comments to individual files.
Stored procedure execution sometimes placed the address of auto variables
in the list of Item changes to undo in THD::rollback_item_tree_changes().
This could cause stack corruption.
Removed sp-goto.test, sp-goto.result and all (disabled) GOTO code.
Also removed some related code that's not needed any more (no possible
unresolved label references any more, so no need to check for them).
NB: Keeping the ER_SP_GOTO_IN_HNDLR in errmsg.txt; it might become useful
in the future, and removing it (and thus re-enumerating error codes)
might upset things. (Anything referring to explicit error codes.)
Also added comments, and fixing some coding style (mostly in comments too).
There are no functional changes, so no tests or documentation needed.
(This was originally part of a bugfix, but it was decided to not include this
in that patch; instead it's done separately.)
The idea is to add DEFINER-clause in CREATE PROCEDURE and CREATE FUNCTION
statements. Almost all support of definer in stored routines had been already
done before this patch.
NOTE: this patch changes behaviour of dumping stored routines in mysqldump.
Before this patch, mysqldump did not dump DEFINER-clause for stored routines
and this was documented behaviour. In order to get full information about stored
routines, one should have dumped mysql.proc table. This patch changes this
behaviour, so that DEFINER-clause is dumped.
Since DEFINER-clause is not supported in CREATE PROCEDURE | FUNCTION statements
before this patch, the clause is covered by additional version-specific comments.
After trying multiple inheritance (to messy and hard make it work) and
sublassing jump_if_not (worked, but ugly), decided to on this solution
instead:
Inserting an abstract sp_instr_opt_meta class as parent for all instructions
with destinations makes it possible to handle a continuation pointer for
sp_instr_set_case_expr too.
Note: No special test case; the fix is captured by the changed behaviour of
bug14643_2, and bug14498_4 (formerly disabled), in sp.test.
Second version.
The problem was that the optimizer didn't work correctly with forwards jumps
to "no-op" hpop and cpop instructions.
Don't generate "no-op" instructions (hpop 0 and cpop 0), it isn't actually
necessary.
- Fixed tests
- Optimized new code
- Fixed some unlikely core dumps
- Better bug fixes for:
- #14397 - OPTIMIZE TABLE with an open HANDLER causes a crash
- #14850 (ERROR 1062 when a quering a view using a Group By on a column that can be null
according to the standard.
The idea is to use Field-classes to implement stored routines
variables. Also, we should provide facade to Item-hierarchy
by Item_field class (it is necessary, since SRVs take part
in expressions).
The patch fixes the following bugs:
- BUG#8702: Stored Procedures: No Error/Warning shown for inappropriate data
type matching;
- BUG#8768: Functions: For any unsigned data type, -ve values can be passed
and returned;
- BUG#8769: Functions: For Int datatypes, out of range values can be passed
and returned;
- BUG#9078: STORED PROCDURE: Decimal digits are not displayed when we use
DECIMAL datatype;
- BUG#9572: Stored procedures: variable type declarations ignored;
- BUG#12903: upper function does not work inside a function;
- BUG#13705: parameters to stored procedures are not verified;
- BUG#13808: ENUM type stored procedure parameter accepts non-enumerated
data;
- BUG#13909: Varchar Stored Procedure Parameter always BINARY string (ignores
CHARACTER SET);
- BUG#14161: Stored procedure cannot retrieve bigint unsigned;
- BUG#14188: BINARY variables have no 0x00 padding;
- BUG#15148: Stored procedure variables accept non-scalar values;
impossible view security".
We should not expose names of tables which are explicitly or implicitly (via
routine or trigger) used by view even if we find that they are missing.
So during building of list of prelocked tables for statement we track which
routines (and therefore tables for these routines) are used from views. We
mark elements of LEX::routines set which correspond to routines used in views
by setting Sroutine_hash_entry::belong_to_view member to point to TABLE_LIST
object for topmost view which uses routine. We propagate this mark to all
routines which are used by this routine and which we add to this set. We also
mark tables used by such routine which we add to the list of tables for
prelocking as belonging to this view.
Since long, the compiled code of stored routines has been printed in the trace file
when starting mysqld with the "--debug" flag. (At creation time only, and only in
debug builds of course.) This has been helpful when debugging stored procedure
execution, but it's a bit awkward to use. Also, the printing of some of the
instructions is a bit terse, in particular for sp_instr_stmt where only the command
code was printed.
This improves the printout of several of the instructions, and adds the debugging-
only commands "show procedure code <name>" and "show function code <name>".
(In non-debug builds they are not available.)
The crash mentioned in original bug report is already prevented by one
of previous patches (fix for bug #13343 "CREATE|etc TRIGGER|VIEW|USER
don't commit the transaction (inconsistency)"), this patch only improve
error returning.
The problem was to continue at the right place in the code after the
test expression in a flow control statement fails with an exception
(internally, the test in sp_instr_jump_if_not), and the exception is
caught by a continue handler. Execution must then be resumed after the
the entire flow control statement (END IF, END WHILE, etc).
cursor is interpreted latin1 character and Bug#9819 "Cursors: Mysql Server
Crash while fetching from table with 5 million records."
A fix for a possible memory leak when fetching into an SP cursor
in a long loop.
The patch uses a common implementation of cursors in the binary protocol and
in stored procedures and implements materialized cursors.
For implementation details, see comments in sql_cursor.cc
Second version after review. Allow 'set autocommit' in procedures, but not
functions or triggers. Can return error in run-time (when a function calls
a procedure).
* Allocate thd->user_var_events elements on appropriate mem_root
* If several SP statements are binlogged as a single statement, collect all user var
accesses they make (grep for StoredRoutinesBinlogging for details)
The idea of the patch is to separate statement processing logic,
such as parsing, validation of the parsed tree, execution and cleanup,
from global query processing logic, such as logging, resetting
priorities of a thread, resetting stored procedure cache, resetting
thread count of errors and warnings.
This makes PREPARE and EXECUTE behave similarly to the rest of SQL
statements and allows their use in stored procedures.
This patch contains a change in behaviour:
until recently for each SQL prepared statement command, 2 queries
were written to the general log, e.g.
[Query] prepare stmt from @stmt_text;
[Prepare] select * from t1 <-- contents of @stmt_text
The chagne was necessary to prevent [Prepare] commands from being written
to the general log when executing a stored procedure with Dynamic SQL.
We should consider whether the old behavior is preferrable and probably
restore it.
This patch refixes Bug#7115, Bug#10975 (partially), Bug#10605 (various bugs
in Dynamic SQL reported before it was disabled).
"Interleaved SPs execution is now binlogged properly, "SELECT spfunc()" is binlogged too.
The known remaining issue is binlogging/replication of "a routine is deleted while it is executed" scenario.
NOT FOUND ...' in conditional handled incorrectly".
Whenever we remove an instruction during optimization, we need to
adjust instruction numbers (ip - instruction pointer) stored in all
instructions. In addition to that, sp_instr_hpush_jump, which
corresponds to DECLARE CONTINUE HANDLER needs adjustment for m_handler,
which holds the number of instruction with the continue handler.
In the bug report, a wrong ip stored in m_handler was pointing at
FETCH, which resulted in an error message and abnormal SP termination.
The fix is to just remove m_handler member from sp_instr_hpush_jump,
as it's always points to the instruction next to the DECLARE
statement itself (m_ip+1).
its body, but lets each statement to get/release its own locks. This allows a broader set
of statements to be executed inside PROCEDUREs (but breaks replication)
This patch should fix BUG#8072, BUG#8766, BUG#9563, BUG#11126
crash if referencing a table" and several other related bugs.
Fix for bug #11834 "Re-execution of prepared statement with dropped function
crashes server." which was spotted during work on previous bugs.
Also couple of nice cleanups:
- Replaced two separate hashes for stored routines used by statement with one.
- Now instead of doing one pass through all routines used in statement for
caching them and then doing another pass for adding their tables to table
list, we do only one pass during which do both things.
"Stored procedures: crash with function calling itself".
Disallow recursive stored routines until we either make Item's and LEX
reentrant safe or will use spearate sp_head instances (and thus separate
LEX objects and Item trees) for each routine invocation.
We need every instruction to have its own arena, because we want to
track instruction's state (INITIALIZED_FOR_SP -> EXECUTED). Because of
`if' statements and other conditional instructions used in stored
procedures, not every instruction of a stored procedure gets executed
during the first (or even subsequent) execution of the procedure.
So it's better if we track the execution state of every instruction
independently.
All instructions of a given procedure now also share sp_head's
mem_root, but keep their own free_list.
This simplifies juggling with free Item lists in sp_head::execute.
- free_items() moved to be a member of Query_arena.
- logic of 'backup_arena' debug member of Query_arena has been
changed to support
multi-backups. Until now, TRUE 'backup_arena' meant that there is
exactly one active backup of the THD arena. Now it means simply that
the arena is used for backup, so that we can't accidentally overwrite an
existing backup. This allows doing multiple backups, e.g. in
sp_head::execute and Cursor::fetch, when THD arena is already backed up
but we want to set yet another arena (usually the 'permanent' arena,
to save permanent transformations/optimizations of a parsed tree).