fixes).
The legend: on a replication slave, in case a trigger creation
was filtered out because of application of replicate-do-table/
replicate-ignore-table rule, the parsed definition of a trigger was not
cleaned up properly. LEX::sphead member was left around and leaked
memory. Until the actual implementation of support of
replicate-ignore-table rules for triggers by the patch for Bug 24478 it
was never the case that "case SQLCOM_CREATE_TRIGGER"
was not executed once a trigger was parsed,
so the deletion of lex->sphead there worked and the memory did not leak.
The fix:
The real cause of the bug is that there is no 1 or 2 places where
we can clean up the main LEX after parse. And the reason we
can not have just one or two places where we clean up the LEX is
asymmetric behaviour of MYSQLparse in case of success or error.
One of the root causes of this behaviour is the code in Item::Item()
constructor. There, a newly created item adds itself to THD::free_list
- a single-linked list of Items used in a statement. Yuck. This code
is unaware that we may have more than one statement active at a time,
and always assumes that the free_list of the current statement is
located in THD::free_list. One day we need to be able to explicitly
allocate an item in a given Query_arena.
Thus, when parsing a definition of a stored procedure, like
CREATE PROCEDURE p1() BEGIN SELECT a FROM t1; SELECT b FROM t1; END;
we actually need to reset THD::mem_root, THD::free_list and THD::lex
to parse the nested procedure statement (SELECT *).
The actual reset and restore is implemented in semantic actions
attached to sp_proc_stmt grammar rule.
The problem is that in case of a parsing error inside a nested statement
Bison generated parser would abort immediately, without executing the
restore part of the semantic action. This would leave THD in an
in-the-middle-of-parsing state.
This is why we couldn't have had a single place where we clean up the LEX
after MYSQLparse - in case of an error we needed to do a clean up
immediately, in case of success a clean up could have been delayed.
This left the door open for a memory leak.
One of the following possibilities were considered when working on a fix:
- patch the replication logic to do the clean up. Rejected
as breaks module borders, replication code should not need to know the
gory details of clean up procedure after CREATE TRIGGER.
- wrap MYSQLparse with a function that would do a clean up.
Rejected as ideally we should fix the problem when it happens, not
adjust for it outside of the problematic code.
- make sure MYSQLparse cleans up after itself by invoking the clean up
functionality in the appropriate places before return. Implemented in
this patch.
- use %destructor rule for sp_proc_stmt to restore THD - cleaner
than the prevoius approach, but rejected
because needs a careful analysis of the side effects, and this patch is
for 5.0, and long term we need to use the next alternative anyway
- make sure that sp_proc_stmt doesn't juggle with THD - this is a
large work that will affect many modules.
Cleanup: move main_lex and main_mem_root from Statement to its
only two descendants Prepared_statement and THD. This ensures that
when a Statement instance was created for purposes of statement backup,
we do not involve LEX constructor/destructor, which is fairly expensive.
In order to track that the transformation produces equivalent
functionality please check the respective constructors and destructors
of Statement, Prepared_statement and THD - these members were
used only there.
This cleanup is unrelated to the patch.
This patch fixes problem that LOAD DATA could use different
character sets when loading files on master and on slave sides:
- Adding replication of thd->variables.collation_database
- Adding optional character set clause into LOAD DATA
Note, the second way, with explicit CHARACTER SET clause
should be the recommended way to load data using an alternative
character set.
The old way, using "SET @@character_set_database=xxx" should be
gradually depricated.
Post fix for bug#23800.
The Item_field constructor now increases the select_n_where_fields counter.
sql_yacc.yy:
Post fix for bug#23800.
Take into account fields that might be added by subselects.
sql_lex.h:
Post fix for bug#23800.
Added the select_n_where_fields variable to the st_select_lex class.
sql_lex.cc:
Post fix for bug#23800.
Initialization of the select_n_where_fields variable.
Several problems fixed:
1. There was a "catch-all" context initialization in setup_tables()
that was causing the table that we insert into to be visible in the
SELECT part of an INSERT .. SELECT .. statement with no tables in
its FROM clause. This was making sure all the under-initialized
contexts in various parts of the code are not left uninitialized.
Fixed by removing the "catch-all" statement and initializing the
context in the parser.
2. Incomplete name resolution context when resolving the right-hand
values in the ON DUPLICATE KEY UPDATE ... part of an INSERT ... SELECT ...
caused columns from NATURAL JOIN/JOIN USING table references in the
FROM clause of the select to be unavailable.
Fixed by establishing a proper name resolution context.
3. When setting up the special name resolution context for problem 2
there was no check for cases where an aggregate function without a
GROUP BY effectively takes the column from the SELECT part of an
INSERT ... SELECT unavailable for ON DUPLICATE KEY UPDATE.
Fixed by checking for that condition when setting up the name
resolution context.
operations)
Before this change, the boolean predicates:
- X IS TRUE,
- X IS NOT TRUE,
- X IS FALSE,
- X IS NOT FALSE
were implemented by expanding the Item tree in the parser, by using a
construct like:
Item_func_if(Item_func_ifnull(X, <value>), <value>, <value>)
Each <value> was a constant integer, either 0 or 1.
A bug in the implementation of the function IF(a, b, c), in
Item_func_if::fix_length_and_dec(), would cause the following :
When the arguments b and c are both unsigned, the result type of the
function was signed, instead of unsigned.
When the result of the if function is signed, space for the sign could be
counted twice (in the max() expression for a signed argument, and in the
total), causing the member max_length to be too high.
An effect of this is that the final type of IF(x, int(1), int(1)) would be
int(2) instead of int(1).
With this fix, the problems found in Item_func_if::fix_length_and_dec()
have been fixed.
While it's semantically correct to represent 'X IS TRUE' with
Item_func_if(Item_func_ifnull(X, <value>), <value>, <value>),
there are however more problems with this construct.
a)
Building the parse tree involves :
- creating 5 Item instances (3 ints, 1 ifnull, 1 if),
- creating each Item calls my_pthread_getspecific_ptr() once in the operator
new(size), and a second time in the Item::Item() constructor, resulting
in a total of 10 calls to get the current thread.
Evaluating the expression involves evaluating up to 4 nodes at runtime.
This representation could be greatly simplified and improved.
b)
Transforming the parse tree internally with if(ifnull(...)) is fine as long
as this transformation is internal to the server implementation.
With views however, the result of the parse tree is later exposed by the
::print() functions, and stored as part of the view definition.
Doing this has long term consequences:
1)
The original semantic 'X IS TRUE' is lost, and replaced by the
if(ifnull(...)) expression. As a result, SHOW CREATE VIEW does not restore
the original code.
2)
Should a future version of MySQL implement the SQL BOOLEAN data type for
example, views created today using 'X IS NULL' can be exported using
mysqldump, and imported again. Such views would be converted correctly and
automatically to use a BOOLEAN column in the future version.
With 'X IS TRUE' and the current implementations, views using these
"boolean" predicates would not be converted during the export/import, and
would use integer columns instead.
The difference traces back to how SHOW CREATE VIEW preserves 'X IS NULL' but
does not preserve the 'X IS TRUE' semantic.
With this fix, internal representation of 'X IS TRUE' booleans predicates
has changed, so that:
- dedicated Item classes are created for each predicate,
- only 1 Item is created to represent 1 predicate
- my_pthread_getspecific_ptr() is invoked 1 time instead of 10
- SHOW CREATE VIEW preserves the original semantic, and prints 'X IS TRUE'.
Note that, because of the fix in Item_func_if, views created before this fix
will:
- correctly use a int(1) type instead of int(2) for boolean predicates,
- incorrectly print the if(ifnull(...), ...) expression in SHOW CREATE VIEW,
since the original semantic (X IS TRUE) has been lost.
- except for the syntax used in SHOW CREATE VIEW, these views will operate
properly, no action is needed.
Views created after this fix will operate correctly, and will preserve the
original code semantic in SHOW CREATE VIEW.
Two problems here:
Problem 1:
While constructing the join columns list the optimizer does as follows:
1. Sets the join_using_fields/natural_join members of the right JOIN
operand.
2. Makes a "table reference" (TABLE_LIST) to parent the two tables.
3. Assigns the join_using_fields/is_natural_join of the wrapper table
using join_using_fields/natural_join of the rightmost table
4. Sets join_using_fields to NULL for the right JOIN operand.
5. Passes the parent table up to the same procedure on the upper
level.
Step 1 overrides the the join_using_fields that are set for a nested
join wrapping table in step 4.
Fixed by making a designated variable SELECT_LEX::prev_join_using to
pass the data from step 1 to step 4 without destroying the wrapping
table data.
Problem 2:
The optimizer checks for ambiguous columns while transforming
NATURAL JOIN/JOIN USING to JOIN ON. While doing that there was no
distinction between columns that are used in the generated join
condition (where ambiguity can be checked) and the other columns
(where ambiguity can be checked only when resolving references
coming from outside the JOIN construct itself).
Fixed by allowing the non-USING columns to be present in multiple
copies in both sides of the join and moving the ambiguity check
to the place where unqualified references to the join columns are
resolved (find_field_in_natural_join()).
Before this fix, a IN predicate of the form: "IN (( subselect ))", with two
parenthesis, would be evaluated as a single row subselect: if the subselect
returns more that 1 row, the statement would fail.
The SQL:2003 standard defines a special exception in the specification,
and mandates that this particular form of IN predicate shall be equivalent
to "IN ( subselect )", which involves a table subquery and works with more
than 1 row.
This fix implements "IN (( subselect ))", "IN ((( subselect )))" etc
as per the SQL:2003 requirement.
All the details related to the implementation of this change have been
commented in the code, and the relevant sections of the SQL:2003 spec
are given for reference, so they are not repeated here.
Having access to the spec is a requirement to review in depth this patch.
WL#3681 (ALTER TABLE ORDER BY)
Before this fix, the ALTER TABLE statement implemented an ORDER BY option
with the following characteristics :
1) The order by clause accepts a list of criteria, with optional ASC or
DESC keywords
2) Each criteria can be a general expression, involving operators,
native functions, stored functions, user defined functions, subselects ...
With this fix :
1) has been left unchanged, since it's a de-facto existing feature,
that was already present in the code base and partially covered in the test
suite. Code coverage for ASC and DESC was missing and has been improved.
2) has been changed to limit the kind of criteria that are permissible:
now only a column name is valid.
- Removed not used variables and functions
- Added #ifdef around code that is not used
- Renamed variables and functions to avoid conflicts
- Removed some not used arguments
Fixed some class/struct warnings in ndb
Added define IS_LONGDATA() to simplify code in libmysql.c
I did run gcov on the changes and added 'purecov' comments on almost all lines that was not just variable name changes
Bug#4968 "Stored procedure crash if cursor opened on altered table"
Bug#19733 "Repeated alter, or repeated create/drop, fails"
Bug#19182 "CREATE TABLE bar (m INT) SELECT n FROM foo; doesn't work from
stored procedure."
Bug#6895 "Prepared Statements: ALTER TABLE DROP COLUMN does nothing"
Bug#22060 "ALTER TABLE x AUTO_INCREMENT=y in SP crashes server"
Test cases for bugs 4968, 19733, 6895 will be added in 5.0.
Re-execution of CREATE DATABASE, CREATE TABLE and ALTER TABLE
statements in stored routines or as prepared statements caused
incorrect results (and crashes in versions prior to 5.0.25).
In 5.1 the problem occured only for CREATE DATABASE, CREATE TABLE
SELECT and CREATE TABLE with INDEX/DATA DIRECTOY options).
The problem of bugs 4968, 19733, 19282 and 6895 was that functions
mysql_prepare_table, mysql_create_table and mysql_alter_table were not
re-execution friendly: during their operation they used to modify contents
of LEX (members create_info, alter_info, key_list, create_list),
thus making the LEX unusable for the next execution.
In particular, these functions removed processed columns and keys from
create_list, key_list and drop_list. Search the code in sql_table.cc
for drop_it.remove() and similar patterns to find evidence.
The fix is to supply to these functions a usable copy of each of the
above structures at every re-execution of an SQL statement.
To simplify memory management, LEX::key_list and LEX::create_list
were added to LEX::alter_info, a fresh copy of which is created for
every execution.
The problem of crashing bug 22060 stemmed from the fact that the above
metnioned functions were not only modifying HA_CREATE_INFO structure in
LEX, but also were changing it to point to areas in volatile memory of
the execution memory root.
The patch solves this problem by creating and using an on-stack
copy of HA_CREATE_INFO (note that code in 5.1 already creates and
uses a copy of this structure in mysql_create_table()/alter_table(),
but this approach didn't work well for CREATE TABLE SELECT statement).
Fixed compiler warnings (detected by VC++):
- Removed not used variables
- Added casts
- Fixed wrong assignments to bool
- Fixed wrong calls with bool arguments
- Added missing argument to store(longlong), which caused wrong store method to be called.
Note that we ignore CONCURRENT if LOAD DATA CONCURRENT is used from
inside a stored routine and MySQL is compiled with Query Cache support
(this is not in the manual).
The problem was that the condition test of "we are inside stored routine"
was reversed, thus CONCURRENT _worked only_ from stored routine. The
solution is to use proper condition test.
No test case is provided because the test case would require a large
amount of input, and it's hard to tell is SELECT is really blocked or
just slow (subject to race).
limitation)
Note to the reviewer
====================
Warning: reviewing this patch is somewhat involved.
Due to the nature of several issues all affecting the same area,
fixing separately each issue is not practical, since each fix can not be
implemented and tested independently.
In particular, the issues with
- rule recursion
- nested case statements
- forward jump resolution (backpatch list)
are tightly coupled (see below).
Definitions
===========
The expression
CASE expr
WHEN expr THEN expr
WHEN expr THEN expr
...
END
is a "Simple Case Expression".
The expression
CASE
WHEN expr THEN expr
WHEN expr THEN expr
...
END
is a "Searched Case Expression".
The statement
CASE expr
WHEN expr THEN stmts
WHEN expr THEN stmts
...
END CASE
is a "Simple Case Statement".
The statement
CASE
WHEN expr THEN stmts
WHEN expr THEN stmts
...
END CASE
is a "Searched Case Statement".
A "Left Recursive" rule is like
list:
element
| list element
;
A "Right Recursive" rule is like
list:
element
| element list
;
Left and right recursion produces the same language, the difference only
affects the *order* in which the text is parsed.
In a descendant parser (usually written manually), right recursion works
very well, and is typically implemented with a while loop.
In an ascendant parser (yacc/bison) left recursion works very well,
and is implemented naturally by the parser stack.
In both cases, using the wrong type or recursion is very bad and should be
avoided, as it causes technical issues with the parser implementation.
Before this change
==================
The "Simple Case Expression" and "Searched Case Expression" were both
implemented by the "when_list" and "when_list2" rules, which are left
recursive (ok).
These rules, however, used lex->when_list instead of using the parser stack,
which is more complex that necessary, and potentially dangerous because
of other rules using THD::reset_lex.
The "Simple Case Statement" and "Searched Case Statements" were implemented
by the "sp_case", "sp_whens" and in part by "sp_proc_stmt" rules.
Both cases were right recursive (bad).
The grammar involved was convoluted, and is assumed to be the results of
tweaks to get the code generation to work, but is not what someone would
naturally write.
In addition, using a common rule for both "Simple" and "Searched" case
statements was implemented with sp_head::m_flags |= IN_SIMPLE_CASE,
which is a flag and not a stack, and therefore does not take into account
*nested* case statements. This leads to incorrect generated code, and either
a server crash or an incorrect result.
With regards to the backpatch mechanism, a *different* backpatch list was
created for each jump from "WHEN expr THEN stmt" to "END CASE", which
relied on the grammar to be right recursive.
This is a mis-use of the backpatch list, since this list can resolve
multiple references to the same target at once.
The optimizer algorithm used to detect dead code in the "assembly" SQL
instructions, implemented by sp_head::opt_mark(uint ip), was recursive
in some cases (a conditional jump pointing forward to another conditional
jump).
In case of specially crafted code, like
- a long list of "IF expr THEN stmt END IF"
- a long CASE statement
this would actually cause a server crash with a stack overflow.
In general, having a stack that grows proportionally with user data (the
SQL code given by the client in a CREATE PROCEDURE) is to be avoided.
In debug builds only, creating a SP / SF / Trigger which had a significant
amount of code would spend --literally-- several minutes in sp_head::create,
because of the debug code involved with DBUG_PRINT("info", ("Code %s ...
There are several issues with this code:
- in a CASE with 5 000 WHEN, there are 15 000 instructions generated,
which create a sting representation of the code which is 500 000 bytes
long,
- using a String instead of an io stream causes performances to degrade
to a total server freeze, as time is spent doing realloc of a buffer
always too short,
- Printing a 500 000 long string in the debug log is too verbose,
- Generating this string even when DBUG_PRINT is off is useless,
- Having code that potentially can affect the server behavior, used with
#ifdef / #endif is useful in some cases, but is also a bad practice.
After this change
=================
"Case Expressions" (both simple and searched) have been simplified to
not use LEX::when_list, which has been removed.
Considering all the issues affecting case statements, the grammar for these
has been totally re written.
The existing actions, used to generate "assembly" sp_inst* code, have been
preserved but moved in the new grammar, with the following changes:
a) Bison rules are no longer shared between "Simple" and "Searched" case
statements, because a stack instead of a flag is required to handle them.
Nested statements are handled naturally by the parser stack, which by
definition uses the correct rule in the correct context.
Nested statements of the opposite type (simple vs searched) works correctly.
The flag sp_head::IN_SIMPLE_CASE is no longer used.
This is a step towards resolution of WL#2999, which correctly identified
that temporary parsing flags do not belong to sp_head.
The code in the action is shared by mean of the case_stmt_action_xxx()
helpers.
b) The backpatch mechanism, used to resolve forward jumps in the generated
code, has been changed to:
- create a label for the instruction following 'END CASE',
- register each jump at the end of a "WHEN expr THEN stmt" in a *unique*
backpatch list associated with the 'END CASE' label
- resolve all the forward jumps for this label at once.
In addition, the code involving backpatch has been commented, so that a
reader can now understand by reading matching "Registering" and "Resolving"
comments how the forward jumps are resolved and what target they resolve to,
as this is far from evident when reading the code alone.
The implementation of sp_head::opt_mark() has been revised to avoid
recursive calls from jump instructions, and instead add the jump location
to the list of paths to explore during the flow analysis of the instruction
graph, with a call to sp_head::add_mark_lead().
In addition, the flow analysis will stop if an instruction has already
been marked as reachable, which the previous code failed to do in the
recursive case.
sp_head::opt_mark() is now private, to prevent new calls to this method from
being introduced.
The debug code present in sp_head::create() has been removed.
Considering that SHOW PROCEDURE CODE is also available in debug builds,
and can be used anytime regardless of the trace level, as opposed to
"CREATE PROCEDURE" time and only if the trace was on,
removing the code actually makes debugging easier (usable trace).
Tests have been written to cover the parser overflow (big CASE),
and to cover nested CASE statements.
This change set implements the DROP TRIGGER IF EXISTS functionality.
This fix is considered a bug and not a feature, because without it,
there is no known method to write a database creation script that can create
a trigger without failing, when executed on a database that may or may not
contain already a trigger of the same name.
Implementing this functionality closes an orthogonality gap between triggers
and stored procedures / stored functions (which do support the DROP IF
EXISTS syntax).
In sql_trigger.cc, in mysql_create_or_drop_trigger,
the code has been reordered to:
- perform the tests that do not depend on the file system (access()),
- get the locks (wait_if_global_read_lock, LOCK_open)
- call access()
- perform the operation
- write to the binlog
- unlock (LOCK_open, start_waiting_global_read_lock)
This is to ensure that all the code that depends on the presence of the
trigger file is executed in the same critical section,
and prevents race conditions similar to the case fixed by Bug 14262 :
- thread 1 executes DROP TRIGGER IF EXISTS, access() returns a failure
- thread 2 executes CREATE TRIGGER
- thread 2 logs CREATE TRIGGER
- thread 1 logs DROP TRIGGER IF EXISTS
The patch itself is based on code contributed by the MySQL community,
under the terms of the Contributor License Agreement (See Bug 18161).
select OK.
The SQL parser was using Item::name to transfer user defined function attributes
to the user defined function (udf). It was not distinguishing between user defined
function call arguments and stored procedure call arguments. Setting Item::name
was causing Item_ref::print() method to print the argument as quoted identifiers
and caused views that reference aggregate functions as udf call arguments (and
rely on Item::print() for the text of the view to store) to throw an undefined
identifier error.
Overloaded Item_ref::print to print aggregate functions as such when printing
the references to aggregate functions taken out of context by split_sum_func2()
Fixed the parser to properly detect using AS clause in stored procedure arguments
as an error.
Fixed printing the arguments of udf call to print properly the udf attribute.
should fail to create
The problem was that this type of errors was checked during view
creation, which doesn't happen when CREATE VIEW is a statement of
a created stored routine.
The solution is to perform the checks at parse time. The idea of the
fix is that the parser checks if a construction just parsed is allowed
in current circumstances by testing certain flags, and this flags are
reset for VIEWs.
The side effect of this change is that if the user already have
such bogus routines, it will now get a error when trying to do
SHOW CREATE PROCEDURE proc;
(and some other) and when trying to execute such routine he will get
ERROR 1457 (HY000): Failed to load routine test.p5. The table mysql.proc is missing, corrupt, or contains bad data (internal code -6)
However there should be very few such users (if any), and they may
(and should) drop these bogus routines.
The syntax of the CALL statement, to invoke a stored procedure, has been
changed to make the use of parenthesis optional in the argument list.
With this change, "CALL p;" is equivalent to "CALL p();".
While the SQL spec does not explicitely mandate this syntax, supporting it
is needed for practical reasons, for integration with JDBC / ODBC connectors.
Also, warnings in the sql/sql_yacc.yy file, which were not reported by Bison 2.1
but are now reported by Bison 2.2, have been fixed.
The warning found were:
bison -y -p MYSQL -d --debug --verbose sql_yacc.yy
sql_yacc.yy:653.9-18: warning: symbol UNLOCK_SYM redeclared
sql_yacc.yy:656.9-17: warning: symbol UNTIL_SYM redeclared
sql_yacc.yy:658.9-18: warning: symbol UPDATE_SYM redeclared
sql_yacc.yy:5169.11-5174.11: warning: unused value: $2
sql_yacc.yy:5208.11-5220.11: warning: unused value: $5
sql_yacc.yy:5221.11-5234.11: warning: unused value: $5
conflicts: 249 shift/reduce
"unused value: $2" correspond to the $$=$1 assignment in the 1st {} block
in table_ref -> join_table {} {},
which does not procude a result ($$) for the rule but an intermediate $2
value for the action instead.
"unused value: $5" are similar, with $$ assignments in {} actions blocks
which are not for the final reduce.