Converting the number zero to binary and back yielded the number zero,
but with no digits, i.e. zero precision.
This made the multiply algorithm go haywire in various ways.
The fix of Bug#12612184 broke crash recovery. When a record that
contains off-page columns (BLOBs) is updated, we must first write redo
log about the BLOB page writes, and only after that write the redo log
about the B-tree changes. The buggy fix would log the B-tree changes
first, meaning that after recovery, we could end up having a record
that contains a null BLOB pointer.
Because we will be redo logging the writes off the off-page columns
before the B-tree changes, we must make sure that the pages chosen for
the off-page columns are free both before and after the B-tree
changes. In this way, the worst thing that can happen in crash
recovery is that the BLOBs are written to free pages, but the B-tree
changes are not applied. The BLOB pages would correctly remain free in
this case. To achieve this, we must allocate the BLOB pages in the
mini-transaction of the B-tree operation. A further quirk is that BLOB
pages are allocated from the same file segment as leaf pages. Because
of this, we must temporarily "hide" any leaf pages that were freed
during the B-tree operation by "fake allocating" them prior to writing
the BLOBs, and freeing them again before the mtr_commit() of the
B-tree operation, in btr_mark_freed_leaves().
btr_cur_mtr_commit_and_start(): Remove this faulty function that was
introduced in the Bug#12612184 fix. The problem that this function was
trying to address was that when we did mtr_commit() the BLOB writes
before the mtr_commit() of the update, the new BLOB pages could have
overwritten clustered index B-tree leaf pages that were freed during
the update. If recovery applied the redo log of the BLOB writes but
did not see the log of the record update, the index tree would be
corrupted. The correct solution is to make the freed clustered index
pages unavailable to the BLOB allocation. This function is also a
likely culprit of InnoDB hangs that were observed when testing the
Bug#12612184 fix.
btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of
a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs,
or freed (nonfree=FALSE) before committing the mini-transaction.
btr_freed_leaves_validate(): A debug function for checking that all
clustered index leaf pages that have been marked free in the
mini-transaction are consistent (have not been zeroed out).
btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
number of the allocated page, or FIL_NULL if out of space. Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or if this is a "fake allocation"
(init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).
btr_page_alloc(): Add the parameter init_mtr, allowing the page to be
initialized and X-latched in a different mini-transaction than the one
that is used for the allocation. Invoke btr_page_alloc_low(). If a
clustered index leaf page was previously freed in mtr, remove it from
the memo of previously freed pages.
btr_page_free(): Assert that the page is a B-tree page and it has been
X-latched by the mini-transaction. If the freed page was a leaf page
of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to
the mini-transaction.
btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
which is NULL (old behaviour in inserts) and the same as local_mtr in
updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
instead of the mini-transaction that is used for writing the BLOBs.
fsp_alloc_from_free_frag(): Refactored from
fsp_alloc_free_page(). Allocate the specified page from a partially
free extent.
fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or NULL if this is a "fake allocation"
that prevents the reuse of a previously freed B-tree page for BLOB
storage. If init_mtr==NULL, try harder to reallocate the specified page
and assert that it succeeded.
fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for
specifying the mini-transaction where the page should be initialized.
Do not allow init_mtr == NULL, because this function is never to be
used for "fake allocations".
mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag
mtr->freed_clust_leaf for quickly determining if any
MTR_MEMO_FREE_CLUST_LEAF operations have been posted.
row_ins_index_entry_low(): When columns are being made off-page in
insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass
the mini-transaction as the alloc_mtr to
btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
row_build(): Correct a comment, and add a debug assertion that a
record that contains NULL BLOB pointers must be a fresh insert.
row_upd_clust_rec(): When columns are being moved off-page, invoke
btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as
the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
buf_reset_check_index_page_at_flush(): Remove. The function
fsp_init_file_page_low() already sets
bpage->check_index_page_at_flush=FALSE.
There is a known issue in tablespace extension. If the request to
allocate a BLOB page leads to the tablespace being extended, crash
recovery could see BLOB writes to pages that are off the tablespace
file bounds. This should trigger an assertion failure in fil_io() at
crash recovery. The safe thing would be to write redo log about the
tablespace extension to the mini-transaction of the BLOB write, not to
the mini-transaction of the record update. However, there is no redo
log record for file extension in the current redo log format.
rb:693 approved by Sunny Bains
Suppress the known warnings generated by filesort().
The real fix belongs to worklog 1509:
Pack values of non-sorted fields in the sort buffer
(which is basically the same issue, but in an optimization context:
We are writing the entire sort buffer to disk,
including un-used space for varchar columns.)
CRASHES SERVER
Flushing of MERGE table or one of its child tables, which was
locked by flushing thread using LOCK TABLES, might have caused
crashes or assertion failures if the thread failed to reopen
child or parent table.
Particularly, this might have happened when another connection
killed this FLUSH TABLE statement/connection.
Also this problem might have occurred when we failed to reopen
MERGE table or one of its children when executing DDL statement
under LOCK TABLES.
The problem was caused by the fact that reopen_tables() might
have failed to reopen child table but still tried to reopen,
reattach children for and re-lock its parent. Vice versa it
might have failed to reopen parent but kept references from
children to parent around. Since reopen_tables() closes table
it has failed to reopen and therefore frees all associated
memory such dangling references led to crashes when followed.
This patch solves this problem by ensuring that we always close
parent table and all its children if we fail to reopen this
table or one of its children. Same happens if we fail to reattach
children to parent.
Affects 5.1 only.
for compressed InnoDB tables
ha_innodb::info_low(): For calculating data_length or index_length,
use the compressed page size for compressed tables instead of UNIV_PAGE_SIZE.
rb:714 approved by Sunny Bains
There is an optimization of DISTINCT in JOIN::optimize()
which depends on THD::used_tables value. Each SELECT statement
inside SP resets used_tables value(see mysql_select()) and it
leads to wrong result. The fix is to replace THD::used_tables
with LEX::used_tables.
The problem is that TIME_FUZZY_DATE is explicitly used for get_arg0_date()
function in Item_date_typecast::get_date method. The fix is to use real
fuzzy_date value.
The server crashes if it processes table map events that are
corrupted, especially if they map different tables to the same
identifier. This could happen, for instance, due to BUG 56226.
We fix this by checking whether the table map has already been
mapped before actually applying the event. If it has been mapped
with different settings an error is raised and the slave SQL
thread stops. If it has been mapped with same settings the event
is skipped. If the table is set to be ignored by the filtering
rules, there is no change in behavior: the event is skipped and
ids are not checked.
When CREATE TABLE wasn't given ENGINE=... it would determine
the default ENGINE at parse-time rather than at execution
time, leading to incorrect behaviour (namely, later changes
to the default engine being ignore) when calling CREATE TABLE
from a stored procedure.
We now defer working out the default engine till execution of
CREATE TABLE.
bug. It added this assert;
ut_ad(ind_field->prefix_len);
before a section of code that assumes there is a prefix_len.
The patch replaced code that explicitly avoided this with a check for
prefix_len. It turns out that the purge thread can get to that assert
without a prefix_len because it does not use a row_ext_t* .
When UNIV_DEBUG is not defined, the affect of this is that the purge thread
sets the dfield->len to zero and then cannot find the entry in the index to
purge. So secondary index entries remain unpurged.
This patch does not do the assert. Instead, it uses
'if (ind_field->prefix_len) {...}'
around the section of code that assumes a prefix_len. This is the way the
patch I provided to Marko did it.
The test case is simply modified to do a sleep(10) in order to give the
purge thread a chance to run. Without the code change to row0row.c, this
modified testcase will assert if InnoDB was compiled with UNIV_DEBUG.
I tried to sleep(5), but it did not always assert.
row_build_index_entry(): In innodb_file_format=Barracuda
(ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED), a secondary index on a
full column can refer to a field that is stored off-page in the
clustered index record. Take that into account.
rb:692 approved by Jimmy Yang
BOGUS "THE TABLE MYSQL.PROC IS MISSING,..."
There was a race condition between loading a stored routine
(function/procedure/trigger) specified by fully qualified name
SCHEMA_NAME.PROC_NAME and dropping the stored routine database.
The problem was that there is a window for race condition when one server
thread tries to load a stored routine being executed and the other thread
tries to drop the stored routine schema.
This condition race window exists in implementation of function
mysql_change_db() called by db_load_routine() during loading of stored
routine to cache. Function mysql_change_db() calls check_db_dir_existence()
that might failed because specified database was dropped during concurrent
execution of DROP SCHEMA statement. db_load_routine() calls mysql_change_db()
with flag 'force_switch' set to 'true' value so when referenced db is not found
then my_error() is not called and function mysql_change_db() returns ok.
This shadows information about schema opening error in db_load_routine().
Then db_load_routine() makes attempt to parse stored routine that is failed.
This makes to return error to sp_cache_routines_and_add_tables_aux() but since
during error generation a call to my_error wasn't made and hence
THD::main_da wasn't set we set the generic "mysql.proc table corrupt" error
when running sp_cache_routines_and_add_tables_aux().
The fix is to install an error handler inside db_load_routine() for
the mysql_op_change_db() call, and check later if the ER_BAD_DB_ERROR
was caught.
TO POSITION FIRST CAN CAUSE DATA TO BE CORRUPTED".
ALTER TABLE MODIFY/CHANGE ... FIRST did nothing except renaming
columns if new version of the table had exactly the same
structure as the old one (i.e. as result of such statement, names
of columns changed their order as specified but data in columns
didn't). The same thing happened for ALTER TABLE DROP COLUMN/ADD
COLUMN statements which were supposed to produce new version of
table with exactly the same structure as the old version of table.
I.e. in the latter case the result was the same as if old column
was renamed instead of being dropped and new column with default
as value being created.
Both these problems were caused by the fact that ALTER TABLE
implementation incorrectly interpreted both these situations as
simple renaming of columns and assumed that in-place ALTER TABLE
algorithm could have been used for them.
This patch fixes this problem by ensuring that in cases when some
column is moved to the first position or some column is dropped
the default ALTER TABLE algorithm involving table copying is
always used. This is achieved by detecting such situations in
mysql_prepare_alter_table() and setting Alter_info::change_level
to ALTER_TABLE_DATA_CHANGED for them.
SYNTAX TRIGGERS IN ANY WAY
Table with triggers which were using deprecated (5.0-only) syntax became
unavailable for any DML and DDL after upgrade to 5.1 version of server.
Attempt to execute any statement on such a table resulted in parsing
error reported. Since this included DROP TRIGGER and DROP TABLE
statements (actually, the latter was allowed but was not functioning
properly for such tables) it was impossible to fix the problem without
manual operations on .TRG and .TRN files in data directory.
The problem was that failure to parse trigger body (due to 5.0-only
syntax) when opening trigger file for a table prevented the table
from being open. This made all operations on the table impossible
(except DROP TABLE which due to peculiarity in its implementation
dropped the table but left trigger files around).
This patch solves this problem by silencing error which occurs when
we parse trigger body during table open. Error message is preserved
for the future use and table is marked as having a broken trigger.
We also try to analyze parse tree to recover trigger name, which
will be needed in order to drop the broken trigger. DML statements
which invoke triggers on the table marked as having broken trigger
are prohibited and emit saved error message. The same happens for
DDL which change triggers except DROP TRIGGER and DROP TABLE which
try their best to do what was requested. Table becomes no longer
marked as having broken trigger when last such trigger is dropped.
THE EVENT STATUS.
Any ALTER EVENT statement on a disabled event enabled it back
(unless this ALTER EVENT statement explicitly disabled the event).
The problem was that during processing of an ALTER EVENT statement
value of status field was overwritten unconditionally even if new
value was not specified explicitly. As a consequence this field
was set to default value for status which corresponds to ENABLE.
The solution is to check if status field was explicitly specified in
ALTER EVENT statement before assigning new value to status field.
Problem: in case of wrong data insert into indexed GEOMETRY fields
(e.g. NULL value for a not NULL field) MyISAM reported
"ERROR 126 (HY000): Incorrect key file for table; try to repair it"
due to misuse of the key deletion function.
Fix: always use R-tree key functions for R-tree based indexes
and B-tree key functions for B-tree based indexes.
FAIL IN EMBEDDED SERVER
FreeBSD 64 bit needs the FP_X_DNML to fpsetmask() to prevent exceptions from
propagating into mysql (as a threaded application).
However fpsetmask() itself is deprecated in favor of fedisableexcept().
1. Fixed the #ifdef to check for FP_X_DNML instead of i386.
2. Added a configure.in check for fedisableexcept() and, if present,
this function is called insted of the fpsetmask().
No need for new tests, as the existing tests cover this already.
Removed the affected tests from the experimental list.
will create multiple running events.
A CREATE IF NOT EXIST on an event that existed and was enabled caused
multiple instances of the event to run. Disabling the event didn't help.
If the event was dropped, the event stopped running, but when created
again, multiple instances of the event were still running. The only way
to get out of this situation was to restart the server.
The problem was that Event_db_repository::create_event() didn't return
enough information to discriminate between situation when event didn't
exist and was created and when event did exist and was not created
(but a warning was emitted). As result in the latter case event
was added to in-memory queue of events second time. And this led to
unwarranted multiple executions of the same event.
The solution is to add out-parameter to Event_db_repository::create_event()
method which will signal that event was not created because it already
exists and so it should not be added to the in-memory queue.
HA_INNOBASE::UPDATE_ROW, TEMPORARY TABLE, TABLE LOCK".
Attempt to update an InnoDB temporary table under LOCK TABLES
led to assertion failure in both debug and production builds
if this temporary table was explicitly locked for READ. The
same scenario works fine for MyISAM temporary tables.
The assertion failure was caused by discrepancy between lock
that was requested on the rows of temporary table at LOCK TABLES
time and by update operation. Since SQL-layer requested a
read-lock at LOCK TABLES time InnoDB engine assumed that upcoming
statements which are going to be executed under LOCK TABLES will
only read table and therefore should acquire only S-lock.
An update operation broken this assumption by requesting X-lock.
Possible approaches to fixing this problem are:
1) Skip locking of temporary tables as locking doesn't make any
sense for connection-local objects.
2) Prohibit changing of temporary table locked by LOCK TABLES ...
READ.
Unfortunately both of these approaches have drawbacks which make
them unviable for stable versions of server.
So this patch takes another approach and changes code in such way
that LOCK TABLES for a temporary table will always request write
lock. In 5.1 version of this patch switch from read lock to write
lock is done inside of InnoDBs handler methods as doing it on
SQL-layer causes compatibility troubles with FLUSH TABLES WITH
READ LOCK.
Problem: MYSQL_BIN_LOG::reset_logs acquires mutexes in wrong order.
The correct order is first LOCK_thread_count and then LOCK_log. This function
does it the other way around. This leads to deadlock when run in parallel
with a thread that takes the two locks in correct order. For example, a thread
that disconnects will take the locks in the correct order.
Fix: change order of the locks in MYSQL_BIN_LOG::reset_logs:
first LOCK_thread_count and then LOCK_log.
Assertion happens due to missing NULL value check in
Item_func_round::fix_length_and_dec() function.
The fix: added NULL value check for second parameter.
VM-WIN2003-32-A, SLES10-IA64-A
The test case waits for master_pos_wait not to timeout, which
means that the deadlock between SQL and IO threads was
succesfully and automatically dealt with.
However, very rarely, master_pos_wait reports a timeout. This
happens because the time set for master_pos_wait to wait was
too small (6 seconds). On slow test env this could be a
problem.
We fix this by setting the timeout inline with the one used
in sync_slave_with_master (300 seconds). In addition we
refactored the test case and refined some comments.
LEAK WITH PARTITIONED ARCHIVE TABLES
CHECK TABLE against archive table, when file descriptors
are exhausted, caused server crash.
Archive didn't handle errors when opening data file for
CHECK TABLE.
There are two problems:
1. There is a missing check for 'year' parameter(year can not be greater than 9999) in
makedate function. fix: added check that year can not be greater than 9999.
2. There is a missing check for zero date in from_days() function.
fix: added zero date check into Item_func_from_days::get_date()
function.
In sql_class.cc, 'row_count', of type 'ha_rows', was used as last argument for
ER_TRUNCATED_WRONG_VALUE_FOR_FIELD which is
"Incorrect %-.32s value: '%-.128s' for column '%.192s' at row %ld".
So 'ha_rows' was used as 'long'.
On SPARC32 Solaris builds, 'long' is 4 bytes and 'ha_rows' is 'longlong' i.e. 8 bytes.
So the printf-like code was reading only the first 4 bytes.
Because the CPU is big-endian, 1LL is 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01
so the first four bytes yield 0. So the warning message had "row 0" instead of
"row 1" in test outfile_loaddata.test:
-Warning 1366 Incorrect string value: '\xE1\xE2\xF7' for column 'b' at row 1
+Warning 1366 Incorrect string value: '\xE1\xE2\xF7' for column 'b' at row 0
All error-messaging functions which internally invoke some printf-life function
are potential candidate for such mistakes.
One apparently easy way to catch such mistakes is to use
ATTRIBUTE_FORMAT (from my_attribute.h).
But this works only when call site has both:
a) the format as a string literal
b) the types of arguments.
So:
func(ER(ER_BLAH), 10);
will silently not be checked, because ER(ER_BLAH) is not known at
compile time (it is known at run-time, and depends on the chosen
language).
And
func("%s", a va_list argument);
has the same problem, as the *real* type of arguments is not
known at this site at compile time (it's known in some caller).
Moreover,
func(ER(ER_BLAH));
though possibly correct (if ER(ER_BLAH) has no '%' markers), will not
compile (gcc says "error: format not a string literal and no format
arguments").
Consequences:
1) ATTRIBUTE_FORMAT is here added only to functions which in practice
take "string literal" formats: "my_error_reporter" and "print_admin_msg".
2) it cannot be added to the other functions: my_error(),
push_warning_printf(), Table_check_intact::report_error(),
general_log_print().
To do a one-time check of functions listed in (2), the following
"static code analysis" has been done:
1) replace
my_error(ER_xxx, arguments for substitution in format)
with the equivalent
my_printf_error(ER_xxx,ER(ER_xxx), arguments for substitution in
format),
so that we have ER(ER_xxx) and the arguments *in the same call site*
2) add ATTRIBUTE_FORMAT to push_warning_printf(),
Table_check_intact::report_error(), general_log_print()
3) replace ER(xxx) with the hard-coded English text found in
errmsg.txt (like: ER(ER_UNKNOWN_ERROR) is replaced with
"Unknown error"), so that a call site has the format as string literal
4) this way, ATTRIBUTE_FORMAT can effectively do its job
5) compile, fix errors detected by ATTRIBUTE_FORMAT
6) revert steps 1-2-3.
The present patch has no compiler error when submitted again to the
static code analysis above.
It cannot catch all problems though: see Field::set_warning(), in
which a call to push_warning_printf() has a variable error
(thus, not replacable by a string literal); I checked set_warning() calls
by hand though.
See also WL 5883 for one proposal to avoid such bugs from appearing
again in the future.
The issues fixed in the patch are:
a) mismatch in types (like 'int' passed to '%ld')
b) more arguments passed than specified in the format.
This patch resolves mismatches by changing the type/number of arguments,
not by changing error messages of sql/share/errmsg.txt. The latter would be wrong,
per the following old rule: errmsg.txt must be as stable as possible; no insertions
or deletions of messages, no changes of type or number of printf-like format specifiers,
are allowed, as long as the change impacts a message already released in a GA version.
If this rule is not followed:
- Connectors, which use error message numbers, will be confused (by insertions/deletions
of messages)
- using errmsg.sys of MySQL 5.1.n with mysqld of MySQL 5.1.(n+1)
could produce wrong messages or crash; such usage can easily happen if
installing 5.1.(n+1) while /etc/my.cnf still has --language=/path/to/5.1.n/xxx;
or if copying mysqld from 5.1.(n+1) into a 5.1.n installation.
When fixing b), I have verified that the superfluous arguments were not used in the format
in the first 5.1 GA (5.1.30 'bteam@astra04-20081114162938-z8mctjp6st27uobm').
Had they been used, then passing them today, even if the message doesn't use them
anymore, would have been necessary, as explained above.
Impementing Test Review Comment.
Bug test scenario:
SELECT is not returning result set for "equal" (=) and "NULL safe equal
operator" (<=>) on BIT data type. Extending this scenario for all data types