This patch adds cost estimation for the queries with ORDER BY / GROUP BY
and LIMIT.
If there was a ref/range access to the table whose rows were required
to be ordered in the result set the optimizer always employed this access
though a scan by a different index that was compatible with the required
order could be cheaper to produce the first L rows of the result set.
Now for such queries the optimizer makes a choice between the cheapest
ref/range accesses not compatible with the given order and index scans
compatible with it.
After applying the snapshots, ensure that code conforms to the final version
of WL 3914.
It is signficant that, after these changes, InnoDB does not define MYSQL_SERVER,
and can be built as an independent storage engine plugin.
Fixes:
Bug#9709: InnoDB inconsistensy causes "Operating System Error 32/33"
Bug#18828: If InnoDB runs out of undo slots, it returns misleading 'table is full'
Bug#20090: InnoDB: Error: trying to declare trx to enter InnoDB
Bug#20352: Make ibuf_contract_for_n_pages tunable
Bug#21101: Wrong error on exceeding max row size for InnoDB table
Bug#21293: Deadlock detection prefers to kill long running FOR UPDATE queries
Bug#22819: SHOW INNODB STATUS crashes the server with an assertion failure under high load
Bug#25078: Make the replication thread to ignore innodb_thread_concurrency
Bug#25645: Assertion failure in file srv0srv.c
Bug#28138: indexing column prefixes produces corruption in InnoDB
Implementation of mysql_multi_update did not call multi_update::send_error method in some cases
(see the test reported on bug page and test cases in changeset).
Fixed with deploying the method, ::send_error() is refined to get binlogging code which works whenever
there is modified non-transactional table.
thd->no_trans_update.stmt flag is set in to TRUE to ease testing though being the beginning of relative
bug#27417 fix (addresses a part of those issues).
Eliminating two minor issues (small bugs) in multi_update methods.
This patch for multi-update also addresses a part of the issues reported in bug#13270,bug#23333.
Fixes:
- Bug #26662: mysqld assertion when creating temporary (InnoDB) table on a tmpfs filesystem
Fix by not open(2)ing with O_DIRECT but rather calling fcntl(2) to set
this flag immediately after open(2)ing. This way an error caused by
O_DIRECT not being supported can easily be ignored.
- Bug #23313: AUTO_INCREMENT=# not reported back for InnoDB tables
- Bug #21404: AUTO_INCREMENT value reset when Adding FKEY (or ALTER?)
Report the current value of the AUTO_INCREMENT counter to MySQL.
NULL MERGE: this ChangeSet will be null merged into mysql-5.1
Fixes:
- Bug #26662: mysqld assertion when creating temporary (InnoDB) table on a tmpfs filesystem
Fix by not open(2)ing with O_DIRECT but rather calling fcntl(2) to set
this flag immediately after open(2)ing. This way an error caused by
O_DIRECT not being supported can easily be ignored.
- Bug #23313: AUTO_INCREMENT=# not reported back for InnoDB tables
- Bug #21404: AUTO_INCREMENT value reset when Adding FKEY (or ALTER?)
Report the current value of the AUTO_INCREMENT counter to MySQL.
innodb-5.1-ss1318
innodb-5.1-ss1330
innodb-5.1-ss1332
innodb-5.1-ss1340
Fixes:
- Bug #21409: Incorrect result returned when in READ-COMMITTED with query_cache ON
At low transaction isolation levels we let each consistent read set
its own snapshot.
- Bug #23666: strange Innodb_row_lock_time_% values in show status; also millisecs wrong
On Windows ut_usectime returns secs and usecs relative to the UNIX
epoch (which is Jan, 1 1970).
- Bug #25494: LATEST DEADLOCK INFORMATION is not always cleared
lock_deadlock_recursive(): When the search depth or length is exceeded,
rewind lock_latest_err_file and display the two transactions at the
point of aborting the search.
- Bug #25927: Foreign key with ON DELETE SET NULL on NOT NULL can crash server
Prevent ALTER TABLE ... MODIFY ... NOT NULL on columns for which
there is a foreign key constraint ON ... SET NULL.
- Bug #26835: Repeatable corruption of utf8-enabled tables inside InnoDB
The bug could be reproduced as follows:
Define a table so that the first column of the clustered index is
a VARCHAR or a UTF-8 CHAR in a collation where sequences of bytes
of differing length are considered equivalent.
Insert and delete a record. Before the delete-marked record is
purged, insert another record whose first column is of different
length but equivalent to the first record. Under certain conditions,
the insertion can be incorrectly performed as update-in-place.
Likewise, an operation that could be done as update-in-place can
unnecessarily be performed as delete and insert, but that would not
cause corruption but merely degraded performance.
Apply the following InnoDB snapshots:
innodb-5.0-ss1319
innodb-5.0-ss1331
innodb-5.0-ss1333
innodb-5.0-ss1341
Fixes:
- Bug #21409: Incorrect result returned when in READ-COMMITTED with query_cache ON
At low transaction isolation levels we let each consistent read set
its own snapshot.
- Bug #23666: strange Innodb_row_lock_time_% values in show status; also millisecs wrong
On Windows ut_usectime returns secs and usecs relative to the UNIX
epoch (which is Jan, 1 1970).
- Bug #25494: LATEST DEADLOCK INFORMATION is not always cleared
lock_deadlock_recursive(): When the search depth or length is exceeded,
rewind lock_latest_err_file and display the two transactions at the
point of aborting the search.
- Bug #25927: Foreign key with ON DELETE SET NULL on NOT NULL can crash server
Prevent ALTER TABLE ... MODIFY ... NOT NULL on columns for which
there is a foreign key constraint ON ... SET NULL.
- Bug #26835: Repeatable corruption of utf8-enabled tables inside InnoDB
The bug could be reproduced as follows:
Define a table so that the first column of the clustered index is
a VARCHAR or a UTF-8 CHAR in a collation where sequences of bytes
of differing length are considered equivalent.
Insert and delete a record. Before the delete-marked record is
purged, insert another record whose first column is of different
length but equivalent to the first record. Under certain conditions,
the insertion can be incorrectly performed as update-in-place.
Likewise, an operation that could be done as update-in-place can
unnecessarily be performed as delete and insert, but that would not
cause corruption but merely degraded performance.
The problem happened because those tests were using "cp932" and "ucs2" without checking whether these character sets are available. This fix moves test parts to make character set specific parts be tested only if they are:
- some parts were moved to "ctype_ucs.test" and "ctype_cp932.test"
- some parts were moved to the newly added tests "innodb-ucs2.test", "mysqlbinglog-cp932.test" and "sp-ucs2.test"
were evaluated.
According to the new rules for string comparison partial indexes on text
columns can be used in the same cases when partial indexes on varchar
columns can be used.
Currently SQL_BIG_RESULT is checked only at compile time.
However, additional optimizations may take place after
this check that change the sort method from 'filesort'
to sorting via index. As a result the actual plan
executed is not the one specified by the SQL_BIG_RESULT
hint. Similarly, there is no such test when executing
EXPLAIN, resulting in incorrect output.
The patch corrects the problem by testing for
SQL_BIG_RESULT both during the explain and execution
phases.
All but ss677 are against the mysql-5.1 tree only.
Fixes the following bugs:
- Bug #19834: Using cursors when running in READ-COMMITTED can cause InnoDB to crash
- Bug #20213: DBT2 testing cause mysqld to core using Innodb
- Bug #20493: on partition tables, select and show command casue server crash
- Bug #21113: Duplicate printout in SHOW INNODB STATUS
- Bug #21313: rsql_..._recover_innodb_tmp_table is redundant and broken
- Bug #21467: Manual URL wrong in InnoDB "page corrupted" error report
The Item::tmp_table_field_from_field_type() function creates Field_datetime
object instead of Field_timestamp object for timestamp field thus always
changing data type is a tmp table is used.
The Field_blob object constructor which is used in the
Item::tmp_table_field_from_field_type() is always setting packlength field of
newly created blob to 4. This leads to changing fields data type for example
from the blob to the longblob if a temporary table is used.
The Item::make_string_field() function always converts Field_string objects
to Field_varstring objects. This leads to changing data type from the
char/binary to varchar/varbinary.
Added appropriate Field_timestamp object constructor for using in the
Item::tmp_table_field_from_field_type() function.
Added Field_blob object constructor which sets pack length according to
max_length argument.
The Item::tmp_table_field_from_field_type() function now creates
Field_timestamp object for a timestamp field.
The Item_type_holder::display_length() now returns correct NULL length NULL
length.
The Item::make_string_field() function now doesn't change Field_string to
Field_varstring in the case of Item_type_holder.
The Item::tmp_table_field_from_field_type() function now uses the Field_blob
constructor which sets packlength according to max_length.
Fixed BUG#19542 "InnoDB doesn't increase the Handler_read_prev couter".
Fixed BUG#19609 "Case sensitivity of innodb_data_file_path gives stupid error".
Fixed BUG#19727 "InnoDB crashed server and crashed tables are ot recoverable".
Also:
* Remove remnants of the obsolete concept of memoryfixing tables and indexes.
* Remove unused dict_table_LRU_trim().
* Remove unused 'trx' parameter from dict_table_get_on_id_low(),
dict_table_get(), dict_table_get_and_increment_handle_count().
* Add a normal linked list implementation.
* Add a work queue implementation.
* Add 'level' parameter to mutex_create() and rw_lock_create().
Remove mutex_set_level() and rw_lock_set_level().
* Rename SYNC_LEVEL_NONE to SYNC_LEVEL_VARYING.
* Add support for bound ids in InnoDB's parser.
* Define UNIV_BTR_DEBUG for enabling consistency checks of
FIL_PAGE_NEXT and FIL_PAGE_PREV when accessing sibling
pages of B-tree indexes.
btr_validate_level(): Check the validity of the doubly linked
list formed by FIL_PAGE_NEXT and FIL_PAGE_PREV.
* Adapt InnoDB to the new tablename to filename encoding in MySQL 5.1.
ut_print_name(), ut_print_name1(): Add parameter 'table_id' for
distinguishing names of tables from other identifiers.
New: innobase_convert_from_table_id(), innobase_convert_from_id(),
innobase_convert_from_filename(), innobase_get_charset.
dict_accept(), dict_scan_id(), dict_scan_col(), dict_scan_table_name(),
dict_skip_word(), dict_create_foreign_constraints_low(): Add
parameter 'cs' so that isspace() can be replaced with my_isspace(),
whose operation depends on the connection character set.
dict_scan_id(): Convert identifier to UTF-8.
dict_str_starts_with_keyword(): New extern function, to replace
dict_accept() in row_search_for_mysql().
mysql_get_identifier_quote_char(): Replaced with innobase_print_identifier().
ha_innobase::create(): Remove the thd->convert_strin() call. Pass the
statement to InnoDB in the connection character set and let InnoDB
convert the identifier to UTF-8.
* Add max_row_size to dict_table_t.
* btr0cur.c
btr_copy_externally_stored_field(): Only set the 'offset' variable
when needed.
* buf0buf.c
buf_page_io_complete(): Write to the error log if the page number or
the space id o the disk do not match those in memory. Also write to
the error log if a page was read from the doublewrite buffer. The
doublewrite buffer should be only read by the lower-level function
fil_io() at database startup.
* dict0dict.c
dict_scan_table_name(): Remove fallback to differently encoded name
when the table is not found. The encoding is handled at a higher level.
* ha_innodb.cc
Increment statistic counter in ha_innobase::index_prev() (bug 19542).
Add innobase_convert_string wrapper function and a new file
ha_prototypes.h.
innobase_print_identifier(): Remove TODO comment before calling
get_quote_char_for_identifier(). That function apparently assumes
the identifier to be encoded in UTF-8.
* ibuf0ibuf.c|h
ibuf_count_get(), ibuf_counts[], ibuf_count_inited(): Define these
only #ifdef UNIV_IBUF_DEBUG. Previously, when compiled without
UNIV_IBUF_DEBUG, invoking ibuf_count_get() would crash InnoDB.
The function is only being called #ifdef UNIV_IBUF_DEBUG.
* innodb.result
Adjust the results for changes in the foreign key error messages.
* mem0mem.c|h
New: mem_heap_dup(), mem_heap_printf(), mem_heap_cat().
* os0file.c
Check the page trailers also after writing to disk. This improves
chances of diagnosing bug 18886.
os_file_check_page_trailers(): New function for checking that the
two copies of the LSN stamped on the page match.
os_aio_simulated_handle(): Call os_file_check_page_trailers()
before and after os_file_write().
* row0mysql.c
Move trx_commit_for_mysql(trx) calls before calls to
row_mysql_unlock_data_dictionary(trx) (bug 19727).
* row0sel.c
row_fetch_print(): Handle SQL NULL values without crashing.
row_sel_store_mysql_rec(): Remove useless call to rec_get_nth_field
when handling an externally stored column.
Fetch externally stored fields when using InnoDB's internal SQL
parser.
Optimize BLOB selects by using prebuilt->blob_heap directly instead
of first reading BLOB data to a temporary heap and then copying it
to prebuilt->blob_heap.
* srv0srv.c
srv_master_thread(): Remove unreachable code.
* srv0start.c
srv_parse_data_file_paths_and_sizes(): Accept lower-case 'm' and
'g' as abbreviations of megabyte and gigabyte (bug 19609).
srv_parse_megabytes(): New fuction.
* ut0dbg.c|h
Implement InnoDB assertions (ut_a and ut_error) with abort() when
the code is compiled with GCC 3 or later on other platforms than
Windows or Netware. Also disable the variable ut_dbg_stop_threads
and the function ut_dbg_stop_thread() i this case, unless
UNIV_SYC_DEBUG is defined. This should allow the compiler to
generate more compact code for assertions.
* ut0list.c|h
Add ib_list_create_heap().
* Fix BUG#15650: "DELETE with LEFT JOIN crashes server with innodb_locks_unsafe_for binlog"
* Fix BUG#17134: "Partitions: uncommitted changes are visible"
* Fix BUG#17992: "Partitions: InnoDB, somehow rotten table after UPDATE"
row0ins.c: MySQL's partitioned table code does not set preduilt->sql_stat_start right
if it does an insert in the same statement after doing a search first in the same
partition table. We now write trx id always to the buffer, not just when flag
sql_stat_start is on. This will waste CPU time very sightly.
* Fix BUG#18077: "InnoDB uses full explicit table locks in stored FUNCTION"
* Fix BUG#18238: "When locks exhaust the buffer pool, InnoDB does not roll back the trx"
* Fix BUG#18252" "Disk space leak in updates of InnoDB BLOB rows in 5.0 and 5.1"
* Fix BUG#18283: "When InnoDB returns error 'lock table full', MySQL can write to binlog too much"
* Fix BUG#18350: "Use consistent read in CREATE ... SELECT ... if innodb_locks_unsafe_for_binlog"
* Fix BUG#18384: "InnoDB memory leak on duplicate key errors in 5.0 if row has many columns"
* Fix BUG#18934: "InnoDB crashes when table uses column names like DB_ROW_ID"
Refuse tables that use reserved column names.
* InnoDB's SQL parser:
- Add support for UNSIGNED types, EXIT keyword, quoted identifiers, user-function callbacks
for processing results of FETCH statements, bound literals, DATA_VARCHAR for bound literals.
- Allow bound literals of type non-INTEGER to be of length 0.
- Add make_flex.sh and update lexer/parser generation documentation.
- Add comment clarifying the difference between 'alias' and 'indirection' fields in sym_node_t.
- Remove never reached duplicate code in pars_set_dfield_type().
- Rewrite pars_info datatypes and APIs, add a few helper functions.
- Since the functions definitions in pars_info_t are accessed after pars_sql() returns
in the query graph execution stage, we can't free pars_info_t in pars_sql(). Instead,
make pars_sql() transfer ownership of pars_info_t to the created query graph, and
make que_graph_free() free it if needed.
- Allow access to system columns like DB_ROW_ID.
* Use bound literals in row_truncate_table_for_mysql, row_drop_table_for_mysql,
row_discard_tablespace_for_mysql, and row_rename_table_for_mysql.
* Setting an isolation level of the transaction to read committed weakens the locks for
this session similarly like the option innodb_locks_unsafe_for binlog. This patch removes
alnost all gap locking (used in next-key locking) and makes MySQL to release the row locks
on the rows which does not belong to result set. Additionally, nonlocking selects on
INSERT INTO SELECT, UPDATE ... (SELECT ...), and CREATE ... SELECT ... use a nonlocking
consistent read. If a binlog is used, then binlog format should be set to row based
binloging to make the execution of the complex SQL statements.
* Disable the statistic variables btr_search_n_hash_fail and n_hash_succ, n_hash_fail,
n_patt_succ, and n_searches of btr_search_t in builds without #ifdef UNIV_SEARCH_PERF_STAT.
* Make innodb.test faster. Group all consistent read test cases to a one test case and
wait their lock timeout after all have been send to the server. Decrease amount of rows
inserted in a certain test - this has no effect on the effectiveness of the test and
reduces the running time by ~10 sec. Remove temporary work-arounds from innodb.result
now that ALTER TABLE DROP FOREIGN KEY works once again.
* Make innodb_unsafe_binlog.test faster. Grout all consistent read test cases to a one
test case amd wait their lock timeout after all have been sent to the server. Remove
unnecessary option --loose_innodb_lock_wait_timeout.
* Print dictionary memory size in SHOW INNODB STATUS.
* Fix memory leaks in row_create_table_for_mysql() in rare corner cases.
* Remove code related to clustered tables. They were never implemented, and the
implementation would be challenging with ROW_FORMAT=COMPACT. Remove the table types
DICT_TABLE_CLUSTER_MEMBER and DICT_TABLE_CLUSTER and all related tests and functions.
dict_table_t: Remove mix_id, mix_len, mix_id_len, mix_id_buf, and cluster_name.
plan_t: Remove mixed_index.
dict_create_sys_tables_tuple(): Set MIX_ID=0, MIX_LEN=0, CLUSTER_NAME=NULL when
inserting into SYS_TABLES.
dict_tree_check_search_tuple(): Enclose in #ifdef UNIV_DEBUG.
* Move calling of thr_local_free() from trx_free_for_mysql() to
innobase_close_connection().