described in Mantis#30. Specifically:
- Allow innodb_file_format to take string arguments
- Make innodb_file_format system variable a string instead of a number
- Implement the callback functions
- Update warning messages
Three new functions are implemented:
file_format_name_lookup(): Validate the file format name and return
its corresponding id.
innodb_file_format_check(): Check if it is a valid file format. This
function is registered as a callback with MySQL.
innodb_file_format_update(): Update the global variable using the
"saved" value. This functions is registered as a callback with MySQL.
Introduced a new session level config param innodb_strict_mode.
If it is set we do extra checking at time of table creation
and fail if any invalid combination of KEY_BLOCK_SIZE
or ROW_FORMAT is specified.
Reviewed by: Sunny
in INFORMATION_SCHEMA.INNODB_CMP*.
i_s_cmp_fields_info, i_s_cmpmem_fields_info: Define the field lengths
as MY_INT32_NUM_DECIMAL_DIGITS or MY_INT64_NUM_DECIMAL_DIGITS.
i_s_cmpmem_fields_info: Define "relocation_ops" as MYSQL_TYPE_LONGLONG.
and columns as suggested by Ken.
INNODB_COMPRESSION, INNODB_COMPRESSION_RESET:
Rename to INNODB_CMP, INNODB_CMP_RESET, with the following columns:
page_size
compress_ops
compress_ops_ok
compress_time
uncompress_ops
uncompress_time
INNODB_COMPRESSION_BUDDY, INNODB_COMPRESSION_BUDDY_RESET:
Rename to INNODB_CMPMEM, INNODB_CMPMEM_RESET, with the following columns:
page_size
pages_used
pages_free
relocation_ops
relocation_time
INNODB_COMPRESSION_BUDDY and INNODB_COMPRESSION_BUDDY_RESET.
buf_buddy_stat_struct, buf_buddy_stat_t, buf_buddy_stat[]:
Statistics of the buddy system grouped by block size.
i_s_innodb_compression_buddy, i_s_innodb_compression_buddy_reset:
New INFORMATION_SCHEMA plugins.
i_s_compression_buddy_fields_info[]: Define the fields:
size, used, free, relocated, relocated_sec.
i_s_compression_buddy_fill_low(), i_s_compression_buddy_fill(),
i_s_compression_buddy_reset_fill(): Fill the fields.
i_s_compression_buddy_init(), i_s_compression_buddy_reset_init():
Initialize the tables.
i_s_compression_fill_low(): Do not acquire or release the buffer pool
mutex. The page_zip_stat[] is not protected by any mutex.
i_s_innodb_compression, i_s_innodb_compression_reset: Change the
description to say "compression" instead of "compressed buffer pool".
INNODB_ZIP and INNODB_ZIP_RESET to
INNODB_COMPRESSION and INNODB_COMPRESSION_RESET,
and remove the statistics of the buddy system.
This change was discussed with Ken. It makes the tables shorter
and easier to understand. The removed data will be represented in
the tables INNODB_COMPRESSION_BUDDY and INNODB_COMPRESSION_BUDDY_RESET
that will be added later.
i_s_innodb_zip, i_s_innodb_zip_reset, i_s_zip_fields_info[],
i_s_zip_fill_low(), i_s_zip_fill(), i_s_zip_reset_fill(),
i_s_zip_init(), i_s_zip_reset_init(): Replace "zip" with "compression".
i_s_compression_fields_info[]: Remove "used", "free",
"relocated", "relocated_usec". In "compressed_usec" and "decompressed_usec",
replace microseconds with seconds ("usec" with "sec").
page_zip_decompress(): Correct a typo in the function comment.
PAGE_ZIP_SSIZE_BITS, PAGE_ZIP_NUM_SSIZE: New constants.
page_zip_stat_t, page_zip_stat: Statistics of the compression, grouped
by page size.
page_zip_simple_validate(): Assert that page_zip->ssize is reasonable.
Add the prototype of check_global_access() if MySQL version is less than
5.1.24 to make zip compile with 5.1.23. This was removed when MySQL added
the prototype in their code in 5.1.24 but we still need zip to compile with
older versions.
my_error(ER_TOO_BIG_ROWSIZE, ...). Otherwise, MySQL can report
"Got error 139 from storage engine" instead of the appropriate
error message.
ha_innobase::index_read(), ha_innobase::general_fetch():
Replace if-else if-else with switch-case.
Pass table->flags to convert_error_code_to_mysql().
innodb_check_for_record_too_big_error(). Remove. This code belongs to
convert_error_code_to_mysql().
convert_error_code_to_mysql(): Add the parameter "flags", for table flags.
Translate DB_TOO_BIG_RECORD into ER_TOO_BIG_ROWSIZE.
create_index(): Add the parameter "flags".
create_clustered_index_when_no_primary(): Replace the parameter "comp"
with "flags".
innobase_drop_database(): Remove the #ifdef'd-out call to
convert_error_code_to_mysql().
Throw warnings, not errors for wrong ROW_FORMAT or KEY_BLOCK_SIZE,
so that any table dump can be loaded.
As of this change, InnoDB supports the following table formats:
ROW_FORMAT=REDUNDANT
the only format before MySQL/InnoDB 5.0.3
ROW_FORMAT=COMPACT
the new default format of MySQL/InnoDB 5.0.3
ROW_FORMAT=DYNAMIC
uncompressed, no prefix in the clustered index record for BLOBs
ROW_FORMAT=COMPRESSED
like ROW_FORMAT=DYNAMIC, but zlib compressed B-trees and BLOBs;
the compressed page size is specified by KEY_BLOCK_SIZE in
kilobytes (1, 2, 4, 8, or 16; default 8)
KEY_BLOCK_SIZE=1, 2, 4, 8, or 16: implies ROW_FORMAT=COMPRESSED;
ignored if ROW_FORMAT is not COMPRESSED
KEY_BLOCK_SIZE=anything else: ignored
The InnoDB row format is displayed in the 4th column (Row_format) of
the output of SHOW TABLE STATUS. The Create_options column may show
ROW_FORMAT= and KEY_BLOCK_SIZE=, but they do not necessarily have
anything to do with InnoDB.
The table format can also be queried like this:
SELECT table_schema, table_name, row_format
FROM information_schema.tables
WHERE engine='innodb' and row_format in ('Compressed','Dynamic');
When Row_format='Compressed', KEY_BLOCK_SIZE should usually correspond
to the compressed page size. But the .frm file could be manipulated
to show any KEY_BLOCK_SIZE.
For some reason, INFORMATION_SCHEMA.TABLES.CREATE_OPTIONS does not
include KEY_BLOCK_SIZE. It does include row_format (spelled in
lowercase). This looks like a MySQL bug, because the table
INFORMATION_SCHEMA.TABLES probably tries to replace SHOW TABLE STATUS.
I reported this as Bug #35275 <http://bugs.mysql.com/35275>.
ha_innobase::get_row_type(): Add ROW_TYPE_COMPRESSED, ROW_TYPE_DYNAMIC.
ha_innobase::create(): Implement ROW_FORMAT=COMPRESSED and
ROW_FORMAT=DYNAMIC. Do not throw errors for wrong ROW_FORMAT or
KEY_BLOCK_SIZE, but issue warnings instead.
ha_innobase::check_if_incompatible_data(): Return COMPATIBLE_DATA_NO
if KEY_BLOCK_SIZE has been specified.
innodb.result: Adjust the result for the warning issued for ROW_FORMAT=FIXED.
innodb-zip.test: Add tests. Query INFORMATION_SCHEMA.TABLES for ROW_FORMAT.
variable innodb_file_format. Implement file format version stamping of
*.ibd files and SYS_TABLES.TYPE.
This change breaks introduces an incompatible change for for
compressed tables. We can do this, as we have not released yet.
innodb-zip.test: Add tests for stricter KEY_BLOCK_SIZE and ROW_FORMAT
checks.
DICT_TF_COMPRESSED_MASK, DICT_TF_COMPRESSED_SHIFT: Replace with
DICT_TF_ZSSIZE_MASK, DICT_TF_ZSSIZE_SHIFT.
DICT_TF_FORMAT_MASK, DICT_TF_FORMAT_SHIFT, DICT_TF_FORMAT_51,
DICT_TF_FORMAT_ZIP: File format version, stored in table->flags,
in the .ibd file header, and in SYS_TABLES.TYPE.
dict_create_sys_tables_tuple(): Write the table flags to SYS_TABLES.TYPE
if the format is at least DICT_TF_FORMAT_ZIP. For old formats
(DICT_TF_FORMAT_51), write DICT_TABLE_ORDINARY as the table type.
DB_TABLE_ZIP_NO_IBD: Remove the error code. The error handling is done
in ha_innodb.cc; as a failsafe measure, dict_build_table_def_step() will
silently clear the compression and format flags instead of returning this
error.
dict_mem_table_create(): Assert that no extra bits are set in the flags.
dict_sys_tables_get_zip_size(): Rename to dict_sys_tables_get_flags().
Check all flag bits, and return ULINT_UNDEFINED if the combination is
unsupported.
dict_boot(): Document the SYS_TABLES columns N_COLS and TYPE.
dict_table_get_format(), dict_table_set_format(),
dict_table_flags_to_zip_size(): New accessors to table->flags.
dtuple_convert_big_rec(): Introduce the auxiliary variables
local_len, local_prefix_len. Store a 768-byte prefix locally
if the file format is less than DICT_TF_FORMAT_ZIP.
dtuple_convert_back_big_rec(): Restore the columns.
srv_file_format: New variable: innodb_file_format.
fil_create_new_single_table_tablespace(): Replace the parameter zip_size
with table->flags.
fil_open_single_table_tablespace(): Replace the parameter zip_size_in_k
with table->flags. Check the flags.
fil_space_struct, fil_space_create(), fil_op_write_log():
Replace zip_size with flags.
fil_node_open_file(): Note a TODO item for InnoDB Hot Backup.
Check that the tablespace flags match.
fil_space_get_zip_size(): Rename to fil_space_get_flags(). Add a
wrapper for fil_space_get_zip_size().
fsp_header_get_flags(): New function.
fsp_header_init_fields(): Replace zip_size with flags.
FSP_SPACE_FLAGS: New name for the tablespace flags. This field used
to be called FSP_PAGE_ZIP_SIZE, or FSP_LOWEST_NO_WRITE. It has always
been written as 0 in MySQL/InnoDB versions 4.1 to 5.1.
MLOG_ZIP_FILE_CREATE: Rename to MLOG_FILE_CREATE2. Add a 32-bit
parameter for the tablespace flags.
ha_innobase::create(): Check the table attributes ROW_FORMAT and
KEY_BLOCK_SIZE. Issue errors if they are inappropriate, or warnings
if the inherited attributes (in ALTER TABLE) will be ignored.
PAGE_ZIP_MIN_SIZE_SHIFT: New constant: the 2-logarithm of PAGE_ZIP_MIN_SIZE.
There is one consideration: fil_init() chooses the tablespace hash size
based on the initial value of srv_file_per_table. However, this is nothing
new: InnoDB could be started with innodb_file_per_table=0 even though
*.ibd files exist.
srv_file_per_table: Declare as my_bool instead of ibool, because
MYSQL_SYSVAR_BOOL() expects a pointer to my_bool. Document the
variable also in srv0srv.h.
innobase_start_or_create_for_mysql(): Note why it is OK to temporarily
clear srv_file_per_table.
innobase_file_per_table: Remove.
lock_get_table(), locks_row_eq_lock(), buf_page_get_mutex(): Add return
after ut_error. On Windows, ut_error is not declared as "noreturn".
Add explicit type casts when assigning ulint to byte to get rid of
"possible loss of precision" warnings.
struct i_s_table_cache_struct: Declare rows_used, rows_allocd as ulint
instead of ullint. 32 bits should be enough.
fill_innodb_trx_from_cache(), i_s_zip_fill_low(): Cast 64-bit unsigned
integers to longlong when calling Field::store(longlong, bool is_unsigned).
Otherwise, the compiler would implicitly convert them to double and
invoke Field::store(double) instead.
recv_truncate_group(), recv_copy_group(), recv_calc_lsn_on_data_add():
Cast ib_uint64_t expressions to ulint to get rid of "possible loss of
precision" warnings. (There should not be any loss of precision in
these cases.)
log_close(), log_checkpoint_margin(): Declare some variables as ib_uint64_t
instead of ulint, so that there won't be any potential loss of precision.
mach_write_ull(): Cast the second argument of mach_write_to_4() to ulint.
OS_FILE_FROM_FD(): Cast the return value of _get_osfhandle() to HANDLE.
row_merge_dict_table_get_index(): Cast the parameter of mem_free() to (void*)
in order to get rid of the bogus MSVC warning C4090, which has been reported
as MSVC bug 101661:
<http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=101661>
row_mysql_read_blob_ref(): To get rid of a bogus MSVC warning C4090,
drop a const qualifier.
innobase_raw_format(), move the definition from row0row.c to
ha_innodb.cc. After this change, row0row.c no longer references
system_charset_info (Mantis issue #17). Patch prepared by Vasil,
tested by Calvin, and reviewed by Marko.
also when row_merge_create_temporary_table() fails. Otherwise, an
assertion would fail when the client connection is closed, because
prebuilt->trx would still be holding a table lock on innodb_table.
Use innobase_strcasecmp() insteaed of strcasecmp() in i_s.cc and get rid
of strings.h (that file is not present on Windows).
Move the prototype of innobase_strcasecmp() from ha_innodb.cc and
dict0dict.c to ha_prototypes.h.
Approved by: Heikki
buf_buddy_relocated_duration[],
page_zip_compress_duration[]
page_zip_decompress_duration[]: Record the total duration of the operations.
buf_buddy_relocate(), page_zip_compress(), page_zip_decompress():
Add ut_time_us() instrumentation.
i_s_zip_fields_info[], i_s_zip_fill_low(): Move the columns containing
cumulated statistics last. Add relocated_usec, compressed_usec, and
decompressed_usec.
innobase_check_index_keys(): Remove unused parameters. Use
sql_print_error() for error message output.
ha_innobase::add_index(): When row_merge_rename_tables() fails, do not
allow row_merge_drop_table() to alter the error code returned to MySQL.
in r2276. Now the following symbols will be exported when InnoDB is built
as a dynamic plugin:
* the virtual method pointer table of class ha_innodb
* the three variables that MySQL will reference when linking at runtime:
_mysql_plugin_declarations_
_mysql_plugin_interface_version_
_mysql_sizeof_struct_st_plugin_
Furthermore, the following symbols are weak globals, to allow us to access
the built-in InnoDB in the mysqld executable, in case it contains a statically
linked InnoDB:
builtin_innobase_plugin
innodb_hton_ptr
symbols. Use it for all definitions of non-static variables and functions.
lexyy.c, make_flex.sh: Declare yylex as UNIV_INTERN, not static. It is
referenced from pars0grm.c.
Actually, according to
nm .libs/ha_innodb.so|grep -w '[ABCE-TVXYZ]'
the following symbols are still global:
* The vtable for class ha_innodb
* pars0grm.c: The function yyparse() and the variables yychar, yylval, yynerrs
The required changes to the Bison-generated file pars0grm.c will be addressed
in a separate commit, which will add a script similar to make_flex.sh.
The class ha_innodb is renamed from class ha_innobase by a #define. Thus,
there will be no clash with the builtin InnoDB. However, there will be some
overhead for invoking virtual methods of class ha_innodb. Ideas for making
the vtable hidden are welcome. -fvisibility=hidden is not available in GCC 3.
Require PROCESS privileges instead of SUPER to view INFORMATION_SCHEMA tables.
Suggested by: Sergei Golubchik <serg@mysql.com> (in a private email,
pointed http://bugs.mysql.com/32710)
creating indexes. Lock the user table inside the user transaction.
enum trx_dict_op: Remove TRX_OP_INDEX_MAY_WAIT.
ha_innobase::add_index(): Lock the user tables within prebuilt->trx.
Commit the data dictionary transaction before creating indexes.
ha_innobase::final_drop_index(): Lock the user table within prebuilt->trx.
to the undo log, also store the original length of the column, so that the
changes will be correctly undone in transaction rollback or when fetching
previous versions of the row.
innodb-zip.test: New file, for tests of the compression.
upd_field_t: Add orig_len, the original length of new_val.
btr_push_update_extern_fields(): Restore the original prefix of the column.
Add the parameter heap where memory will be allocated if necessary.
trx_undo_rec_get_col_val(): Add the output parameter orig_len.
trx_undo_page_report_modify_ext(): New function: Write an externally
stored column to the undo log. This is only called from
trx_undo_page_report_modify(), and this is the only caller of
trx_undo_page_fetch_ext().
trx_undo_update_rec_get_update(): Read the original length of the column
prefix to upd_field->orig_len.
Forward port of r2236
Introduce retry/sleep logic as a workaround for a transient bug
where ::open fails for partitioned tables randomly if we are using
one file per table. (Bug #33349)
Reviewed by: Heikki
buf_pool->mutex: Rename to buf_pool_mutex, so that the wrappers will have
to be used when changes are merged from other source trees.
buf_pool->zip_mutex: Rename to buf_pool_zip_mutex.
buf_pool_mutex_own(), buf_pool_mutex_enter(), buf_pool_mutex_exit():
Wrappers for buf_pool_mutex.
ha_innobase::final_drop_index(): If row_merge_drop_table() fails, clear
the to_be_dropped flags. This was the error fixed in this commit; the rest
is just additional safety.
ha_innobase::final_drop_index(): After dropping the flagged indexes,
assert that none of the remaining indexes are flagged to_be_dropped.
ha_innobase::prepare_drop_index(): Assert that no index has been flagged
for deletion. When checking foreign key constraints, simply traverse the
list of indexes and check if any of the indexes that were just flagged
to_be_dropped. On error, clear the to_be_dropped flags with simple list
traversal.
Change the format of TRX_IDs in INFORMATION_SCHEMA tables from DEC to
HEX.
The current TRX_IDs are hard to remember and track down: 426355, 428466,
428566, etc.
In HEX:
* there are less "digits", the strings are shorter;
* since there are 16 instead of 10 "digits", the chance of having
repeating ones are smaller.
The above look like 68173, 689B2, 68A16 in HEX.
Discussed with: Ken
Approved by: Heikki (via IM)
acquiring the table lock. The data dictionary should not be locked for
long periods. Before this change, in the worst case, the dictionary
would be locked until the expiration of innodb_lock_wait_timeout.
Virtually, transaction-level locks (locks on database objects, such
as records and tables) have a latching order level of SYNC_USER_TRX_LOCK,
which is above any InnoDB rw-locks or mutexes. However, the latching
order of SYNC_USER_TRX_LOCK is never checked, not even by UNIV_SYNC_DEBUG.
ha_innobase::add_index(), ha_innobase::final_drop_index(): Invoke
row_mysql_lock_data_dictionary(trx) only after row_merge_lock_table().
row_merge_lock_table().
ha_innobase::final_drop_index(): Set the dictionary operation mode to
TRX_DICT_OP_INDEX_MAY_WAIT for the duration of the row_merge_lock_table()
call.
Active transactions must not switch table or index definitions on the fly,
for several reasons, including the following:
* copied indexes do not carry any history or locking information;
that is, rollbacks, read views, and record locking would be broken
* huge potential for race conditions, inconsistent reads and writes,
loss of data, and corruption
Instead of trying to track down if the table was changed during a transaction,
acquire appropriate locks that protect the creation and dropping of indexes.
innodb-index.test: Test the locking of CREATE INDEX and DROP INDEX. Test
that consistent reads work across dropped indexes.
lock_rec_insert_check_and_lock(): Relax the lock_table_has() assertion.
When inserting a record into an index, the table must be at least IX-locked.
However, when an index is being created, an IS-lock on the table is
sufficient.
row_merge_lock_table(): Add the parameter enum lock_mode mode, which must
be LOCK_X or LOCK_S.
row_merge_drop_table(): Assert that n_mysql_handles_opened == 0.
Unconditionally drop the table.
ha_innobase::add_index(): Acquire an X or S lock on the table, as appropriate.
After acquiring an X lock, assert that n_mysql_handles_opened == 1.
Remove the comments about dropping tables in the background.
ha_innobase::final_drop_index(): Acquire an X lock on the table.
dict_table_t: Remove version_number, to_be_dropped, and prebuilts.
ins_node_t: Remove table_version_number.
enum lock_mode: Move the definition from lock0lock.h to lock0types.h.
ROW_PREBUILT_OBSOLETE, row_update_prebuilt(), row_prebuilt_table_obsolete():
Remove.
row_prebuilt_t: Remove the declaration from row0types.h.
row_drop_table_for_mysql_no_commit(): Always print a warning if a table
was added to the background drop queue.
kernel_mutex must be released before calling this function.
innobase_mysql_end_print_arbitrary_thd(),
innobase_mysql_prepare_print_arbitrary_thd(): Assert that the
kernel_mutex is not being held by the current thread.
Non-functional change:
Move the prototypes of
innobase_mysql_prepare_print_arbitrary_thd() and
innobase_mysql_end_print_arbitrary_thd() from lock0lock.c to
ha_prototypes.h
Suggested by: Marko
Approved by: Marko
* Change terminology:
wait lock -> requested lock
waited lock -> blocking lock
new: requesting transaction (the trx what owns the requested lock)
new: blocking transaction (the trx that owns the blocking lock)
* Add transaction ids to INFORMATION_SCHEMA.INNODB_LOCK_WAITS. This is
somewhat redundant because transaction ids can be found in INNODB_LOCKS
(which can be joined with INNODB_LOCK_WAITS) but would help users to
write shorter joins (one table less) in some cases where they want to
find which transaction is blocking which.
Suggested by: Ken
Approved by: Heikki
for dropping the index trees, and set the dictionary operation flag, similar
to what ha_innobase::add_index() does. This should ensure correct crash
recovery.
Fix the size of the static buffer for lock_table and lock_index.
I was not realizing that NAME_LEN contains the mbmaxlen multiplier and thus
a quote, when converted to 2 quotes, will take 2 bytes while there are 3
bytes reserved.
Spotted by: Marko
Pointyhat to: Vasil