Throw warnings, not errors for wrong ROW_FORMAT or KEY_BLOCK_SIZE,
so that any table dump can be loaded.
As of this change, InnoDB supports the following table formats:
ROW_FORMAT=REDUNDANT
the only format before MySQL/InnoDB 5.0.3
ROW_FORMAT=COMPACT
the new default format of MySQL/InnoDB 5.0.3
ROW_FORMAT=DYNAMIC
uncompressed, no prefix in the clustered index record for BLOBs
ROW_FORMAT=COMPRESSED
like ROW_FORMAT=DYNAMIC, but zlib compressed B-trees and BLOBs;
the compressed page size is specified by KEY_BLOCK_SIZE in
kilobytes (1, 2, 4, 8, or 16; default 8)
KEY_BLOCK_SIZE=1, 2, 4, 8, or 16: implies ROW_FORMAT=COMPRESSED;
ignored if ROW_FORMAT is not COMPRESSED
KEY_BLOCK_SIZE=anything else: ignored
The InnoDB row format is displayed in the 4th column (Row_format) of
the output of SHOW TABLE STATUS. The Create_options column may show
ROW_FORMAT= and KEY_BLOCK_SIZE=, but they do not necessarily have
anything to do with InnoDB.
The table format can also be queried like this:
SELECT table_schema, table_name, row_format
FROM information_schema.tables
WHERE engine='innodb' and row_format in ('Compressed','Dynamic');
When Row_format='Compressed', KEY_BLOCK_SIZE should usually correspond
to the compressed page size. But the .frm file could be manipulated
to show any KEY_BLOCK_SIZE.
For some reason, INFORMATION_SCHEMA.TABLES.CREATE_OPTIONS does not
include KEY_BLOCK_SIZE. It does include row_format (spelled in
lowercase). This looks like a MySQL bug, because the table
INFORMATION_SCHEMA.TABLES probably tries to replace SHOW TABLE STATUS.
I reported this as Bug #35275 <http://bugs.mysql.com/35275>.
ha_innobase::get_row_type(): Add ROW_TYPE_COMPRESSED, ROW_TYPE_DYNAMIC.
ha_innobase::create(): Implement ROW_FORMAT=COMPRESSED and
ROW_FORMAT=DYNAMIC. Do not throw errors for wrong ROW_FORMAT or
KEY_BLOCK_SIZE, but issue warnings instead.
ha_innobase::check_if_incompatible_data(): Return COMPATIBLE_DATA_NO
if KEY_BLOCK_SIZE has been specified.
innodb.result: Adjust the result for the warning issued for ROW_FORMAT=FIXED.
innodb-zip.test: Add tests. Query INFORMATION_SCHEMA.TABLES for ROW_FORMAT.
variable innodb_file_format. Implement file format version stamping of
*.ibd files and SYS_TABLES.TYPE.
This change breaks introduces an incompatible change for for
compressed tables. We can do this, as we have not released yet.
innodb-zip.test: Add tests for stricter KEY_BLOCK_SIZE and ROW_FORMAT
checks.
DICT_TF_COMPRESSED_MASK, DICT_TF_COMPRESSED_SHIFT: Replace with
DICT_TF_ZSSIZE_MASK, DICT_TF_ZSSIZE_SHIFT.
DICT_TF_FORMAT_MASK, DICT_TF_FORMAT_SHIFT, DICT_TF_FORMAT_51,
DICT_TF_FORMAT_ZIP: File format version, stored in table->flags,
in the .ibd file header, and in SYS_TABLES.TYPE.
dict_create_sys_tables_tuple(): Write the table flags to SYS_TABLES.TYPE
if the format is at least DICT_TF_FORMAT_ZIP. For old formats
(DICT_TF_FORMAT_51), write DICT_TABLE_ORDINARY as the table type.
DB_TABLE_ZIP_NO_IBD: Remove the error code. The error handling is done
in ha_innodb.cc; as a failsafe measure, dict_build_table_def_step() will
silently clear the compression and format flags instead of returning this
error.
dict_mem_table_create(): Assert that no extra bits are set in the flags.
dict_sys_tables_get_zip_size(): Rename to dict_sys_tables_get_flags().
Check all flag bits, and return ULINT_UNDEFINED if the combination is
unsupported.
dict_boot(): Document the SYS_TABLES columns N_COLS and TYPE.
dict_table_get_format(), dict_table_set_format(),
dict_table_flags_to_zip_size(): New accessors to table->flags.
dtuple_convert_big_rec(): Introduce the auxiliary variables
local_len, local_prefix_len. Store a 768-byte prefix locally
if the file format is less than DICT_TF_FORMAT_ZIP.
dtuple_convert_back_big_rec(): Restore the columns.
srv_file_format: New variable: innodb_file_format.
fil_create_new_single_table_tablespace(): Replace the parameter zip_size
with table->flags.
fil_open_single_table_tablespace(): Replace the parameter zip_size_in_k
with table->flags. Check the flags.
fil_space_struct, fil_space_create(), fil_op_write_log():
Replace zip_size with flags.
fil_node_open_file(): Note a TODO item for InnoDB Hot Backup.
Check that the tablespace flags match.
fil_space_get_zip_size(): Rename to fil_space_get_flags(). Add a
wrapper for fil_space_get_zip_size().
fsp_header_get_flags(): New function.
fsp_header_init_fields(): Replace zip_size with flags.
FSP_SPACE_FLAGS: New name for the tablespace flags. This field used
to be called FSP_PAGE_ZIP_SIZE, or FSP_LOWEST_NO_WRITE. It has always
been written as 0 in MySQL/InnoDB versions 4.1 to 5.1.
MLOG_ZIP_FILE_CREATE: Rename to MLOG_FILE_CREATE2. Add a 32-bit
parameter for the tablespace flags.
ha_innobase::create(): Check the table attributes ROW_FORMAT and
KEY_BLOCK_SIZE. Issue errors if they are inappropriate, or warnings
if the inherited attributes (in ALTER TABLE) will be ignored.
PAGE_ZIP_MIN_SIZE_SHIFT: New constant: the 2-logarithm of PAGE_ZIP_MIN_SIZE.
There is one consideration: fil_init() chooses the tablespace hash size
based on the initial value of srv_file_per_table. However, this is nothing
new: InnoDB could be started with innodb_file_per_table=0 even though
*.ibd files exist.
srv_file_per_table: Declare as my_bool instead of ibool, because
MYSQL_SYSVAR_BOOL() expects a pointer to my_bool. Document the
variable also in srv0srv.h.
innobase_start_or_create_for_mysql(): Note why it is OK to temporarily
clear srv_file_per_table.
innobase_file_per_table: Remove.
lock_get_table(), locks_row_eq_lock(), buf_page_get_mutex(): Add return
after ut_error. On Windows, ut_error is not declared as "noreturn".
Add explicit type casts when assigning ulint to byte to get rid of
"possible loss of precision" warnings.
struct i_s_table_cache_struct: Declare rows_used, rows_allocd as ulint
instead of ullint. 32 bits should be enough.
fill_innodb_trx_from_cache(), i_s_zip_fill_low(): Cast 64-bit unsigned
integers to longlong when calling Field::store(longlong, bool is_unsigned).
Otherwise, the compiler would implicitly convert them to double and
invoke Field::store(double) instead.
recv_truncate_group(), recv_copy_group(), recv_calc_lsn_on_data_add():
Cast ib_uint64_t expressions to ulint to get rid of "possible loss of
precision" warnings. (There should not be any loss of precision in
these cases.)
log_close(), log_checkpoint_margin(): Declare some variables as ib_uint64_t
instead of ulint, so that there won't be any potential loss of precision.
mach_write_ull(): Cast the second argument of mach_write_to_4() to ulint.
OS_FILE_FROM_FD(): Cast the return value of _get_osfhandle() to HANDLE.
row_merge_dict_table_get_index(): Cast the parameter of mem_free() to (void*)
in order to get rid of the bogus MSVC warning C4090, which has been reported
as MSVC bug 101661:
<http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=101661>
row_mysql_read_blob_ref(): To get rid of a bogus MSVC warning C4090,
drop a const qualifier.
innobase_raw_format(), move the definition from row0row.c to
ha_innodb.cc. After this change, row0row.c no longer references
system_charset_info (Mantis issue #17). Patch prepared by Vasil,
tested by Calvin, and reviewed by Marko.
also when row_merge_create_temporary_table() fails. Otherwise, an
assertion would fail when the client connection is closed, because
prebuilt->trx would still be holding a table lock on innodb_table.
Use innobase_strcasecmp() insteaed of strcasecmp() in i_s.cc and get rid
of strings.h (that file is not present on Windows).
Move the prototype of innobase_strcasecmp() from ha_innodb.cc and
dict0dict.c to ha_prototypes.h.
Approved by: Heikki
buf_buddy_relocated_duration[],
page_zip_compress_duration[]
page_zip_decompress_duration[]: Record the total duration of the operations.
buf_buddy_relocate(), page_zip_compress(), page_zip_decompress():
Add ut_time_us() instrumentation.
i_s_zip_fields_info[], i_s_zip_fill_low(): Move the columns containing
cumulated statistics last. Add relocated_usec, compressed_usec, and
decompressed_usec.
innobase_check_index_keys(): Remove unused parameters. Use
sql_print_error() for error message output.
ha_innobase::add_index(): When row_merge_rename_tables() fails, do not
allow row_merge_drop_table() to alter the error code returned to MySQL.
in r2276. Now the following symbols will be exported when InnoDB is built
as a dynamic plugin:
* the virtual method pointer table of class ha_innodb
* the three variables that MySQL will reference when linking at runtime:
_mysql_plugin_declarations_
_mysql_plugin_interface_version_
_mysql_sizeof_struct_st_plugin_
Furthermore, the following symbols are weak globals, to allow us to access
the built-in InnoDB in the mysqld executable, in case it contains a statically
linked InnoDB:
builtin_innobase_plugin
innodb_hton_ptr
symbols. Use it for all definitions of non-static variables and functions.
lexyy.c, make_flex.sh: Declare yylex as UNIV_INTERN, not static. It is
referenced from pars0grm.c.
Actually, according to
nm .libs/ha_innodb.so|grep -w '[ABCE-TVXYZ]'
the following symbols are still global:
* The vtable for class ha_innodb
* pars0grm.c: The function yyparse() and the variables yychar, yylval, yynerrs
The required changes to the Bison-generated file pars0grm.c will be addressed
in a separate commit, which will add a script similar to make_flex.sh.
The class ha_innodb is renamed from class ha_innobase by a #define. Thus,
there will be no clash with the builtin InnoDB. However, there will be some
overhead for invoking virtual methods of class ha_innodb. Ideas for making
the vtable hidden are welcome. -fvisibility=hidden is not available in GCC 3.
Require PROCESS privileges instead of SUPER to view INFORMATION_SCHEMA tables.
Suggested by: Sergei Golubchik <serg@mysql.com> (in a private email,
pointed http://bugs.mysql.com/32710)
creating indexes. Lock the user table inside the user transaction.
enum trx_dict_op: Remove TRX_OP_INDEX_MAY_WAIT.
ha_innobase::add_index(): Lock the user tables within prebuilt->trx.
Commit the data dictionary transaction before creating indexes.
ha_innobase::final_drop_index(): Lock the user table within prebuilt->trx.
to the undo log, also store the original length of the column, so that the
changes will be correctly undone in transaction rollback or when fetching
previous versions of the row.
innodb-zip.test: New file, for tests of the compression.
upd_field_t: Add orig_len, the original length of new_val.
btr_push_update_extern_fields(): Restore the original prefix of the column.
Add the parameter heap where memory will be allocated if necessary.
trx_undo_rec_get_col_val(): Add the output parameter orig_len.
trx_undo_page_report_modify_ext(): New function: Write an externally
stored column to the undo log. This is only called from
trx_undo_page_report_modify(), and this is the only caller of
trx_undo_page_fetch_ext().
trx_undo_update_rec_get_update(): Read the original length of the column
prefix to upd_field->orig_len.
Forward port of r2236
Introduce retry/sleep logic as a workaround for a transient bug
where ::open fails for partitioned tables randomly if we are using
one file per table. (Bug #33349)
Reviewed by: Heikki
buf_pool->mutex: Rename to buf_pool_mutex, so that the wrappers will have
to be used when changes are merged from other source trees.
buf_pool->zip_mutex: Rename to buf_pool_zip_mutex.
buf_pool_mutex_own(), buf_pool_mutex_enter(), buf_pool_mutex_exit():
Wrappers for buf_pool_mutex.
ha_innobase::final_drop_index(): If row_merge_drop_table() fails, clear
the to_be_dropped flags. This was the error fixed in this commit; the rest
is just additional safety.
ha_innobase::final_drop_index(): After dropping the flagged indexes,
assert that none of the remaining indexes are flagged to_be_dropped.
ha_innobase::prepare_drop_index(): Assert that no index has been flagged
for deletion. When checking foreign key constraints, simply traverse the
list of indexes and check if any of the indexes that were just flagged
to_be_dropped. On error, clear the to_be_dropped flags with simple list
traversal.
Change the format of TRX_IDs in INFORMATION_SCHEMA tables from DEC to
HEX.
The current TRX_IDs are hard to remember and track down: 426355, 428466,
428566, etc.
In HEX:
* there are less "digits", the strings are shorter;
* since there are 16 instead of 10 "digits", the chance of having
repeating ones are smaller.
The above look like 68173, 689B2, 68A16 in HEX.
Discussed with: Ken
Approved by: Heikki (via IM)
acquiring the table lock. The data dictionary should not be locked for
long periods. Before this change, in the worst case, the dictionary
would be locked until the expiration of innodb_lock_wait_timeout.
Virtually, transaction-level locks (locks on database objects, such
as records and tables) have a latching order level of SYNC_USER_TRX_LOCK,
which is above any InnoDB rw-locks or mutexes. However, the latching
order of SYNC_USER_TRX_LOCK is never checked, not even by UNIV_SYNC_DEBUG.
ha_innobase::add_index(), ha_innobase::final_drop_index(): Invoke
row_mysql_lock_data_dictionary(trx) only after row_merge_lock_table().
row_merge_lock_table().
ha_innobase::final_drop_index(): Set the dictionary operation mode to
TRX_DICT_OP_INDEX_MAY_WAIT for the duration of the row_merge_lock_table()
call.
Active transactions must not switch table or index definitions on the fly,
for several reasons, including the following:
* copied indexes do not carry any history or locking information;
that is, rollbacks, read views, and record locking would be broken
* huge potential for race conditions, inconsistent reads and writes,
loss of data, and corruption
Instead of trying to track down if the table was changed during a transaction,
acquire appropriate locks that protect the creation and dropping of indexes.
innodb-index.test: Test the locking of CREATE INDEX and DROP INDEX. Test
that consistent reads work across dropped indexes.
lock_rec_insert_check_and_lock(): Relax the lock_table_has() assertion.
When inserting a record into an index, the table must be at least IX-locked.
However, when an index is being created, an IS-lock on the table is
sufficient.
row_merge_lock_table(): Add the parameter enum lock_mode mode, which must
be LOCK_X or LOCK_S.
row_merge_drop_table(): Assert that n_mysql_handles_opened == 0.
Unconditionally drop the table.
ha_innobase::add_index(): Acquire an X or S lock on the table, as appropriate.
After acquiring an X lock, assert that n_mysql_handles_opened == 1.
Remove the comments about dropping tables in the background.
ha_innobase::final_drop_index(): Acquire an X lock on the table.
dict_table_t: Remove version_number, to_be_dropped, and prebuilts.
ins_node_t: Remove table_version_number.
enum lock_mode: Move the definition from lock0lock.h to lock0types.h.
ROW_PREBUILT_OBSOLETE, row_update_prebuilt(), row_prebuilt_table_obsolete():
Remove.
row_prebuilt_t: Remove the declaration from row0types.h.
row_drop_table_for_mysql_no_commit(): Always print a warning if a table
was added to the background drop queue.
kernel_mutex must be released before calling this function.
innobase_mysql_end_print_arbitrary_thd(),
innobase_mysql_prepare_print_arbitrary_thd(): Assert that the
kernel_mutex is not being held by the current thread.
Non-functional change:
Move the prototypes of
innobase_mysql_prepare_print_arbitrary_thd() and
innobase_mysql_end_print_arbitrary_thd() from lock0lock.c to
ha_prototypes.h
Suggested by: Marko
Approved by: Marko
* Change terminology:
wait lock -> requested lock
waited lock -> blocking lock
new: requesting transaction (the trx what owns the requested lock)
new: blocking transaction (the trx that owns the blocking lock)
* Add transaction ids to INFORMATION_SCHEMA.INNODB_LOCK_WAITS. This is
somewhat redundant because transaction ids can be found in INNODB_LOCKS
(which can be joined with INNODB_LOCK_WAITS) but would help users to
write shorter joins (one table less) in some cases where they want to
find which transaction is blocking which.
Suggested by: Ken
Approved by: Heikki
for dropping the index trees, and set the dictionary operation flag, similar
to what ha_innobase::add_index() does. This should ensure correct crash
recovery.
Fix the size of the static buffer for lock_table and lock_index.
I was not realizing that NAME_LEN contains the mbmaxlen multiplier and thus
a quote, when converted to 2 quotes, will take 2 bytes while there are 3
bytes reserved.
Spotted by: Marko
Pointyhat to: Vasil
Bugfix1: Set innodb_locks.lock_index to NOT NULL.
If a column in INFORMATION_SCHEMA table has the flag
MY_I_S_MAYBE_NULL and it is not explicitly marked as NOT NULL
with the method ::set_notnull() then it is always rendered as
NULL by MySQL.
Bugfix2: Avoid crashes if lock_index is NULL. It is NULL for table
level locks.
Pointyhat to: Marko
innodb_information_schema.test. Add tests that display most columns from
INFORMATION_SCHEMA.INNODB_LOCKS. Test that quoting of table names works
and respects SQL_MODE='ANSI_QUOTES'.
innobase_print_identifier(): Remove.
innobase_convert_identifier(): New function,
based on innobase_print_identifier().
innobase_convert_name(): New function, similar to ut_print_namel(), but
using a memory buffer.
ut_print_namel(): Use innobase_convert_name().
fill_innodb_locks_from_cache(): Convert lock_table and lock_index by
calling innobase_convert_name().
Implement a limit on the memory used by the INNODB_TRX, INNODB_LOCKS and
INNODB_LOCK_WAITS tables. The maximum allowed memory is defined with the
macro TRX_I_S_MEM_LIMIT.
Approved by: Marko (via IM)
Add the query in information_schema.innodb_trx.trx_query. Add it even
though it is available in information_schema.processlist.info to make
inconsistencies between those two tables obvious.
It is rather confusting to see a transaction shown in innodb_trx and
innodb_locks that holds a lock on one table and the corresponding query
in processlist executing INSERT on another table. We do not want users
to contact us asking to explain that. It is caused by the fact that the
data for innodb_* tables and processlist is fetched at different time.
Approved by: Marko
Introduce a generic soultion to the common problem that MySQL do not add
functions needed by us in a reasonable time.
Start with a function that retrieves THD::thread_id, this is needed for
the information_schema.innodb_trx.mysql_thread_id column.
Approved by: Marko
a compressed table in the system tablespace.
db0err.h: Introduce the error code DB_TABLE_ZIP_NO_IBD. Replace the
#define directives with an enum, to ease future code merges. These
error codes are never written out to files or displayed to the user.
Thus they need not remain constant.
dict_build_table_def_step(): Return DB_TABLE_ZIP_NO_IBD instead of DB_ERROR.
create_table_def(): Report ER_ILLEGAL_HA_CREATE_OPTION "KEY_BLOCK_SIZE"
when the table creation fails with DB_TABLE_ZIP_NO_IBD.
redefined so that the dynamic plugin can replace the builtin InnoDB
in MySQL 5.1.
ha_innodb.cc, handler0alter.cc: #include "univ.i" before any other InnoDB
header files or before defining any symbols
innodb_redefine.h: New file, to contain a mapping of symbols. The idea
is that this file will be replaced in the build process; because this
is a large file that can be generated automatically, it does not make sense
to keep it under version control.
univ.i: #include "innodb_redefine.h" and #define ha_innobase ha_innodb
Makefile.am (ha_innodb_la_CXXFLAGS): Remove -Dha_innobase=ha_innodb
NOTE: there are still some issues in the source code. One known issue is
the #undef mutex_free in sync0sync.h, which will cause the plugin to call the
function mutex_free in the builtin InnoDB. The preprocessor symbols defined
in innodb_redefine.h must not be undefined or redefined anywhere in the code.
plugin "InnoDB", not "InnoDBzip".
We can disable the builtin InnoDB by mysqld --skip-innodb. If the
builtin InnoDB is not disabled, installing the InnoDB plugin by the same
name will not work.
innodb_plugin_init(): Ignore differences in the PLUGIN_VAR_READONLY flag.